大数据2025,Vol.11Issue(6):123-142,20.DOI:10.11959/j.issn.2096-0271.2025074
融合多分组归并的券商数据Shuffle和数据倾斜算法
Multi group merging algorithm for solving data Shuffle and data skew of securities companies
曹亚坤 1唐小勇1
作者信息
- 1. 长沙理工大学计算机与通信工程学院,湖南 长沙 410114
- 折叠
摘要
Abstract
In the securities industry,the processing and analysis of user data are critical technologies that significantly impact business decision-making and risk control.However,the vast scale and complexity of user data securities companies led to significant Shuffle operations and data skew issues in big data computations.Existing optimization methods either relied on hardware upgrades or were limited by domain-specific constraints,failing to address the problem effectively.To resolve this,a multi-group merging algorithm(MGMA)based on user relationships was proposed,which improved computational efficiency and reduces resource consumption through effective grouping and optimization strategies.Experimental results showed that,compared to the no optimized(NO)control group,MGMA algorithm achieved a 20%data skew rate,72%memory usage,and 61%computation time.All three indicators surpass those of the other four comparison optimization methods.关键词
Shuffle操作/数据倾斜/预处理/券商数据Key words
Shuffle operations/data skew/preprocessing/data of securities companies分类
计算机与自动化引用本文复制引用
曹亚坤,唐小勇..融合多分组归并的券商数据Shuffle和数据倾斜算法[J].大数据,2025,11(6):123-142,20.