重庆邮电大学学报(自然科学版)2023,Vol.35Issue(6):1154-1163,10.DOI:10.3979/j.issn.1673-825X.202211030310
结合节点计算能力的MapReduce负载均衡方法
Load balancing in MapReduce combined with computing capacity of nodes
摘要
Abstract
MapReduce is a widely used programming model in big data computing,offering significant benefits for intensive computing tasks.However,the default Hash partitioning method is prone to data skew and unbalanced load among nodes,impacting overall computing performance and wasting cluster resources.In this paper,a partitioning method combining node computing capacity is proposed to solve the load balancing problem.Firstly,an independent sampling job is executed using the Reservoir sampling algorithm to extract the data to be processed.The location and frequency of keywords in the sample are then counted.Secondly,the partition strategy is formulated to balance the load of each partition with the computing ca-pacity of nodes according to the statistics of the keywords,and the network overhead is optimized simultaneously.Finally,the whole dataset is used as input to run the computation job,and the established partitioning strategy is used to partition the intermediate data,resulting in the final output of the computation job.Experimental results show that the proposed method achieves more balanced load among nodes and significantly improves the efficiency of computing job execution.关键词
负载均衡/数据倾斜/大数据/抽样算法Key words
load balancing/data skew/big data/sampling algorithm分类
信息技术与安全科学引用本文复制引用
胡林发,付晓东,刘骊,刘利军..结合节点计算能力的MapReduce负载均衡方法[J].重庆邮电大学学报(自然科学版),2023,35(6):1154-1163,10.基金项目
国家自然科学基金项目(62362043,61962030) (62362043,61962030)
"兴滇英才支持计划"项目(KKXY202203008) (KKXY202203008)
云南省科技计划项目(202204BQ040010,202205AF150003)The National Natural Science Project of China(62362043,61962030) (202204BQ040010,202205AF150003)
The Xingdian Talents Support Plan Project(KKXY202203008) (KKXY202203008)
Yunnan Provincial Science and Technology Plan Project(202204BQ040010,202205AF150003) (202204BQ040010,202205AF150003)