计算机技术与发展2016,Vol.26Issue(9):201-204,4.DOI:10.3969/j.issn.1673-629X.2016.09.045
MapReduce中数据倾斜解决方法的研究
Research on Handling Data Skew in MapReduce
摘要
Abstract
With the rapid development of mobile Internet and the Internet of Things,the data size explosively grows,and people have been in the era of big data. As a distributed computing framework,MapReduce has the ability of processing massive data and becomes a focus in big data. But the performance of MapReduce depends on the distribution of data. The Hash partition function defaulted by MapReduce can’ t guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to solve the problem,sampling is used. It does a MapReduce job to sample before dealing with user’ s job in this paper. After learning the distribution of key,load balance of data partition is achieved using data locality. The example of WordCount is tested in experimental plat-form. Results show that data partition using sample is better than Hash partition,and taking data locality is much better than that using sample but no data locality.关键词
大数据/MapReduce/负载均衡/抽样Key words
big data/MapReduce/load balancing/sampling分类
信息技术与安全科学引用本文复制引用
王刚,李盛恩..MapReduce中数据倾斜解决方法的研究[J].计算机技术与发展,2016,26(9):201-204,4.基金项目
国家自然科学基金资助项目(61170052) (61170052)