首页|期刊导航|计算机技术与发展|MapReduce中数据倾斜解决方法的研究

MapReduce中数据倾斜解决方法的研究

王刚李盛恩

计算机技术与发展2016，Vol.26Issue(9)：201-204,4.

计算机技术与发展2016，Vol.26Issue(9)：201-204,4.DOI:10.3969/j.issn.1673-629X.2016.09.045

MapReduce中数据倾斜解决方法的研究

Research on Handling Data Skew in MapReduce

王刚 ¹李盛恩¹

作者信息

1. 山东建筑大学计算机科学与技术学院，山东济南 250101
折叠

摘要

Abstract

With the rapid development of mobile Internet and the Internet of Things,the data size explosively grows,and people have been in the era of big data. As a distributed computing framework,MapReduce has the ability of processing massive data and becomes a focus in big data. But the performance of MapReduce depends on the distribution of data. The Hash partition function defaulted by MapReduce can’ t guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to solve the problem,sampling is used. It does a MapReduce job to sample before dealing with user’ s job in this paper. After learning the distribution of key,load balance of data partition is achieved using data locality. The example of WordCount is tested in experimental plat-form. Results show that data partition using sample is better than Hash partition,and taking data locality is much better than that using sample but no data locality.

关键词

大数据/MapReduce/负载均衡/抽样

Key words

big data/MapReduce/load balancing/sampling

分类

信息技术与安全科学

引用本文复制引用

王刚,李盛恩..MapReduce中数据倾斜解决方法的研究[J].计算机技术与发展,2016,26(9):201-204,4.

基金项目

国家自然科学基金资助项目(61170052) （61170052）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航