| 注册
首页|期刊导航|计算机技术与发展|MapReduce中数据倾斜解决方法的研究

MapReduce中数据倾斜解决方法的研究

王刚 李盛恩

计算机技术与发展2016,Vol.26Issue(9):201-204,4.
计算机技术与发展2016,Vol.26Issue(9):201-204,4.DOI:10.3969/j.issn.1673-629X.2016.09.045

MapReduce中数据倾斜解决方法的研究

Research on Handling Data Skew in MapReduce

王刚 1李盛恩1

作者信息

  • 1. 山东建筑大学 计算机科学与技术学院,山东 济南 250101
  • 折叠

摘要

Abstract

With the rapid development of mobile Internet and the Internet of Things,the data size explosively grows,and people have been in the era of big data. As a distributed computing framework,MapReduce has the ability of processing massive data and becomes a focus in big data. But the performance of MapReduce depends on the distribution of data. The Hash partition function defaulted by MapReduce can’ t guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to solve the problem,sampling is used. It does a MapReduce job to sample before dealing with user’ s job in this paper. After learning the distribution of key,load balance of data partition is achieved using data locality. The example of WordCount is tested in experimental plat-form. Results show that data partition using sample is better than Hash partition,and taking data locality is much better than that using sample but no data locality.

关键词

大数据/MapReduce/负载均衡/抽样

Key words

big data/MapReduce/load balancing/sampling

分类

信息技术与安全科学

引用本文复制引用

王刚,李盛恩..MapReduce中数据倾斜解决方法的研究[J].计算机技术与发展,2016,26(9):201-204,4.

基金项目

国家自然科学基金资助项目(61170052) (61170052)

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文