| 注册
首页|期刊导航|计算机科学与探索|非均匀数据分布下的MapReduce连接查询算法优化

非均匀数据分布下的MapReduce连接查询算法优化

张敬伟 尚宏佳 钱俊彦 周萍 杨青

计算机科学与探索2017,Vol.11Issue(5):752-767,16.
计算机科学与探索2017,Vol.11Issue(5):752-767,16.DOI:10.3778/j.issn.1673-9418.1604022

非均匀数据分布下的MapReduce连接查询算法优化

Join Query Optimization Based on MapReduce under Skewed Data

张敬伟 1尚宏佳 2钱俊彦 1周萍 1杨青3

作者信息

  • 1. 桂林电子科技大学广西可信软件重点实验室,广西桂林541004
  • 2. 桂林电子科技大学广西云计算与大数据协同创新中心,广西桂林541004
  • 3. 桂林电子科技大学广西自动检测技术与仪器重点实验室,广西桂林541004
  • 折叠

摘要

Abstract

MapReduce,a classic distributed computing environment,can improve the performance of join query on large-scale data,but when the join attributes do not follow a uniform distribution,the pure hash strategy in traditional MapReduce will lead to load imbalance over computing nodes,which will reduce the performance of overall task.Aiming at the data skew problem in the join query,this paper studies the join query optimization based on MapReduce computing framework.Firstly,this paper conducts experimental analysis for the improved repartitioning join query algorithm,studies the execution phases of join query based on traditional MapReduce computing framework,and finds the performance bottlenecks of join query on MapReduce computing framework when data do not follow a uniform distribution.Based on the above,this paper designs and implements an improved join query optimization algorithm,which is based on an execution strategy by integrating the combination segmentation method and equilibrium partitioning method.The experimental results show that the proposed optimization method provides a good solution for distributed join query on large-scale skewed datasets,and presents an excellent time performance and scalability.

关键词

连接查询/MapReduce/数据倾斜

Key words

join query/MapReduce/skewed data

分类

信息技术与安全科学

引用本文复制引用

张敬伟,尚宏佳,钱俊彦,周萍,杨青..非均匀数据分布下的MapReduce连接查询算法优化[J].计算机科学与探索,2017,11(5):752-767,16.

基金项目

The National Natural Science Foundation of China under Grant Nos.U1501252,61363005,61462017(国家自然科学基金) (国家自然科学基金)

the Natural Science Foundation of Guangxi under Grant Nos.2014GXNSFAA118353,2014GXNSFAA118390,2014GXNSFDA118036(广西自然科学基金) (广西自然科学基金)

the High Level Innovation Team of Colleges and Universities in Guangxi and Outstanding Scholars Program Funding (广西高等学校高水平创新团队及卓越学者计划) (广西高等学校高水平创新团队及卓越学者计划)

the Program of Guangxi Cooperative Innovation Center of Cloud Computing and Big Data(广西云计算与大数据协同创新中心基金项目) (广西云计算与大数据协同创新中心基金项目)

the Guangxi Cooperative Innovation Center of IOT and Industrialization (广西物联网技术与产业化推进协同创新中心资助项目). (广西物联网技术与产业化推进协同创新中心资助项目)

计算机科学与探索

OA北大核心CSCDCSTPCD

1673-9418

访问量0
|
下载量0
段落导航相关论文