吉林大学学报(理学版)2016,Vol.54Issue(6):1383-1387,5.DOI:10.13413/j.cnki.jdxblxb.2016.06.35
基于MapReduce的两表数据倾斜连接的优化算法
Optimization Algorithm of Two Table Data Skew Join Based on MapReduce
摘要
Abstract
Aiming at the problem that Range partition algorithm could not optimize two table join efficiency,which contained heavily skewed data,we proposed an improved algorithm for the data skew connection.The algorithm took different treatment for skew data and non-skew data,sent data to each Reduce node by using the methods of replicating and broadcasting,and completed all the connection operation through a round of Map/Reduce tasks.The algorithm could effectively balance processing of each Reduce,which solved the impact of the heavily skewed data on the performance of two table join.The results show that the algorithm is effective by comparing with the traditional partition join algorithm.关键词
MapReduce/Range partition算法/数据倾斜/连接算法优化Key words
MapReduce/Range partition algorithm/data skew/optimization of join algorithm分类
信息技术与安全科学引用本文复制引用
赵宇兰..基于MapReduce的两表数据倾斜连接的优化算法[J].吉林大学学报(理学版),2016,54(6):1383-1387,5.基金项目
国家自然科学基金(批准号:61303107) (批准号:61303107)