计算机工程与应用2017,Vol.53Issue(12):85-91,7.DOI:10.3778/j.issn.1002-8331.1606-0108
改进的Hadoop作业调度算法
Improvement of job scheduling algorithm on Hadoop
摘要
Abstract
Distributed cluster has the problem of load balancing, and the Hadoop does not take into account the differences in the performance of the nodes. Although it has a load balancing mechanism, the effect is not ideal. As a result, there is often a load imbalance in the process of running. In view of the above problem, this paper has in-depth analysis of the Hadoop source code, to clarify of hadoop principle, and improves Hadoop task scheduling in Yarn which is resource management mechanism of Hadoop. Then establishes new task scheduling rules, and also proposes a performance evalua-tion index for each node, performance evaluation includes dynamic performance and static performance. On the basis of this, this paper improves FairScheduler algorithm of Yarn, and forms a scheduling algorithm considering the performance of nades. To recompile the Hadoop source code, and comparative experiment which carries out on the Hadoop platform, and proves the performance index of the join node can effectively solve the problem of Hadoop load balancing, greatly improves of running efficiency on Hadoop.关键词
大数据/Hadoop/Yarn/负载均衡/FairScheduler算法Key words
big data/Hadoop/Yarn/load balancing/FairScheduler algorithm分类
信息技术与安全科学引用本文复制引用
冯兴杰,贺阳..改进的Hadoop作业调度算法[J].计算机工程与应用,2017,53(12):85-91,7.基金项目
国家自然科学基金委员会与中国民用航空局联合基金项目(No.U1233113) (No.U1233113)
国家自然科学基金(No.61301245, No.61201414). (No.61301245, No.61201414)