计算机工程与应用2018,Vol.54Issue(4):72-76,5.DOI:10.3778/j.issn.1002-8331.1701-0238
一种Spark集群下的shuffle优化机制
Shuffle optimization for Spark cluster
摘要
Abstract
Spark is a distributed processing framework based on memory. The large amounts of data generated by the shuffle process deeply affect the network transmission,which has become one of the main bottlenecks of the Spark perfor-mance. In order to solve the problem of unbalanced data distribution resulting in the I/O load imbalance in different nodes, a restart policy based on task local level is designed. Finally, the optimization mechanism is verified by experi-ments,which can reduce the execution time of task and improve the efficiency of shuffle process.关键词
Spark集群/shuffle过程/数据传输/本地性/调度策略Key words
Spark cluster/shuffle process/data transfer/locality/schedule strategy分类
信息技术与安全科学引用本文复制引用
熊安萍,夏玉冲,杨方方..一种Spark集群下的shuffle优化机制[J].计算机工程与应用,2018,54(4):72-76,5.基金项目
重庆邮电大学博士启动基金(No.A2015-17). (No.A2015-17)