中山大学学报(自然科学版)2017,Vol.56Issue(3):46-56,11.DOI:10.13471/j.cnki.acta.snus.2017.03.008
Spark DAG优化MapReduce协同过滤算法
Optimization of collaborative filtering algorithm based on DAG Spark scheduling
摘要
Abstract
The scale effect of big data has brought great challenges to data storage,management and analysis.And the high efficiency and low cost big data processing technology has become a hotspot research in academia and industry.In order to improve the efficiency of collaborative filtering algorithms,the implementation of the algorithm under the MapReduce architecture is decomposed in order to analysis the defects of the algorithm.For the Spark suitable for the iterative and interactive tasks,this paper presents the methods to improve the execution efficiency from the MapReduce platform to the Spark platform.The implementation flow of the algorithm in Spark is designed,and efficiency is improved by parameter adjustment and memory optimization.Experimental results show that: based on spark DAG scheduling,the algorithm can reduce more than 65% HDFS I/O operations and enforce the efficiency and energy efficiency were increased by nearly 200% and 50%.关键词
协同过滤/MapReduce/Spark/算法优化/能耗优化Key words
collaborative filtering/MapReduce/Spark/algorithm optimization/energy consumption optimization分类
计算机与自动化引用本文复制引用
廖彬,张陶,于炯,国冰磊,张旭光,刘炎..Spark DAG优化MapReduce协同过滤算法[J].中山大学学报(自然科学版),2017,56(3):46-56,11.基金项目
国家自然科学基金(61562078,61262088) (61562078,61262088)
新疆维吾尔自治区自然科学基金(2016D01B014) (2016D01B014)
新疆财经大学博士启动基金(2015BS007) (2015BS007)