计算机工程Issue(2):31-34,38,5.DOI:10.3969/j.issn.1000-3428.2014.02.007
基于MapReduce的并行PageRank算法实现
Implementation of Parallel PageRank Algoirthm Based on MapReduce
摘要
Abstract
The emergence of distributed Web crawl largely expands the scale of related Web information. Since PageRank needs to process the topology of entire existed page set, the limitation of CPU, I/O and memory becomes the big issue when it confronts the data in TB or PB level. Aiming at these problems, this paper proposes a parallel PageRank algorithm based on MapReduce. In a certain iteration of algorithm, it processes the files containing the topology of Web page graph by Map function and calculates the pages’ scores by Reduce function. Using the global Web page score as convergence to control iterations and get more precise Web page sorting result. Experimental result shows that the improved algorithm has better clustering performance and faster execution speed on the basis of keeping the overall Web page sorting accuracy of single machine PageRank algorithm.关键词
搜索引擎/PageRank 算法/MapReduce 框架/并行计算/Hadoop 平台Key words
search engine/PageRank algorithm/MapReduce framework/parallel computing/Hadoop platform分类
信息技术与安全科学引用本文复制引用
平宇,向阳,张波,黄寅飞..基于MapReduce的并行PageRank算法实现[J].计算机工程,2014,(2):31-34,38,5.基金项目
国家自然科学基金资助项目(61103069,71170148);国家科技支撑计划基金资助项目(2012BAD35B01);上海市科技创新计划基金资助项目(11DZ1501703);陈家镇智慧社区和智能交通基金资助项目(11dz1210600) (61103069,71170148)