| 注册
首页|期刊导航|计算机工程|基于MapReduce的并行PageRank算法实现

基于MapReduce的并行PageRank算法实现

平宇 向阳 张波 黄寅飞

计算机工程Issue(2):31-34,38,5.
计算机工程Issue(2):31-34,38,5.DOI:10.3969/j.issn.1000-3428.2014.02.007

基于MapReduce的并行PageRank算法实现

Implementation of Parallel PageRank Algoirthm Based on MapReduce

平宇 1向阳 1张波 2黄寅飞3

作者信息

  • 1. 同济大学计算机科学与技术系,上海 201804
  • 2. 上海师范大学信息与机电工程学院,上海 200234
  • 3. 上海证券交易所,上海 200120
  • 折叠

摘要

Abstract

The emergence of distributed Web crawl largely expands the scale of related Web information. Since PageRank needs to process the topology of entire existed page set, the limitation of CPU, I/O and memory becomes the big issue when it confronts the data in TB or PB level. Aiming at these problems, this paper proposes a parallel PageRank algorithm based on MapReduce. In a certain iteration of algorithm, it processes the files containing the topology of Web page graph by Map function and calculates the pages’ scores by Reduce function. Using the global Web page score as convergence to control iterations and get more precise Web page sorting result. Experimental result shows that the improved algorithm has better clustering performance and faster execution speed on the basis of keeping the overall Web page sorting accuracy of single machine PageRank algorithm.

关键词

搜索引擎/PageRank 算法/MapReduce 框架/并行计算/Hadoop 平台

Key words

search engine/PageRank algorithm/MapReduce framework/parallel computing/Hadoop platform

分类

信息技术与安全科学

引用本文复制引用

平宇,向阳,张波,黄寅飞..基于MapReduce的并行PageRank算法实现[J].计算机工程,2014,(2):31-34,38,5.

基金项目

国家自然科学基金资助项目(61103069,71170148);国家科技支撑计划基金资助项目(2012BAD35B01);上海市科技创新计划基金资助项目(11DZ1501703);陈家镇智慧社区和智能交通基金资助项目(11dz1210600) (61103069,71170148)

计算机工程

OA北大核心CSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文