计算机科学与探索Issue(12):1422-1431,10.DOI:10.3778/j.issn.1673-9418.1405053
基于随机游走路径的分布式SimRank算法
Distributed SimRank Algorithm Based on Random Walk Path
摘要
Abstract
SimRank is a widely used model for computing similarity, it measures similarity between objects based on graph topology. With the rapid increase of data, the way of centralized SimRank is not applicable and current distributed SimRank approaches have some drawbacks in efficiency and scalability. This paper presents a two-stage distributed SimRank algorithm based on random walk path. The first stage is data preprocessing and a Find-K-Paths algorithm based on BSP (bulk synchronous parallel) model is proposed. The algorithm can effectively build the index information of random walk path and support the dynamic adding of new paths. The number of the generated paths can be reduced by threshold filtering. In the second stage, based on the index information, a distributed SimRank algorithm is proposed under MapReduce. The experiments demonstrate the feasibility and effectiveness of the proposed algorithm.关键词
分布式SimRank/随机游走路径/BSP模型/MapReduceKey words
distributed SimRank/random walk path/BSP model/MapReduce分类
信息技术与安全科学引用本文复制引用
刘恒,寇月,申德荣,王泰明,于戈..基于随机游走路径的分布式SimRank算法[J].计算机科学与探索,2014,(12):1422-1431,10.基金项目
The National Natural Science Foundation of China under Grant Nos.61472070,61033007(国家自然科学基金) (国家自然科学基金)
the National Basic Research Program of China under Grant No.2012CB316201(国家重点基础研究发展计划(973计划)) (国家重点基础研究发展计划(973计划)
the Fundamental Research Funds for the Central Universities of China under Grant Nos.110404007,130404015(中央高校基本科研业务费专项基金) (中央高校基本科研业务费专项基金)
the Spe-cialized Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120042110028(高等学校博士学科点专项科研基金) (高等学校博士学科点专项科研基金)
the MOE-Intel Special Fund of Information Technology under Grant No. MOE-INTEL-2012-06(教育部-英特尔信息技术专项科研基金) (教育部-英特尔信息技术专项科研基金)