桂林电子科技大学学报2013,Vol.33Issue(2):139-143,5.
基于Hadoop的Nutch网页排序算法研究与实现
Research and implementation of Nutch Web sort algorithm based on Hadoop
摘要
Abstract
As the Nutch search engine framework does not realize Google's PageRank page sort algorithm,in order to meet the search engine's growing demand for high quality retrieval needs,the PageRank algorithm is analyzed,the validity of the algorithm is verified by the experiments,Hadoop distributed cluster is built successfully,and PageRank algorithm is realized in Nutch framework based on MapReduce distributed programming model.Experimental results show that the Nutch search engine system works with higher accuracy and provides users with better retrieval services in PageRank algorithm.关键词
Hadoop集群/MapReduce/Nutch/网页排序算法/PageRankKey words
Hadoop cluster/ MapReduce/ Nutch/ page sort algorithm/ PageRank分类
信息技术与安全科学引用本文复制引用
陶林,谌超,强保华,王勇..基于Hadoop的Nutch网页排序算法研究与实现[J].桂林电子科技大学学报,2013,33(2):139-143,5.基金项目
国家自然科学基金(61163057) (61163057)
广西自然科学基金(2012GXNSFAA053228) (2012GXNSFAA053228)