计算机与数字工程2018,Vol.46Issue(5):874-878,5.DOI:10.3969/j.issn.1672-9722.2018.05.006
引入主题链接块因子的候选链接搜索策略研究
Research of Searching Strategy in Candidate Link Introducing Topic Link Blocking Factor
周雪 1刘乃文2
作者信息
- 1. 山东师范大学信息科学与工程学院 济南250014
- 2. 山东省分布式计算机软件新技术重点实验室 济南250014
- 折叠
摘要
Abstract
In crawling process,the urls'weight is need to compute,the crawl queue is filled to meet the crawl conditions.It's the key problem that how to find the most relevant links to the theme and how to avoid"theme drift"problem.Due to anchor text is short,it can't clearly show the page's relevance to the topic which the page linked to.On the basis of Shark- search algorithm intro-ducing the related link weights,the neutron link anchor text is used for calculating blocks'weight.Through contrasted experiments, verified the effectiveness of the improved algorithm is verfied,it can better distinguish the links'relevance score in the same page, improve the precision of the crawler and moderate"theme drift"problem at the same time.关键词
网页分块/Shark-search算法/链接结构/主题链接块Key words
page-block/Shark-search algorithm/link-structure/topic-relative link block分类
信息技术与安全科学引用本文复制引用
周雪,刘乃文..引入主题链接块因子的候选链接搜索策略研究[J].计算机与数字工程,2018,46(5):874-878,5.