计算机工程与应用Issue(2):116-119,128,5.DOI:10.3778/j.issn.1002-8331.1303-0512
主题搜索引擎中爬虫搜索策略的研究
Research on search strategy of web spider in topic-oriented search engines
摘要
Abstract
In order to solve the low efficiency problem of traditional focused crawler, web spider always selects the most valuable links to visit, so how to focus the search around a given topic is a key problem. The traditional method always only computes the relevance of the links, but ignores the relevance among the unlabeled URL, now it proposes the algorithm based on link model which combines the seed URL with unlabeled URL to compute the relevance of the other URL, and it deduces the point that initial iterative is insensitivity of the results. Compared with the methods based on traditional algorithm, experimental result proves the performance of the new algorithm is more efficient than the traditional ones.关键词
网络爬虫/主题搜索引擎/搜索策略/向量空间模型Key words
web spider/topic-oriented search engine/search strategy/Vector Space Model(VSM)分类
信息技术与安全科学引用本文复制引用
史宝明,贺元香,吴崇正..主题搜索引擎中爬虫搜索策略的研究[J].计算机工程与应用,2014,(2):116-119,128,5.基金项目
甘肃联合大学科研能力提升计划项目(No.2012YBTS05)。 ()