摘要
Abstract
To overcome the weakness existing in the present topic crawling strategies,a topic crawling strategy based on Wikipedia and web-page similarity analysis is put forward in this paper. The Wikipedia classification tree structure is utilized to describe the topics,and then the downloaded webs are properly handled. Finally,the priorities of the candidate links are calcu-lated in combination with text relativity and analysis of Web links. The experimental result indicates that this new method is bet-ter than the traditional crawler in terms of searching results and topic relativity,and its climb rate has been increased. The theme description method and the crawl strategy have a certain promotion value,especially in the field of genetically modified or-ganisms,the crawler has certain innovativeness.关键词
维基百科/文本相关性/链接分析/相似度计算Key words
topic crawling/Wikipedia/text relativity/link analysis/similarity calculation分类
信息技术与安全科学