| 注册
首页|期刊导航|现代电子技术|基于维基百科和网页相似度分析的主题爬行策略

基于维基百科和网页相似度分析的主题爬行策略

栾霞 赵晓楠

现代电子技术Issue(20):35-37,3.
现代电子技术Issue(20):35-37,3.

基于维基百科和网页相似度分析的主题爬行策略

Topic crawling strategies based on Wikipedia and analysis of web-page similarity

栾霞 1赵晓楠2

作者信息

  • 1. 中国人民解放军第三二三医院 网络中心,陕西 西安 710054
  • 2. 中国人民解放军68303部队,甘肃 武威 733000
  • 折叠

摘要

Abstract

To overcome the weakness existing in the present topic crawling strategies,a topic crawling strategy based on Wikipedia and web-page similarity analysis is put forward in this paper. The Wikipedia classification tree structure is utilized to describe the topics,and then the downloaded webs are properly handled. Finally,the priorities of the candidate links are calcu-lated in combination with text relativity and analysis of Web links. The experimental result indicates that this new method is bet-ter than the traditional crawler in terms of searching results and topic relativity,and its climb rate has been increased. The theme description method and the crawl strategy have a certain promotion value,especially in the field of genetically modified or-ganisms,the crawler has certain innovativeness.

关键词

维基百科/文本相关性/链接分析/相似度计算

Key words

topic crawling/Wikipedia/text relativity/link analysis/similarity calculation

分类

信息技术与安全科学

引用本文复制引用

栾霞,赵晓楠..基于维基百科和网页相似度分析的主题爬行策略[J].现代电子技术,2014,(20):35-37,3.

现代电子技术

OA北大核心CSTPCD

1004-373X

访问量0
|
下载量0
段落导航相关论文