计算机与数字工程2012,Vol.40Issue(6):76-78,123,4.
基于朴素贝叶斯算法的主题爬虫的研究
Research on Focused Crawler Based on Naive Bayes Algorithm
皮靖 1邵雄凯 1肖雅夫1
作者信息
- 1. 湖北工业大学计算机学院 武汉430068
- 折叠
摘要
Abstract
Focused crawler is a key part of the focused search engine. This paper proposed a method of using Naive Bayes algorithm to identify topics, introduced the core part of the focused crawler, including the generation of seed URL collection, the page analysis and feature extraction and the topic identify. Compared the focused crawler based on Naive Bayes algorithm with the focused crawler base on links analysis and thesaurus, the experiment result proved that the focused crawler based on Naive Bayes algorithm has better accuracy and the method is feasible. It laid good foundation for the topic information collection.关键词
朴素贝叶斯算法/主题爬虫/主题相关度/信息采集Key words
Naive Bayes algorithm/ focused crawler/ topic correlativity/ information collection分类
信息技术与安全科学引用本文复制引用
皮靖,邵雄凯,肖雅夫..基于朴素贝叶斯算法的主题爬虫的研究[J].计算机与数字工程,2012,40(6):76-78,123,4.