计算机技术与发展2012,Vol.22Issue(8):48-52,5.
基于遗传算法的主题爬虫
Focused Crawling Based on Genetic Algorithms
张海亮 1袁道华1
作者信息
- 1. 四川大学计算机学院,四川 成都 610065
- 折叠
摘要
Abstract
Optimized solution cant be found in the global scope based on the present searching strategy of focused crawler. A focused crawler method based on genetic algorithm is proposed through the analysis and study of genetic algorithm. This method introduces the PageRank algorithm combined with text contents, computes the page topic similarity with vector space model algorithm , and judges the importance of web page according to web link structure and topic similarity. At the same time, the genetic factors are selected on basis of the importance of web page. The system sets fitness function to select pages relevant with topic. Compared to focused crawler , the topic crawler based on genetic algorithms could obtain the web pages which have strong correlation with subjects, and improve the importance of access web pages, and satisfy user' s demand for searching topic webs they' re interested in. So in a certain extent, the above problems are solved.关键词
遗传算法/爬虫/主题爬虫/主题相关度/网页重要性Key words
genetic algorithm/ crawler/ focused crawler/ topic similarity/ web importance分类
信息技术与安全科学引用本文复制引用
张海亮,袁道华..基于遗传算法的主题爬虫[J].计算机技术与发展,2012,22(8):48-52,5.