| 注册
首页|期刊导航|计算机技术与发展|基于遗传算法的主题爬虫

基于遗传算法的主题爬虫

张海亮 袁道华

计算机技术与发展2012,Vol.22Issue(8):48-52,5.
计算机技术与发展2012,Vol.22Issue(8):48-52,5.

基于遗传算法的主题爬虫

Focused Crawling Based on Genetic Algorithms

张海亮 1袁道华1

作者信息

  • 1. 四川大学计算机学院,四川 成都 610065
  • 折叠

摘要

Abstract

Optimized solution cant be found in the global scope based on the present searching strategy of focused crawler. A focused crawler method based on genetic algorithm is proposed through the analysis and study of genetic algorithm. This method introduces the PageRank algorithm combined with text contents, computes the page topic similarity with vector space model algorithm , and judges the importance of web page according to web link structure and topic similarity. At the same time, the genetic factors are selected on basis of the importance of web page. The system sets fitness function to select pages relevant with topic. Compared to focused crawler , the topic crawler based on genetic algorithms could obtain the web pages which have strong correlation with subjects, and improve the importance of access web pages, and satisfy user' s demand for searching topic webs they' re interested in. So in a certain extent, the above problems are solved.

关键词

遗传算法/爬虫/主题爬虫/主题相关度/网页重要性

Key words

genetic algorithm/ crawler/ focused crawler/ topic similarity/ web importance

分类

信息技术与安全科学

引用本文复制引用

张海亮,袁道华..基于遗传算法的主题爬虫[J].计算机技术与发展,2012,22(8):48-52,5.

计算机技术与发展

OACSTPCD

1673-629X

访问量0
|
下载量0
段落导航相关论文