| 注册
首页|期刊导航|计算机工程与应用|基于词频差异特征选取的Context Graph算法改进

基于词频差异特征选取的Context Graph算法改进

张永 吴崇正

计算机工程与应用Issue(10):141-146,6.
计算机工程与应用Issue(10):141-146,6.DOI:10.3778/j.issn.1002-8331.1207-0328

基于词频差异特征选取的Context Graph算法改进

Improved context graph algorithm by using feature selection based on word fre- quency differentia

张永 1吴崇正1

作者信息

  • 1. 兰州理工大学 计算机与通信学院,兰州 730050
  • 折叠

摘要

Abstract

In order to solve the low efficiency problem of traditional focused crawler, the heuristic web crawler search algorithm Context Graph is analyzed. However, Context Graph method is deficient. An optimization strategy is proposed by adopting the improved TF-IDF and feature selection method based on word frequency differentia, which takes impor-tance of different web textual content into consideration synthetically. A new method of term weighting is explicated in text categorization which considers feature words among and inside class. Compared with the other given algorithms, experimental results indicate that this strategy is more efficient in crawling the topic pages.

关键词

主题爬虫/Context Graph模型/搜索策略/特征选取/TF-IDF

Key words

focused crawler/Context Graph/search strategy/feature selection/TF-IDF

分类

信息技术与安全科学

引用本文复制引用

张永,吴崇正..基于词频差异特征选取的Context Graph算法改进[J].计算机工程与应用,2014,(10):141-146,6.

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文