计算机工程与应用Issue(10):141-146,6.DOI:10.3778/j.issn.1002-8331.1207-0328
基于词频差异特征选取的Context Graph算法改进
Improved context graph algorithm by using feature selection based on word fre- quency differentia
张永 1吴崇正1
作者信息
- 1. 兰州理工大学 计算机与通信学院,兰州 730050
- 折叠
摘要
Abstract
In order to solve the low efficiency problem of traditional focused crawler, the heuristic web crawler search algorithm Context Graph is analyzed. However, Context Graph method is deficient. An optimization strategy is proposed by adopting the improved TF-IDF and feature selection method based on word frequency differentia, which takes impor-tance of different web textual content into consideration synthetically. A new method of term weighting is explicated in text categorization which considers feature words among and inside class. Compared with the other given algorithms, experimental results indicate that this strategy is more efficient in crawling the topic pages.关键词
主题爬虫/Context Graph模型/搜索策略/特征选取/TF-IDFKey words
focused crawler/Context Graph/search strategy/feature selection/TF-IDF分类
信息技术与安全科学引用本文复制引用
张永,吴崇正..基于词频差异特征选取的Context Graph算法改进[J].计算机工程与应用,2014,(10):141-146,6.