智能系统学报Issue(4):474-479,6.DOI:10.3969/j.issn.1673-4785.201305044
基于遗传算法优化综合启发式的中文网页特征提取
Chinese Web page feature extraction by optimizing comprehensive heuristisc based on GA
摘要
Abstract
Feature extraction is the basis of such technologies as information retrieval , text classification , text clus-tering and automatic summarization .Aiming at the shortcomings of the traditional feature extraction methods which make it difficult to test feature words comprehensively and effectively , this paper proposes a method for extracting Chinese web page features by optimizing the comprehensive heuristic features based on GA .This proposed method employs comprehensive heuristics of word frequency , word correlation, parts of speech (POS) and position features to comprehensively test selected features and uses GA to optimize the weight of each heuristic parameter .The exper-imental results of the different test sets show that the proposed method can effectively avoid the derivations of the traditional extraction methods and obtain more representative features , and therefore it has a certain practical value .关键词
特征提取/遗传算法/文本分类/文本聚类/词频/关联度Key words
feature extraction/GA/text classification/text clustering/word frequency/word correlation分类
信息技术与安全科学引用本文复制引用
沈高峰,谷淑敏..基于遗传算法优化综合启发式的中文网页特征提取[J].智能系统学报,2014,(4):474-479,6.基金项目
河南省基础与前沿技术研究计划项目(102300410266);郑州轻工业学院博士科研基金资助项目. ()