计算机与数字工程2012,Vol.40Issue(7):6-8,3.
基于类间分散度和类内集中度的文本特征选择
Feature Selection Based on Dispersion Degree and Concentration Degree
摘要
Abstract
Feature selection is one of the key steps in text categorization, selected feature subset directly influences results of text categorization. Firstly, two kinds of feature influence degree were defined: one was the dispersion degree amongst categories, the influence degree that was larger was better. Another was the concentration degre in category, the influence degree that was larger was better. And then, the two kinds of influence degree were integrated organically and a new feature selection method was designed. The method can inspect selected feature synthetically so that the feature set that is more representative is obtained. Simulation experiments show that, to a certain extent, the feature selection method is able to improve performance of text categorization.关键词
特征选择/文本分类/类间分散度/类内集中度Key words
feature selection, text categorization, dispersion degree, concentration degree分类
信息技术与安全科学引用本文复制引用
陈炎龙,段红玉..基于类间分散度和类内集中度的文本特征选择[J].计算机与数字工程,2012,40(7):6-8,3.基金项目
河南省基础与前沿技术研究计划项目(编号:102300410266)资助. (编号:102300410266)