计算机应用研究2011,Vol.28Issue(12):4532-4534,3.DOI:10.3969/j.issn.1001-3695.2011.12.035
不平衡数据集上的文本分类特征选择新方法
New feature selection approach for imbalanced text classification
摘要
Abstract
Handing unbalanced data sets in text classification, the traditional feature selection approach more likely tends to large categories and neglects sub-categories. To tackle this problem, this paper proposed a new feature selection approach IPR. This approach considered the distribution property of feature between the positive class and negative class, combined four measure indicators for features with categories distinguishing ability, this approach had solved the problem which traditional fea-ture selection was not adaptive to unbalanced data set and improving the recognition rate of sub-categories,but hadn' t reduced performance of the large categories. Experimental result shows that it is an effective and feasible feature selection approach.关键词
不平衡数据集/文本分类/特征选择/正类/负类Key words
unbalanced data sets/ text classification/ feature selection/ positive class/ negative class/分类
信息技术与安全科学引用本文复制引用
张玉芳,王勇,熊忠阳,刘明..不平衡数据集上的文本分类特征选择新方法[J].计算机应用研究,2011,28(12):4532-4534,3.基金项目
中央高校研究生创新基金资助项目(CDJXS11180013) (CDJXS11180013)