| 注册
首页|期刊导航|计算机应用研究|不平衡数据集上的文本分类特征选择新方法

不平衡数据集上的文本分类特征选择新方法

张玉芳 王勇 熊忠阳 刘明

计算机应用研究2011,Vol.28Issue(12):4532-4534,3.
计算机应用研究2011,Vol.28Issue(12):4532-4534,3.DOI:10.3969/j.issn.1001-3695.2011.12.035

不平衡数据集上的文本分类特征选择新方法

New feature selection approach for imbalanced text classification

张玉芳 1王勇 1熊忠阳 1刘明1

作者信息

  • 1. 重庆大学计算机学院,重庆400044
  • 折叠

摘要

Abstract

Handing unbalanced data sets in text classification, the traditional feature selection approach more likely tends to large categories and neglects sub-categories. To tackle this problem, this paper proposed a new feature selection approach IPR. This approach considered the distribution property of feature between the positive class and negative class, combined four measure indicators for features with categories distinguishing ability, this approach had solved the problem which traditional fea-ture selection was not adaptive to unbalanced data set and improving the recognition rate of sub-categories,but hadn' t reduced performance of the large categories. Experimental result shows that it is an effective and feasible feature selection approach.

关键词

不平衡数据集/文本分类/特征选择/正类/负类

Key words

unbalanced data sets/ text classification/ feature selection/ positive class/ negative class/

分类

信息技术与安全科学

引用本文复制引用

张玉芳,王勇,熊忠阳,刘明..不平衡数据集上的文本分类特征选择新方法[J].计算机应用研究,2011,28(12):4532-4534,3.

基金项目

中央高校研究生创新基金资助项目(CDJXS11180013) (CDJXS11180013)

计算机应用研究

OA北大核心CSCDCSTPCD

1001-3695

访问量0
|
下载量0
段落导航相关论文