不平衡数据集上的文本分类特征选择新方法OA北大核心CSCDCSTPCD
New feature selection approach for imbalanced text classification
针对不平衡数据集上进行文本分类,传统的特征选择方法容易导致分类器倾向于大类而忽视小类,提出一种新的特征选择方法IPR(integrated probability ratio).该方法综合考虑特征在正类和负类中的分布性质,结合四种衡量特征类别相关性的指标对特征词进行评分,能够更好地解决传统特征选择方法在不平衡数据集上的不适应性,在不降低大类分类性能的同时提高了小类的识别率.实验结果表明,该方法有效可行.
Handing unbalanced data sets in text classification, the traditional feature selection approach more likely tends to large categories and neglects sub-categories. To tackle this problem, this paper proposed a new feature selection approach IPR. This approach considered the distribution property of feature between the positive class and negative class, combined four measure indicators for features with categories distinguishing ability, this approach had solv…查看全部>>
张玉芳;王勇;熊忠阳;刘明
重庆大学计算机学院,重庆400044重庆大学计算机学院,重庆400044重庆大学计算机学院,重庆400044重庆大学计算机学院,重庆400044
信息技术与安全科学
不平衡数据集文本分类特征选择正类负类
unbalanced data sets text classification feature selection positive class negative class
《计算机应用研究》 2011 (12)
4532-4534,3
中央高校研究生创新基金资助项目(CDJXS11180013)
评论