| 注册
首页|期刊导航|计算机工程与应用|不均衡数据集上文本分类方法研究

不均衡数据集上文本分类方法研究

谢娜娜 房斌 吴磊

计算机工程与应用Issue(20):118-121,4.
计算机工程与应用Issue(20):118-121,4.DOI:10.3778/j.issn.1002-8331.1201-0299

不均衡数据集上文本分类方法研究

Study of text categorization on imbalanced data

谢娜娜 1房斌 1吴磊1

作者信息

  • 1. 重庆大学 计算机学院,重庆 400030
  • 折叠

摘要

Abstract

Class imbalance problems are often encountered in real application of automatic text classifications. From the view of the optimistic feature selection methods and the improvement of classifiers, a new text classification method on imbalanced data set is proposed. The positive and negative correlation between items and categorizations are combined with the strength of class information in the aspect of the feature selection scheme. Then on the data layer, the imbalanced characters of the training corpus are filtered by data resampling methods in order to reduce the effect on the classification. Experimental results show that the new approach can achieve better performance.

关键词

特征选择/CHI统计/文本分类/不均衡数据集/重取样

Key words

feature selection/CHI statistical approach/text categorization/imbalanced data/resampling

分类

信息技术与安全科学

引用本文复制引用

谢娜娜,房斌,吴磊..不均衡数据集上文本分类方法研究[J].计算机工程与应用,2013,(20):118-121,4.

基金项目

国家自然科学基金(No.61173129)。 ()

计算机工程与应用

OACSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文