计算机工程与应用2011,Vol.47Issue(1):139-143,5.DOI:10.3778/j.issn.1002-8331.2011.01.038
一种基于过抽样技术的非平衡数据集分类方法
Imbalanced data sets classification method based on over-sampling technique.
摘要
Abstract
Classification of data with imbalanced class distribution is a research focus on machine learning. In order to resolve the imbalanced problems, especially those of the poor predictive accuracy over the minority class, this paper presents an improved approach,AdaBoost-SVM-OBMS,which is based on a combination of Boosting,an ensemble-based learning algorithm, and an improved over-sampling method based on misclassified samples. In this approach, using support vector machine as base classifier,the misclassified samples are identified during each iteration. Subsequently, they are used to separately generate new samples for the majority and minority classes. The new samples are then added to the original training set to retrain the classification model,which is used to improve the prediction of hard samples. This method is evaluated, in terms of the AUC,F-value,and G-mean, on eight imbalanced data sets.Results indicate that the improved approach produces high prediction in imbalanced data sets.关键词
数据挖掘/非平衡数据集/Boosting/错分样本/支持向量机Key words
data ming/ imbalanced data sets/ Boosting/ misclassified samples/ support vector machine分类
信息技术与安全科学引用本文复制引用
王春玉,苏宏业,渠瑜,褚健..一种基于过抽样技术的非平衡数据集分类方法[J].计算机工程与应用,2011,47(1):139-143,5.基金项目
国家高技术研究发展计划(863)(the National High-Tech Research and Development Plan of China under Grant No.2008AA042902,No.2009AA04Z162) (863)
高等学校学科创新引智(111)计划资助(the 111 Project under Grant No.B07031). (111)