首页|期刊导航|计算机工程与应用|一种基于过抽样技术的非平衡数据集分类方法

一种基于过抽样技术的非平衡数据集分类方法

王春玉苏宏业渠瑜褚健

计算机工程与应用2011，Vol.47Issue(1)：139-143,5.

计算机工程与应用2011，Vol.47Issue(1)：139-143,5.DOI:10.3778/j.issn.1002-8331.2011.01.038

一种基于过抽样技术的非平衡数据集分类方法

Imbalanced data sets classification method based on over-sampling technique.

王春玉 ¹苏宏业 ¹渠瑜 ¹褚健¹

作者信息

1. 浙江大学,智能系统与控制研究所,工业控制技术国家重点实验室,杭州,310027
折叠

摘要

Abstract

Classification of data with imbalanced class distribution is a research focus on machine learning. In order to resolve the imbalanced problems, especially those of the poor predictive accuracy over the minority class, this paper presents an improved approach,AdaBoost-SVM-OBMS,which is based on a combination of Boosting,an ensemble-based learning algorithm, and an improved over-sampling method based on misclassified samples. In this approach, using support vector machine as base classifier,the misclassified samples are identified during each iteration. Subsequently, they are used to separately generate new samples for the majority and minority classes. The new samples are then added to the original training set to retrain the classification model,which is used to improve the prediction of hard samples. This method is evaluated, in terms of the AUC,F-value,and G-mean, on eight imbalanced data sets.Results indicate that the improved approach produces high prediction in imbalanced data sets.

关键词

数据挖掘/非平衡数据集/Boosting/错分样本/支持向量机

Key words

data ming/ imbalanced data sets/ Boosting/ misclassified samples/ support vector machine

分类

信息技术与安全科学

引用本文复制引用

王春玉,苏宏业,渠瑜,褚健..一种基于过抽样技术的非平衡数据集分类方法[J].计算机工程与应用,2011,47(1):139-143,5.

基金项目

国家高技术研究发展计划(863)(the National High-Tech Research and Development Plan of China under Grant No.2008AA042902,No.2009AA04Z162) （863）

高等学校学科创新引智(111)计划资助(the 111 Project under Grant No.B07031). （111）

计算机工程与应用

OACSCDCSTPCD

ISSN：1002-8331

访问量0

下载量0

段落导航