| 注册
首页|期刊导航|集成技术|一种基于聚类提升的不平衡数据分类算法

一种基于聚类提升的不平衡数据分类算法

胡小生 张润晶 钟勇

集成技术Issue(2):35-41,7.
集成技术Issue(2):35-41,7.

一种基于聚类提升的不平衡数据分类算法

A Clustering-Based Enhanced Classiifcation Algorithm for Imbalanced Data

胡小生 1张润晶 2钟勇1

作者信息

  • 1. 佛山科学技术学院电子与信息工程学院 佛山 528000
  • 2. 佛山科学技术学院信息与教育技术中心 佛山 528000
  • 折叠

摘要

Abstract

Imbalanced data exist widely in the real world and their classiifcation is a hot topic in the ifeld of machine learning. A clustering-based enhanced AdaBoost algorithm was proposed to improve the poor classiifcation performance produced by the traditional algorithm in classifying the minority class of imbalanced datasets. The algorithm firstly constructs balanced training sets by the clustering-based undersampling, using K-means clustering to cluster the majority class and extract cluster centroids and then merge with all minority class instances to generate a new balanced training set. To avoid the declining of the classiifcation accuracy caused by the shortage of training sets owing to too few minority class samples, SMOTE (Synthetic Minority Oversampling Technique) combining the clustering-based undersampling was used. Next, the misclassiifcation loss function in the basic classiifer of the AdaBoost algorithm was modiifed based on the cost-sensitive learning theory to assign asymmetric misclassiifcation losses to samples of different classes. The experimental results show that, the proposed algorithm makes the model training samples more representative and greatly increases the classiifcation accuracy of the minority class, keeping the overall classiifcation performance.

关键词

不平衡数据分类/K均值聚类/AdaBoost/集成学习

Key words

imbalanced data classiifcation/K-mean clustering/AdaBoost/ensemble learning

分类

信息技术与安全科学

引用本文复制引用

胡小生,张润晶,钟勇..一种基于聚类提升的不平衡数据分类算法[J].集成技术,2014,(2):35-41,7.

基金项目

广东高校优秀青年创新人才培养项目(2013LYM_0097);佛山市智能教育评价指标体系研究(DX20120220);佛山科学技术学院校级科研项目。 (2013LYM_0097)

集成技术

2095-3135

访问量0
|
下载量0
段落导航相关论文