计算机应用研究2017,Vol.34Issue(11):3229-3232,3254,5.DOI:10.3969/j.issn.1001-3695.2017.11.006
基于AdaBoost的类不平衡学习算法
AdaBoost-based class imbalance learning algorithm
摘要
Abstract
When dealing with unbalanced data sets,the borderline examples of the minority class are more easily misclassifled.To reduce the impact of class imbalanced about the performance of classifier,this paper presented an adaptive borderlineSMOTE (AB-SMOTE) algorithm.AB-SMOTE algorithm sampled the boundary samples of the minority adaptively,which improved the degree of balance and efficiency of the data sets.At the same time,the AB-SMOTE algorithm was combined with the data cleaning technology to form a new ensemble algorithm ABTAdaBoost based on AdaBoost.ABTAdaBoost algorithm consisted of three stages.In the first stage,the training data sets adopted AB-SMOTE algorithm to reduce the degree of imbalance of data sets;in the second stage,it used Tomek links data cleaning techniques to remove the noise and overlapping instances which were introduced from sampling methods in the data sets,the availability of data was improved at the same time;in the third stage,it used the AdaBoost algorithm to generate an ensemble classifier based on N weak classifier.Experiment used J48 decision tree and naive Bayes as the base classifier respectively.The results show that ABTAdaBoost algorithm has the best overall performance compared with other algorithms in 12 UCI data sets.关键词
机器学习/类不平衡学习/集成学习/SMOTE/数据清理技术Key words
machine learning/class imbalance learning/ensemble learning/SMOTE/data cleaning techniques分类
信息技术与安全科学引用本文复制引用
秦孟梅,邱建林,陆鹏程,陈璐璐,赵伟康..基于AdaBoost的类不平衡学习算法[J].计算机应用研究,2017,34(11):3229-3232,3254,5.基金项目
国家自然科学基金资助项目(NSF61202006/61272424) (NSF61202006/61272424)
计算机软件新技术国家重点实验室开放课题(KFKT2012B29) (KFKT2012B29)
江苏省自然科学基金资助项目(BK2010277) (BK2010277)
江苏省科技创新基金资助项目(BC2013167) (BC2013167)