计算机工程与应用2013,Vol.49Issue(2):184-187,245,5.DOI:10.3778/j.issn.1002-8331.1109-0145
基于改进SMOTE的非平衡数据集分类研究
Research on classification for imbalanced dataset based on improved SMOTE
摘要
Abstract
Based on analyzing the shortages of SMOTE(Synthetic Minority Over-sampling Technique), an improved SMOTE (SSMOTE) is presented. The key of SSMOTE lies on leading the concept of support and roulette wheel selection into SMOTE and making full use of the heterogeneous nearest-neighbor distribution information to achieve the fine control of the synthesis quality and quantity to the minority class samples. SSMOTE and KNN(A>Nearest Neighbor) are combined to handle the classification problem on imbalanced datasets, and extensive experiments are conducted to compare SSMOTE and algorithms in pertinent literatures on the UCI datasets. The simulation results show SSMOTE promises prominent synthesis effect to the minority class samples, and brings better classification performance on imbalanced datasets with KHH.关键词
非平衡数据集/分类/支持度/轮盘赌选择/合成少数过采样技术(SMOTE)Key words
imbalanced datasets/ classification/ support/ roulette wheel selection/ Synthetic Minority Over-sampling Technique (SMOTE)分类
信息技术与安全科学引用本文复制引用
王超学,潘正茂,董丽丽,马春森,张星..基于改进SMOTE的非平衡数据集分类研究[J].计算机工程与应用,2013,49(2):184-187,245,5.基金项目
国家自然科学基金(No.31170393) (No.31170393)
陕西省教育厅自然科学项目(No.2010JK620). (No.2010JK620)