计算机工程与应用2019,Vol.55Issue(17):68-75,8.DOI:10.3778/j.issn.1002-8331.1804-0307
基于混合采样的不平衡数据集算法研究
Imbalanced Data Processing Algorithm Based on Mixed Sampling
摘要
Abstract
Aiming to solve the poor performance of imbalanced datasets classification, a novel imbalanced datasets classification algorithm based on mixed sampling(BSI)is proposed. This method firstly introduces coefficient of variation to find out the sparse domain and dense domain samples, and then deals with them in different ways, an oversampling method(BSMOTE)is proposed to improve the SMOTE algorithm for the minority samples in sparse domain. An improved undersampling method(IS)is proposed for the majority samples in dense domain. Finally, experiments on six imbalanced datasets show that the algorithm achieves higher G-mean value, F-value value, AUC value, and improves the comprehen-sive performance of imbalanced datasets classification effectively.关键词
不平衡数据集/变异系数/SMOTE算法/欠采样Key words
imbalanced datasets/coefficient of variation/SMOTE algorithm/undersampling分类
信息技术与安全科学引用本文复制引用
张明,胡晓辉,吴嘉昕..基于混合采样的不平衡数据集算法研究[J].计算机工程与应用,2019,55(17):68-75,8.基金项目
国家自然科学基金(No.61163009) (No.61163009)
甘肃省科技计划(No.144NKCA040). (No.144NKCA040)