基于OCkNN+ENN的过采样算法研究OACSTPCD
Research on Oversampling Algorithm Based on OCkNN+ENN
类不平衡学习是机器学习领域热点问题之一.在类别不平衡学习方法中,SMOTE被认为是其中的一个基准算法.虽然SMOTE算法在绝大多数的类不平衡数据集上表现良好,但它也存在一些问题,如会产生噪声干扰和噪声传播.基于对SMOTE改进算法的研究,提出了一种更加鲁棒和通用的算法:ONE-SMOTE.研究发现:使用ENN进行数据清洗,可以很好地消除数据噪声,使用基于KNN的一类分类器(OCkNN)可以探测样本空间的相对密度分布信息,并精确定位每个样本的相对密度位置以及边界.基于样本位置信息进行过采样可以很好地保持原始样本空间的密度分布.实验结果表明:该算法能有效提高数据分类的准确性.
Class imbalance learning(CIL)is one of the hot topics in the field of machine learning(ML).Among the CIL meth-ods,SMOTE is considered as one of the benchmark algorithms.Although the SMOTE algorithm performs well on most of the class imbalance datasets,it has some problems,such as generating noise interference and noise propagation.Based on the study of SMOTE variants,a more robust and general algorithm is proposed,which is ONE-SMOTE.That method can use edited nearest neighbor(ENN)to clean data and filter noise,then use one-class(OCkNN)to detect the relative density distribution information of the sample.And the relative density position and boundary of each sample can be precisely located that will be used for oversam-pling.The experimental results show that the algorithm can effectively improve the accuracy rate of data classification.
张爱民;于化龙
江苏科技大学计算机学院 镇江 212100
计算机与自动化
类不平衡学习SMOTEENNOCkNN相对密度分布信息
class imbalance learningSMOTEENNOCkNNrelative density distribution information
《计算机与数字工程》 2024 (005)
1275-1281,1330 / 8
国家自然科学基金项目(编号:62176107);江苏省自然科学基金项目(编号:BK20191457)资助.
评论