SMOTE类算法研究综述
Survey of Research on SMOTE Type Algorithms
摘要
Abstract
Synthetic minority oversampling technique(SMOTE)has become one of the mainstream methods for dealing with unbalanced data due to its ability to effectively deal with minority samples,and many SMOTE im-provement algorithms have been proposed,but very little research existing considers popular algorithmic-level im-provement methods.Therefore a more comprehensive analysis of existing SMOTE class algorithms is provided.Firstly,the basic principles of the SMOTE method are elaborated in detail,and then the SMOTE class algorithms are systematically analyzed mainly from the two levels of data level and algorithmic level,and the new ideas of the hybrid improvement of data level and algorithmic level are introduced.Data-level improvement is to balance the data distribution by deleting or adding data through different operations during preprocessing;algorithmic-level improve-ment will not change the data distribution,and mainly strengthens the focus on minority samples by modifying or creating algorithms.Comparison between these two kinds of methods shows that,data-level methods are less re-stricted in their application,and algorithmic-level improvements generally have higher algorithmic robustness.In order to provide more comprehensive basic research material on SMOTE class algorithms,this paper finally lists the com-monly used datasets,evaluation metrics,and gives ideas of research in the future to better cope with unbalanced data problem.关键词
不平衡数据/合成少数类过采样技术(SMOTE)/过采样/监督学习Key words
unbalanced data/synthetic minority oversampling technique(SMOTE)/oversampling/supervised learning分类
信息技术与安全科学引用本文复制引用
王晓霞,李雷孝,林浩..SMOTE类算法研究综述[J].计算机科学与探索,2024,18(5):1135-1159,25.基金项目
国家自然科学基金(62362055) (62362055)
内蒙古自治区重点研发与成果转化计划项目(2022YFSJ0013,2023YFHH0052) (2022YFSJ0013,2023YFHH0052)
内蒙古自治区高等学校青年科技英才支持计划项目(NJYT22084) (NJYT22084)
内蒙古自然科学基金(2023MS06008) (2023MS06008)
内蒙古自治区科技成果转化专项资金项目(2020CG0073,2021CG0033). This work was supported by the National Natural Science Foundation of China(62362055),the Key Research and Development and Achievement Transformation Program of Inner Mongolia Autonomous Region(2022YFSJ0013,2023YFHH0052),the Support Pro-gram for Young Scientific and Technological Talents in Higher Education Institutions in Inner Mongolia Autonomous Region(NJYT22084),the Natural Science Foundation of Inner Mongolia(2023MS06008),and the Special Funds for Transformation of Scien-tific and Technological Achievements of Inner Mongolia Autonomous Region(2020CG0073,2021CG0033). (2020CG0073,2021CG0033)