|国家科技期刊平台
首页|期刊导航|计算机科学与探索|SMOTE类算法研究综述

SMOTE类算法研究综述OA北大核心CSTPCD

Survey of Research on SMOTE Type Algorithms

中文摘要英文摘要

合成少数类过采样技术(SMOTE)因能有效处理少数类样本已成为处理不平衡数据的主流方法之一,而且许多SMOTE改进算法已被提出,但目前已有的调研极少考虑到流行的算法级改进方法.因此对现有SMOTE类算法进行更全面的分析与总结.首先详细阐述了SMOTE方法的基本原理,然后主要从数据级、算法级两个层面系统性地梳理分析SMOTE类算法,并介绍数据级和算法级混合改进的新思路.数据级改进是在预处理时通过不同操作删除或添加数据来平衡数据分布;算法级改进不会改变数据分布,主要通过修改或创建算法来加强对少数类样本的关注度.二者相比,数据级方法应用受限更少,算法级改进的算法鲁棒性普遍更高.为了更全面地提供SMOTE类算法的基础研究材料,最后列出常用数据集、评价指标,给出未来可能尝试进行的研究思路,以更好地应对不平衡数据问题.

Synthetic minority oversampling technique(SMOTE)has become one of the mainstream methods for dealing with unbalanced data due to its ability to effectively deal with minority samples,and many SMOTE im-provement algorithms have been proposed,but very little research existing considers popular algorithmic-level im-provement methods.Therefore a more comprehensive analysis of existing SMOTE class algorithms is provided.Firstly,the basic principles of the SMOTE method are elaborated in detail,and then the SMOTE class algorithms are systematically analyzed mainly from the two levels of data level and algorithmic level,and the new ideas of the hybrid improvement of data level and algorithmic level are introduced.Data-level improvement is to balance the data distribution by deleting or adding data through different operations during preprocessing;algorithmic-level improve-ment will not change the data distribution,and mainly strengthens the focus on minority samples by modifying or creating algorithms.Comparison between these two kinds of methods shows that,data-level methods are less re-stricted in their application,and algorithmic-level improvements generally have higher algorithmic robustness.In order to provide more comprehensive basic research material on SMOTE class algorithms,this paper finally lists the com-monly used datasets,evaluation metrics,and gives ideas of research in the future to better cope with unbalanced data problem.

王晓霞;李雷孝;林浩

内蒙古工业大学 数据科学与应用学院,呼和浩特 010080内蒙古工业大学 数据科学与应用学院,呼和浩特 010080||内蒙古自治区基于大数据的软件服务工程技术研究中心,呼和浩特 010080天津理工大学 计算机科学与工程学院,天津 300384

计算机与自动化

不平衡数据合成少数类过采样技术(SMOTE)过采样监督学习

unbalanced datasynthetic minority oversampling technique(SMOTE)oversamplingsupervised learning

《计算机科学与探索》 2024 (005)

1135-1159 / 25

国家自然科学基金(62362055);内蒙古自治区重点研发与成果转化计划项目(2022YFSJ0013,2023YFHH0052);内蒙古自治区高等学校青年科技英才支持计划项目(NJYT22084);内蒙古自然科学基金(2023MS06008);内蒙古自治区科技成果转化专项资金项目(2020CG0073,2021CG0033). This work was supported by the National Natural Science Foundation of China(62362055),the Key Research and Development and Achievement Transformation Program of Inner Mongolia Autonomous Region(2022YFSJ0013,2023YFHH0052),the Support Pro-gram for Young Scientific and Technological Talents in Higher Education Institutions in Inner Mongolia Autonomous Region(NJYT22084),the Natural Science Foundation of Inner Mongolia(2023MS06008),and the Special Funds for Transformation of Scien-tific and Technological Achievements of Inner Mongolia Autonomous Region(2020CG0073,2021CG0033).

10.3778/j.issn.1673-9418.2309079

评论