计算机技术与发展2025,Vol.35Issue(10):18-27,10.DOI:10.20165/j.cnki.ISSN1673-629X.2025.0127
结合模糊聚类和集成学习的不平衡数据过采样方法
Oversampling Method for Imbalanced Data Based on Fuzzy Clustering and Ensemble Learning
摘要
Abstract
At present,the processing methods of imbalanced data mainly focus on solving the problem of class distribution imbalance and usually adopt resampling methods to construct a more balanced dataset.However,compared with class distribution imbalance,the problem of inter-class overlap has a greater adverse impact on the classification performance of imbalanced data.Therefore,addressing at the issues of intra-class imbalance and inter-class overlap in imbalanced datasets,an imbalanced data oversampling method FCEL based on fuzzy clustering and ensemble learning is proposed.At the data level,firstly,SMOTE oversampling is used to synthesize new samples.Then,soft clustering and adaptive threshold are employed to partition the data space into regions.Subsequently,the partitioned regions are resampled to generate two sampling subsets.At the algorithm level,firstly,corresponding ensemble models are constructed based on the different sampling subsets.Furthermore,a model selection algorithm is applied to assign suitable models to each sample according to its distribution.Comparative experiments conducted on 9 imbalanced datasets.The experimental results show that compared with some existing typical methods,the average values of the four indicators Recall,F1,G-mean,and AUC of the FCEL method are increased by at least 17.67 percentage points,0.09 percentage points,7.25 percentage points,and1.21 percentage points,and at most30.29 percentage points,4.62 percentage points,17.25 percentage points,and4.35 percentage points,indicating that the proposed method can effectively improve the classification accuracy of minority class samples.关键词
不平衡数据分类/类别重叠/过采样/软聚类/集成学习Key words
imbalanced data classification/class overlap/oversampling/soft clustering/ensemble learning分类
信息技术与安全科学引用本文复制引用
李金,王彪..结合模糊聚类和集成学习的不平衡数据过采样方法[J].计算机技术与发展,2025,35(10):18-27,10.基金项目
国家自然科学基金(11801436) (11801436)
陕西省自然科学基金(2019JQ-346,2025JC-YBMS-090) (2019JQ-346,2025JC-YBMS-090)