| 注册
首页|期刊导航|计算机工程与应用|利用微簇动态欠采样的不平衡数据集成分类算法

利用微簇动态欠采样的不平衡数据集成分类算法

孟东霞 姚怡帆 杨旌

计算机工程与应用2026,Vol.62Issue(6):110-121,12.
计算机工程与应用2026,Vol.62Issue(6):110-121,12.DOI:10.3778/j.issn.1002-8331.2508-0339

利用微簇动态欠采样的不平衡数据集成分类算法

Ensemble Classification Algorithm for Imbalanced Data with Dynamic Undersampling Based on Micro-Clusters

孟东霞 1姚怡帆 1杨旌1

作者信息

  • 1. 河北金融学院 金融科技学院,河北 保定 071051
  • 折叠

摘要

Abstract

Ensemble undersampling serves as an effective approach to address class imbalance.However,the perfor-mance of some methods is often compromised due to the loss of key information from the majority class or the disruption of its internal distribution structure.To fully retain the distribution structure of the majority class and enhance the classifi-cation accuracy of the minority class,a Boosting-based ensemble algorithm for imbalanced data is proposed,which dynami-cally samples from majority class micro-clusters.The algorithm constructs majority class micro-clusters using natural nearest neighbors,adaptively determines the sampling size for each micro-cluster based on the sample distribution,and then selects majority class samples according to sampling weights.These selected samples are combined with minority class samples to train an initial base classifier.During subsequent Boosting iterations,the algorithm updates the micro-cluster sampling configuration using classification results from the preceding round.This process reselects majority samples to train new base classifiers,which are finally aggregated into a strong classifier through weighted integration.Comparative experiments conducted on 22 datasets against four classical undersampling methods(Random Undersampling,Cluster Centroids,etc.)and eight mainstream ensemble undersampling methods(Self-paced Ensemble,CusBoost,Equalization Ensemble,etc.)demonstrate that the proposed method achieves superior performance in terms of F1-score,G-mean,and AUC.

关键词

不平衡数据/自然最近邻/微簇/欠采样/集成学习

Key words

imbalanced data/natural nearest neighbor/micro-cluster/undersampling/ensemble learning

分类

信息技术与安全科学

引用本文复制引用

孟东霞,姚怡帆,杨旌..利用微簇动态欠采样的不平衡数据集成分类算法[J].计算机工程与应用,2026,62(6):110-121,12.

基金项目

河北省金融科技应用重点实验室开放课题(2024005) (2024005)

河北省教育厅青年基金(QN2024200). (QN2024200)

计算机工程与应用

1002-8331

访问量0
|
下载量0
段落导航相关论文