重庆工商大学学报(自然科学版)2025,Vol.42Issue(3):34-43,10.DOI:10.16055/j.issn.1672-058X.2025.0003.005
面向非平衡数据流的重采样集成分类方法研究
Research on Resampling Ensemble Classification Method for Imbalanced Data Streams
摘要
Abstract
Objective Class imbalance and concept drift are two main challenges in data stream classification tasks.When they occur simultaneously,they significantly affect the performance of data stream classification algorithms.Therefore,to address the difficulty of traditional data stream classification algorithms in handling the simultaneous occurrence of class imbalance and concept drift,a resampling ensemble model focused on imbalanced data streams was proposed.Methods Firstly,a boundary oversampling method tailored for data streams was designed.By leveraging the characteristics of the triangular center of gravity,new samples were synthesized inside boundary samples to enhance the minority class within the block,while striving to maintain the original data distribution and avoid introducing new concepts.This effectively improved the class imbalance in the data block.On this basis,a dynamic weighted ensemble model based on Matthews correlation coefficient as weights was designed by integrating the time decay strategy and weighted ensemble strategy.This model solved the problem of concept drift and enhanced the adaptability and robustness of the classification mining model.Results Simulation experiments on three real data streams and six simulated data streams demonstrated that the proposed method exhibited efficient identification capabilities for both majority and minority classes in imbalanced data stream scenarios,as well as better drift perception and adaptation capabilities for sudden and incremental concept drifts.The overall performance of the classification model was superior to the comparison algorithms.Conclusion The experiments verify that the proposed method constructs a robust classification model for imbalanced data streams,which has better advantages in dealing with imbalanced data streams and adapting to two types of concept drift.关键词
非平衡数据流/概念漂移/集成学习/马修斯相关系数Key words
imbalanced data stream/concept drift/ensemble learning/Matthews correlation coefficient分类
计算机与自动化引用本文复制引用
章涂义,刘三民,陈燕菲,余文韬,朱健..面向非平衡数据流的重采样集成分类方法研究[J].重庆工商大学学报(自然科学版),2025,42(3):34-43,10.基金项目
安徽省自然科学基金项目(2308085MF220) (2308085MF220)
安徽省高校自然科学研究重点项目(2022AH050972,KJ2021A0516). (2022AH050972,KJ2021A0516)