大数据2025,Vol.11Issue(2):107-126,20.DOI:10.11959/j.issn.2096-0271.2025018
有限标签下的非平衡数据流分类方法
Imbalanced data stream classification method with limited labels
摘要
Abstract
Data stream classification is a crucial research area within data stream mining,with the core task of swiftly capturing concept drifts from real-time incoming data stream and promptly adjusting classification models.Extreme learning machine possesses advantages such as fast training speeds and excellent generalization performance.However,existing data stream classification methods based on extreme learning machine often struggle to simultaneously address common challenges in data stream,such as multi-class imbalance,concept drift,and the expensive labeling cost.For this reason,an imbalanced data stream classification with limited labels was proposed.We defined a sample prediction certainty measure that combined the difference in predicted probabilities and information entropy.An uncertainty label request strategy was introduced.Furthermore,we defined a sample importance measure based on class imbalance ratios and sample prediction errors.We also proposed an update and reconstruction mechanism for the classifier based on the concept drift index.Comparative experiments on six synthetic data streams and three real data streams demonstrate that the proposed method outperforms six existing data stream classification methods in terms of classification performance.关键词
数据流分类/多类非平衡/极限学习机/概念漂移/标签成本昂贵Key words
data stream classification/multi-class imbalance/extreme learning machine/concept drift/expensive labeling cost分类
计算机与自动化引用本文复制引用
李艳红,李志华,郑建兴,白鹤翔,郭鑫..有限标签下的非平衡数据流分类方法[J].大数据,2025,11(2):107-126,20.基金项目
国家自然科学基金项目(No.62272286,No.41871286) (No.62272286,No.41871286)
山西省基础研究计划项目(No.202203021221001,No.202203021221021) The National Natural Science Foundation of China(No.62272286,No.41871286),The Fundamental Research Program of Shanxi Province(No.202203021221001,No.202203021221021) (No.202203021221001,No.202203021221021)