自动化学报2024,Vol.50Issue(7):1389-1401,13.DOI:10.16383/j.aas.c211246
非平衡数据流在线主动学习方法
Online Active Learning Method for Imbalanced Data Stream
摘要
Abstract
Data stream classification is an important research task in the field of data stream mining,which aims to capture changing class structures from the ever-changing massive data.At present,almost no frameworks can sim-ultaneously address the common problems in data stream,such as multi-class imbalance,concept drift,outlier and the exorbitant costs associated with labeling the unlabeled samples.In this paper,we propose an online active learning method for imbalanced data stream(OALM-IDS).AdaBoost is an ensemble classification method that iteratively generates a strong classifier from multiple weak classifiers.AdaBoost.M2 further introduces the confid-ence degree of weak classifiers,which is suitable for static data.In the method,we firstly define an importance measure of training sample based on imbalanced ratio and adaptive forgetting factor,which makes the AdaBoost.M2 method applying for imbalanced data stream and improves the performance of ensemble classifier.Then,we propose an adaptive adjustment method of marginal threshold matrix,which optimizes the label request strategy.Finally,we define an adaptive forgetting factor based on the concept drift index by bringing the degree of concept drift into the construction process of model,which realizes the model reconstruction after drift.Comparat-ive experiments on six artificial data streams and four real data streams show that the classification performance of the online active learning method is better than those of the existing five learning methods for imbalance data stream.关键词
主动学习/数据流分类/多类非平衡/概念漂移Key words
Active learning/data stream classification/multi-class imbalance/concept drift引用本文复制引用
李艳红,任霖,王素格,李德玉..非平衡数据流在线主动学习方法[J].自动化学报,2024,50(7):1389-1401,13.基金项目
国家自然科学基金(62076158,62072294,41871286),山西省重点研发计划(201903D421041)资助Supported by National Natural Science Foundation of China(62076158,62072294,41871286)and Shanxi Key Research and Development Program(201903D421041) (62076158,62072294,41871286)