|国家科技期刊平台
首页|期刊导航|自动化学报|非平衡数据流在线主动学习方法

非平衡数据流在线主动学习方法OA北大核心CSTPCD

Online Active Learning Method for Imbalanced Data Stream

中文摘要英文摘要

数据流分类是数据流挖掘领域一项重要研究任务,目标是从不断变化的海量数据中捕获变化的类结构.目前,几乎没有框架可以同时处理数据流中常见的多类非平衡、概念漂移、异常点和标记样本成本高昂问题.基于此,提出一种非平衡数据流在线主动学习方法(Online active learning method for imbalanced data stream,OALM-IDS).AdaBoost 是一种将多个弱分类器经过迭代生成强分类器的集成分类方法,AdaBoost.M2引入了弱分类器的置信度,此类方法常用于静态数据.定义了基于非平衡比率和自适应遗忘因子的训练样本重要性度量,从而使AdaBoost.M2方法适用于非平衡数据流,提升了非平衡数据流集成分类器的性能.提出了边际阈值矩阵的自适应调整方法,优化了标签请求策略.将概念漂移程度融入模型构建过程中,定义了基于概念漂移指数的自适应遗忘因子,实现了漂移后的模型重构.在6个人工数据流和4个真实数据流上的对比实验表明,提出的非平衡数据流在线主动学习方法的分类性能优于其他5种非平衡数据流学习方法.

Data stream classification is an important research task in the field of data stream mining,which aims to capture changing class structures from the ever-changing massive data.At present,almost no frameworks can sim-ultaneously address the common problems in data stream,such as multi-class imbalance,concept drift,outlier and the exorbitant costs associated with labeling the unlabeled samples.In this paper,we propose an online active learning method for imbalanced data stream(OALM-IDS).AdaBoost is an ensemble classification method that iteratively generates a strong classifier from multiple weak classifiers.AdaBoost.M2 further introduces the confid-ence degree of weak classifiers,which is suitable for static data.In the method,we firstly define an importance measure of training sample based on imbalanced ratio and adaptive forgetting factor,which makes the AdaBoost.M2 method applying for imbalanced data stream and improves the performance of ensemble classifier.Then,we propose an adaptive adjustment method of marginal threshold matrix,which optimizes the label request strategy.Finally,we define an adaptive forgetting factor based on the concept drift index by bringing the degree of concept drift into the construction process of model,which realizes the model reconstruction after drift.Comparat-ive experiments on six artificial data streams and four real data streams show that the classification performance of the online active learning method is better than those of the existing five learning methods for imbalance data stream.

李艳红;任霖;王素格;李德玉

山西大学计算机与信息技术学院 太原 030006||山西大学计算智能与中文信息处理教育部重点实验室 太原 030006

主动学习数据流分类多类非平衡概念漂移

Active learningdata stream classificationmulti-class imbalanceconcept drift

《自动化学报》 2024 (007)

1389-1401 / 13

国家自然科学基金(62076158,62072294,41871286),山西省重点研发计划(201903D421041)资助Supported by National Natural Science Foundation of China(62076158,62072294,41871286)and Shanxi Key Research and Development Program(201903D421041)

10.16383/j.aas.c211246

评论