数据采集与处理2017,Vol.32Issue(5):853-860,8.DOI:10.16337/j.1004-9037.2017.05.001
一种基于Tri-training的数据流集成分类算法
Data Stream Ensemble Classification Algorithm Based on Tri-training
摘要
Abstract
Data stream classification is one of important research tasks in the field of data mining.Most existing data stream classification algorithms require the labeled data for training.However,there are few labeled data in data streams in real applications.To solve this problem,the labeled data can be obtained by manual labeling,but it is very expensive and time consuming.Considering the unlabeled data are huge and full of information,a data stream ensemble classification algorithm based on Tri-training for labeled and unlabeled data is proposed in this paper.The proposed algorithm divides data stream into chunks by sliding windows and trains base classifiers with Tri-training on the first coming k chunks with labeled and unlabeled data.Then the classifiers are iteratively updated by weighted voting until all unlabeled data are labeled.Meanwhile,the k+1 data chunk is predicted by using the ensemble model of k Tri-training classifiers and the classifier with higher classification error is discarded,which reconstructs a new classifier on current data chunk to update the model.Experiments on 10 UCI data sets show that the proposed algorithm can significantly improve the classification accuracy of data stream even with 80 % unlabeled data in comparison with traditional algorithms.关键词
数据流分类/Tri-training/未标记数据/集成/加权投票Key words
data stream classification/Tri-training/unlabeled data/ensemble/weighted voting分类
信息技术与安全科学引用本文复制引用
胡学钢,马利伟,李培培..一种基于Tri-training的数据流集成分类算法[J].数据采集与处理,2017,32(5):853-860,8.基金项目
国家重点研发计划课题(2016YFC0801406)资助项目 (2016YFC0801406)
国家自然科学基金(61673152,61503112)资助项目 (61673152,61503112)
教育部博士点博导基金(20130111110011)资助项目. (20130111110011)