| 注册
首页|期刊导航|数据采集与处理|一种基于Tri-training的数据流集成分类算法

一种基于Tri-training的数据流集成分类算法

胡学钢 马利伟 李培培

数据采集与处理2017,Vol.32Issue(5):853-860,8.
数据采集与处理2017,Vol.32Issue(5):853-860,8.DOI:10.16337/j.1004-9037.2017.05.001

一种基于Tri-training的数据流集成分类算法

Data Stream Ensemble Classification Algorithm Based on Tri-training

胡学钢 1马利伟 1李培培1

作者信息

  • 1. 合肥工业大学计算机与信息学院数据挖掘与智能计算实验室,合肥,230009
  • 折叠

摘要

Abstract

Data stream classification is one of important research tasks in the field of data mining.Most existing data stream classification algorithms require the labeled data for training.However,there are few labeled data in data streams in real applications.To solve this problem,the labeled data can be obtained by manual labeling,but it is very expensive and time consuming.Considering the unlabeled data are huge and full of information,a data stream ensemble classification algorithm based on Tri-training for labeled and unlabeled data is proposed in this paper.The proposed algorithm divides data stream into chunks by sliding windows and trains base classifiers with Tri-training on the first coming k chunks with labeled and unlabeled data.Then the classifiers are iteratively updated by weighted voting until all unlabeled data are labeled.Meanwhile,the k+1 data chunk is predicted by using the ensemble model of k Tri-training classifiers and the classifier with higher classification error is discarded,which reconstructs a new classifier on current data chunk to update the model.Experiments on 10 UCI data sets show that the proposed algorithm can significantly improve the classification accuracy of data stream even with 80 % unlabeled data in comparison with traditional algorithms.

关键词

数据流分类/Tri-training/未标记数据/集成/加权投票

Key words

data stream classification/Tri-training/unlabeled data/ensemble/weighted voting

分类

信息技术与安全科学

引用本文复制引用

胡学钢,马利伟,李培培..一种基于Tri-training的数据流集成分类算法[J].数据采集与处理,2017,32(5):853-860,8.

基金项目

国家重点研发计划课题(2016YFC0801406)资助项目 (2016YFC0801406)

国家自然科学基金(61673152,61503112)资助项目 (61673152,61503112)

教育部博士点博导基金(20130111110011)资助项目. (20130111110011)

数据采集与处理

OA北大核心CSCDCSTPCD

1004-9037

访问量0
|
下载量0
段落导航相关论文