计算机应用研究2018,Vol.35Issue(4):992-995,1000,5.DOI:10.3969/j.issn.1001-3695.2018.04.007
一种基于距离和采样机制的数据流分类方法
Data streams classification approach based on distance and sampling
摘要
Abstract
Data stream classification is widely used in sensor networks,network monitoring and other real-world applications.However,the problem of class imbalance and label missing in data stream greatly aggravates the difficulty of data stream classification.Therefore,this paper proposed an ensemble classification method based on distance evaluation and sampling to solve the problem of incomplete labeled data stream classification with imbalanced class distribution.The proposed method first calculated the distance between the unlabeled data and the center point of the labeled data chunks to partition the positive and negative instances.Secondly,in order to balance the class distribution of the current data chunk,the data chunk was reconstructed by over-sampling positive instances and under-sampling negative instances,and then it was used to build an ensemble classification model.Experiments on the simulated incomplete labeled data stream with class imbalance show that the proposed method can improve the classification accuracy while reducing the influence of imbalanced class distribution as compared with the classical similar algorithm.关键词
分类/集成学习/类分布不平衡/类标签缺失Key words
classification/ensemble learning/class imbalance/label missing分类
信息技术与安全科学引用本文复制引用
胡学钢,何俊宏,李培培..一种基于距离和采样机制的数据流分类方法[J].计算机应用研究,2018,35(4):992-995,1000,5.基金项目
国家重点研发计划项目(2016YFC0801406) (2016YFC0801406)
国家自然科学基金青年基金资助项目(61503112) (61503112)
国家自然科学基金资助项目(61673152) (61673152)