数据采集与处理2017,Vol.32Issue(5):997-1004,8.DOI:10.16337/j.1004-9037.2017.05.017
基于聚类和核密度估计假设检验的异常值检测方法
Outlier Detection Based on Clustering and KDE Hypothesis Testing
摘要
Abstract
Outlier detection is the core problem in data mining and is widely used in industrial production.Accurate and efficient outlier detection method can reflect the condition of industrial system in time,which provides reference for the relevant personnel.Traditional outlier detection algorithms can't efficiently detect outliers in those data with complicated change modes,small change range and the characteristics of streaming data.In this paper a new method for detecting outliers is proposed.Firstly,the data are clustered into several categories by clustering.The data in the same categories share the common characteristics.In this way,we believe that the data in the same categories are under the same distribution which are simpler to fit than the whole data.So the original complex data distribution can be factored into several simple distributions.Secondly,kernel density estimation (KDE) hypothesis testing is used for abnormal value detection.Experiments in the UCI dataset and real industrial data show that the proposed method is more efficient than traditional methods.关键词
异常值检测/聚类/假设检验/核密度估计Key words
outlier detection/clustering/hypothesis testing/kernel density estimation分类
信息技术与安全科学引用本文复制引用
周春蕾,田品卓,杨晨琛,王皓..基于聚类和核密度估计假设检验的异常值检测方法[J].数据采集与处理,2017,32(5):997-1004,8.基金项目
国家自然科学基金(61503178)资助项目 (61503178)
江苏省自然科学基金(BK20150587)资助项目. (BK20150587)