计算机技术与发展2024,Vol.34Issue(9):138-146,9.DOI:10.20165/j.cnki.ISSN1673-629X.2024.0137
基于自适应距离的离群点检测算法
Adaptive Distance Based Outlier Detection Algorithm
摘要
Abstract
Near-neighbour based outlier detection methods mine outlier points based on the neighbours around the data object,but this type of method is greatly affected by the threshold parameter and mostly performs well only in the case of a single data distribution.Aiming at the difficulty of outlier detection in the case of diverse data distribution and the sensitivity of threshold parameters,an adaptive distance-based outlier detection algorithm is proposed.Firstly,by dynamically adjusting the contribution factor of data attributes,the key attributes have more influence in outlier detection,which can accurately reflect the correlation between the key attributes and outliers.Secondly,the distance between data objects is calculated by comprehensively considering the contribution factor of attributes and the density,so as to better identify the positional relationship between data objects and the density distribution characteristics.Lastly,in order to reduce the threshold parameter's influence,the size of neighbours is gradually increased to calculate the sum of changes in adaptive distances of data objects,which is accumulated as the outlier score.The proposed algorithm is verified to have higher detection accuracy through experiments on synthetic datasets and public datasets.关键词
数据挖掘/离群点检测/属性贡献因子/密度分布/自适应距离Key words
data mining/outlier detection/attribute contribution factor/density distribution/adaptive distance分类
信息技术与安全科学引用本文复制引用
曹霞,郑爱宇,郝静..基于自适应距离的离群点检测算法[J].计算机技术与发展,2024,34(9):138-146,9.基金项目
国家自然科学基金(U1931209) (U1931209)