|国家科技期刊平台
首页|期刊导航|计算机应用研究|基于专家反馈的广义孤立森林异常检测算法

基于专家反馈的广义孤立森林异常检测算法OACSTPCD

Generalized isolation forest anomaly detection algorithm based on expert feedback

中文摘要英文摘要

针对孤立森林算法无法检测与轴平行的局部异常点以及树结构无法动态更新等问题,提出了一种基于专家反馈的广义孤立森林异常检测算法.首先,将数据映射在单位特征向量上,从映射区域内选择分割点划分数据空间,重复此操作构造出一棵广义孤立树;然后,给广义孤立森林中每棵树的叶节点引入权重,综合考虑子空间划分次数和子空间内样本数量对数据异常分数的影响;最后,计算每个数据的加权异常分数,并选择异常分数较大的数据交由专家进行批量标注,算法根据标注结果更新叶节点权重,从而实现树结构的动态调整.实验结果表明,该算法在7个数据集中专家标注真实异常的数量优于其他同类树结构算法,并在12个数据集中平均准确率比孤立森林、扩展孤立森林和广义孤立森林分别提升了38.952%、49.144%和49.144%.

Aiming at the problem that the isolation forest algorithm cannot detect local anomalies parallel to the axes and the tree structure is unable to be dynamically updated,this paper proposed a generalized isolation forest anomaly detection algo-rithm based on expert feedback.Firstly,it projected the data to the sampled normal unit vector,and selected a split point from the mapping area to divide the data space,then repeated these operations until constructed a generalized isolation tree.Second-ly,it introduced the weights of the leaf nodes of each tree in the generalized isolation forest,which comprehensively considered the influence of the number of subspace partitions and the sample size in the subspace on anomaly scores.Finally,it calculated the weighted anomaly scores of each data,and submitted data with high anomaly scores to expert for batch labeling,then the al-gorithm updated the weights of the leaf nodes according to the labeling results,so as to dynamically adjust the structure of the generalized isolation tree.The experimental results show that the numbers of real abnormal data are marked by expert in 7 data-sets are better than that of the other tree-based anomaly detection algorithms,and the average precision in 12 datasets are 38.952%,49.144%and 49.144%higher than isolation forest,extended isolation forest,generalized isolation forest,respectively.

祝诚勇;黄鹏翔;李理敏

温州大学电气与电子工程学院,浙江温州 325035

计算机与自动化

异常检测孤立森林动态更新专家反馈

anomaly detectionisolation forestdynamic updateexpert feedback

《计算机应用研究》 2024 (001)

88-93 / 6

国家自然科学基金面上项目(61972288);浙江省教育厅科研项目(Y202146796);温州市重大科技创新攻关项目(ZG2021029)

10.19734/j.issn.1001-3695.2023.05.0182

评论