局部标准差优化的密度峰值聚类算法OA北大核心CSTPCD
Density peak clustering algorithm optimized with local standard deviation
密度峰值聚类(clustering by fast search and find of density peaks,DPC)算法是一种基于密度的聚类算法,它可以发现任意形状和维度的类簇,是具有里程碑意义的聚类算法.然而,DPC算法的样本局部密度定义不适用于同时发现数据集的稠密簇和稀疏簇;此外,DPC算法的一步分配策略使得一旦有一个样本分配错误,将导致更多样本的错误分配,产生"多米诺骨牌效应".针对这些问题,提出一种新的样本局部密度定义,采用局部标准差指数定义样本局部密度,克服DPC的密度定义缺陷;采用两步分配策略代替DPC的一步分配策略,克服DPC的"多米诺骨牌效应",得到ESDTS-DPC算法.与 DPC及其改进算法 KNN-DPC、FKNN-DPC、DPC-CE和经典密度聚类算法DBSCAN的实验比较显示,提出的ESDTS-DPC算法具有更好的聚类准确性.
DPC(clustering by fast search and find of density peaks)algorithm is a density based clustering algorithm.It is one of the milestone clustering algorithms.It can find any arbitrary shapes of clusters embedded within any dimensional spaces.However,its local density definition of a point is not appropriate for simultaneously detecting the cluster centers of dense and sparse clusters,nor detecting the sparse and dense clusters subsequently.In addition,its one-step assignment strategy leads to a fatal problem,that is,once a point is assigned to an incorrect cluster,there are more subsequent points being assigned erroneously,resulting in the domino effect.To address the aforementioned problems,this paper redefines the local density of a point based on the local standard deviation,and proposes a two-step assignment strategy,resulting in the ESDTS-DPC algorithm.The ESDTS-DPC algorithm is compared with the original DPC and its variations including KNN-DPC,FKNN-DPC,DPC-CE and the classic density based clustering algorithm,such as DBSCAN.The extensive experiment results demonstrate superiority of the proposed ESDTS-DPC in detecting the clustering within a dataset.
谢娟英;张文杰
陕西师范大学 计算机科学学院,陕西 西安 710119
计算机与自动化
密度峰值聚类标准差局部密度分配策略聚类
density peak clusteringstandard deviationlocal densityassignment strategyclustering
《陕西师范大学学报(自然科学版)》 2024 (003)
47-62 / 16
国家自然科学基金(62076159,61673251,12031010);中央高校基本科研业务费(GK202105003)
评论