|国家科技期刊平台
首页|期刊导航|山西大学学报(自然科学版)|基于簇中心预选策略的三支决策密度峰值聚类算法

基于簇中心预选策略的三支决策密度峰值聚类算法OACSTPCD

Three-way Decision-based Density Peak Clustering Algorithm with Clustering Centers Preselection

中文摘要英文摘要

本文针对密度峰值聚类算法(CFSFDP)无法自动选取簇中心的不确定性问题,通过引入三支决策理论对其进行优化,提出了一种基于簇中心预选策略的三支决策密度峰值聚类算法(TDPC).首先利用密度和距离两参数的统计特性将数据对象划分核心域、边界域与琐碎域,符合条件的聚类中心被置于核心域,难以判定的疑似聚类中心点则被置于边界域,然后通过定义的 k-可达域和判别准则对疑似聚类中心进行分析,选取出实际聚类中心.所提出算法有效解决了密度峰值聚类算法聚类中心自动确定问题.在2个人工数据集和4个UCI(University of Cali-fornia,lrvine)公共数据集上对TDPC进行测试.与CFSFDP算法和DBSCAN(Density-Based Spatial Clustering of Applications with Noise)算法进行聚类性能比较,所提出算法TDPC在轮廓系数、DB(Davies-Bouldin)指数、调整互信息、调整兰德系数、FM(Fowlkes-Mallows)指数、同质性、完整性等聚类评价指标方面均达到最优或与最优算法结果相近,表明TDPC综合聚类性能优于比较算法,具有良好的聚类可行性与有效性.

Aiming at the uncertainty problem that the CFSFDP(clustering by fast search and find of density peaks)algorithm cannot automatically select the clustering center,in this paper,we propose a three-way decision-based density peak clustering algorithm with clustering centers preselection(TDPC)by incorporating the three-way decision theory.Firstly,the statistical characteristics of density and distance are used to divide the data objects into core region,boundary region and trivial region.The qualified cluster cen-ters are assigned to the core region,and the suspected cluster centers that are difficult to determine are placed in the boundary region.Then the defined k-reachable region and discriminant criterion are used to analyze the suspected cluster centers,and the actual clus-ter centers are subsequently selected.The proposed algorithm can effectively solve the problem of automatic determination of cluster centers in density peak clustering algorithm.The proposed algorithm is evaluated on two synthetic datasets and four UCI(University of California,lrvine)public datasets.Comparing to the CFSFDP algorithm and the DBSCAN(Density-Based Spatial Clustering of Applications with Noise)algorithm,TDPC demonstrated clustering performance that is on par with or superior to the optimal algo-rithm across various clustering evaluation indexes,including silhouette coefficient,DB(Davies-Bouldin)index,adjusted mutual in-formation,adjusted rand index,FM(Fowlkes-Mallows)index,homogeneity,and completeness.These results indicate that TDPC outperforms the comparison algorithms in terms of comprehensive clustering performance,and highlight its good clustering feasibili-ty and effectiveness.

罗舒文;万仁霞;苗夺谦

泉州信息工程学院 通识教育中心,福建 泉州 362000||北方民族大学 数学与信息科学学院,宁夏 银川 750021北方民族大学 数学与信息科学学院,宁夏 银川 750021同济大学 电子与信息工程学院,上海 201804

计算机与自动化

聚类算法聚类中心边界域三支聚类密度聚类k-可达域

clustering algorithmclustering centerboundary regionthree-way clusteringdensity clusteringk-reachable region

《山西大学学报(自然科学版)》 2024 (001)

不确定性无线传感器网络数据流的聚类问题研究及其在银川地区气溶胶监测中的应用

30-39 / 10

国家自然科学基金(61662001);中央高校基本科研业务费专项资金(FWNX04);宁夏自然科学基金(2021AAC03203)

10.13451/j.sxu.ns.2023140

评论