首页|期刊导航|南京大学学报（自然科学版）|一种基于簇边界的密度峰值点快速搜索聚类算法

一种基于簇边界的密度峰值点快速搜索聚类算法

贾培灵樊建聪彭延军

南京大学学报（自然科学版）2017，Vol.53Issue(2)：368-377,10.

南京大学学报（自然科学版）2017，Vol.53Issue(2)：368-377,10.DOI:10.13232/j.cnki.jnju.2017.02.019

一种基于簇边界的密度峰值点快速搜索聚类算法

An improved clustering algorithm by fast search and find ofdensity peaks based on boundary samples

贾培灵 ¹樊建聪 ¹彭延军²

作者信息

1. 山东科技大学计算机科学与工程学院,青岛,266590
2. 山东省智慧矿山信息技术重点实验室,青岛,266590
折叠

摘要

Abstract

In data mining community,clustering is one of the most important research topics because of the complexity and nonsupervisory of data.A great deal of techniques are devoted to the study of data clustering algorithms.A paper titled with clustering by fast search and find of density peaks(DPC)was proposed in Science journal,which focused on density-based clustering.Compared with other clustering algorithms,DPC only uses less parameters but can obtain better clustering results.However,when there exist multi density peaks in a cluster,the clustering results are not satisfactory.For this reason a boundary partition-based DPC algorithm,B-DPC,is proposed.B-DPC algorithm improves the standard DPC from two aspects:a criterion of cleaning noisy data and the data clustering processes with two rounds.A new criterion how to judge whether a data instance is a noise is defined by calculating the distances among all data instances.A data instance can be viewed as a noise if the distances between this instance and all noisy data instances in noisy dataset are less than a predetermined threshold.Such noisy data instances are firstly cleaned from dataset,and then B-DPC begins to implement a two-round process.The first-round process is to apply the standard DPC to choose some latent cluster centers.Then some initial clusters can be obtained and the decision graph can be built.The second-round process is to combine those similar clusters into more actual count of clusters,which is implemented by finding boundary data instances,the count of these boundary instances and the ratio of the boundary instances to the near clusters.In order to test the B-DPC algorithm,some classical artificial datasets and real-world datasets are applied to our experiments.And several well-performed clustering algorithms,such as DPC,DBSCAN,K-means,are also used as comparing clustering methods.Experimental results show that B-DPC can solve the multi density peaks problem effectively,and also discover the clusters with arbitrary shapes.

关键词

密度峰/聚类中心/噪声清除/聚类

Key words

density peaks/cluster centers/noise cleaning/clustering

分类

信息技术与安全科学

引用本文复制引用

贾培灵,樊建聪,彭延军..一种基于簇边界的密度峰值点快速搜索聚类算法[J].南京大学学报（自然科学版）,2017,53(2):368-377,10.

基金项目

国家自然科学基金(61203305,61433012),山东省重点研发计划(攻关)(2016GSF120012),山东省自然科学基金(ZR2015FM013),山东省"泰山学者"攀登计划（61203305,61433012）

南京大学学报（自然科学版）

OACSCDCSTPCD

ISSN：0469-5097

访问量0

下载量0

段落导航