南京大学学报(自然科学版)2018,Vol.54Issue(1):48-55,8.DOI:10.13232/j.cnki.jnju.2018.01.006
基于分位数半径的动态K-means算法
Dynamic K-means algorithm based on quantile radius
摘要
Abstract
There are obvious deficiencies in the well-known K-means algorithm:one is that it is sensitive to initial values,and the other is that the number of clusters needs to be given artificially.Many related works have improved the performance of K-means,but most of them only focus on single aspect rather than both on speed and identification of the number of clusters.The Hierarchical K-means algorithm(H K-means)was proposed to generate a center set C by carrying out K-means repeatedly with a constant k .After that,it generates the initial cluster centers of K-means by executing Hierarchical clustering algorithm on the center set C.This paper focuses on the geometric features of clusters,proposing a dynamic K-means algorithm based on quantile radius (QRD K-means ).This algorithm is on the basis of Hierarchical K-means and repeats K-means with varying k values.After executing K-means multiple times,a series of cluster centers are obtained to be stored in set C.Moreover,QRD K-means addi-tionally a new concept called quantile radius,which is a certain quantile of distance between the center and the sample points which belong to the corresponding cluster.Then it merges the centers in set C into a new cluster by comparing the pair-wise distance of centers with the sum of the corresponding quantile radiuses.In this way,a clustering result of centers can be obtained adaptively.At last,QRD K-means uses the centers of these new clusters as the initial centers of K-means to get the finial clustering result.The change in k value makes the center set C cover more areas than the previous Hierarchical K-means and can lead to more representative initial cluster centers.The way to get initial cluster centers by means of quantile radius is also faster.In the simulation experiments,QRD K-means was compared with Hierarchical K-means,DP-means and X-means both on the time consumption and accuracy.Results show that QRD K-means method is faster and more efficient.关键词
K-means/类的数目/分位数半径/动态K-meansKey words
K-means/number of clusters/quantile radius/dynamic K-means分类
信息技术与安全科学引用本文复制引用
程明畅,刘友波,张程嘉,马铁丰..基于分位数半径的动态K-means算法[J].南京大学学报(自然科学版),2018,54(1):48-55,8.基金项目
国家自然科学基金(11471264,11401148,11571282,51437003) (11471264,11401148,11571282,51437003)