南京大学学报(自然科学版)2016,Vol.52Issue(6):1090-1096,7.DOI:10.13232/j.cnki.jnju.2016.06.012
一种基于抽样的谱聚类集成算法
An ensemble algorithm of spectral clustering based on sampling
摘要
Abstract
Spectral clustering algorithm is an important one among clustering algorithms,and it uses the feature vectors of the similarity matrix calculated from the sample data set to cluster the sample data.However,the compu-tational complexity and time consumption will increase markedly because of the large-scale calculation of eigen-de-composition when the spectral clustering is applied to large scale data sets.The use of sampling methods can effectively reduce the time consumed by the spectral clustering algorithm,but the relationship between the data subsets extracted by simple randomly sampling is too weak,which usually cannot reflect the distribution characteristics of the sample data sets accurately.Based on this and aimed at the computing characteristics of spectral clustering algorithm,a new sampling strategy different from the simple random sampling is designed and multiply used to generate multiple data subsets that can reflect the distribution characteristics of the sample data sets more accurately because of their coexisting relevance and otherness.Then each data subset is spectral clustered by NJW algorithm(the most classical spectral clustering algorithm,proposed by Ng A Y,Jordom M I and Weiss Y)and every clustering results can be mapped to the whole sample data set according to the nearest neighbor principle,generating a number of component clusters which both have relevance and otherness.Finally,the clustering results of the whole sample data set are integrated to get the final unified clustering partition.Experimental results show that applying the proposed sampling method to the spectral clustering algorithm is effective compared with the traditional NJW al-gorithm and efficient compared with the ensemble algorithm of spectral clustering based on simple sampling.关键词
抽样/谱聚类/聚类集成/相似性矩阵/有效性指标Key words
sampling/spectral clustering/clustering ensemble/similarity matrix/validity index分类
信息技术与安全科学引用本文复制引用
孟娜,梁吉业,庞天杰..一种基于抽样的谱聚类集成算法[J].南京大学学报(自然科学版),2016,52(6):1090-1096,7.基金项目
国家自然科学基金(61273294),山西省回国留学人员科研项目(2013-101) (61273294)