东南大学学报(自然科学版)2011,Vol.41Issue(3):505-508,4.DOI:10.3969/j.issn.1001-0505.2011.03.014
基于约束信息的并行k-means算法
Parallel k-means algorithm based on constrained information
摘要
Abstract
In order to obtain the desired clustering results on the distributed data set, a parallel kmeans algorithm is presented based on constrained information. On the basis of the facts that the parallel k-means algorithm can be effectively used in clustering the horizontal distributed data set, the objective function of the parallel k-means algorithm is modified, and the constrained parallel k-means algorithm is designed, then the constrained information of site users is introduced into the distributed clustering process in the form of chunklets, which can guide the algorithm to a bias search. Theoretically the algorithm guarantees the inter-cluster distance among the unconstrained samples to be the closest, and guarantees the average distance between constrained samples in a chunklet and the corresponding cluster center to be the closest one. The results from the experiments show that the algorithm can effectively enhance the clustering precision of parallel k-means, meanwhile it can obtain the clustering results on the distributed data set, which are equivalent to the results of the constrained k-means algorithm running on a centralized data set.关键词
k-means/并行k-means/约束聚类/约束并行k-meansKey words
k-means/ parallel k-means/ constrained clustering/ constrained parallel k-means分类
信息技术与安全科学引用本文复制引用
於跃成,王建东,郑关胜,陈斌..基于约束信息的并行k-means算法[J].东南大学学报(自然科学版),2011,41(3):505-508,4.基金项目
国家高技术研究发展计划(863计划)资助项目(2006AA12A106)、国家自然科学基金资助项目(60903130). (863计划)