南京大学学报(自然科学版)2017,Vol.53Issue(3):525-536,12.DOI:10.13232/j.cnki.jnju.2017.03.017
基于可靠性的正则化加权软k-均值的子空间聚类
Reliability-based regularized weighted soft k-means algorithmfor subspace clustering
摘要
Abstract
Subspace clustering methods have been widely employed in many fields involved in high-dimensional data clustering and attracted more and more attentions.Subspace clustering method is a clustering analysis technique with feature selection and can achieve better performances by selecting a subset of salient features and performing clustering on the low-dimensional representation of the high-dimensional data.In many practical applications,it is known that soft clustering can provide more meaningful partition of complex data than hard clustering.In this paper,we extend the k-means clustering and present a novel reliability-based regularized weighted soft k-means clustering algorithm(RRWSKM).The method can calculate the contribution of each dimension in each cluster and find different subsets of salient dimensions relevant to different clusters.Furthermore,it can also identify the exact data patterns by tuning model parameters and exhibit good performance.These are achieved by incorporating dimension weight entropy and partition entropy terms as regularizations into the objective function to avoid overfitting and stimulate more dimensions to contribute to identify the clusters.In addition,the reliability of dimension weights is retained by exploiting the data reliability measure,and the initial dimension weights can be determined,enhancing the performances and robustness of the proposed algorithm greatly.Since the optimization problem of RRWSKM is non-convex,the optimal solution is achieved by solving the optimization problem through an iterative update formulations.Some experiments on real-world data sets are conducted to verify the novel algorithm.The results of the experiments showed that the proposed method can exhibit the low-dimensionality representations of high-dimensional data and achieve better clustering performances than other subspace clustering methods and can handle with the high-dimensional data well.关键词
软k-均值聚类/聚类相关维度权重/最大熵/高维数据/可靠性测度Key words
soft k-means clustering/cluster-specific dimension weights/maximum entropy/high-dimensional data/reliability measure分类
信息技术与安全科学引用本文复制引用
李新玉,徐桂云,任世锦,杨茂云..基于可靠性的正则化加权软k-均值的子空间聚类[J].南京大学学报(自然科学版),2017,53(3):525-536,12.基金项目
国家自然科学基金(60974056) (60974056)