| 注册
首页|期刊导航|南京大学学报(自然科学版)|基于可靠性的正则化加权软k-均值的子空间聚类

基于可靠性的正则化加权软k-均值的子空间聚类

李新玉 徐桂云 任世锦 杨茂云

南京大学学报(自然科学版)2017,Vol.53Issue(3):525-536,12.
南京大学学报(自然科学版)2017,Vol.53Issue(3):525-536,12.DOI:10.13232/j.cnki.jnju.2017.03.017

基于可靠性的正则化加权软k-均值的子空间聚类

Reliability-based regularized weighted soft k-means algorithmfor subspace clustering

李新玉 1徐桂云 1任世锦 2杨茂云1

作者信息

  • 1. 中国矿业大学机电工程学院,徐州,221116
  • 2. 徐州师范大学计算机学院,徐州,221116
  • 折叠

摘要

Abstract

Subspace clustering methods have been widely employed in many fields involved in high-dimensional data clustering and attracted more and more attentions.Subspace clustering method is a clustering analysis technique with feature selection and can achieve better performances by selecting a subset of salient features and performing clustering on the low-dimensional representation of the high-dimensional data.In many practical applications,it is known that soft clustering can provide more meaningful partition of complex data than hard clustering.In this paper,we extend the k-means clustering and present a novel reliability-based regularized weighted soft k-means clustering algorithm(RRWSKM).The method can calculate the contribution of each dimension in each cluster and find different subsets of salient dimensions relevant to different clusters.Furthermore,it can also identify the exact data patterns by tuning model parameters and exhibit good performance.These are achieved by incorporating dimension weight entropy and partition entropy terms as regularizations into the objective function to avoid overfitting and stimulate more dimensions to contribute to identify the clusters.In addition,the reliability of dimension weights is retained by exploiting the data reliability measure,and the initial dimension weights can be determined,enhancing the performances and robustness of the proposed algorithm greatly.Since the optimization problem of RRWSKM is non-convex,the optimal solution is achieved by solving the optimization problem through an iterative update formulations.Some experiments on real-world data sets are conducted to verify the novel algorithm.The results of the experiments showed that the proposed method can exhibit the low-dimensionality representations of high-dimensional data and achieve better clustering performances than other subspace clustering methods and can handle with the high-dimensional data well.

关键词

软k-均值聚类/聚类相关维度权重/最大熵/高维数据/可靠性测度

Key words

soft k-means clustering/cluster-specific dimension weights/maximum entropy/high-dimensional data/reliability measure

分类

信息技术与安全科学

引用本文复制引用

李新玉,徐桂云,任世锦,杨茂云..基于可靠性的正则化加权软k-均值的子空间聚类[J].南京大学学报(自然科学版),2017,53(3):525-536,12.

基金项目

国家自然科学基金(60974056) (60974056)

南京大学学报(自然科学版)

OACSCDCSTPCD

0469-5097

访问量0
|
下载量0
段落导航相关论文