| 注册
首页|期刊导航|南京大学学报(自然科学版)|一种基于带核随机子空间的聚类集成算法

一种基于带核随机子空间的聚类集成算法

严丽宇 魏巍 郭鑫垚 崔军彪

南京大学学报(自然科学版)2017,Vol.53Issue(6):1033-1042,10.
南京大学学报(自然科学版)2017,Vol.53Issue(6):1033-1042,10.DOI:10.13232/j.cnki.jnju.2017.06.005

一种基于带核随机子空间的聚类集成算法

Clustering ensemble algorithm based on random subspace with core

严丽宇 1魏巍 1郭鑫垚 2崔军彪1

作者信息

  • 1. 山西大学计算机与信息技术学院,太原,030006
  • 2. 计算智能与中文信息处理教育部重点实验室,太原,030006
  • 折叠

摘要

Abstract

Clustering analysis with a wide range of applications is very important for data mining.At present, clustering algorithms are faced with large-scale and high-dimension data,but the traditional clustering algorithms are not effective to cluster the sparse data in high dimensional data environment.Subspace clustering algorithm,which aims at solving clustering problems in high dimensional data environment,is a newly emerging and quite important embranchment of clustering analysis.For one thing,acting as an extension of the traditional clustering algorithm, subspace clustering plays a vital role in clustering the high dimensional data effectively.For another,clustering ensemble can offer a partition that could better reflect the inherent structure of the data set through integrating many clustering results of the original data set,which improves the quality of clustering to a large degree.Random subspace-based clustering ensemble algorithm generates subspaces through sampling attributes randomly,and then combines base clustering results derived from these attribute subspaces to produce the ensemble clustering result.In the whole process,it is possible that some random subspaces may contain few important attributes,which gives rise to a bad ensemble clustering result ultimately.To address this problem,we propose a core-containing random subspace generating strategy,where we pick out a set of important attributes on the basis of their values of complement mutual information in rough set theory as the core of each attribute subspace first of all,and then combine the core with some attributes sampled randomly from the rest of attributes to construct a random subspace with core.Not only does this random subspace generating strategy provide diversity among subspaces,it also heightens the ability of representing complete information of data for every subspace,which contributes to a better clustering ensemble result.Performing experiments on data from UCI(University of California.Irvine),it turns out that compared with the clustering ensemble based on completely random subspace,the one based on random subspace with core wins in the majority of the data sets.

关键词

子空间聚类/聚类集成/粗糙集/互补互信息

Key words

subspace clustering/clustering ensemble/rough set/complement mutual information

分类

信息技术与安全科学

引用本文复制引用

严丽宇,魏巍,郭鑫垚,崔军彪..一种基于带核随机子空间的聚类集成算法[J].南京大学学报(自然科学版),2017,53(6):1033-1042,10.

基金项目

国家自然科学基金(61303008,61202018),山西省自然科学基金(2013021018-1) (61303008,61202018)

南京大学学报(自然科学版)

OACSCDCSTPCD

0469-5097

访问量0
|
下载量0
段落导航相关论文