计算机工程与应用Issue(23):131-135,202,6.DOI:10.3778/j.issn.1002-8331.1301-0121
高维分类型数据加权子空间聚类算法
Algorithm for high-dimensional categorical data weighted sub-space clustering
摘要
Abstract
Subspace clustering is a kind of effective strategy to high-dimensional data clustering, the principle of sub-space clustering is as well as possible keeping original data information, meanwhile as small as possible using subspace to data clustering. Based on the studying of the existing soft subspace clustering, it proposes a new algorithm for subspace searching. The algorithm combines with the size of cluster and information entropy, defines a new subspace dimensional weight distribution mode, and then uses the feature vector of cluster subspace to measure the similarity of two clusters. It uses the idea of agglomerative hierarchical clustering in hierarchical clustering to data clustering, which overcoming the shortcomings of using information entropy or traditional similarity separately. Through the test in the Zoo, Votes, Soybean three typical categorical data set to find out that compared with other algorithms, the proposed algorithm not only can improve the accuracy of clustering, but also has the very high stability.关键词
高维数据/聚类/子空间/信息熵/层次聚类Key words
high-dimensional data/clustering/subspace/information entropy/hierarchical clustering分类
信息技术与安全科学引用本文复制引用
孙浩军,闪光辉,高玉龙,袁婷,吴云霞..高维分类型数据加权子空间聚类算法[J].计算机工程与应用,2014,(23):131-135,202,6.基金项目
国家自然科学基金(No.61170130)。 ()