首页|期刊导航|现代电子技术|基于数据抽样的自动k-means聚类算法

基于数据抽样的自动k-means聚类算法

罗军锋洪丹丹

现代电子技术Issue(8)：19-21,3.

基于数据抽样的自动k-means聚类算法

Automatic k-means clustering algorithm based on data sampling

罗军锋 ¹洪丹丹¹

作者信息

1. 西安交通大学信息中心，陕西西安 710049
折叠

摘要

Abstract

In order to solve the problems of the traditional k-means algorithm in which k values needs to be input and the the ultra-large-scale data set needs to be clustered,on the basis of previous studies,the information entropy is brought in when distance is calculated,and data sampling method is adopted,that is,the optimal samples are extracted from the ultra-large-scale data set to conduct sample clustering. Based on the sample data clustering,the validity indexes are verified and k value re-quired by the algorithm is obtained. The distance formula for information entropy is brought in to carry out clustering on the ultra-large data set. Experiments show that the algorithm can overcome the defects of traditional k-means algorithm for k value input, and can automatically obtain k values of ultra-large data clustering under the premise of not affecting the quality of the early da-ta clustering.

关键词

k-means算法/信息熵/最优样本抽取/有效性指标

Key words

k-means algorithm/information entropy/optimal sample extraction/validity index

分类

信息技术与安全科学

引用本文复制引用

罗军锋,洪丹丹..基于数据抽样的自动k-means聚类算法[J].现代电子技术,2014,(8):19-21,3.

现代电子技术

OA北大核心CSTPCD

ISSN：1004-373X

访问量0

下载量0

段落导航