首页|期刊导航|沈阳航空航天大学学报|高维数据聚类数量可视化确定模式

高维数据聚类数量可视化确定模式

何选森何帆樊跃平陈洪军

沈阳航空航天大学学报2024，Vol.41Issue(3)：71-84,14.

沈阳航空航天大学学报2024，Vol.41Issue(3)：71-84,14.DOI:10.3969/j.issn.2095-1248.2024.03.010

高维数据聚类数量可视化确定模式

Visualized determination mode for clustering quantity of high-dimensional data

何选森 ¹何帆 ²樊跃平 ³陈洪军³

作者信息

1. 广州商学院信息技术与工程学院,广州 511363||湖南大学信息科学与工程学院,长沙 410082
2. 北京理工大学管理与经济学院,北京 100081
3. 广州商学院信息技术与工程学院,广州 511363
折叠

摘要

Abstract

In order to solve the problem that the classical K-means clustering algorithm reguired users to know the number of clusters in advance and the clustering results were sensitive to initialization of the algorithm,a comprehensive scheme was proposed to improve the random initial partitioning of K-means algorithm and visually determine the number of clusters.Firstly,the data was standardized to make it obey normal distribution,and the most important features were extracted by principal compo-nent analysis to achieve dimensionality reduction of high-dimensional data.Then,the farthest centroid selection and min-max distance rule were used to modify the random initialization of K-means algo-rithm to avoid empty clusters and ensure data separability.Based on these,the statistical empirical rule was used to estimate the range of the number of clusters,and the optimal number of clusters was as-sessed by searching the elbow of sum-of-squared-error curve within this range.Finally,by calculating and comparing the silhouette coefficients of each cluster,the clustering quality of the algorithm was evaluated,thereby ultimately determining the inherent number of clusters in the data.The simulation re-sults show that the proposed scheme can not only visually determine the potential number of clusters in the data,but also provide an effective method for high-dimensional data analysis in the era of big data.

关键词

K-均值聚类算法/主分量分析/最远质心选择/最小-最大距离规则/统计经验法则/肘部法/轮廓分析

Key words

K-means clustering algorithm/principal component analysis/farthest centroid selection/min-max distance rule/statistical empirical rule/elbow method/silhouette analysis

分类

信息技术与安全科学

引用本文复制引用

何选森,何帆,樊跃平,陈洪军..高维数据聚类数量可视化确定模式[J].沈阳航空航天大学学报,2024,41(3):71-84,14.

基金项目

广东省普通高校重点领域专项(项目编号:2021ZDZX1035) （项目编号:2021ZDZX1035）

沈阳航空航天大学学报

ISSN：2095-1248

访问量6

下载量0

段落导航