计算机工程与科学2019,Vol.41Issue(2):214-223,10.DOI:10.3969/j.issn.1007-130X.2019.02.004
高维数据的增量式聚类算法的距离度量选择研究
Selecting distance metrics for incremental clustering algorithm of high dimensional data
邵俊健 1王士同1
作者信息
- 1. 江南大学数字媒体学院, 江苏 无锡 214122
- 折叠
摘要
Abstract
Appropriate distance metric functions have an important effect on clustering results. For large-scale and high-dimensional datasets, the incremental fuzzy clustering algorithm is used to analyze the selection of distance metrics. Since the SpFCM algorithm divides a large-scale dataset into small samples for incremental batch clustering, it can get better clustering results in limited computer memory. Different distance metric functions are applied into the traditional SpFCM algorithm in order to measure the similarities between different samples to check the effect of different distance metrics on the SpFCM algorithm. Four distance metrics, which are the Euclidean metric, the cosine metric, the correlation distance metric and the extended Jaccard similarity metric, are used to calculate the distance for different large-scale high dimensional datasets. Experimental results show that, the latter three distance metrics can greatly improve the clustering effect. The correlation distance metric gets a better clustering result while the cosine distance metric and the extended Jaccard similarity distance get an average result.关键词
高维数据/SpFCM算法/距离度量/增量式模糊聚类算法/相关系数距离度量Key words
high dimensional data/SpFCM algorithm/distance metric/incremental fuzzy clustering algorithm/correlation coefficient distance metric分类
信息技术与安全科学引用本文复制引用
邵俊健,王士同..高维数据的增量式聚类算法的距离度量选择研究[J].计算机工程与科学,2019,41(2):214-223,10.