计算机应用与软件2017,Vol.34Issue(5):43-47,53,6.DOI:10.3969/j.issn.1000-386x.2017.05.008
面向大规模数据快速聚类K-means算法的研究
RESEARCH ON FAST CLUSTERING K-MEANS ALGORITHM FOR LARGE-SCALE DATA
摘要
Abstract
To further enhance the efficiency of K-means clustering algorithm for large-scale data, combined with MapReduce computational model, a parallel clustering method is proposed, which uses Hash function to extract samples and then obtains initial center by Pam algorithm.The sample extracted by Hash function can fully reflect the statistical characteristics of the data, using Pam algorithm to obtain the initial clustering center, and improve the traditional clustering algorithm to rely on the initial center of the problem.It uses the Pam algorithm to obtain the initial clustering center, and improves the problem of that the traditional clustering algorithms rely on the initial center.The experimental results show that the proposed algorithm can effectively improve the clustering quality and efficiency, and is suitable for the clustering analysis of large-scale data.关键词
大规模数据/聚类算法/MapReduce/Hash样本抽样/Pam算法Key words
Large-scale data/Clustering algorithm/MapReduce/Hash sampling/Pam algorithm分类
信息技术与安全科学引用本文复制引用
郭占元,林涛..面向大规模数据快速聚类K-means算法的研究[J].计算机应用与软件,2017,34(5):43-47,53,6.基金项目
天津市科技支持计划科技服务重大专项(14ZCDZGX00818). (14ZCDZGX00818)