首页|期刊导航|计算机应用与软件|面向大规模数据快速聚类K-means算法的研究

面向大规模数据快速聚类K-means算法的研究

郭占元林涛

计算机应用与软件2017，Vol.34Issue(5)：43-47,53,6.

计算机应用与软件2017，Vol.34Issue(5)：43-47,53,6.DOI:10.3969/j.issn.1000-386x.2017.05.008

面向大规模数据快速聚类K-means算法的研究

RESEARCH ON FAST CLUSTERING K-MEANS ALGORITHM FOR LARGE-SCALE DATA

郭占元 ¹林涛¹

作者信息

1. 河北工业大学计算机科学与软件学院天津 300401
折叠

摘要

Abstract

To further enhance the efficiency of K-means clustering algorithm for large-scale data, combined with MapReduce computational model, a parallel clustering method is proposed, which uses Hash function to extract samples and then obtains initial center by Pam algorithm.The sample extracted by Hash function can fully reflect the statistical characteristics of the data, using Pam algorithm to obtain the initial clustering center, and improve the traditional clustering algorithm to rely on the initial center of the problem.It uses the Pam algorithm to obtain the initial clustering center, and improves the problem of that the traditional clustering algorithms rely on the initial center.The experimental results show that the proposed algorithm can effectively improve the clustering quality and efficiency, and is suitable for the clustering analysis of large-scale data.

关键词

大规模数据/聚类算法/MapReduce/Hash样本抽样/Pam算法

Key words

Large-scale data/Clustering algorithm/MapReduce/Hash sampling/Pam algorithm

分类

信息技术与安全科学

引用本文复制引用

郭占元,林涛..面向大规模数据快速聚类K-means算法的研究[J].计算机应用与软件,2017,34(5):43-47,53,6.

基金项目

天津市科技支持计划科技服务重大专项(14ZCDZGX00818). （14ZCDZGX00818）

计算机应用与软件

OA北大核心CSTPCD

ISSN：1000-386X

访问量2

下载量0

段落导航