南京邮电大学学报(自然科学版)2017,Vol.37Issue(4):113-118,6.DOI:10.14132/j.cnki.1673-5439.2017.04.018
K-Means算法改进及基于Spark计算模型的实现
Improvement of K-Means algorithm and implementation based on Spark computing model
徐鹏程 1王诚1
作者信息
- 1. 南京邮电大学通信与信息工程学院,江苏南京210003
- 折叠
摘要
Abstract
The K-Means algorithm is a partition-based algorithm with numerous advantages of simple and high efficiency.But the algorithm has a strong dependence on the selection of initial center.What's more,the number of classes is not always known and frequent iterations can result in the overload of server.To solve these problems,the original K-Means algorithm is improved by introducing Canopy algorithm and minimum maximum distance algorithm.In order to deal with big data,the Spark computing model is utilized to improve the algorithm.Experimental results show that the improved clustering algorithm can improve the classification stability,the accuracy and the convergence speed,thus having performance advantages in dealing with big data.关键词
K-Means/Canopy算法/最小最大距离算法/SparkKey words
K-Means/Canopy algorithm/minimum maximum distance algorithm/Spark分类
信息技术与安全科学引用本文复制引用
徐鹏程,王诚..K-Means算法改进及基于Spark计算模型的实现[J].南京邮电大学学报(自然科学版),2017,37(4):113-118,6.