首页|期刊导航|南京邮电大学学报（自然科学版）|K-Means算法改进及基于Spark计算模型的实现

K-Means算法改进及基于Spark计算模型的实现

徐鹏程王诚

南京邮电大学学报（自然科学版）2017，Vol.37Issue(4)：113-118,6.

南京邮电大学学报（自然科学版）2017，Vol.37Issue(4)：113-118,6.DOI:10.14132/j.cnki.1673-5439.2017.04.018

K-Means算法改进及基于Spark计算模型的实现

Improvement of K-Means algorithm and implementation based on Spark computing model

徐鹏程 ¹王诚¹

作者信息

1. 南京邮电大学通信与信息工程学院,江苏南京210003
折叠

摘要

Abstract

The K-Means algorithm is a partition-based algorithm with numerous advantages of simple and high efficiency.But the algorithm has a strong dependence on the selection of initial center.What's more,the number of classes is not always known and frequent iterations can result in the overload of server.To solve these problems,the original K-Means algorithm is improved by introducing Canopy algorithm and minimum maximum distance algorithm.In order to deal with big data,the Spark computing model is utilized to improve the algorithm.Experimental results show that the improved clustering algorithm can improve the classification stability,the accuracy and the convergence speed,thus having performance advantages in dealing with big data.

关键词

K-Means/Canopy算法/最小最大距离算法/Spark

Key words

K-Means/Canopy algorithm/minimum maximum distance algorithm/Spark

分类

信息技术与安全科学

引用本文复制引用

徐鹏程,王诚..K-Means算法改进及基于Spark计算模型的实现[J].南京邮电大学学报（自然科学版）,2017,37(4):113-118,6.

南京邮电大学学报（自然科学版）

OA北大核心CSTPCD

ISSN：1673-5439

访问量0

下载量0

段落导航