首页|期刊导航|电子科技大学学报|一种K-means改进算法的并行化实现与应用

一种K-means改进算法的并行化实现与应用

李晓瑜俞丽颖雷航唐雪飞

电子科技大学学报2017，Vol.46Issue(1)：61-68,8.

电子科技大学学报2017，Vol.46Issue(1)：61-68,8.DOI:10.3969/j.issn.1001-0548.2017.01.010

一种K-means改进算法的并行化实现与应用

The Parallel Implementation and Application of an Improved K-means Algorithm

李晓瑜 ¹俞丽颖 ¹雷航 ¹唐雪飞¹

作者信息

1. 电子科技大学信息与软件工程学院成都 610054
折叠

摘要

Abstract

Following with the growth of massive data, clustering research, one of the core problems of big dataisfaced with more and more problems such as high computing complexity and lack of resource. It has proposed an improved parallel K-means algorithm based on Hadoop. To overcomethe problem that the traditional K-means algorithm often has local optimal solution due to the randomness choice of initial center, we introduce Canopy algorithm to initialize clustering center andapply K-means algorithm on canopy. Meanwhile, clusters are merged among canopies. The result is stable and iteration number is less. In addition, the parallel implementation methods and strategies of the improved algorithm are presented, combining with the distributed computing model of MapReduce. And a new method of text clustering is introduced by improving the similarity of measurement. The experiment results indicate the validity and scalability of our method.

关键词

canopy算法/Hadoop/MapReduce/并行K-means/文本聚类

Key words

canopy algorithm/Hadoop/MapReduce/parallel K-means/text clustering

分类

信息技术与安全科学

引用本文复制引用

李晓瑜,俞丽颖,雷航,唐雪飞..一种K-means改进算法的并行化实现与应用[J].电子科技大学学报,2017,46(1):61-68,8.

基金项目

国家科技支撑计划(2012BAH87F03)；中央高校基本科研业务费(ZYGX2014J065) （2012BAH87F03）

电子科技大学学报

OA北大核心CSCDCSTPCD

ISSN：1001-0548

访问量0

下载量0

段落导航