| 注册
首页|期刊导航|计算机工程|基于MapReduce的分布式网络数据聚类算法

基于MapReduce的分布式网络数据聚类算法

陈东明 刘健 王冬琦 徐晓伟

计算机工程2013,Vol.39Issue(7):76-82,7.
计算机工程2013,Vol.39Issue(7):76-82,7.DOI:10.3969/j.issn.1000-3428.2013.07.017

基于MapReduce的分布式网络数据聚类算法

Distributed Clustering Algorithm for Network Data Based on MapReduce

陈东明 1刘健 1王冬琦 1徐晓伟2

作者信息

  • 1. 东北大学软件学院,沈阳110819
  • 2. 阿肯色大学小石城分校信息科学系,美国小石城72204
  • 折叠

摘要

Abstract

Due to the high time and space complexity and physical machines out of memory,traditional clustering algorithms usually can not effectively analyze and deal with large data network.To solve this problem,this paper proposes a distributed clustering algorithm for network data based on MapReduce model.It adopts the theory of MRC theory to design limited round number of MapReduce to control the time in shuffle stage,and utilizes the Map inner merging technology to control network flow.It proposes an idea that if merge the intermediate results,only merge clusters and do not consider the internal nodes,which can control memory overhead.It utilizes the data sets generated by simulation to do experiment.Experimental results show that when the data size and cluster scale increases,the CAMR algorithm has good speedup ratio and scalability.

关键词

聚类算法/分布式聚类/MapReduce编程模型/数据挖掘/社团结构

Key words

clustering algorithm/ distributed clustering/ MapReduce programming model/ data mining/ community structure

分类

信息技术与安全科学

引用本文复制引用

陈东明,刘健,王冬琦,徐晓伟..基于MapReduce的分布式网络数据聚类算法[J].计算机工程,2013,39(7):76-82,7.

基金项目

辽宁省自然科学基金资助项目(20102059) (20102059)

计算机工程

OACSCDCSTPCD

1000-3428

访问量0
|
下载量0
段落导航相关论文