计算机技术与发展Issue(2):60-64,5.DOI:10.3969/j.issn.1673-629X.2013.02.015
一种网格化聚类算法的MapReduce并行化研究
MapReduce Parallelization Research of a Clustering Algorithm Based on Grid
摘要
Abstract
As the incremental growth of clustering data and inspired by the parallel processing model of cloud computing,conducted the MapReduce parallelization research for a clustering algorithm based on gird. This algorithm,firstly,preprocessed the data using the grid processing method,then used the center of gravity of the grid unit as the basic data unit for the clustering analysis under the MapReduce framework,instead of using all the points stored in the unit. The result of experiments demonstrate that this clustering algorithm after its MapReduce parallelization had the same result as before running in the Hadoop cluster. This clustering algorithm can also save the time of analysis and reduce the computational complexity. So,it is suitable for the analysis and data mining of incremental massive data with high latitudes.关键词
网格/聚类/数据挖掘/MapReduce并行化Key words
grid/clustering algorithm/data mining/MapReduce parallelization分类
信息技术与安全科学引用本文复制引用
张磊,张公让,张金广..一种网格化聚类算法的MapReduce并行化研究[J].计算机技术与发展,2013,(2):60-64,5.基金项目
国家“863”云制造主题项目(2011AA040501) (2011AA040501)
国家自然科学基金资助项目(70871033) (70871033)
安徽省教育自然科学重点项目(KJ2011A006) (KJ2011A006)