湖南大学学报(自然科学版)Issue(8):116-124,9.
基于弹性分布数据集的海量空间数据密度聚类∗
Density Based Clustering on Large Scale Spatial Data Using Resilient Distributed Dataset
摘要
Abstract
This paper proposed a density based parallel clustering algorithm to mine the feature of large scale spatial data.The proposed PClusterdp algorithm is based on the cluster-dp algorithm.First,we in-troduced a data object count based RDD partition algorithm for balancing the working load of each compute node in computing cluster.Second,we redefined the local density for each data point to suit the parallel computing.Meanwhile,in order to get rid of original algorithm's decision graph,we proposed a method to automatically determine the center point for each cluster.Finally,we discussed the cluster merge strata-gem to combine the partially clustered data together to generate the final clustering result.We implemen-ted our Resilient Distributed Dataset (RDD)based algorithm on Spark.The experiment result shows that the proposed algorithm can cluster large scale spatial data effectively,and meanwhile,the method has bet-ter performance than the traditional density clustering methods and can achieve the rapid clustering of mas-sive spatial data.关键词
空间数据/聚类算法/弹性分布式数据集/SparkKey words
spatial data/clustering algorithm/resilient distributed dataset/Spark分类
信息技术与安全科学引用本文复制引用
李璐明,蒋新华,廖律超..基于弹性分布数据集的海量空间数据密度聚类∗[J].湖南大学学报(自然科学版),2015,(8):116-124,9.基金项目
国家自然科学基金资助项目(61304199),National Natural Science Foundation of China(61304199) (61304199)
长沙理工大学特殊道路工程湖南省重点实验室开发基金资助项目 ()