计算机应用与软件2018,Vol.35Issue(4):269-275,280,8.DOI:10.3969/j.issn.1000-386x.2018.04.050
大规模数据集下基于DBSCAN算法的增量并行化快速聚类
INCREMENTAL PARALLELIZATION OF FAST CLUSTERING BASED ON DBSCAN ALGORITHM UNDER LARGE-SCALE DATA SET
摘要
Abstract
Spatial temporal trajectory data mining is an important way to discover the behavior patterns of mobile objects.Aimed at the demand of massive trajectory data processing,an incremental parallelization fast clustering algorithm was proposed.Based on the number of data points,the algorithm divided the space grid by dichotomy,and combined the greedy algorithm to restructure the partition rationally to reasonably divide the data.It dealt with local clustering to obtain the merged candidate cluster sets.The candidate clusters of R *-tree indexes were merged to be judged and processed.An undirected acyclic graph model of the merged clusters was established and the data was globally re-labeled.The experimental results show that thealgorithm effectively reduces the elastic partition processing noise data and improves the quality of local clustering.The merging strategy adopting R *-tree index structure effectively improves the time efficiency of clustering,and has good clustering effect and realized the online processing of large-scale data.关键词
大数据/DBSCAN/均衡划分/增量/并行化Key words
Big data/DBSCAN/Balanced partitioning/Increment/Parallelization分类
信息技术与安全科学引用本文复制引用
王兴,吴艺,蒋新华,廖律超..大规模数据集下基于DBSCAN算法的增量并行化快速聚类[J].计算机应用与软件,2018,35(4):269-275,280,8.基金项目
国家自然科学基金项目(61304199,41471333) (61304199,41471333)
福建省高校杰出青年科研人才计划项目(JA14209) (JA14209)
福建省教育厅项目(JA15325). (JA15325)