首页|期刊导航|现代电子技术|基于Spark的PFP-Growth并行算法优化实现

基于Spark的PFP-Growth并行算法优化实现

方向张功萱

现代电子技术2016，Vol.39Issue(8)：9-13,5.

现代电子技术2016，Vol.39Issue(8)：9-13,5.DOI:10.16652/j.issn.1004-373x.2016.08.003

基于Spark的PFP-Growth并行算法优化实现

Optimization of parallel FP-Growth algorithm based on Spark

方向 ¹张功萱¹

作者信息

1. 南京理工大学计算机科学与技术学院，江苏南京 210094
折叠

摘要

Abstract

The advantage of the FP⁃Growth algorithm for compressing data is reflected with the increasing of the data size. With the MapReduce framework,the PFP⁃Growth algorithm can be parallelized on the Hadoop platform. However,when processing tasks with the MapReduce framework,the intermediate results need to be written to the disk,which will affect the efficiency of the algorithm. Therefore,based on Spark platform,this algorithm was improved according to the concept of balanced grouping to improve the efficiency of association mining. In addition,if there is a long prefix,the improved algorithm will split the shared prefix. The IPFP⁃Growth is implemented in Spark through four steps. The experimental results show that the performance of the algorithm optimized in Spark is superior to that of the PFP⁃Growth algorithm.

关键词

并行化/Spark/关联挖掘/PFP-Growth

Key words

parallelization/Spark/association mining/PFP-Growth

分类

信息技术与安全科学

引用本文复制引用

方向,张功萱..基于Spark的PFP-Growth并行算法优化实现[J].现代电子技术,2016,39(8):9-13,5.

基金项目

江苏省973项目（BK2011022）；国家自然科学基金重点项目（）

现代电子技术

OA北大核心CSTPCD

ISSN：1004-373X

访问量0

下载量0

段落导航