现代电子技术2016,Vol.39Issue(8):9-13,5.DOI:10.16652/j.issn.1004-373x.2016.08.003
基于Spark的PFP-Growth并行算法优化实现
Optimization of parallel FP-Growth algorithm based on Spark
摘要
Abstract
The advantage of the FP⁃Growth algorithm for compressing data is reflected with the increasing of the data size. With the MapReduce framework,the PFP⁃Growth algorithm can be parallelized on the Hadoop platform. However,when processing tasks with the MapReduce framework,the intermediate results need to be written to the disk,which will affect the efficiency of the algorithm. Therefore,based on Spark platform,this algorithm was improved according to the concept of balanced grouping to improve the efficiency of association mining. In addition,if there is a long prefix,the improved algorithm will split the shared prefix. The IPFP⁃Growth is implemented in Spark through four steps. The experimental results show that the performance of the algorithm optimized in Spark is superior to that of the PFP⁃Growth algorithm.关键词
并行化/Spark/关联挖掘/PFP-GrowthKey words
parallelization/Spark/association mining/PFP-Growth分类
信息技术与安全科学引用本文复制引用
方向,张功萱..基于Spark的PFP-Growth并行算法优化实现[J].现代电子技术,2016,39(8):9-13,5.基金项目
江苏省973项目(BK2011022);国家自然科学基金重点项目 ()