首页|期刊导航|计算机工程与应用|基于Spark的FP_Growth算法的并行与优化

基于Spark的FP_Growth算法的并行与优化

石陆魁张欣师胜利

计算机工程与应用2018，Vol.54Issue(13)：52-58,110,8.

计算机工程与应用2018，Vol.54Issue(13)：52-58,110,8.DOI:10.3778/j.issn.1002-8331.1705-0114

基于Spark的FP_Growth算法的并行与优化

Parallelization and optimization of FP_Growth algorithm based on Spark

石陆魁 ¹张欣 ¹师胜利²

作者信息

1. 河北工业大学计算机科学与软件学院,天津 300401
2. 河北师范大学信息技术学院,石家庄 050024
折叠

摘要

Abstract

PFP_Growth algorithm is the parallelization of FP_Growth algorithm on the Hadoop platform based on MapReduce. The algorithm does not consider the balance of the load while grouping the transaction set, which causes the time inconsistency of different nodes to accomplish the tasks and even a bigger difference. Thus, it reduces the efficiency of the algorithm. To improve the efficiency of the algorithm, this paper proposes a Spark-based RPFP algorithm, which optimizes PFP_Growth algorithm through balancing the groups and reducing the time complexity. To balance the group, the large load is placed into the group with the smallest total load. The address of the element is fast accessed by adding a Hash table to the head table, which reduces the time complexity. Experimental results show that RPFP algorithm effectively improves the mining efficiency of the frequent itemsets.

关键词

FP_Growth算法/频繁项集挖掘/负载均衡/链头表结构/Spark

Key words

FP_Growth algorithm/frequent itemset mining/load balance/head table/Spark

分类

信息技术与安全科学

引用本文复制引用

石陆魁,张欣,师胜利..基于Spark的FP_Growth算法的并行与优化[J].计算机工程与应用,2018,54(13):52-58,110,8.

基金项目

河北省自然科学基金(No.F2016202144,No.F2017202145). （No.F2016202144,No.F2017202145）

计算机工程与应用

OA北大核心CSCDCSTPCD

ISSN：1002-8331

访问量0

下载量0

段落导航