| 注册
首页|期刊导航|计算机工程与应用|基于Spark的FP_Growth算法的并行与优化

基于Spark的FP_Growth算法的并行与优化

石陆魁 张欣 师胜利

计算机工程与应用2018,Vol.54Issue(13):52-58,110,8.
计算机工程与应用2018,Vol.54Issue(13):52-58,110,8.DOI:10.3778/j.issn.1002-8331.1705-0114

基于Spark的FP_Growth算法的并行与优化

Parallelization and optimization of FP_Growth algorithm based on Spark

石陆魁 1张欣 1师胜利2

作者信息

  • 1. 河北工业大学 计算机科学与软件学院,天津 300401
  • 2. 河北师范大学 信息技术学院,石家庄 050024
  • 折叠

摘要

Abstract

PFP_Growth algorithm is the parallelization of FP_Growth algorithm on the Hadoop platform based on MapReduce. The algorithm does not consider the balance of the load while grouping the transaction set, which causes the time inconsistency of different nodes to accomplish the tasks and even a bigger difference. Thus, it reduces the efficiency of the algorithm. To improve the efficiency of the algorithm, this paper proposes a Spark-based RPFP algorithm, which optimizes PFP_Growth algorithm through balancing the groups and reducing the time complexity. To balance the group, the large load is placed into the group with the smallest total load. The address of the element is fast accessed by adding a Hash table to the head table, which reduces the time complexity. Experimental results show that RPFP algorithm effectively improves the mining efficiency of the frequent itemsets.

关键词

FP_Growth算法/频繁项集挖掘/负载均衡/链头表结构/Spark

Key words

FP_Growth algorithm/frequent itemset mining/load balance/head table/Spark

分类

信息技术与安全科学

引用本文复制引用

石陆魁,张欣,师胜利..基于Spark的FP_Growth算法的并行与优化[J].计算机工程与应用,2018,54(13):52-58,110,8.

基金项目

河北省自然科学基金(No.F2016202144,No.F2017202145). (No.F2016202144,No.F2017202145)

计算机工程与应用

OA北大核心CSCDCSTPCD

1002-8331

访问量0
|
下载量0
段落导航相关论文