首页|期刊导航|计算机技术与发展|基于Spark的关联规则挖掘算法并行化研究

基于Spark的关联规则挖掘算法并行化研究

许德心李玲娟

计算机技术与发展2019，Vol.29Issue(3)：30-34,5.

计算机技术与发展2019，Vol.29Issue(3)：30-34,5.DOI:10.3969/j.issn.1673-629X.2019.03.006

基于Spark的关联规则挖掘算法并行化研究

Research on Parallelization of Association Rules Mining Algorithm Based on Spark

许德心 ¹李玲娟¹

作者信息

1. 南京邮电大学计算机学院, 江苏南京 210023
折叠

摘要

Abstract

Association rule mining is an important task of data mining. Association rule mining algorithm can excavate potential relationships from data, among which Apriori algorithm is a typical representative. The Spark platform is a distributed memory-based big data framework suitable for iterative computing. In order to improve the mining efficiency of strong association rules, we propose a parallelization scheme of Apriori algorithm based on Spark. The scheme utilizes distributed architecture and cluster scheduling mechanism of the Spark platform to distribute the transaction data set to multiple sub nodes. Each sub node invokes transformation operation to obtain local candidate itemsets and support degree, and stores them in memory. Local candidate itemsets in summary nodes generate global candidate itemsets and global frequent itemsets. The process is iterated until the next level candidate set does not exist. The performance test experiment shows that the parallel Apriori algorithm based on the Spark platform can effectively analyze the frequent itemsets in large data itemsets and extract strong association rules, with high accuracy and timeliness.

关键词

Apriori/关联规则/并行化/Spark/推荐算法/频繁项集/挖掘

Key words

Apriori/association rules/parallelization/Spark/recommendation algorithm/frequent itemsets/mining

分类

信息技术与安全科学

引用本文复制引用

许德心,李玲娟..基于Spark的关联规则挖掘算法并行化研究[J].计算机技术与发展,2019,29(3):30-34,5.

基金项目

国家自然科学基金(61302158,61571238) （61302158,61571238）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航