计算机应用研究2017,Vol.34Issue(8):2274-2277,4.DOI:10.3969/j.issn.1001-3695.2017.08.007
IABS:一个基于Spark的Apriori改进算法
IABS: parallel improved Apriori algorithm based on Spark
摘要
Abstract
Apriori algorithm is one of the most classical algorithm in association rule mining, the core problem is the generation process of frequent itemsets.Firstly, aimed at the existing problems of classical Apriori algorithm, such as it needed to scan the transaction global database for several times and needed to generate candidate itemsets, this paper optimized it by transforming storage structure and eliminating the process of candidate itemsets generation.Then, with the advent of the era of big data, data volume rises with the day, classical Apriori algorithm faces severe challenge.Based on the improved Apriori algorithm and combined with Spark platform, this paper proposed the IABS algorithm, which made full use of Spark, such as in-memory computation, resilient distributed datasets.Compared with already existing similar algorithms, the sizeup and node salability of IABS are validated, as well as, IABS achieves 23.88% performance improvement in average for various benchmarks.Especially, as the growth of data, its performance improvement is more obvious.关键词
Apriori算法/频繁项集/存储结构转换/Spark/内存计算Key words
Apriori algorithm/frequent itemset/storage structure transformation/Spark/in-memory computation分类
信息技术与安全科学引用本文复制引用
闫梦洁,罗军,刘建英,侯传旺..IABS:一个基于Spark的Apriori改进算法[J].计算机应用研究,2017,34(8):2274-2277,4.基金项目
国家"863"计划资助项目(2014AA01A302) (2014AA01A302)