计算机应用与软件2016,Vol.33Issue(5):35-39,5.DOI:10.3969/j.issn.1000-386x.2016.05.010
Hadoop 下负载均衡的频繁项集挖掘算法研究
RESEARCH ON LOAD BALANCED FREQUENT ITEMSETS MINING ALGORITHM BASED ON HADOOP
摘要
Abstract
Frequent itemsets mining (FIM)is an important component of association rules mining algorithms.However,classical Apriori and FP-Growth algorithms face the bottleneck of memory occupation and computation performance when processing massive data.Based on Hadoop cloud computing platform,we proposed the HBFP algorithm of frequent itemsets mining applicable for big data processing,and designed the data partitioning with suffix mode conversion and the balanced tasks grouping scheme.This makes the nodes possess locally the data relyed on by the computation and realises the parallel data mining method with different nodes independent each other,and ensures the global load balancing characteristic of the algorithm.Experimental data indicated that the HBFP algorithm could distribute the calculation load to different computation node uniformly and run FP-Growth mining progress parallelly and mutual-independently.The efficiency of the algorithm raised about 12%,and the global stabilisation and efficiency of the algorithm were promoted as well.关键词
频繁项集挖掘/FP-Growth 算法/Hadoop/并行计算Key words
Frequent itemsets mining/FP-Growth/Hadoop/Parallel computing分类
信息技术与安全科学引用本文复制引用
朱文飞,齐建东,洪剑珂..Hadoop 下负载均衡的频繁项集挖掘算法研究[J].计算机应用与软件,2016,33(5):35-39,5.基金项目
国家林业局重点课题(2013-05);十二五科技支撑课题(2011BAH10B04)。 ()