计算机技术与发展Issue(10):80-83,87,5.DOI:10.3969/j.issn.1673-629X.2015.10.017
基于二叉树的并行频繁项集挖掘算法
Parallel Algorithm of Frequent Itemset Mining Based on Binary-tree
摘要
Abstract
Along with the advent of the era of big data,people have higher requirements in the speed of data processing and the utilization of data. In the aspect of mining frequent itemset,the algorithms of Count Distribution and Data Distribution are classical parallel algo-rithms for mining frequent itemset,because large storage space and communication overhead are needed in the process of mining,the min-ing efficiency is not very ideal. A parallel algorithm of frequent itemset mining based on the binary-tree is proposed in this paper,it takes advantage of the parallelism of MapReduce. Firstly,find out all subsets of fixed size in the database by using the method of traversing the binary-tree. Secondly,count occurrence numbers of each subset,and compare with a fixed threshold which is set in advance. If the occur-rence number of a subset is more than the threshold value,the subset is the frequent itemset which is requested. The study of the compari-son and analysis of the experimental results show that the proposed algorithm needs only one process of MapReduce to complete the min-ing work,it makes full use of the parallelism of the cluster. It does not need to use iterative way for mining frequent itemset,and the per-formance is superior to the CD and DD algorithms,in other words,it has higher mining efficiency.关键词
频繁项集挖掘/MapReduce/并行计算/二叉树Key words
frequent itemset mining/MapReduce/parallel computing/binary-tree分类
信息技术与安全科学引用本文复制引用
陈静,郑彦..基于二叉树的并行频繁项集挖掘算法[J].计算机技术与发展,2015,(10):80-83,87,5.基金项目
国家“973”重点基础研究发展计划项目(2006AA01Z201) (2006AA01Z201)