首页|期刊导航|计算机技术与发展|基于二叉树的并行频繁项集挖掘算法

基于二叉树的并行频繁项集挖掘算法

陈静郑彦

计算机技术与发展Issue(10)：80-83,87,5.

计算机技术与发展Issue(10)：80-83,87,5.DOI:10.3969/j.issn.1673-629X.2015.10.017

基于二叉树的并行频繁项集挖掘算法

Parallel Algorithm of Frequent Itemset Mining Based on Binary-tree

陈静 ¹郑彦¹

作者信息

1. 南京邮电大学计算机学院，江苏南京 210003
折叠

摘要

Abstract

Along with the advent of the era of big data,people have higher requirements in the speed of data processing and the utilization of data. In the aspect of mining frequent itemset,the algorithms of Count Distribution and Data Distribution are classical parallel algo-rithms for mining frequent itemset,because large storage space and communication overhead are needed in the process of mining,the min-ing efficiency is not very ideal. A parallel algorithm of frequent itemset mining based on the binary-tree is proposed in this paper,it takes advantage of the parallelism of MapReduce. Firstly,find out all subsets of fixed size in the database by using the method of traversing the binary-tree. Secondly,count occurrence numbers of each subset,and compare with a fixed threshold which is set in advance. If the occur-rence number of a subset is more than the threshold value,the subset is the frequent itemset which is requested. The study of the compari-son and analysis of the experimental results show that the proposed algorithm needs only one process of MapReduce to complete the min-ing work,it makes full use of the parallelism of the cluster. It does not need to use iterative way for mining frequent itemset,and the per-formance is superior to the CD and DD algorithms,in other words,it has higher mining efficiency.

关键词

频繁项集挖掘/MapReduce/并行计算/二叉树

Key words

frequent itemset mining/MapReduce/parallel computing/binary-tree

分类

信息技术与安全科学

引用本文复制引用

陈静,郑彦..基于二叉树的并行频繁项集挖掘算法[J].计算机技术与发展,2015,(10):80-83,87,5.

基金项目

国家“973”重点基础研究发展计划项目(2006AA01Z201) （2006AA01Z201）

计算机技术与发展

OACSTPCD

ISSN：1673-629X

访问量0

下载量0

段落导航