南京师大学报(自然科学版)2016,Vol.39Issue(4):19-24,6.DOI:10.3969/j.issn.1001-4616.2016.04.005
一种新的基于FP_Growth的频繁项目集并行挖掘算法
New Parallel Algorithm for Mining Frequent Item Sets Based on FP_Growth
摘要
Abstract
Mining of frequent item sets is used to find the association rules between items.In order to get frequent item sets of big data efficiently,this paper proposes a new parallel algorithm for mining frequent item sets based on FP_ Growth,named NPFP_Growth(New Parallel algorithm based on FP_Growth).The storage structure of local frequent pat tern tree is improved and created in each node based on parallel computing model Map/Reduce and distributed storage system HDFS,and then longest global frequent item sets are mined in each branch of the tree.Finally,Support for item sets which does not meet global minimum support is computed and then sent to corresponding computing node to count.Parallel mining algorithm NPFP_Growth is implemented.The experimental results show that the algorithm have high computing efficiency and good scalability.关键词
频繁项目集/关联规则/FP_Growth/Hadoop/Map/ReduceKey words
frequent item sets/association rule/FP_Growth/Hadoop/Map/Reduce分类
信息技术与安全科学引用本文复制引用
孙鸿艳,吉根林..一种新的基于FP_Growth的频繁项目集并行挖掘算法[J].南京师大学报(自然科学版),2016,39(4):19-24,6.基金项目
国家自然科学基金(41471371). (41471371)