计算机技术与发展2017,Vol.27Issue(7):29-33,5.DOI:10.3969/j.issn.1673-629X.2017.07.007
面向流数据的DPFP-Stream算法的设计与实现
Realization and Implementation of Distributed Parallel Mining of Frequent Patterns for Data Streams
摘要
Abstract
Finding frequent patterns in a continuous stream of transactions is critical for many applications such as retail market data analysis,network monitoring,web usage mining and stock market prediction.Even though numerous frequent pattern mining algorithms have been developed over the past decade,new solutions for handling stream data are still required due to the continuous,unbounded and ordered sequence of data elements generated at a rapid rate in a data stream.As a result,the knowledge embedded in a data stream is more likely to be changed as time goes by.Therefore,extracting frequent patterns from data at multiple time granularities and monitoring the gradual changes of frequent patterns can enhance the analysis of online data streams.Based on efficient FP-tree structure,according to the ideas of tilted-time windows and MapReduce,the DPFP-stream is proposed and implemented in Storm.The data resource of it uses Kafka and stores middle result into Redis.Extensive experiment shows that the algorithm proposed is highly efficient in terms of time complexity when finding recent frequent patterns from a high-speed data stream.With the application of the algorithm in real-time computing,it can not only process high speed stream,but also monitor the change of frequent patterns with tilted-time windows.关键词
DPFP-stream/MapReduce/Storm/RedisKey words
DPFP-stream/MapReduce/Storm/Redis分类
信息技术与安全科学引用本文复制引用
孙杜靖,李玲娟,马可..面向流数据的DPFP-Stream算法的设计与实现[J].计算机技术与发展,2017,27(7):29-33,5.基金项目
国家自然科学基金资助项目(61302158,61571238) (61302158,61571238)