计算机工程与应用2016,Vol.52Issue(19):72-77,93,7.DOI:10.3778/j.issn.1002-8331.1501-0238
不确定数据流最大频繁项集挖掘算法研究
Mining maximum frequent itemsets over uncertain data streams
摘要
Abstract
For large data bases, the number of frequent itemsets is huge and redundancy, and mining maximum frequent itemsets is more suitable because the scale of the output is much smaller. But traditional mining maximum frequent item-sets algorithm assumes the availability of precise data. Mining frequent itemsets from uncertain data streams is much more complicated than precise streams, and there is no research on mining maximum frequent itemsets over uncertain data streams until now. Therefore, aiming at the shortcoming, the paper proposes a maximum frequent itemsets mining algorithm TUFSMax. The algorithm adopts a decay window model to find frequent itemsets through calculating expected supports, and it uses a new method, called marking the tree nodes. By using the new method, algorithm TUFSMax can avoid super detection in the course of mining all of the maximum frequent itemsets, to save the detection time. Experimental results show that the proposed algorithm is efficient in time and space.关键词
不确定性数据流/最大频繁项集/超集检测Key words
uncertain data stream/maximum frequent items/super check分类
信息技术与安全科学引用本文复制引用
刘慧婷,候明利,赵鹏,姚晟..不确定数据流最大频繁项集挖掘算法研究[J].计算机工程与应用,2016,52(19):72-77,93,7.基金项目
国家自然科学基金(No.61202227);安徽省自然科学基金(No.1408085MF122)。 ()