面向大图的Top-Rank-K频繁模式挖掘算法OA北大核心CSTPCD
Top-Rank-K frequent pattern mining algorithm for large graphs
频繁模式挖掘(Frequent Pattern Mining,FPM)在社交分析中扮演重要角色,能从海量社交数据中挖掘用户行为的模式和规律,为社交网络的研究提供新的认识和决策支持.然而,对于一个FPM任务,设置一个合适的支持度阈值不容易;此外,FPM作为一个NP-hard问题,不存在多项式时间的算法.针对上述问题,提出一种无须用户设置初始支持度阈值的算法ItrMiner.该算法使用一种新的兴趣度指标对模式进行评估,综合考虑模式的大小和支持度,挖掘Top-Rank-K频繁模式.同时,为了解决去除初始支持度阈值后在算法剪枝阶段遇到的困难,提出基于树模式优先识别的策略和模式扩展约束策略,减少非必要候选模式的生成.在真实图和人工合成图数据集上进行了广泛的实验,证明ItrMiner在执行效率和可扩展性方面表现出色,尤其在稠密的数据集上,其时间开销仅为基线算法Top-K Graph Miner的13.2%.另外,提出的模式扩展约束策略的有效性较高,和无扩展约束优化的ItrMinernopt算法相比,效率提升最高可达9.5倍.
Frequent Pattern Mining(FPM)plays a crucial role in social analytics,which mines patterns and regularities about users'behaviour from vast amounts of social data,thereby provides new insights and decision support for research on social networks.However,it is not easy to set an appropriate support threshold for an FPM task.Moreover,as an NP-hard problem,FPM does not exist a polynomial-time algorithm.To address these problems,an algorithm that does not require an initial support threshold,denoted by ItrMiner,is proposed.In ItrMiner,a new metric which considers both size and support is proposed for measuring the interestingness of a pattern,and this metric is utilized to mine the Top-Rank-K frequent patterns.Meanwhile,to overcome the difficulty caused by the lack of initial support threshold,a pattern-recognition-based strategy and a pattern expansion strategy are proposed,which reduces excessive redundant candidate patterns.Extensive experiments on real and synthetic graphs show that ItrMiner performs well in terms of efficiency and scalability,especially on dense datasets,with a time overhead of only 13.2%of the baseline algorithm Top-K Graph Miner.Furthermore,the pattern expansion strategy is quite effective,which performs up to 9.5 times faster than the counterpart of ItrMinernopt without optimization.
邹杰军;王欣;石俊豪;兰卓;方宇;张翀;谢文波;沈玲珍
西南石油大学计算机科学学院,成都,610500
计算机与自动化
频繁模式挖掘社交分析支持度阈值兴趣度
frequent pattern miningsocial analyticssupport thresholdinterestingness
《南京大学学报(自然科学版)》 2024 (001)
38-52 / 15
国家自然科学基金(62172102),四川省科技创新人才基金(2022JDRC0009)
评论