南京大学学报:自然科学版2011,Vol.47Issue(5):532-543,12.
基于对gSpan改进的有向频繁子图挖掘算法
Digraph frequent subgraph mining based on gSpan
摘要
Abstract
With graph data generated from various sources,meaningful pattern mining on this kind of data set becomes more and more urgent,especially along with the development of life science,yielding a considerable amount of directed graphs.However,existed related algorithms are all designed for undirected graphs,like Frequent Subgraph Discovery(FSG),Apriori-based Graph Mining(AGM),Fast Frequent Subgraph Mining(FFSM) and so on.Except for FFSM,there is not such an algorithm specially designed for directed graphs.Meanwhile,FFSM algorithm itself also needs some modifications ahead of utilizing on directed graphs.In the paper,we propose a new algorithm called DFSS after the research based on some pattern mining algorithms of graphs,which detects frequent substructures directly in directed graphs.DFSS constructs a new data model to indicate each graph and maps each one into a unique minimum array code.Two new concepts,link degrees and level degree are presented in the paper for the special operation of directed graph pattern mining.We also analyze the efficiency of FFSM and DFSS.The time complexity of DFSS is O(n3·2n),which improves O(m4/n2) times compared to FFSM,where n is the number of frequent edges and m is the number of frequent vertices in the graph data set.Experiments showed that DFSS achieves orders of magnitude speedup in comparison with FFSM,and demonstrate theoretical analysis mentioned above.In addition,the algorithm proposed in this paper plays the role as the first to directly operate on the directed graph data set,which will surely have vital significance for future related works.关键词
有向图挖掘/gSpan/频繁子图/适用性扩展Key words
directed graph/digraph frequent subgraph mining based on gSpan/new data model/complexity分类
信息技术与安全科学引用本文复制引用
周溜溜,业宁..基于对gSpan改进的有向频繁子图挖掘算法[J].南京大学学报:自然科学版,2011,47(5):532-543,12.基金项目
国家自然科学基金 ()
江苏省自然科学基金 ()
江苏省青蓝工程学术带头人项目 ()