ABSTRACT
Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In this kind of graph, connectivity becomes critical in identifying highly associated groups and clusters. In this paper, we investigate the issues of mining closed frequent graphs with connectivity constraints in massive relational graphs where each graph has around 10K nodes and 1M edges. We adopt the concept of edge connectivity and apply the results from graph theory, to speed up the mining process. Two approaches are developed to handle different mining requests: CloseCut, a pattern-growth approach, and splat, a pattern-reduction approach. We have applied these methods in biological datasets and found the discovered patterns interesting.
- C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 211--218, 2002.]] Google ScholarDigital Library
- D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 443--452, 2001.]] Google ScholarDigital Library
- A. Butte, P. Tamayo, D. Slonim, T. Golub, and I. Kohane. Discovering functional relationships between rna expression and chemotherapeutic susceptibility. In Proc. of the National Academy of Science, volume 97, pages 12182--12186, 2000.]]Google Scholar
- C. Chekuri, A. Goldberg, D. Karger, M. Levine, and C. Stein. Experimental study of minimum cut algorithms. In Proc. of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'97), pages 324--333, 1997.]] Google ScholarDigital Library
- M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Science, volume 95, pages 14863--14868, 1998.]]Google ScholarCross Ref
- G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In Proc. 2000 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'00), pages 150--160, 2000.]] Google ScholarDigital Library
- L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94), pages 169--180, 1994.]]Google Scholar
- J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB'04), pages 308--315.]] Google ScholarDigital Library
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13--23, 1998.]] Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313--320, 2001.]] Google ScholarDigital Library
- T. Mielikainen. Intersecting data to closed sets with constraints. In Proc. of the First ICDM Workshop on Frequent Itemset Mining Implementation (FIMI'03), 2003.]]Google Scholar
- F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), 2003.]] Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.]] Google ScholarDigital Library
- V. Spirin and L. Mirny. Protein complexes and functional modules in molecular networks. In Proc. of the National Academy of Science, volume 100, pages 12123--12128, 2003.]]Google ScholarCross Ref
- M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM, 44:585--591, 1997.]] Google ScholarDigital Library
- P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. In Proc. of the National Academy of Science, volume 96, pages 2907--2912, 1999.]]Google ScholarCross Ref
- N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458--465, 2002.]] Google ScholarDigital Library
- J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 236--245, 2003.]] Google ScholarDigital Library
- T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003.]] Google ScholarDigital Library
- D. West. Introduction to Graph Theory. Prentice Hall, Cambridge, MA, 2000.]]Google Scholar
- Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15:1101--1113, 1993.]] Google ScholarDigital Library
- X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 721--724, 2002.]] Google ScholarDigital Library
- X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286--295, 2003.]] Google ScholarDigital Library
- X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. 2004 ACM Int. Conf. Management of Data (SIGMOD'04), pages 335--346, 2004.]] Google ScholarDigital Library
- M. Zaki and K. Gouda. Fast vertical mining using diffsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 326--335, 2003.]] Google ScholarDigital Library
Index Terms
- Mining closed relational graphs with connectivity constraints
Recommendations
The bondage and connectivity of a graph
Let G =(V,E) be a simple graph. A subset S of V is a dominating set of G if for any vertex v ∈ V - S, there exists some vertex u ∈ S such that uv ∈ E(G). The domination number, denoted by γ(G), is the cardinality of a minimum dominating set of G. The ...
On the connectivity of p-diamond-free graphs
For a vertex v of a graph G, we denote by d(v) the degree of v. The local connectivity@k(u,v) of two vertices u and v in a graph G is the maximum number of internally disjoint u-v paths in G, and the connectivity of G is defined as @k(G)=min{@k(u,v)|u,v@...
A bound on 4-restricted edge connectivity of graphs
An edge cut of a connected graph is 4-restricted if it disconnects this graph with each component having order at least four. The size of minimum 4-restricted edge cuts of graph G is called its 4-restricted edge connectivity and is denoted by @l"4(G). ...
Comments