skip to main content
10.1145/1081870.1081908acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Mining closed relational graphs with connectivity constraints

Authors Info & Claims
Published:21 August 2005Publication History

ABSTRACT

Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In this kind of graph, connectivity becomes critical in identifying highly associated groups and clusters. In this paper, we investigate the issues of mining closed frequent graphs with connectivity constraints in massive relational graphs where each graph has around 10K nodes and 1M edges. We adopt the concept of edge connectivity and apply the results from graph theory, to speed up the mining process. Two approaches are developed to handle different mining requests: CloseCut, a pattern-growth approach, and splat, a pattern-reduction approach. We have applied these methods in biological datasets and found the discovered patterns interesting.

References

  1. C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 211--218, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. 2001 Int. Conf. Data Engineering (ICDE'01), pages 443--452, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Butte, P. Tamayo, D. Slonim, T. Golub, and I. Kohane. Discovering functional relationships between rna expression and chemotherapeutic susceptibility. In Proc. of the National Academy of Science, volume 97, pages 12182--12186, 2000.]]Google ScholarGoogle Scholar
  4. C. Chekuri, A. Goldberg, D. Karger, M. Levine, and C. Stein. Experimental study of minimum cut algorithms. In Proc. of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'97), pages 324--333, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Science, volume 95, pages 14863--14868, 1998.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In Proc. 2000 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'00), pages 150--160, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI'94 Workshop on Knowledge Discovery in Databases (KDD'94), pages 169--180, 1994.]]Google ScholarGoogle Scholar
  8. J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB'04), pages 308--315.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13--23, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313--320, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Mielikainen. Intersecting data to closed sets with constraints. In Proc. of the First ICDM Workshop on Frequent Itemset Mining Implementation (FIMI'03), 2003.]]Google ScholarGoogle Scholar
  12. F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Spirin and L. Mirny. Protein complexes and functional modules in molecular networks. In Proc. of the National Academy of Science, volume 100, pages 12123--12128, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM, 44:585--591, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. In Proc. of the National Academy of Science, volume 96, pages 2907--2912, 1999.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458--465, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 236--245, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. West. Introduction to Graph Theory. Prentice Hall, Cambridge, MA, 2000.]]Google ScholarGoogle Scholar
  21. Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15:1101--1113, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 721--724, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Yan and J. Han. Closegraph: Mining closed frequent graph patterns. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286--295, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. 2004 ACM Int. Conf. Management of Data (SIGMOD'04), pages 335--346, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Zaki and K. Gouda. Fast vertical mining using diffsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 326--335, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining closed relational graphs with connectivity constraints

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader