skip to main content
10.1145/2063576.2063835acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

CP-index: on the efficient indexing of large graphs

Authors Info & Claims
Published:24 October 2011Publication History

ABSTRACT

Graph search, i.e., finding all graphs in a database D that contain the query graph q, is a classical primitive prevalent in various graph database applications. In the past, there has been an abundance of studies devoting to this topic; however, with the recent emergence of large information networks, it places new challenges to the research community. Most of the traditional graph search schemes utilize the strategy of graph feature based indexing, whereas the index construction step that often involves frequent subgraph mining becomes a bottleneck for large graphs due to the high computational complexity. Although there have been several methods proposed to solve this mining bottleneck such as summarization of database graphs, the frequent subgraphs thus generated as indexing features are still unsatisfactory because the feature set is in general not only inadequate or deficient for the large graph scenario, but also with many redundant features. Furthermore, the large size of the graphs makes it too easy for a small feature to be contained in many of them, severely impacting its selectivity and pruning power. Motivated by all the above issues we identify, in this paper we propose a novel CP-Index (Contact Preservation) for efficient indexing of large graphs. To overcome the low selectivity issue, we reap further pruning opportunities by leveraging each feature's location information in the database graphs. Specifically, we look at how features are touching upon each other in the query, and check whether this contact pattern is preserved in the target graphs. Then, to tackle the deficiency and redundancy problems associated with features, new feature generation and selection methods such as dual feature generation and size-increasing bootstrapping feature selection are introduced to complete our design. Experiment results show that CP-Index is much more effective in indexing large graphs.

References

  1. C. Chen, C. X. Lin, M. Fredrikson, M. Christodorescu, X. Yan, and J. Han. Mining graph patterns efficiently via randomized summaries. PVLDB, 2(1):742--753, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Chen, W. Hsu, M.-L. Lee, and S.-K. Ng. Nemofinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In KDD, pages 106--115, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: Towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, pages 913--922, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In EDBT, pages 193--204, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Christodorescu, S. Jha, and C. Kruegel. Mining specifications of malicious behavior. In ESEC/SIGSOFT FSE, pages 5--14, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu. Graph pattern matching: From intractable to polynomial time. PVLDB, 3(1):264--275, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Giugno and D. Shasha. Graphgrep: A fast and universal method for querying graphs. In ICPR (2), pages 112--115, 2002.Google ScholarGoogle Scholar
  9. W.-S. Han, J. Lee, M.-D. Pham, and J. X. Yu. iGraph: A framework for comparisons of disk-based graph indexing techniques. PVLDB, 3(1):449--459, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. A. Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. Origami: Mining representative orthogonal graph patterns. In ICDM, pages 153--162, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. He and A. K. Singh. Graphs-at-a-time: Query language and access methods for graph databases. In SIGMOD, pages 405--418, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321--354, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, pages 505--516, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Ke, J. Cheng, and W. Ng. Efficient correlation search from graph databases. IEEE Transactions on Knowledge and Data Engineering, 20(12):1601--1615, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In KDD, pages 228--238, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Polyzotis and M. N. Garofalakis. Xsketch synopses for XML data graphs. ACM Transactions on Database Systems, 31(3):1014--1063, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Sarlós, A. A. Benczúr, K. Csalogány, D. Fogaras, and B. Rácz. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In WWW, pages 297--306, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: An efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Wang, S. Parthasarathy, K.-L. Tan, and A. K. H. Tung. Csv: Visualizing and mining cohesive subgraphs. In SIGMOD, pages 445--458, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE, pages 976--985, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  24. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Zhang, S. Li, and J. Yang. Gaddi: Distance index based subgraph matching in biological networks. In EDBT, pages 192--203, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Zhao and J. Han. On graph query optimization in large networks. PVLDB, 3(1):340--351, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree+ delta >= graph. In VLDB, pages 938--949, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CP-index: on the efficient indexing of large graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
        October 2011
        2712 pages
        ISBN:9781450307178
        DOI:10.1145/2063576

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 October 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader