Skip to main content
Log in

Efficient structure similarity searches: a partition-based approach

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Graphs are widely used to model complex data in many applications, such as bioinformatics, chemistry, social networks, pattern recognition. A fundamental and critical query primitive is to efficiently search similar structures in a large collection of graphs. This article mainly studies threshold-based graph similarity search with edit distance constraints. Existing solutions to the problem utilize fixed-size overlapping substructures to generate candidates, and thus become susceptible to large vertex degrees and distance thresholds. In this article, we present a partition-based approach to tackle the problem. By dividing data graphs into variable-size non-overlapping partitions, the edit distance constraint is converted to a graph containment constraint for candidate generation. We develop efficient query processing algorithms based on the novel paradigm. Moreover, candidate-pruning techniques and an improved graph edit distance verification algorithm are developed to boost the performance. In addition, a cost-aware graph partitioning method is devised to optimize the index. Extending the partition-based filtering paradigm, we present a solution to the top-\(k\) graph similarity search problem, where tailored filtering, look-ahead and computation-sharing strategies are exploited. Using both public real-life and synthetic datasets, extensive experiments demonstrate that our approaches significantly outperform the baseline and its alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. An elaborated discussion is provided in Part A of supplementary material to this article.

  2. A partition can be either connected or disconnected.

  3. For example, a partition contained by the query for Pars, or a \(q\)-gram appearing in the query’s \(q\)-gram multiset for \(\kappa \)-AT.

  4. The special case of \(\tau = 1\) is polynomially reducible from the partition problem that decides whether a given multiset of numbers can be partitioned into two subsets such that the sums of elements in both subsets are equal, and thus, is NP-hard.

  5. http://dtp.nci.nih.gov/docs/aids/aids_data.html.

  6. http://www.iam.unibe.ch/fki/databases/iam-graph-database/download-the-iam-graph-database.

  7. http://www.cs.washington.edu/research/xmldatasets/.

  8. http://www.cse.ust.hk/graphgen/.

  9. This RAM configuration is to accommodate the A \(^*\)-based verification algorithm, which needs to maintain a large number of partial mappings in a priority queue.

  10. Some of the experiments were manually terminated after running for 24 h, the duration of which was reported instead.

  11. Onwards this method is adopted for summarizing experiment results if not otherwise specified, and figures of the same results in logarithmic scale can be found in Part C of supplementary material to this article.

  12. The original algorithm does not come with exact verification [7].

  13. There is a pile of literature dedicated to subgraph similarity search based on MCS, e.g., [12, 19, 26].

References

  1. Bi, F., Chang, L., Lin, X., Qin, L., Zhang, W.: Efficient subgraph matching by postponing Cartesian products. In: SIGMOD Conference, pp. 1199–1214 (2016)

  2. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. PRL 1(4), 245–253 (1983)

    Article  MATH  Google Scholar 

  3. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. IJPRAI 18(3), 265–298 (2004)

    Google Scholar 

  4. Fankhauser, S., Riesen, K., Bunke, H.: Speeding up graph edit distance computation through fast bipartite matching. In: GbRPR, pp. 102–111 (2011)

  5. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness, 1st edn. W. H. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  6. Gouda, K., Arafa, M., Calders, T.: Bfst_ed: a novel upper bound computation framework for the graph edit distance. In: SISAP, pp. 3–19 (2016)

  7. Gouda, K., Hassaan, M.: CSI_GED: an efficient approach for graph edit similarity computation. In: ICDE, pp. 265–276 (2016)

  8. Gupta, M., Gao, J., Yan, X., Cam, H., Han, J.: Top-k interesting subgraph discovery in information networks. In: ICDE, pp. 820–831 (2014)

  9. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Los Altos (2011)

    MATH  Google Scholar 

  10. Han, W.-S., Lee, J., Lee, J.-H.: Turbo\(_{\text{iso}}\): towards ultrafast and robust subgraph isomorphism search in large graph databases. In: SIGMOD Conference, pp. 337–348 (2013)

  11. He, H., Singh, A.K.: Closure-Tree: an index structure for graph queries. In: ICDE, p. 38 (2006)

  12. Jin, C., Bhowmick, S.S., Choi, B., Zhou, S.: PRAGUE: towards blending practical visual subgraph query formulation and query processing. In: ICDE, pp. 222–233 (2012)

  13. Marín, R.M., Aguirre, N.F., Daza, E.E.: Graph theoretical similarity approach to compare molecular electrostatic potentials. J. Chem. Inf. Model. 48(1), 109–118 (2008)

    Article  Google Scholar 

  14. Ranu, S., Hoang, M.X., Singh, A.K.: Answering top-\(k\) representative queries on graph databases. In: SIGMOD Conference, pp. 1163–1174 (2014)

  15. Raveaux, R., Burie, J.-C., Ogier, J.-M.: A graph matching method and a graph matching distance based on subgraph assignments. PRL 31(5), 394–406 (2010)

    Article  Google Scholar 

  16. Ren, X., Wang, J.: Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs. PVLDB 8(5), 617–628 (2015)

    Google Scholar 

  17. Riesen, K., Fankhauser, S., Bunke, H.: Speeding up graph edit distance computation with a bipartite heuristic. In: MLG (2007)

  18. Sanfeliu, A., Fu, K.-S.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cyber. 13(3), 353–362 (1983)

    Article  MATH  Google Scholar 

  19. Shang, H., Lin, X., Zhang, Y., Yu, J.X., Wang, W.: Connected substructure similarity search. In: SIGMOD Conference, pp. 903–914 (2010)

  20. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)

    Google Scholar 

  21. Ullmann, J.R.: Bit-vector algorithms for binary constraint satisfaction and subgraph isomorphism. ACM J. Exp. Algorithmics 15, 1–6 (2010)

    MathSciNet  MATH  Google Scholar 

  22. Ullmann, J.R.: Degree reduction in labeled graph retrieval. ACM J. Exp. Algorithmics 20, 1–3 (2015)

    MathSciNet  MATH  Google Scholar 

  23. Wang, G., Wang, B., Yang, X., Yu, G.: Efficiently indexing large sparse graphs for similarity search. IEEE Trans. Knowl. Data Eng. 24(3), 440–451 (2012)

    Article  Google Scholar 

  24. Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: ICDE, pp. 210–221 (2012)

  25. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD Conference, pp. 335–346 (2004)

  26. Yan, X., Yu, P.S., Han, J.: Substructure similarity search in graph databases. In: SIGMOD Conference, pp. 766–777 (2005)

  27. Yang, S., Han, F., Wu, Y., Yan, X.: Fast top-k search in knowledge graphs. In: ICDE (to appear) (2016)

  28. Yang, Z., Fu, A.W., Liu, R.: Diversified top-\(k\) subgraph querying in a large graph. In: SIGMOD Conference, pp. 1167–1182 (2016)

  29. Zeng, Z., Tung, A.K.H., Wang, J., Feng, J., Zhou, L.: Comparing stars: on approximating graph edit distance. PVLDB 2(1), 25–36 (2009)

    Google Scholar 

  30. Zhang, K., Wang, J.T.-L., Shasha, D.: On the editing distance between undirected acyclic graphs and related problems. In: CPM, pp. 395–407 (1995)

  31. Zhang, S., Yang, J., Jin, W.: SAPPER: subgraph indexing and approximate matching in large graphs. PVLDB 3(1), 1185–1194 (2010)

    Google Scholar 

  32. Zhao, X., Xiao, C., Lin, X., Liu, Q., Zhang, W.: A partition-based approach to structure similarity search. PVLDB 7(3), 169–180 (2013)

    Google Scholar 

  33. Zhao, X., Xiao, C., Lin, X., Wang, W., Ishikawa, Y.: Efficient processing of graph similarity queries with edit distance constraints. VLDB J. 22(6), 727–752 (2013)

    Article  Google Scholar 

  34. Zheng, W., Zou, L., Lian, X., Wang, D., Zhao, D.: Efficient graph similarity search over large graph databases. IEEE Trans. Knowl. Data Eng. 27(4), 964–978 (2015)

    Article  Google Scholar 

  35. Zhu, Y., Qin, L., Yu, J.X., Cheng, H.: Finding top-\(k\) similar graphs in graph databases. In: EDBT, pp. 456–467 (2012)

  36. Zhu, Y., Yu, J.X., Qin, L.: Leveraging graph dimensions in online graph search. PVLDB 8(1), 85–96 (2014)

    Google Scholar 

Download references

Acknowledgements

Funding was provided by Japan Society for the Promotion of Science (Grant No. 16H01722), National Natural Science Foundation of China (Grant Nos. 61402494, 71690233), Natural Science Foundation of Hunan Province (Grant No. 2015JJ4009) and Australian Research Council (Grant Nos. DP150103071 and DP150102728).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuan Xiao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 146 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Xiao, C., Lin, X. et al. Efficient structure similarity searches: a partition-based approach. The VLDB Journal 27, 53–78 (2018). https://doi.org/10.1007/s00778-017-0487-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0487-0

Keywords

Navigation