Skip to main content
Log in

Toward continuous pattern detection over evolving large graph with snapshot isolation

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

This paper studies continuous pattern detection over large evolving graphs, which plays an important role in monitoring-related applications. The problem is challenging due to the large size and dynamic updates of graphs, the massive search space of pattern detection and inconsistent query results on dynamic graphs. This paper first introduces a snapshot isolation requirement, which ensures that the query results come from a consistent graph snapshot instead of a mixture of partial evolving graphs. Second, we propose an SSD (single sink directed acyclic graph) plan friendly to vertex-centric-distributed graph processing frameworks. SSD plan can guide the message transformation and transfer among graph vertices, and determine the satisfaction of the pattern on graph vertices for the sink vertex. Third, we devise strategies for major steps in the SSD evaluation, including the location of valid messages to achieve snapshot isolation, AO-List to determine the satisfaction of transition rule over dynamic graph, and message-on-change policy to reduce outgoing messages. The experiments on billion-edge graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. http://www.statista.com/statistics.

References

  1. Apache Giraph. http://incubator.apache.org/giraph/

  2. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: SIGMOD, pp. 975–986 (2010)

  3. Boldi, P., Santini, M., Vigna, S.: A large time-aware graph. SIGIR Forum 42(2), 33–38 (2008)

    Article  Google Scholar 

  4. Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: Cosi: cloud oriented subgraph identification in massive social networks. In: ASONAM, pp. 248–255 (2010)

  5. Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking the pulse of a fast-changing and connected world. In: EuroSys, pp. 85–98 (2012)

  6. Choudhury, S., Holder, L.B., Chin, G. Jr., Agarwal, K., Feo, J.: A selectivity based approach to continuous pattern detection in streaming graphs. In: EDBT, pp. 157–168 (2015)

  7. Diao, Y., Fischer, P.M., Franklin, M.J., To, R.: Yfilter: efficient and scalable filtering of xml documents. In: ICDE, pp. 341–342 (2002)

  8. Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., Wu, Y.: Incremental graph pattern matching. In: SIGMOD, pp. 925–936 (2011)

  9. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010)

    Google Scholar 

  10. Gao, J., Zhou, C., Zhou, J., Yu, J.X.: Continuous pattern detection over billion-edge graph using distributed framework. In: ICDE, pp. 556–567 (2014)

  11. Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing xml streams with deterministic automata. In: ICDT, pp. 173–189 (2003)

  12. Han, M., Daudjee, K., Khaled Ammar, M., Özsu, T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. PVLDB 7(12), 1047–1058 (2014)

    Google Scholar 

  13. Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: SIGMOD, pp. 901–912 (2011)

  14. Khurana, U., Deshpande, A.: Efficient snapshot retrieval over historical graph data. In: ICDE, pp. 997–1008 (2013)

  15. Kwak, H., Lee, C., Park, H., Moon, S.B.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)

  16. Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)

    Google Scholar 

  17. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

    Google Scholar 

  18. Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW, pp. 949–958 (2012)

  19. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)

  20. McCune, R.R., Weninger, T., Madey, G.R.: Thinking like a vertex: a survey of vertex-centric frameworks for distributed graph processing. CoRR, abs/1507.04405 (2015)

  21. Mondal., J., Deshpande, A.: Managing large dynamic graphs efficiently. In: SIGMOD, pp. 145–156 (2012)

  22. Pugliese, A., Bröcheler, M., Subrahmanian, V.S., Ovelgönne, M.: Efficient multiview maintenance under insertion in huge social networks. TWEB 8(2), 10 (2014)

  23. Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. PVLDB 7(7), 577–588 (2014)

    Google Scholar 

  24. Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)

    Google Scholar 

  25. Shao, B., Wang, H., Xiao, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)

  26. Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)

    Google Scholar 

  27. Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. PVLDB 7(3), 193–204 (2013)

    Google Scholar 

  28. Tian, Y., Patel, J.M.: Tale: a tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)

  29. Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)

    Article  MathSciNet  Google Scholar 

  30. Wang, C., Chen, L.: Continuous subgraph pattern search over graph streams. In: ICDE, pp. 393–404 (2009)

  31. Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: ICDE, pp. 210–221 (2012)

  32. Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)

  33. Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1), 340–351 (2010)

    Google Scholar 

  34. Zhou, C., Gao, J., Sun, B., Yu, J.X.: Mocgraph: scalable distributed graph processing using message online computing. PVLDB 8(4), 377–388 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by NSFC under Grant Nos. 61272156 and 61572040, Research Grants Council of the Hong Kong SAR, China Nos. 14209314 and 418512.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, J., Zhou, C. & Yu, J.X. Toward continuous pattern detection over evolving large graph with snapshot isolation. The VLDB Journal 25, 269–290 (2016). https://doi.org/10.1007/s00778-015-0416-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-015-0416-z

Keywords

Navigation