Abstract
This paper studies continuous pattern detection over large evolving graphs, which plays an important role in monitoring-related applications. The problem is challenging due to the large size and dynamic updates of graphs, the massive search space of pattern detection and inconsistent query results on dynamic graphs. This paper first introduces a snapshot isolation requirement, which ensures that the query results come from a consistent graph snapshot instead of a mixture of partial evolving graphs. Second, we propose an SSD (single sink directed acyclic graph) plan friendly to vertex-centric-distributed graph processing frameworks. SSD plan can guide the message transformation and transfer among graph vertices, and determine the satisfaction of the pattern on graph vertices for the sink vertex. Third, we devise strategies for major steps in the SSD evaluation, including the location of valid messages to achieve snapshot isolation, AO-List to determine the satisfaction of transition rule over dynamic graph, and message-on-change policy to reduce outgoing messages. The experiments on billion-edge graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method.
Similar content being viewed by others
References
Apache Giraph. http://incubator.apache.org/giraph/
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: SIGMOD, pp. 975–986 (2010)
Boldi, P., Santini, M., Vigna, S.: A large time-aware graph. SIGIR Forum 42(2), 33–38 (2008)
Bröcheler, M., Pugliese, A., Subrahmanian, V.S.: Cosi: cloud oriented subgraph identification in massive social networks. In: ASONAM, pp. 248–255 (2010)
Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking the pulse of a fast-changing and connected world. In: EuroSys, pp. 85–98 (2012)
Choudhury, S., Holder, L.B., Chin, G. Jr., Agarwal, K., Feo, J.: A selectivity based approach to continuous pattern detection in streaming graphs. In: EDBT, pp. 157–168 (2015)
Diao, Y., Fischer, P.M., Franklin, M.J., To, R.: Yfilter: efficient and scalable filtering of xml documents. In: ICDE, pp. 341–342 (2002)
Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., Wu, Y.: Incremental graph pattern matching. In: SIGMOD, pp. 925–936 (2011)
Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010)
Gao, J., Zhou, C., Zhou, J., Yu, J.X.: Continuous pattern detection over billion-edge graph using distributed framework. In: ICDE, pp. 556–567 (2014)
Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing xml streams with deterministic automata. In: ICDT, pp. 173–189 (2003)
Han, M., Daudjee, K., Khaled Ammar, M., Özsu, T., Wang, X., Jin, T.: An experimental comparison of pregel-like graph processing systems. PVLDB 7(12), 1047–1058 (2014)
Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: SIGMOD, pp. 901–912 (2011)
Khurana, U., Deshpande, A.: Efficient snapshot retrieval over historical graph data. In: ICDE, pp. 997–1008 (2013)
Kwak, H., Lee, C., Park, H., Moon, S.B.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)
Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. PVLDB 6(2), 133–144 (2012)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)
Ma, S., Cao, Y., Huai, J., Wo, T.: Distributed graph pattern matching. In: WWW, pp. 949–958 (2012)
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)
McCune, R.R., Weninger, T., Madey, G.R.: Thinking like a vertex: a survey of vertex-centric frameworks for distributed graph processing. CoRR, abs/1507.04405 (2015)
Mondal., J., Deshpande, A.: Managing large dynamic graphs efficiently. In: SIGMOD, pp. 145–156 (2012)
Pugliese, A., Bröcheler, M., Subrahmanian, V.S., Ovelgönne, M.: Efficient multiview maintenance under insertion in huge social networks. TWEB 8(2), 10 (2014)
Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems. PVLDB 7(7), 577–588 (2014)
Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB 1(1), 364–375 (2008)
Shao, B., Wang, H., Xiao, Y.: Trinity: a distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)
Sun, Z., Wang, H., Wang, H., Shao, B., Li, J.: Efficient subgraph matching on billion node graphs. PVLDB 5(9), 788–799 (2012)
Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. PVLDB 7(3), 193–204 (2013)
Tian, Y., Patel, J.M.: Tale: a tool for approximate large graph matching. In: ICDE, pp. 963–972 (2008)
Ullmann, J.R.: An algorithm for subgraph isomorphism. J. ACM 23, 31–42 (1976)
Wang, C., Chen, L.: Continuous subgraph pattern search over graph streams. In: ICDE, pp. 393–404 (2009)
Wang, X., Ding, X., Tung, A.K.H., Ying, S., Jin, H.: An efficient graph indexing method. In: ICDE, pp. 210–221 (2012)
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD, pp. 335–346 (2004)
Zhao, P., Han, J.: On graph query optimization in large networks. PVLDB 3(1), 340–351 (2010)
Zhou, C., Gao, J., Sun, B., Yu, J.X.: Mocgraph: scalable distributed graph processing using message online computing. PVLDB 8(4), 377–388 (2014)
Acknowledgments
This work was partially supported by NSFC under Grant Nos. 61272156 and 61572040, Research Grants Council of the Hong Kong SAR, China Nos. 14209314 and 418512.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, J., Zhou, C. & Yu, J.X. Toward continuous pattern detection over evolving large graph with snapshot isolation. The VLDB Journal 25, 269–290 (2016). https://doi.org/10.1007/s00778-015-0416-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-015-0416-z