ABSTRACT
Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate results to the iterative computation --- a form of incremental computation that corrects the (small amount of) error in the approximate result. Despite the effectiveness of this approach in processing growing graphs, it is generally not applicable when edge deletions are present --- existing approximations can lead to either incorrect results (e.g., monotonic computations terminate at an incorrect minima/maxima) or poor performance (e.g., with approximations, convergence takes longer than performing the computation from scratch).
This paper presents KickStarter, a runtime technique that can trim the approximate values for a subset of vertices impacted by the deleted edges. The trimmed approximation is both safe and profitable, enabling the computation to produce correct results and converge quickly. KickStarter works for a class of monotonic graph algorithms and can be readily incorporated in any existing streaming graph system. Our experiments with four streaming algorithms on five large graphs demonstrate that trimming not only produces correct results but also accelerates these algorithms by 8.5--23.7x.
- D.J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The design of the borealis stream processing engine. In CIDR, volume 5, pages 277--289, 2005.Google Scholar
- R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault-tolerant and scalable joining of continuous data streams. In SIGMOD, pages 577--588, 2013.Google ScholarDigital Library
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. In KDD, pages 44--54, 2006.Google ScholarDigital Library
- H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, and S. Zdonik. Retrospective on aurora. The VLDB Journal, 13(4):370--383, 2004. Google ScholarDigital Library
- P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, pages 595--601, 2004.Google ScholarDigital Library
- M. Cha, H. Haddadi, F. Benevenuto, and P.K. Gummadi. Measuring user influence in twitter: The million follower fallacy. ICWSM, 10(10--17):30, 2010.Google Scholar
- R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In EuroSys, pages 1:1--1:15, 2015.Google ScholarDigital Library
- R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.Google ScholarDigital Library
- U. Demiryurek, B. Pan, F. Banaei-Kashani, and C. Shahabi. Towards modeling the traffic data on road networks. In International Workshop on Computational Transportation Science, pages 13--18, 2009. Google ScholarDigital Library
- D. Ediger, J. Riedy, D.A. Bader, and H. Meyerhenke. Tracking structure of streaming social networks. In IEEE IPDPS Workshops and Phd Forum, pages 1691--1699, May 2011. Google ScholarDigital Library
- D. Ediger, K. Jiang, J. Riedy, and D.A. Bader. Massive streaming data analytics: A case study with clustering coefficients, In IEEE IPDPSW, pages 1--10, 2010.Google ScholarCross Ref
- D. Ediger, R. Mccoll, J. Riedy, and D.A. Bader. Stinger: High performance data structure for streaming graphs, In HPEC, Sept. 2012.Google ScholarCross Ref
- Friendster network dataset, 2015.Google Scholar
- J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google ScholarDigital Library
- STREAM Group et al. Stream: The stanford stream data manager. IEEE Data Engineering Bulletin, http://www-db. stanford. edu/stream, 2(003), 2003.Google Scholar
- W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: A graph engine for temporal graph analysis. In EuroSys, pages 1:1--1:14, 2014.Google Scholar
- M.R. Henzinger, V. King, and T. Warnow. Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology. Algorithmica, 24(1):1--13, 1999. Google ScholarCross Ref
- A.P. Iyer, L.E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016. Google ScholarDigital Library
- E. Kanoulas, Y. Du, T. Xia, and D. Zhang. Finding fastest paths on a road network with speed patterns. In ICDE, page 10, April 2006. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J.M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, April 2012. Google ScholarDigital Library
- G. Malewicz, M.H. Austern, A.J. C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, and Google Inc. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.Google ScholarDigital Library
- D.G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP, pages 439--455, 2013. Google ScholarDigital Library
- F. Reiss, K. Stockinger, K. Wu, A. Shoshani, and J.M. Hellerstein. Enabling real-time querying of live and historical stream data. In SSDM, page 28, 2007. Google ScholarDigital Library
- C. Ren, E. Lo, B. Kao, X. Zhu, and R. Cheng. On querying historical evolving graph sequences, In Proc. VLDB Endow., Vo. 4, No. 11, 2011.Google ScholarDigital Library
- J. Riedy and H. Meyerhenke. Scalable algorithms for analysis of massive, streaming graphs, In SIAM Parallel Processing for Scientific Computing, 2012.Google Scholar
- L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. SIAM Journal on Computing, 45(3):712--733, 2016. Google ScholarDigital Library
- A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out graph processing from secondary storage. In SOSP, pages 410--424, 2015.Google ScholarDigital Library
- A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric graph processing using streaming partitions. In SOSP, pages 472--488, 2013.Google ScholarDigital Library
- P. Roy, A. Khan, and G. Alonso. Augmented sketch: Faster and more accurate stream processing. In SIGMOD, pages 1449--1463, 2016.Google ScholarDigital Library
- S. Salihoglu and J. Widom. GPS: A graph processing system. In SSDBM, pages 22:1--22:12, 2013. Google ScholarDigital Library
- X. Shi, B. Cui, Y. Shao, and Y. Tong. Tornado: A system for real-time iterative analysis over evolving data. In SIGMOD, pages 417--430, 2016. Google ScholarDigital Library
- Y. Shiloach and S. Even. An on-line edge-deletion problem. Journal of the ACM, 28(1):1--4, 1981. Google ScholarDigital Library
- J. Shun and G.E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013. Google ScholarDigital Library
- I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In SIGKDD, pages 1222--1230, 2012. Google ScholarDigital Library
- T. Suzumura, S. Nishii, and M. Ganse. Towards large-scale graph stream processing platform. In WWW Companion, pages 1321--1326, 2014. Google ScholarDigital Library
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J.M Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al. Storm@ twitter. In SIGMOD, pages 147--156, 2014.Google Scholar
- C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In WSDM, pages 333--342, 2014.Google ScholarDigital Library
- K. Vora, G. Xu, and R. Gupta. Load the edges you need: A generic I/O optimization for disk-based graph processing. In USENIX ATC, pages 507--522, 2014.Google Scholar
- K. Vora, R. Gupta, and G. Xu. Synergistic Analysis of Evolving Graphs. In TACO, Vol. 13, No. 4, pages 32:1--32:27, 2016. Google ScholarDigital Library
- K. Vora, S-C. Koduru, and R. Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM. In OOPSLA, pages 861--878, 2014.Google ScholarDigital Library
- K. Wang, A. Hussain, Z. Zuo, G. Xu, and A. A. Sani. Graspan: A single-machine disk-based graph system forinterprocedural static analyses of large-scale systems code. In ASPLOS, 2017.Google ScholarDigital Library
- K. Wang, G. Xu, Z. Su, and Y. D. Liu. GraphQ: Graph query processing with abstraction refinement. In USENIX ATC, pages 387--401, 2015.Google Scholar
- M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. Gram: Scaling graph computation to the trillions. In SoCC, pages 408--421, 2015.Google ScholarDigital Library
- M. Yuan, K-L. Wu, G. Jacques-Silva, and Y. Lu. Efficient processing of streaming graphs for evolution-aware clustering. In CIKM, pages 319--328, 2013. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M.J Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, 2010.Google ScholarDigital Library
- M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012.Google ScholarDigital Library
- E. Zeitler and T. Risch. Massive scale-out of expensive continuous queries, In Proceedings of the VLDB Endowment, Vol. 4, No. 11, 2011.Google Scholar
- X. Zhang, N. Gupta, and R. Gupta. Pruning dynamic slices with confidence. In PLDI, pages 169--180, 2006. Google ScholarDigital Library
- X. Zhang, R. Gupta, and Y. Zhang. Precise dynamic slicing algorithms. In ICSE, pages 319--329, 2003.Google ScholarDigital Library
- KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
Recommendations
TDGraph: a topology-driven accelerator for high-performance streaming graph processing
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitectureMany solutions have been recently proposed to support the processing of streaming graphs. However, for the processing of each graph snapshot of a streaming graph, the new states of the vertices affected by the graph updates are propagated irregularly ...
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
ASPLOS '17Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate ...
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
Asplos'17Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate ...
Comments