skip to main content
10.1145/3037697.3037748acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations

Authors Info & Claims
Published:04 April 2017Publication History

ABSTRACT

Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate results to the iterative computation --- a form of incremental computation that corrects the (small amount of) error in the approximate result. Despite the effectiveness of this approach in processing growing graphs, it is generally not applicable when edge deletions are present --- existing approximations can lead to either incorrect results (e.g., monotonic computations terminate at an incorrect minima/maxima) or poor performance (e.g., with approximations, convergence takes longer than performing the computation from scratch).

This paper presents KickStarter, a runtime technique that can trim the approximate values for a subset of vertices impacted by the deleted edges. The trimmed approximation is both safe and profitable, enabling the computation to produce correct results and converge quickly. KickStarter works for a class of monotonic graph algorithms and can be readily incorporated in any existing streaming graph system. Our experiments with four streaming algorithms on five large graphs demonstrate that trimming not only produces correct results but also accelerates these algorithms by 8.5--23.7x.

References

  1. D.J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J-H. Hwang, W. Lindner, A. Maskey, A. Rasin, E. Ryvkina, et al. The design of the borealis stream processing engine. In CIDR, volume 5, pages 277--289, 2005.Google ScholarGoogle Scholar
  2. R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault-tolerant and scalable joining of continuous data streams. In SIGMOD, pages 577--588, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social networks: Membership, growth, and evolution. In KDD, pages 44--54, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, and S. Zdonik. Retrospective on aurora. The VLDB Journal, 13(4):370--383, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, pages 595--601, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Cha, H. Haddadi, F. Benevenuto, and P.K. Gummadi. Measuring user influence in twitter: The million follower fallacy. ICWSM, 10(10--17):30, 2010.Google ScholarGoogle Scholar
  7. R. Chen, J. Shi, Y. Chen, and H. Chen. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In EuroSys, pages 1:1--1:15, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: Taking the pulse of a fast-changing and connected world. In EuroSys, pages 85--98, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. U. Demiryurek, B. Pan, F. Banaei-Kashani, and C. Shahabi. Towards modeling the traffic data on road networks. In International Workshop on Computational Transportation Science, pages 13--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Ediger, J. Riedy, D.A. Bader, and H. Meyerhenke. Tracking structure of streaming social networks. In IEEE IPDPS Workshops and Phd Forum, pages 1691--1699, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Ediger, K. Jiang, J. Riedy, and D.A. Bader. Massive streaming data analytics: A case study with clustering coefficients, In IEEE IPDPSW, pages 1--10, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. Ediger, R. Mccoll, J. Riedy, and D.A. Bader. Stinger: High performance data structure for streaming graphs, In HPEC, Sept. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  13. Friendster network dataset, 2015.Google ScholarGoogle Scholar
  14. J.E. Gonzalez, R.S. Xin, A. Dave, D. Crankshaw, M.J. Franklin, and I. Stoica. GraphX: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. STREAM Group et al. Stream: The stanford stream data manager. IEEE Data Engineering Bulletin, http://www-db. stanford. edu/stream, 2(003), 2003.Google ScholarGoogle Scholar
  16. W. Han, Y. Miao, K. Li, M. Wu, F. Yang, L. Zhou, V. Prabhakaran, W. Chen, and E. Chen. Chronos: A graph engine for temporal graph analysis. In EuroSys, pages 1:1--1:14, 2014.Google ScholarGoogle Scholar
  17. M.R. Henzinger, V. King, and T. Warnow. Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology. Algorithmica, 24(1):1--13, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  18. A.P. Iyer, L.E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In International Workshop on Graph Data Management Experiences and Systems, pages 5:1--5:6, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Kanoulas, Y. Du, T. Xia, and D. Zhang. Finding fastest paths on a road network with speed patterns. In ICDE, page 10, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J.M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8):716--727, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Malewicz, M.H. Austern, A.J. C. Bik, J.C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, and Google Inc. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135--146, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D.G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP, pages 439--455, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Reiss, K. Stockinger, K. Wu, A. Shoshani, and J.M. Hellerstein. Enabling real-time querying of live and historical stream data. In SSDM, page 28, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Ren, E. Lo, B. Kao, X. Zhu, and R. Cheng. On querying historical evolving graph sequences, In Proc. VLDB Endow., Vo. 4, No. 11, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Riedy and H. Meyerhenke. Scalable algorithms for analysis of massive, streaming graphs, In SIAM Parallel Processing for Scientific Computing, 2012.Google ScholarGoogle Scholar
  27. L. Roditty and U. Zwick. A fully dynamic reachability algorithm for directed graphs with an almost linear update time. SIAM Journal on Computing, 45(3):712--733, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: Scale-out graph processing from secondary storage. In SOSP, pages 410--424, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Roy, I. Mihailovic, and W. Zwaenepoel. X-Stream: Edge-centric graph processing using streaming partitions. In SOSP, pages 472--488, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P. Roy, A. Khan, and G. Alonso. Augmented sketch: Faster and more accurate stream processing. In SIGMOD, pages 1449--1463, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Salihoglu and J. Widom. GPS: A graph processing system. In SSDBM, pages 22:1--22:12, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Shi, B. Cui, Y. Shao, and Y. Tong. Tornado: A system for real-time iterative analysis over evolving data. In SIGMOD, pages 417--430, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Shiloach and S. Even. An on-line edge-deletion problem. Journal of the ACM, 28(1):1--4, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Shun and G.E. Blelloch. Ligra: A lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In SIGKDD, pages 1222--1230, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Suzumura, S. Nishii, and M. Ganse. Towards large-scale graph stream processing platform. In WWW Companion, pages 1321--1326, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J.M Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al. Storm@ twitter. In SIGMOD, pages 147--156, 2014.Google ScholarGoogle Scholar
  38. C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In WSDM, pages 333--342, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Vora, G. Xu, and R. Gupta. Load the edges you need: A generic I/O optimization for disk-based graph processing. In USENIX ATC, pages 507--522, 2014.Google ScholarGoogle Scholar
  40. K. Vora, R. Gupta, and G. Xu. Synergistic Analysis of Evolving Graphs. In TACO, Vol. 13, No. 4, pages 32:1--32:27, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Vora, S-C. Koduru, and R. Gupta. ASPIRE: Exploiting Asynchronous Parallelism in Iterative Algorithms using a Relaxed Consistency based DSM. In OOPSLA, pages 861--878, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Wang, A. Hussain, Z. Zuo, G. Xu, and A. A. Sani. Graspan: A single-machine disk-based graph system forinterprocedural static analyses of large-scale systems code. In ASPLOS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. Wang, G. Xu, Z. Su, and Y. D. Liu. GraphQ: Graph query processing with abstraction refinement. In USENIX ATC, pages 387--401, 2015.Google ScholarGoogle Scholar
  44. M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. Gram: Scaling graph computation to the trillions. In SoCC, pages 408--421, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Yuan, K-L. Wu, G. Jacques-Silva, and Y. Lu. Efficient processing of streaming graphs for evolution-aware clustering. In CIKM, pages 319--328, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Zaharia, M. Chowdhury, M.J Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In HotCloud, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. E. Zeitler and T. Risch. Massive scale-out of expensive continuous queries, In Proceedings of the VLDB Endowment, Vol. 4, No. 11, 2011.Google ScholarGoogle Scholar
  49. X. Zhang, N. Gupta, and R. Gupta. Pruning dynamic slices with confidence. In PLDI, pages 169--180, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. X. Zhang, R. Gupta, and Y. Zhang. Precise dynamic slicing algorithms. In ICSE, pages 319--329, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
      April 2017
      856 pages
      ISBN:9781450344654
      DOI:10.1145/3037697

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ASPLOS '17 Paper Acceptance Rate53of320submissions,17%Overall Acceptance Rate535of2,713submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader