skip to main content
research-article
Public Access

Fast Distributed Algorithms for Connectivity and MST in Large Graphs

Published:13 June 2018Publication History
Skip Abstract Section

Abstract

Motivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where k≥2 machines jointly perform computations on graphs with n nodes (typically, n>k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation.

Our main result is an (almost) optimal distributed randomized algorithm for graph connectivity. Our algorithm runs in Õ(n/k2) rounds (Õ notation hides a polylog(n) factor and an additive polylog(n) term). This improves over the best previously known bound of Õ(n/k) [Klauck et al., SODA 2015] and is optimal (up to a polylogarithmic factor) in light of an existing lower bound of Ω˜(n/k2). Our improved algorithm uses a bunch of techniques, including linear graph sketching, that prove useful in the design of efficient distributed graph algorithms. Using the connectivity algorithm as a building block, we then present fast randomized algorithms for computing minimum spanning trees, (approximate) min-cuts, and for many graph verification problems. All these algorithms take Õ(n/k2) rounds and are optimal up to polylogarithmic factors. We also show an almost matching lower bound of Ω˜(n/k2) rounds for many graph verification problems by leveraging lower bounds in random-partition communication complexity.

References

  1. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Analyzing graph structure via linear measurements. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 459--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM Symposium on Principles of Database Systems (PODS). 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Noga Alon, László Babai, and Alon Itai. 1986. A fast and simple randomized parallel algorithm for the maximal independent set problem. Journal of Algorithms 7, 4 (1986), 567--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. 2012. Space-efficient local computation algorithms. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1132--1139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, and Sriram V. Pemmaraju. 2018. Near-optimal clustering in the -machine model. In Proceedings of the 19th International Conference on Distributed Computing and Networking (ICDCN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Otakar Borůvka. 1926. O jistém problému minimálním (about a certain minimal problem). Práce Mor. Prírodoved. Spol. v Brne III 3 (1926).Google ScholarGoogle Scholar
  7. Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. 2015. Algebraic methods in the congested clique. In Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC). 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jen-Yeu Chen and Gopal Pandurangan. 2012. Almost-optimal gossip-based aggregate computation. SIAM Journal of Computing 41, 3 (2012), 455--483.Google ScholarGoogle ScholarCross RefCross Ref
  9. Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fan Chung and Olivia Simpson. 2015. Distributed algorithms for finding local clusters using heat kernel Pagerank. In Proceedings of the 12th Workshop on Algorithms and Models for the Web-graph (WAW). 77--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Graham Cormode and Donatella Firmani. 2014. A unifying framework for -sampling algorithms. Distributed and Parallel Databases 32, 3 (2014), 315--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. 2012. Distributed verification and hardness of distributed approximation. SIAM Journal of Computing 41, 5 (2012), 1235--1265.Google ScholarGoogle ScholarCross RefCross Ref
  13. Andrew Drucker, Fabian Kuhn, and Rotem Oshman. 2014. On the power of the congested clique model. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC). 367--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Elkin, Hartmut Klauck, Danupon Nanongkai, and Gopal Pandurangan. 2014. Can quantum communication speed up distributed computation?. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC). 166--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Robert G. Gallager, Pierre A. Humblet, and Philip M. Spira. 1983. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Language Systems 5, 1 (1983), 66--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mohsen Ghaffari and Fabian Kuhn. 2013. Distributed minimum cut approximation. In Proceedings of the 27th International Symposium on Distributed Computing (DISC). 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mohsen Ghaffari and Merav Parter. 2016. MST in log-star rounds of congested clique. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC). 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James W. Hegeman, Gopal Pandurangan, Sriram V. Pemmaraju, Vivek B. Sardeshmukh, and Michele Scquizzato. 2015. Toward optimal bounds in the congested clique: Graph connectivity and MST. In Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC). 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hossein Jowhari, Mert Saglam, and Gábor Tardos. 2011. Tight bounds for samplers, finding duplicates in streams, and related problems. In Proceedings of the 30th ACM Symposium on Principles of Database Systems (PODS). 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. David R. Karger. 1994. Random sampling in cut, flow, and network design problems. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC). 648--657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David R. Karger, Philip N. Klein, and Robert E. Tarjan. 1995. A randomized linear-time algorithm to find minimum spanning trees. Journal of the ACM 42, 2 (1995), 321--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 938--948. Google ScholarGoogle ScholarCross RefCross Ref
  23. Valerie King, Shay Kutten, and Mikkel Thorup. 2015. Construction and impromptu repair of an MST in a distributed network with o(m) communication. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (PODC). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hartmut Klauck, Danupon Nanongkai, Gopal Pandurangan, and Peter Robinson. 2015. Distributed computation of large-scale graph problems. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 391--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Eyal Kushilevitz and Noam Nisan. 1997. Communication Complexity. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Shay Kutten, Gopal Pandurangan, David Peleg, Peter Robinson, and Amitabh Trehan. 2015. Sublinear bounds for randomized leader election. Theoretical Computer Science 561 (2015), 134--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. 2011. Filtering: A method for solving graph problems in MapReduce. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christoph Lenzen. 2013. Optimal deterministic routing and sorting on the congested clique. In Proceedings of the 32nd ACM Symposium on Principles of Distributed Computing (PODC). 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christoph Lenzen and Roger Wattenhofer. 2016. Tight bounds for parallel randomized load balancing. Distributed Computing 29, 2 (2016), 127--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zvi Lotker, Boaz Patt-Shamir, Elan Pavlov, and David Peleg. 2005. Minimum-weight spanning tree construction in communication rounds. SIAM Journal of Computing 35, 1 (2005), 120--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM International Conference on Management of Data (SIGMOD). 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Andrew McGregor. 2014. Graph stream algorithms: A survey. SIGMOD Record 43, 1 (2014), 9--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Danupon Nanongkai. 2014. Distributed approximation algorithms for weighted shortest paths. In Proceedings of the 46th ACM Symposium on Theory of Computing (STOC). 565--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Danupon Nanongkai, Atish Das Sarma, and Gopal Pandurangan. 2011. A tight unconditional lower bound on distributed randomwalk computation. In Proceedings of the 30th Annual ACM Symposium on Principles of Distributed Computing (PODC). 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rotem Oshman. 2014. Communication complexity lower bounds in distributed message-passing. In Proceedings of the 21th International Colloquium on Structural Information and Communication Complexity (SIROCCO). 14--17.Google ScholarGoogle ScholarCross RefCross Ref
  39. Gopal Pandurangan, David Peleg, and Michele Scquizzato. 2016. Message lower bounds via efficient network synchronization. In Proceedings of the 23rd International Colloquium on Structural Information and Communication Complexity (SIROCCO). 75--91.Google ScholarGoogle ScholarCross RefCross Ref
  40. Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2016. Fast distributed algorithms for connectivity and MST in large graphs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 429--438. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2017. A time- and message-optimal distributed algorithm for minimum spanning trees. In Proceedings of the 49th Annual ACM Symposium on the Theory of Computing (STOC). 743--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2018. On the distributed complexity of large-scale graph computations. In Proceedings of the 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). To appear.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David Peleg. 2000. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarCross RefCross Ref
  44. Sriram V. Pemmaraju and Vivek B. Sardeshmukh. 2016. Super-fast MST algorithms in the congested clique using o(m) messages. In Proceedings of the 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). 47:1--47:15.Google ScholarGoogle Scholar
  45. Judy Qiu, Shantenu Jha, Andre Luckow, and Geoffrey C. Fox. 2014. Towards HPC-ABDS: An initial high-performance big data stack. Retrieved from http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdf.Google ScholarGoogle Scholar
  46. Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1287--1301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Ramakrishna Thurimella. 1997. Sub-linear distributed algorithms for sparse certificates and biconnected components. Journal of Algorithms 23, 1 (1997), 160--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From “think like a vertex” to “think like a graph.” PVLDB 7, 3 (2013), 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM Journal on Computing 11, 2 (1982), 350--361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990), 103--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sergei Vassilvitskii. 2015. Models for Parallel Computation (A Hitchhikers’ Guide to Massively Parallel Universes). Retrieved from http://grigory.us/blog/massively-parallel-universes/.Google ScholarGoogle Scholar
  52. David P. Woodruff and Qin Zhang. 2017. When distributed computation is communication expensive. Distributed Computing 30, 5 (2017), 309--323. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast Distributed Algorithms for Connectivity and MST in Large Graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Parallel Computing
        ACM Transactions on Parallel Computing  Volume 5, Issue 1
        Special Issue on SPAA 2016
        March 2018
        140 pages
        ISSN:2329-4949
        EISSN:2329-4957
        DOI:10.1145/3232649
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2018
        • Accepted: 1 January 2018
        • Revised: 1 October 2017
        • Received: 1 November 2016
        Published in topc Volume 5, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader