Abstract
Motivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where k≥2 machines jointly perform computations on graphs with n nodes (typically, n>k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation.
Our main result is an (almost) optimal distributed randomized algorithm for graph connectivity. Our algorithm runs in Õ(n/k2) rounds (Õ notation hides a polylog(n) factor and an additive polylog(n) term). This improves over the best previously known bound of Õ(n/k) [Klauck et al., SODA 2015] and is optimal (up to a polylogarithmic factor) in light of an existing lower bound of Ω˜(n/k2). Our improved algorithm uses a bunch of techniques, including linear graph sketching, that prove useful in the design of efficient distributed graph algorithms. Using the connectivity algorithm as a building block, we then present fast randomized algorithms for computing minimum spanning trees, (approximate) min-cuts, and for many graph verification problems. All these algorithms take Õ(n/k2) rounds and are optimal up to polylogarithmic factors. We also show an almost matching lower bound of Ω˜(n/k2) rounds for many graph verification problems by leveraging lower bounds in random-partition communication complexity.
- Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Analyzing graph structure via linear measurements. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 459--467. Google ScholarDigital Library
- Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. 2012. Graph sketches: Sparsification, spanners, and subgraphs. In Proceedings of the 31st ACM Symposium on Principles of Database Systems (PODS). 5--14. Google ScholarDigital Library
- Noga Alon, László Babai, and Alon Itai. 1986. A fast and simple randomized parallel algorithm for the maximal independent set problem. Journal of Algorithms 7, 4 (1986), 567--583. Google ScholarDigital Library
- Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. 2012. Space-efficient local computation algorithms. In Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1132--1139. Google ScholarDigital Library
- Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, and Sriram V. Pemmaraju. 2018. Near-optimal clustering in the -machine model. In Proceedings of the 19th International Conference on Distributed Computing and Networking (ICDCN). Google ScholarDigital Library
- Otakar Borůvka. 1926. O jistém problému minimálním (about a certain minimal problem). Práce Mor. Prírodoved. Spol. v Brne III 3 (1926).Google Scholar
- Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. 2015. Algebraic methods in the congested clique. In Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC). 143--152. Google ScholarDigital Library
- Jen-Yeu Chen and Gopal Pandurangan. 2012. Almost-optimal gossip-based aggregate computation. SIAM Journal of Computing 41, 3 (2012), 455--483.Google ScholarCross Ref
- Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815. Google ScholarDigital Library
- Fan Chung and Olivia Simpson. 2015. Distributed algorithms for finding local clusters using heat kernel Pagerank. In Proceedings of the 12th Workshop on Algorithms and Models for the Web-graph (WAW). 77--189. Google ScholarDigital Library
- Graham Cormode and Donatella Firmani. 2014. A unifying framework for -sampling algorithms. Distributed and Parallel Databases 32, 3 (2014), 315--335. Google ScholarDigital Library
- Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. 2012. Distributed verification and hardness of distributed approximation. SIAM Journal of Computing 41, 5 (2012), 1235--1265.Google ScholarCross Ref
- Andrew Drucker, Fabian Kuhn, and Rotem Oshman. 2014. On the power of the congested clique model. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC). 367--376. Google ScholarDigital Library
- Michael Elkin, Hartmut Klauck, Danupon Nanongkai, and Gopal Pandurangan. 2014. Can quantum communication speed up distributed computation?. In Proceedings of the 33rd ACM Symposium on Principles of Distributed Computing (PODC). 166--175. Google ScholarDigital Library
- Robert G. Gallager, Pierre A. Humblet, and Philip M. Spira. 1983. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Language Systems 5, 1 (1983), 66--77. Google ScholarDigital Library
- Mohsen Ghaffari and Fabian Kuhn. 2013. Distributed minimum cut approximation. In Proceedings of the 27th International Symposium on Distributed Computing (DISC). 1--15. Google ScholarDigital Library
- Mohsen Ghaffari and Merav Parter. 2016. MST in log-star rounds of congested clique. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC). 19--28. Google ScholarDigital Library
- James W. Hegeman, Gopal Pandurangan, Sriram V. Pemmaraju, Vivek B. Sardeshmukh, and Michele Scquizzato. 2015. Toward optimal bounds in the congested clique: Graph connectivity and MST. In Proceedings of the 34th ACM Symposium on Principles of Distributed Computing (PODC). 91--100. Google ScholarDigital Library
- Hossein Jowhari, Mert Saglam, and Gábor Tardos. 2011. Tight bounds for samplers, finding duplicates in streams, and related problems. In Proceedings of the 30th ACM Symposium on Principles of Database Systems (PODS). 49--58. Google ScholarDigital Library
- David R. Karger. 1994. Random sampling in cut, flow, and network design problems. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC). 648--657. Google ScholarDigital Library
- David R. Karger, Philip N. Klein, and Robert E. Tarjan. 1995. A randomized linear-time algorithm to find minimum spanning trees. Journal of the ACM 42, 2 (1995), 321--328. Google ScholarDigital Library
- Howard J. Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 938--948. Google ScholarCross Ref
- Valerie King, Shay Kutten, and Mikkel Thorup. 2015. Construction and impromptu repair of an MST in a distributed network with o(m) communication. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing (PODC). 71--80. Google ScholarDigital Library
- Hartmut Klauck, Danupon Nanongkai, Gopal Pandurangan, and Peter Robinson. 2015. Distributed computation of large-scale graph problems. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 391--410. Google ScholarDigital Library
- Eyal Kushilevitz and Noam Nisan. 1997. Communication Complexity. Cambridge University Press. Google ScholarDigital Library
- Shay Kutten, Gopal Pandurangan, David Peleg, Peter Robinson, and Amitabh Trehan. 2015. Sublinear bounds for randomized leader election. Theoretical Computer Science 561 (2015), 134--143. Google ScholarDigital Library
- Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. 2011. Filtering: A method for solving graph problems in MapReduce. In Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 85--94. Google ScholarDigital Library
- Christoph Lenzen. 2013. Optimal deterministic routing and sorting on the congested clique. In Proceedings of the 32nd ACM Symposium on Principles of Distributed Computing (PODC). 42--50. Google ScholarDigital Library
- Christoph Lenzen and Roger Wattenhofer. 2016. Tight bounds for parallel randomized load balancing. Distributed Computing 29, 2 (2016), 127--142. Google ScholarDigital Library
- Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press. Google ScholarDigital Library
- Zvi Lotker, Boaz Patt-Shamir, Elan Pavlov, and David Peleg. 2005. Minimum-weight spanning tree construction in communication rounds. SIAM Journal of Computing 35, 1 (2005), 120--131. Google ScholarDigital Library
- Nancy A. Lynch. 1996. Distributed Algorithms. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM International Conference on Management of Data (SIGMOD). 135--146. Google ScholarDigital Library
- Andrew McGregor. 2014. Graph stream algorithms: A survey. SIGMOD Record 43, 1 (2014), 9--20. Google ScholarDigital Library
- Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. Google ScholarDigital Library
- Danupon Nanongkai. 2014. Distributed approximation algorithms for weighted shortest paths. In Proceedings of the 46th ACM Symposium on Theory of Computing (STOC). 565--573. Google ScholarDigital Library
- Danupon Nanongkai, Atish Das Sarma, and Gopal Pandurangan. 2011. A tight unconditional lower bound on distributed randomwalk computation. In Proceedings of the 30th Annual ACM Symposium on Principles of Distributed Computing (PODC). 257--266. Google ScholarDigital Library
- Rotem Oshman. 2014. Communication complexity lower bounds in distributed message-passing. In Proceedings of the 21th International Colloquium on Structural Information and Communication Complexity (SIROCCO). 14--17.Google ScholarCross Ref
- Gopal Pandurangan, David Peleg, and Michele Scquizzato. 2016. Message lower bounds via efficient network synchronization. In Proceedings of the 23rd International Colloquium on Structural Information and Communication Complexity (SIROCCO). 75--91.Google ScholarCross Ref
- Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2016. Fast distributed algorithms for connectivity and MST in large graphs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 429--438. Google ScholarDigital Library
- Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2017. A time- and message-optimal distributed algorithm for minimum spanning trees. In Proceedings of the 49th Annual ACM Symposium on the Theory of Computing (STOC). 743--756. Google ScholarDigital Library
- Gopal Pandurangan, Peter Robinson, and Michele Scquizzato. 2018. On the distributed complexity of large-scale graph computations. In Proceedings of the 30th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). To appear.Google ScholarDigital Library
- David Peleg. 2000. Distributed Computing: A Locality-Sensitive Approach. Society for Industrial and Applied Mathematics. Google ScholarCross Ref
- Sriram V. Pemmaraju and Vivek B. Sardeshmukh. 2016. Super-fast MST algorithms in the congested clique using o(m) messages. In Proceedings of the 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). 47:1--47:15.Google Scholar
- Judy Qiu, Shantenu Jha, Andre Luckow, and Geoffrey C. Fox. 2014. Towards HPC-ABDS: An initial high-performance big data stack. Retrieved from http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdf.Google Scholar
- Isabelle Stanton. 2014. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 1287--1301. Google ScholarDigital Library
- Ramakrishna Thurimella. 1997. Sub-linear distributed algorithms for sparse certificates and biconnected components. Journal of Algorithms 23, 1 (1997), 160--179. Google ScholarDigital Library
- Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From “think like a vertex” to “think like a graph.” PVLDB 7, 3 (2013), 193--204. Google ScholarDigital Library
- Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM Journal on Computing 11, 2 (1982), 350--361.Google ScholarDigital Library
- Leslie G. Valiant. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8 (1990), 103--111. Google ScholarDigital Library
- Sergei Vassilvitskii. 2015. Models for Parallel Computation (A Hitchhikers’ Guide to Massively Parallel Universes). Retrieved from http://grigory.us/blog/massively-parallel-universes/.Google Scholar
- David P. Woodruff and Qin Zhang. 2017. When distributed computation is communication expensive. Distributed Computing 30, 5 (2017), 309--323. Google ScholarDigital Library
Index Terms
- Fast Distributed Algorithms for Connectivity and MST in Large Graphs
Recommendations
A Simple Deterministic Distributed MST Algorithm with Near-Optimal Time and Message Complexities
The distributed minimum spanning tree (MST) problem is one of the most central and fundamental problems in distributed graph algorithms. Kutten and Peleg devised an algorithm with running time O(D + √n ⋅ log* n), where D is the hop diameter of the input ...
Fast Distributed Algorithms for Connectivity and MST in Large Graphs
SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and ArchitecturesMotivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly ...
Improved Distributed Delta-Coloring
PODC '18: Proceedings of the 2018 ACM Symposium on Principles of Distributed ComputingWe present a randomized distributed algorithm that computes a Δ- coloring in any non-complete graph with maximum degree Δ ≥ 4 in O(log Δ) +2O( √ log log n) rounds, as well as a randomized algorithm that computes a Δ-coloring in O((log logn)2) rounds ...
Comments