skip to main content
research-article
Open Access

Optimizing Distributed Protocols with Query Rewrites

Published:26 March 2024Publication History
Skip Abstract Section

Abstract

Distributed protocols such as 2PC and Paxos lie at the core of many systems in the cloud, but standard implementations do not scale. New scalable distributed protocols are developed through careful analysis and rewrites, but this process is ad hoc and error-prone. This paper presents an approach for scaling any distributed protocol by applying rule-driven rewrites, borrowing from query optimization. Distributed protocol rewrites entail a new burden: reasoning about spatiotemporal correctness. We leverage order-insensitivity and data dependency analysis to systematically identify correct coordination-free scaling opportunities. We apply this analysis to create preconditions and mechanisms for coordination-free decoupling and partitioning, two fundamental vertical and horizontal scaling techniques. Manual rule-driven applications of decoupling and partitioning improve the throughput of 2PC by 5× and Paxos by 3×, and match state-of-the-art throughput in recent work. These results point the way toward automated optimizers for distributed protocols based on correct-by-construction rewrite rules.

References

  1. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. http://webdam.inria.fr/Alice/pdfs/all.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ittai Abraham, Guy Gueta, Dahlia Malkhi, Lorenzo Alvisi, Ramakrishna Kotla, and Jean-Philippe Martin. 2017. Revisiting Fast Practical Byzantine Fault Tolerance. CoRR, Vol. abs/1712.01367 (2017). arxiv: 1712.01367 http://arxiv.org/abs/1712.01367Google ScholarGoogle Scholar
  3. Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, and Tevfik Kosar. 2020. WPaxos: Wide Area Network Flexible Consensus. IEEE Transactions on Parallel and Distributed Systems, Vol. 31, 1 (2020), 211--223. https://doi.org/10.1109/TPDS.2019.2929793Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter Alvaro, Tom J Ameloot, Joseph M Hellerstein, William Marczak, and Jan Van den Bussche. 2011a. A declarative semantics for Dedalus. UC Berkeley EECS Technical Report, Vol. 120 (2011), 2011.Google ScholarGoogle Scholar
  5. Peter Alvaro, Neil Conway, Joseph M. Hellerstein, and William R. Marczak. 2011b. Consistency Analysis in Bloom: a CALM and Collected Approach. In Fifth Biennial Conference on Innovative Data Systems Research, CIDR 2011, Asilomar, CA, USA, January 9--12, 2011, Online Proceedings. 249--260. http://cidrdb.org/cidr2011/Papers/CIDR11_Paper35.pdfGoogle ScholarGoogle Scholar
  6. Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, and Russell Sears. 2011c. Dedalus: Datalog in Time and Space. In Datalog Reloaded, Oege de Moor, Georg Gottlob, Tim Furche, and Andrew Sellers (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 262--281.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. 2017. Parallel-Correctness and Transferability for Conjunctive Queries. Journal of the ACM, Vol. 64, 5 (Oct. 2017), 1--38. https://doi.org/10.1145/3106412Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mohammad Javad Amiri, Chenyuan Wu, Divyakant Agrawal, Amr El Abbadi, Boon Thau Loo, and Mohammad Sadoghi. 2022. The bedrock of bft: A unified platform for bft protocol design and implementation. arXiv preprint arXiv:2205.04534 (2022).Google ScholarGoogle Scholar
  9. Carolyn Jane Anderson, Nate Foster, Arjun Guha, Jean-Baptiste Jeannin, Dexter Kozen, Cole Schlesinger, and David Walker. 2014. NetKAT: Semantic foundations for networks. Acm sigplan notices, Vol. 49, 1 (2014), 113--126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mahesh Balakrishnan, Chen Shen, Ahmed Jafri, Suyog Mapara, David Geraghty, Jason Flinn, Vidhya Venkat, Ivailo Nedelchev, Santosh Ghosh, Mihir Dharamshi, Jingming Liu, Filip Gruszczynski, Jun Li, Rounak Tibrewal, Ali Zaveri, Rajeev Nagar, Ahmed Yossef, Francois Richard, and Yee Jiun Song. 2021. Log-Structured Protocols in Delos. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 538--552. https://doi.org/10.1145/3477132.3483544Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christian Berger and Hans P Reiser. 2018. Scaling byzantine consensus: A broad analysis. In Proceedings of the 2nd workshop on scalable and resilient infrastructures for distributed ledgers. 13--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Robert D Blumofe, Christopher F Joerg, Bradley C Kuszmaul, Charles E Leiserson, Keith H Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. ACM SigPlan Notices, Vol. 30, 8 (1995), 207--216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Pierre Lemarinier, and Jack Dongarra. 2012. DAGuE: A generic distributed DAG engine for high performance computing. Parallel Comput., Vol. 38, 1--2 (2012), 37--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review, Vol. 44, 3 (2014), 87--95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering, Vol. 38, 4 (2015).Google ScholarGoogle Scholar
  16. David Chu, Rithvik Panchapakesan, Shadaj Laddad, Lucky Katahanas, Chris Liu, Kaushik Shivakumar, Natacha Crooks, Joseph M. Hellerstein, and Heidi Howard. 2024. Optimizing Distributed Protocols with Query Rewrites [Technical Report]. https://github.com/rithvikp/autocomp.Google ScholarGoogle Scholar
  17. Neil Conway, Peter Alvaro, Emily Andrews, and Joseph M Hellerstein. 2014. Edelweiss: Automatic storage reclamation for distributed programming. Proceedings of the VLDB Endowment, Vol. 7, 6 (2014), 481--492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Neil Conway, William R Marczak, Peter Alvaro, Joseph M Hellerstein, and David Maier. 2012. Logic and lattices for distributed programming. In Proceedings of the Third ACM Symposium on Cloud Computing. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design I& Implementation - Volume 6 (San Francisco, CA) (OSDI'04). USENIX Association, USA, 10.Google ScholarGoogle Scholar
  20. Amol Deshpande, Zachary Ives, Vijayshankar Raman, et al. 2007. Adaptive query processing. Foundations and Trends® in Databases, Vol. 1, 1 (2007), 1--140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David DeWitt and Jim Gray. 1992. Parallel database systems. Commun. ACM, Vol. 35, 6 (June 1992), 85--98. https://doi.org/10.1145/129888.129894Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, and M. Muralikrishna. 1986. GAMMA - A High Performance Dataflow Database Machine. In VLDB. 228--237.Google ScholarGoogle Scholar
  23. Cong Ding, David Chu, Evan Zhao, Xiang Li, Lorenzo Alvisi, and Robbert Van Renesse. 2020. Scalog: Seamless Reconfiguration and Total Order in a Scalable Shared Log. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). USENIX Association, Santa Clara, CA, 325--338. https://www.usenix.org/conference/nsdi20/presentation/dingGoogle ScholarGoogle Scholar
  24. Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. 1988. Consensus in the presence of partial synchrony. Journal of the ACM (JACM), Vol. 35, 2 (1988), 288--323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shinya Fushimi, Masaru Kitsuregawa, and Hidehiko Tanaka. 1986. An Overview of The System Software of A Parallel Relational Database Machine GRACE.. In VLDB, Vol. 86. 209--219.Google ScholarGoogle Scholar
  26. Sumit Ganguly, Avi Silberschatz, and Shalom Tsur. 1990. A framework for the parallel processing of datalog queries. ACM SIGMOD Record, Vol. 19, 2 (1990), 143--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Gaetano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick. 2019. Parallel-Correctness and Containment for Conjunctive Queries with Union and Negation. ACM Transactions on Computational Logic, Vol. 20, 3 (July 2019), 1--24. https://doi.org/10.1145/3329120Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gaetano Geck, Frank Neven, and Thomas Schwentick. 2020. Distribution Constraints: The Chase for Distributed Data. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.ICDT.2020.13Google ScholarGoogle ScholarCross RefCross Ref
  29. Rachid Guerraoui, Nikola Knevz ević, Vivien Quéma, and Marko Vukolić. 2010. The next 700 BFT protocols. In Proceedings of the 5th European conference on Computer systems. 363--376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Suyash Gupta, Mohammad Javad Amiri, and Mohammad Sadoghi. 2023. Chemistry behind Agreement. In Conference on Innovative Data Systems Research (CIDR).(2023).Google ScholarGoogle Scholar
  31. Chris Hawblitzel, Jon Howell, Manos Kapritsos, Jacob R Lorch, Bryan Parno, Michael L Roberts, Srinath Setty, and Brian Zill. 2015. IronFleet: proving practical distributed systems correct. In Proceedings of the 25th Symposium on Operating Systems Principles. 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Joseph M. Hellerstein and Peter Alvaro. 2020. Keeping CALM: When Distributed Consistency is Easy. Commun. ACM, Vol. 63, 9 (Aug. 2020), 72--81. https://doi.org/10.1145/3369736Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Maurice P Herlihy and Jeannette M Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 12, 3 (1990), 463--492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Martin Hirzel, Robert Soulé, Buug ra Gedik, and Scott Schneider. 2018. Stream Query Optimization. Springer International Publishing, 1--9.Google ScholarGoogle Scholar
  35. Heidi Howard and Ittai Abraham. 2020. Raft does not Guarantee Liveness in the face of Network Faults. https://decentralizedthoughts.github.io/2020--12--12-raft-liveness-full-omission/.Google ScholarGoogle Scholar
  36. Heidi Howard, Dahlia Malkhi, and Alexander Spiegelman. 2016. Flexible paxos: Quorum intersection revisited. arXiv preprint arXiv:1608.06696 (2016).Google ScholarGoogle Scholar
  37. Heidi Howard and Richard Mortier. 2020. Paxos vs Raft. In Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data. ACM. https://doi.org/10.1145/3380787.3393681Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007. 59--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mohammad M Jalalzai, Costas Busch, and Golden G Richard. 2019. Proteus: A scalable BFT consensus protocol for blockchains. In 2019 IEEE international conference on Blockchain (Blockchain). IEEE, 308--313.Google ScholarGoogle ScholarCross RefCross Ref
  40. Bas Ketsman and Christoph Koch. 2020. Datalog with Negation and Monotonicity. In 23rd International Conference on Database Theory (ICDT 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 155), Carsten Lutz and Jean Christoph Jung (Eds.). Schloss Dagstuhl--Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 19:1--19:18. https://doi.org/10.4230/LIPIcs.ICDT.2020.19Google ScholarGoogle ScholarCross RefCross Ref
  41. Bas Ketsman, Paraschos Koutris, et al. 2022. Modern Datalog Engines. Foundations and Trends® in Databases, Vol. 12, 1 (2022), 1--68.Google ScholarGoogle ScholarCross RefCross Ref
  42. Igor Konnov, Jure Kukovec, and Thanh-Hai Tran. 2019. TLA model checking made symbolic. Proceedings of the ACM on Programming Languages, Vol. 3, OOPSLA (2019), 1--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Leslie Lamport. 1998. The Part-Time Parliament. ACM Trans. Comput. Syst., Vol. 16, 2 (May 1998), 133--169. https://doi.org/10.1145/279227.279229Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Leslie Lamport. 2002. Specifying systems: the TLA language and tools for hardware and software engineers. (2002).Google ScholarGoogle Scholar
  45. Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E Gay, Joseph M Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica. 2009. Declarative networking. Commun. ACM, Vol. 52, 11 (2009), 87--95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C Mohan, Bruce Lindsay, and Ron Obermarck. 1986. Transaction management in the R* distributed database management system. ACM Transactions on Database Systems (TODS), Vol. 11, 4 (1986), 378--396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in Egalitarian parliaments. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM. https://doi.org/10.1145/2517349.2517350Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Inderpal Singh Mumick and Oded Shmueli. 1995. How expressive is stratified aggregation? Annals of Mathematics and Artificial Intelligence, Vol. 15 (1995), 407--435.Google ScholarGoogle ScholarCross RefCross Ref
  49. Ray Neiheiser, Miguel Matos, and Lu'is Rodrigues. 2021. Kauri: Scalable bft consensus with pipelined tree-based dissemination and aggregation. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 35--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Chris Newcombe, Tim Rath, Fan Zhang, Bogdan Munteanu, Marc Brooker, and Michael Deardeuff. 2015. How Amazon web services uses formal methods. Commun. ACM, Vol. 58, 4 (2015), 66--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Diego Ongaro. 2014. Consensus : bridging theory and practice. Ph.,D. Dissertation. Stanford University.Google ScholarGoogle Scholar
  52. Kenneth J. Perry and Sam Toueg. 1986. Distributed agreement in the presence of processor and communication faults. IEEE Transactions on Software Engineering, Vol. SE-12, 3 (1986), 477--482. https://doi.org/10.1109/TSE.1986.6312888Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. George Pirlea. 2023. Errors found in distributed protocols. https://github.com/dranov/protocol-bugs-list.Google ScholarGoogle Scholar
  54. Mingwei Samuel, Joseph M Hellerstein, and Alvin Cheung. 2021. Hydroflow: A Model and Runtime for Distributed Systems Programming. (2021).Google ScholarGoogle Scholar
  55. Bruhathi Sundarmurthy, Paraschos Koutris, and Jeffrey Naughton. 2021. Locality-Aware Distribution Schemes. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.ICDT.2021.22Google ScholarGoogle ScholarCross RefCross Ref
  56. Florian Suri-Payer, Matthew Burke, Zheng Wang, Yunhao Zhang, Lorenzo Alvisi, and Natacha Crooks. 2021. Basil. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles CD-ROM. ACM. https://doi.org/10.1145/3477132.3483552Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Pierre Sutra. 2020. On the correctness of Egalitarian Paxos. Inform. Process. Lett., Vol. 156 (2020), 105901. https://doi.org/10.1016/j.ipl.2019.105901Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Immanuel Trummer, Samuel Moseley, Deepak Maram, Saehan Jo, and Joseph Antonakakis. 2018. Skinnerdb: regret-bounded query evaluation via reinforcement learning. Proceedings of the VLDB Endowment, Vol. 11, 12 (2018), 2074--2077.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Robbert Van Renesse and Deniz Altinbuken. 2015. Paxos Made Moderately Complex. ACM Comput. Surv., Vol. 47, 3, Article 42 (Feb. 2015), 36 pages. https://doi.org/10.1145/2673577Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Robbert van Renesse, Nicolas Schiper, and Fred B. Schneider. 2015. Vive La Diffé rence: Paxos vs. Viewstamped Replication vs. Zab. IEEE Transactions on Dependable and Secure Computing, Vol. 12, 4 (July 2015), 472--484. https://doi.org/10.1109/tdsc.2014.2355848Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhaoguo Wang, Changgeng Zhao, Shuai Mu, Haibo Chen, and Jinyang Li. 2019. On the Parallels between Paxos and Raft, and how to Port Optimizations. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing. ACM. https://doi.org/10.1145/3293611.3331595Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Michael Whittaker. 2020. mwhittaker/craq_bug. https://github.com/mwhittaker/craq_bug.Google ScholarGoogle Scholar
  63. Michael Whittaker, Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Neil Giridharan, Joseph M. Hellerstein, Heidi Howard, Ion Stoica, and Adriana Szekeres. 2021a. Scaling Replicated State Machines with Compartmentalization. Proc. VLDB Endow., Vol. 14, 11 (July 2021), 2203--2215. https://doi.org/10.14778/3476249.3476273Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Michael Whittaker, Ailidani Ailijiang, Aleksey Charapko, Murat Demirbas, Neil Giridharan, Joseph M. Hellerstein, Heidi Howard, Ion Stoica, and Adriana Szekeres. 2021b. Scaling Replicated State Machines with Compartmentalization [Technical Report]. arxiv: 2012.15762 [cs.DC]Google ScholarGoogle Scholar
  65. Michael Whittaker, Neil Giridharan, Adriana Szekeres, Joseph Hellerstein, and Ion Stoica. 2021c. SoK: A Generalized Multi-Leader State Machine Replication Tutorial. Journal of Systems Research, Vol. 1, 1 (2021).Google ScholarGoogle ScholarCross RefCross Ref
  66. James R Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D Ernst, and Thomas Anderson. 2015. Verdi: a framework for implementing and formally verifying distributed systems. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 357--368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jianan Yao, Runzhou Tao, Ronghui Gu, Jason Nieh, Suman Jana, and Gabriel Ryan. 2021. DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols.. In OSDI. 405--421.Google ScholarGoogle Scholar
  68. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 10). USENIX Association, Boston, MA. https://www.usenix.org/conference/hotcloud-10/spark-cluster-computing-working-setsGoogle ScholarGoogle Scholar
  69. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the twenty-fourth ACM symposium on operating systems principles. 423--438.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jingren Zhou, Per-Ake Larson, and Ronnie Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010). IEEE, 1060--1071.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Optimizing Distributed Protocols with Query Rewrites

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader