skip to main content
10.1145/3514221.3517885acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open Access

Balsa: Learning a Query Optimizer Without Expert Demonstrations

Authors Info & Claims
Published:11 June 2022Publication History

ABSTRACT

Query optimizers are a performance-critical component in every database system. Due to their complexity, optimizers take experts months to write and years to refine. In this work, we demonstrate for the first time that learning to optimize queries without learning from an expert optimizer is both possible and efficient. We present Balsa, a query optimizer built by deep reinforcement learning. Balsa first learns basic knowledge from a simple, environment-agnostic simulator, followed by safe learning in real execution. On the Join Order Benchmark, Balsa matches the performance of two expert query optimizers, both open-source and commercial, with two hours of learning, and outperforms them by up to 2.8× in workload runtime after a few more hours. Balsa thus opens the possibility of automatically learning to optimize in future compute environments where expert-designed optimizers do not exist.

References

  1. Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et al. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG), Vol. 38, 4 (2019), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, et al. 2019. Solving rubik's cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019).Google ScholarGoogle Scholar
  3. Thomas Anthony, Zheng Tian, and David Barber. 2017. Thinking Fast and Slow with Deep Learning and Tree Search. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). 5366--5376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sophie Cluet and Guido Moerkotte. 1995. On the complexity of generating optimal left-deep processing trees with cross products. In International Conference on Database Theory. Springer, 54--67.Google ScholarGoogle ScholarCross RefCross Ref
  5. PostgreSQL developers. [n.,d.]. Commit history of the PostgreSQL optimizer. https://github.com/postgres/postgres/commits/master/src/backend/optimizer/. [Online; accessed February, 2021].Google ScholarGoogle Scholar
  6. NTT OSS Center DBMS Development and Support Team. 2020. pg_hint_plan. https://github.com/ossc-db/pg_hint_plan.Google ScholarGoogle Scholar
  7. Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, and Surajit Chaudhuri. 2019. Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, Vol. 12, 9 (2019), 1044--1057.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Benjamin Hilprecht, Carsten Binnig, and Uwe Röhm. 2020 a. Learning a Partitioning Advisor for Cloud Databases. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 143--157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020 b. DeepDB: Learn from Data, not from Queries! Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 992--1005.Google ScholarGoogle Scholar
  10. Andy Kimball. 2018. How We Built a Cost-Based SQL Optimizer. https://www.cockroachlabs.com/blog/building-cost-based-sql-optimizer/. [Online; accessed December, 2020].Google ScholarGoogle Scholar
  11. Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter A. Boncz, and Alfons Kemper. 2019. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR 2019, 9th Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 13--16, 2019, Online Proceedings.Google ScholarGoogle Scholar
  12. Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2018. Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR, Vol. abs/1808.03196 (2018). arxiv: 1808.03196 http://arxiv.org/abs/1808.03196Google ScholarGoogle Scholar
  13. Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Viktor Leis, Bernhard Radke, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2018. Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal (2018), 1--26.Google ScholarGoogle Scholar
  15. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD/PODS '21). Association for Computing Machinery, New York, NY, USA, 1275--1288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow., Vol. 12, 11 (July 2019), 1705--1718.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A distributed framework for emerging AI applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 561--577.Google ScholarGoogle Scholar
  18. Arjun Narayan. 2020. Materialize: Roadmap to Building a Streaming Database on Timely Dataflow. https://materialize.com/blog-roadmap/. [Online; accessed December, 2020].Google ScholarGoogle Scholar
  19. Devin Petersohn, Stephen Macke, Doris Xin, William Ma, Doris Lee, Xiangxi Mo, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D. Joseph, and Aditya Parameswaran. 2020. Towards Scalable Dataframe Systems. Proc. VLDB Endow., Vol. 13, 12 (July 2020), 2033--2046.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. arXiv preprint arXiv:2009.03393 (2020).Google ScholarGoogle Scholar
  21. Anthony G Read. 2006. DeWitt clauses: Can we protect purchasers without hurting Microsoft. Rev. Litig., Vol. 25 (2006), 387.Google ScholarGoogle Scholar
  22. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. 2020. Mastering atari, go, chess and shogi by planning with a learned model. Nature, Vol. 588, 7839 (2020), 604--609.Google ScholarGoogle Scholar
  23. P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (Boston, Massachusetts) (SIGMOD '79). Association for Computing Machinery, New York, NY, USA, 23--34.Google ScholarGoogle Scholar
  24. Suraj Shetiya, Saravanan Thirumuruganathan, Nick Koudas, and Gautam Das. 2020. Astrid: accurate selectivity estimation for string predicates using deep learning. Proceedings of the VLDB Endowment, Vol. 14, 4 (2020), 471--484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529, 7587 (01 Jan 2016), 484--489.Google ScholarGoogle Scholar
  26. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of Go without human knowledge. Nature, Vol. 550, 7676 (01 Oct 2017), 354--359.Google ScholarGoogle Scholar
  27. Ji Sun and Guoliang Li. 2019. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, Vol. 13, 3 (2019), 307--319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction .MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 23--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19). ACM, New York, NY, USA, 1153--1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  32. Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354.Google ScholarGoogle Scholar
  33. Florian Waas and Arjan Pellenkoft. 2000. Join order selection (good enough is easy). In British National Conference on Databases. Springer, 51--67.Google ScholarGoogle ScholarCross RefCross Ref
  34. Xiaoying Wang, Changbo Qu, Weiyuan Wu, Jiannan Wang, and Qingqing Zhou. 2020. Are We Ready For Learned Cardinality Estimation? arXiv preprint arXiv:2012.06743 (2020).Google ScholarGoogle Scholar
  35. Eric W Weisstein. [n.,d.]. Mode. MathWorld--A Wolfram Web Resource. https://mathworld.wolfram.com/Mode.html.Google ScholarGoogle Scholar
  36. Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, and Sriram Rao. 2018. Towards a learning optimizer for shared clouds. Proceedings of the VLDB Endowment, Vol. 12, 3 (2018), 210--222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zongheng Yang, Amog Kamsetty, Sifei Luan, Eric Liang, Yan Duan, Xi Chen, and Ion Stoica. 2020. NeuroCard: One Cardinality Estimator for All Tables. Proc. VLDB Endow., Vol. 14, 1 (Sept. 2020), 61--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow., Vol. 13, 3 (Nov. 2019), 279--292.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
    June 2022
    2597 pages
    ISBN:9781450392495
    DOI:10.1145/3514221

    Copyright © 2022 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 11 June 2022

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate785of4,003submissions,20%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader