research-article

Open Access

Top-Down Synthesis for Library Learning

Authors:
Matthew Bowers

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA

0000-0001-8450-7033
View Profile

,
Theo X. Olausson

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA

0000-0001-6653-2227
View Profile

,
Lionel Wong

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA

0000-0001-8814-7629
View Profile

,
Gabriel Grand

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA

0000-0003-1920-0021
View Profile

,
Joshua B. Tenenbaum

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA

0000-0002-1925-2035
View Profile

,
Kevin Ellis

Cornell University, USA

Cornell University, USA

0000-0001-6586-0632
View Profile

,
Armando Solar-Lezama

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA

0000-0001-7604-8252
View Profile

Proceedings of the ACM on Programming Languages Volume 7 Issue POPLArticle No.: 41pp 1182–1213https://doi.org/10.1145/3571234

Published:11 January 2023Publication History

Proceedings of the ACM on Programming Languages

Abstract

This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch’s scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early—further allowing it to scale to challenging datasets by means of early stopping.

Supplemental Material

Available for Download

zip

popl23main-p278-p-archive.zip (1.5 MB)

Supplement for the paper Top-Down Synthesis for Library Learning (POPL 2023). stitch_appendix.pdf: Appendix for the paper stitch.v: Coq proof of the completeness of LambdaUnify from the paper Top-Down Synthesis for Library Learning (POPL 2023). Proven with CoqIDE 8.15.2.

References

Martin Abadi, Luca Cardelli, P-L Curien, and J-J Lévy. 1989. Explicit substitutions. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 31–46. Google ScholarDigital Library
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51, 4 (2018), 1–37. Google ScholarDigital Library
Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd acm sigsoft international symposium on foundations of software engineering. 472–483. Google ScholarDigital Library
Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989. Google Scholar
Matthew Bowers, Olausson, Wong, Grand, Tenenbaum, Ellis, and Solar-Lezama. 2022. Artifact for "Top-Down Synthesis For Library Learning". https://doi.org/10.5281/zenodo.7151663 Google ScholarDigital Library
Rod M Burstall and John Darlington. 1977. A transformation system for developing recursive programs. Journal of the ACM (JACM), 24, 1 (1977), 44–67. Google ScholarDigital Library
David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. 2023. babble: Learning Better Abstractions with E-Graphs and Anti-Unification. Proc. ACM Program. Lang., https://doi.org/10.1145/3571207 Google ScholarDigital Library
Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 487–502. https://doi.org/10.1145/3385412.3385988 Google ScholarDigital Library
Xinyun Chen, Chang Liu, and Dawn Song. 2018. Execution-guided neural program synthesis. In International Conference on Learning Representations. Google Scholar
Adam Chlipala, Benjamin Delaware, Samuel Duchovni, Jason Gross, Clément Pit-Claudel, Sorawit Suriyakarn, Peng Wang, and Katherine Ye. 2017. The end of history? Using a proof assistant to replace language design with library design. In 2nd Summit on Advances in Programming Languages (SNAPL 2017). Google Scholar
Geoffrey Chu and Peter J. Stuckey. 2015. Dominance breaking constraints. Constraints An Int. J., 20, 2 (2015), 155–182. https://doi.org/10.1007/s10601-014-9173-7 Google ScholarDigital Library
Andrew Cropper. 2019. Playgol: Learning programs through play. arXiv preprint arXiv:1904.08993. Google Scholar
Nicolaas Govert de Bruijn. 1972. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem. In Indagationes Mathematicae (Proceedings). 75, 381–392. Google ScholarCross Ref
Eyal Dechter, Jonathan Malmaud, Ryan Prescott Adams, and Joshua B Tenenbaum. 2013. Bootstrap learning via modular concept discovery. In Proceedings of the International Joint Conference on Artificial Intelligence. Google Scholar
Gilles Dowek, Thérèse Hardin, and Claude Kirchner. 1995. Higher-Order Unification via Explicit Substitutions. In Proceedings of the Tenth Annual Symposium on Logic in Computer Science, D. Kozen (Ed.). IEEE Computer Society Press, San Diego, California. 366–374. Google ScholarCross Ref
Gilles Dowek, Thérèse Hardin, Claude Kirchner, and Frank Pfenning. 1996. Unification via Explicit Substitutions: The Case of Higher-Order Patterns. In Proceedings of the Joint International Conference and Symposium on Logic Programming, M. Maher (Ed.). MIT Press, Bonn, Germany. 259–273. Google Scholar
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. 2020. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv:2006.08381. Google Scholar
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B Tenenbaum. 2021. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 835–850. Google ScholarDigital Library
Kevin M Ellis, Lucas E Morales, Mathias Sablé-Meyer, Armando Solar Lezama, and Joshua B Tenenbaum. 2018. Library learning for neurally-guided bayesian program induction. Google Scholar
Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci., 103, 2 (1992), sep, 235–271. issn:0304-3975 https://doi.org/10.1016/0304-3975(92)90014-7 Google ScholarDigital Library
John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. ACM SIGPLAN Notices, 50, 6 (2015), 229–239. Google ScholarDigital Library
Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Ali Eslami, and Oriol Vinyals. 2018. Synthesizing programs for images using reinforced adversarial learning. In International Conference on Machine Learning. 1666–1675. Google Scholar
Sumit Gulwani, José Hernández-Orallo, Emanuel Kitzelmann, Stephen H Muggleton, Ute Schmid, and Benjamin Zorn. 2015. Inductive programming meets the real world. Commun. ACM, 58, 11 (2015), 90–99. Google ScholarDigital Library
Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern., 4, 2 (1968), 100–107. https://doi.org/10.1109/TSSC.1968.300136 Google ScholarCross Ref
Robert John Henderson. 2013. Cumulative learning in the lambda calculus. Google Scholar
Gérard Huet. 1975. A Unification Algorithm for Typed λ -Calculus. Theoretical Computer Science, 1 (1975), 27–57. Google ScholarCross Ref
Irvin Hwang, Andreas Stuhlmüller, and Noah D Goodman. 2011. Inducing probabilistic programs by Bayesian program merging. arXiv preprint arXiv:1110.5667. Google Scholar
Toshihide Ibaraki. 1977. The Power of Dominance Relations in Branch-and-Bound Algorithms. J. ACM, 24, 2 (1977), 264–279. https://doi.org/10.1145/322003.322010 Google ScholarDigital Library
Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learning programmatic idioms for scalable semantic parsing. arXiv preprint arXiv:1904.09086. Google Scholar
Thomas Johnsson. 1985. Lambda Lifting: Treansforming Programs to Recursive Equations. In Functional Programming Languages and Computer Architecture, FPCA 1985, Nancy, France, September 16-19, 1985, Proceedings, Jean-Pierre Jouannaud (Ed.) (Lecture Notes in Computer Science, Vol. 201). Springer, 190–203. https://doi.org/10.1007/3-540-15975-4_37 Google ScholarCross Ref
R Kenny Jones, David Charatan, Paul Guerrero, Niloy J Mitra, and Daniel Ritchie. 2021. ShapeMOD: macro operation discovery for 3D shape programs. ACM Transactions on Graphics (TOG), 40, 4 (2021), 1–16. Google ScholarDigital Library
Manos Koukoutos, Mukund Raghothaman, Etienne Kneuss, and Viktor Kuncak. 2017. On repair with probabilistic attribute grammars. arXiv preprint arXiv:1707.04148. Google Scholar
Michihiro Kuramochi and George Karypis. 2001. Frequent Subgraph Discovery. In Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA, Nick Cercone, Tsau Young Lin, and Xindong Wu (Eds.). IEEE Computer Society, 313–320. https://doi.org/10.1109/ICDM.2001.989534 Google ScholarCross Ref
Michihiro Kuramochi and George Karypis. 2004. Finding Frequent Patterns in a Large Sparse Graph. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004, Michael W. Berry, Umeshwar Dayal, Chandrika Kamath, and David B. Skillicorn (Eds.). SIAM, 345–356. https://doi.org/10.1137/1.9781611972740.32 Google ScholarCross Ref
A. H. Land and A. G. Doig. 1960. An Automatic Method of Solving Discrete Programming Problems. Econometrica, 28, 3 (1960), 497–520. issn:00129682, 14680262 http://www.jstor.org/stable/1910129 Google ScholarCross Ref
Tessa Lau, Steven A Wolfman, Pedro Domingos, and Daniel S Weld. 2003. Programming by demonstration using version space algebra. Machine Learning, 53, 1 (2003), 111–156. Google ScholarDigital Library
Miguel Lázaro-Gredilla, Dianhuan Lin, J Swaroop Guntupalli, and Dileep George. 2019. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics, 4, 26 (2019), eaav3150. Google Scholar
Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016, Bernd Fischer and Ina Schaefer (Eds.). ACM, 70–80. https://doi.org/10.1145/2993236.2993244 Google ScholarDigital Library
Percy Liang, Michael I Jordan, and Dan Klein. 2010. Learning programs: A hierarchical Bayesian approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 639–646. Google Scholar
Dianhuan Lin, Eyal Dechter, Kevin Ellis, Joshua B Tenenbaum, and Stephen H Muggleton. 2014. Bias reformulation for one-shot function induction. Google Scholar
Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems (TOPLAS), 2, 1 (1980), 90–121. Google ScholarDigital Library
Dale Miller. 1991. A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification. Journal of Logic and Computation, 1, 4 (1991), 497–536. Google ScholarCross Ref
Dale Miller. 1992. Unification under a Mixed Prefix. Journal of Symbolic Computation, 14 (1992), 321–358. Google ScholarDigital Library
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network Motifs: Simple Building Blocks of Complex Networks. Science, 298, 5594 (2002), 824–827. https://doi.org/10.1126/science.298.5594.824 arxiv:https://www.science.org/doi/pdf/10.1126/science.298.5594.824. Google ScholarCross Ref
Tom M Mitchell. 1977. Version spaces: A candidate elimination approach to rule learning. In Proceedings of the 5th international joint conference on Artificial intelligence-Volume 1. 305–310. Google Scholar
David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discret. Optim., 19 (2016), 79–102. https://doi.org/10.1016/j.disopt.2016.01.005 Google ScholarDigital Library
Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B Tenenbaum, and Armando Solar-Lezama. 2021. Representing Partial Programs with Blended Abstract Semantics. In International Conference on Learning Representations. Google Scholar
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices, 51, 6 (2016), 522–538. Google ScholarDigital Library
Oleksandr Polozov and Sumit Gulwani. 2015. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 107–126. Google ScholarDigital Library
Falk Schreiber and Henning Schwöbbermeyer. 2005. Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks. Trans. Comp. Sys. Biology, 3 (2005), 89–104. https://doi.org/10.1007/11599128_7 Google ScholarCross Ref
Ameesh Shah, Eric Zhan, Jennifer Sun, Abhinav Verma, Yisong Yue, and Swarat Chaudhuri. 2020. Learning Differentiable Programs with Admissible Neural Heuristics. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 4940–4952. https://proceedings.neurips.cc/paper/2020/file/342285bb2a8cadef22f667eeb6a63732-Paper.pdf Google Scholar
Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27, 3 (1948), 379–423. Google Scholar
Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. 2019. Program synthesis and semantic parsing with learned code idioms. Advances in Neural Information Processing Systems, 32 (2019). Google Scholar
Thoralf Skolem. 1920. Logisch-kombinatorische Untersuchungen über die Erfüllbarkeit oder Bewiesbarkeit mathematischer Sätze nebst einem Theorem über dichte Mengen. Google Scholar
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang., 5, POPL (2021), Article 23, jan, 29 pages. https://doi.org/10.1145/3434304 Google ScholarDigital Library
Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, and Jacob Andreas. 2021. Leveraging language to learn program abstractions and search heuristics. In International Conference on Machine Learning. 11193–11204. Google Scholar
Catherine Wong, William McCarthy, Gabriel Grand, Jacob Andreas, Joshua B Tenenbaum, Robert Hawkins, and Judy Fan. 2022. Identifying concept libraries from language about object structure. In CogSci. To appear. Google Scholar

Index Terms

Top-Down Synthesis for Library Learning
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Branch-and-bound

Recommendations

Optimizing synthesis with metasketches
POPL '16

Many advanced programming tools---for both end-users and expert developers---rely on program synthesis to automatically generate implementations from high-level specifications. These tools often need to employ tricky, custom-built synthesis algorithms ...
Read More
Can reactive synthesis and syntax-guided synthesis be friends?
SPLASH Companion 2021: Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Read More
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Programming Languages Volume 7, Issue POPL
January 2023
2196 pages
EISSN:2475-1421
DOI:10.1145/3554308
Editor:
Issue’s Table of Contents
Copyright © 2023 Owner/Author
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 January 2023
Published in pacmpl Volume 7, Issue POPL

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available / v1.1
- Artifacts Evaluated & Reusable / v1.1
Author Tags
Abstraction Learning
Library Learning
Program Synthesis
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 1,776
  Total Downloads
- Downloads (Last 12 months)1,062
- Downloads (Last 6 weeks)113
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.