skip to main content

Top-Down Synthesis for Library Learning

Published:11 January 2023Publication History
Skip Abstract Section

Abstract

This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch’s scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early—further allowing it to scale to challenging datasets by means of early stopping.

Skip Supplemental Material Section

Supplemental Material

References

  1. Martin Abadi, Luca Cardelli, P-L Curien, and J-J Lévy. 1989. Explicit substitutions. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 31–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51, 4 (2018), 1–37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd acm sigsoft international symposium on foundations of software engineering. 472–483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989. Google ScholarGoogle Scholar
  5. Matthew Bowers, Olausson, Wong, Grand, Tenenbaum, Ellis, and Solar-Lezama. 2022. Artifact for "Top-Down Synthesis For Library Learning". https://doi.org/10.5281/zenodo.7151663 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rod M Burstall and John Darlington. 1977. A transformation system for developing recursive programs. Journal of the ACM (JACM), 24, 1 (1977), 44–67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. 2023. babble: Learning Better Abstractions with E-Graphs and Anti-Unification. Proc. ACM Program. Lang., https://doi.org/10.1145/3571207 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 487–502. https://doi.org/10.1145/3385412.3385988 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Xinyun Chen, Chang Liu, and Dawn Song. 2018. Execution-guided neural program synthesis. In International Conference on Learning Representations. Google ScholarGoogle Scholar
  10. Adam Chlipala, Benjamin Delaware, Samuel Duchovni, Jason Gross, Clément Pit-Claudel, Sorawit Suriyakarn, Peng Wang, and Katherine Ye. 2017. The end of history? Using a proof assistant to replace language design with library design. In 2nd Summit on Advances in Programming Languages (SNAPL 2017). Google ScholarGoogle Scholar
  11. Geoffrey Chu and Peter J. Stuckey. 2015. Dominance breaking constraints. Constraints An Int. J., 20, 2 (2015), 155–182. https://doi.org/10.1007/s10601-014-9173-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Andrew Cropper. 2019. Playgol: Learning programs through play. arXiv preprint arXiv:1904.08993. Google ScholarGoogle Scholar
  13. Nicolaas Govert de Bruijn. 1972. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem. In Indagationes Mathematicae (Proceedings). 75, 381–392. Google ScholarGoogle ScholarCross RefCross Ref
  14. Eyal Dechter, Jonathan Malmaud, Ryan Prescott Adams, and Joshua B Tenenbaum. 2013. Bootstrap learning via modular concept discovery. In Proceedings of the International Joint Conference on Artificial Intelligence. Google ScholarGoogle Scholar
  15. Gilles Dowek, Thérèse Hardin, and Claude Kirchner. 1995. Higher-Order Unification via Explicit Substitutions. In Proceedings of the Tenth Annual Symposium on Logic in Computer Science, D. Kozen (Ed.). IEEE Computer Society Press, San Diego, California. 366–374. Google ScholarGoogle ScholarCross RefCross Ref
  16. Gilles Dowek, Thérèse Hardin, Claude Kirchner, and Frank Pfenning. 1996. Unification via Explicit Substitutions: The Case of Higher-Order Patterns. In Proceedings of the Joint International Conference and Symposium on Logic Programming, M. Maher (Ed.). MIT Press, Bonn, Germany. 259–273. Google ScholarGoogle Scholar
  17. Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. 2020. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv:2006.08381. Google ScholarGoogle Scholar
  18. Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B Tenenbaum. 2021. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 835–850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kevin M Ellis, Lucas E Morales, Mathias Sablé-Meyer, Armando Solar Lezama, and Joshua B Tenenbaum. 2018. Library learning for neurally-guided bayesian program induction. Google ScholarGoogle Scholar
  20. Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci., 103, 2 (1992), sep, 235–271. issn:0304-3975 https://doi.org/10.1016/0304-3975(92)90014-7 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. ACM SIGPLAN Notices, 50, 6 (2015), 229–239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Ali Eslami, and Oriol Vinyals. 2018. Synthesizing programs for images using reinforced adversarial learning. In International Conference on Machine Learning. 1666–1675. Google ScholarGoogle Scholar
  23. Sumit Gulwani, José Hernández-Orallo, Emanuel Kitzelmann, Stephen H Muggleton, Ute Schmid, and Benjamin Zorn. 2015. Inductive programming meets the real world. Commun. ACM, 58, 11 (2015), 90–99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern., 4, 2 (1968), 100–107. https://doi.org/10.1109/TSSC.1968.300136 Google ScholarGoogle ScholarCross RefCross Ref
  25. Robert John Henderson. 2013. Cumulative learning in the lambda calculus. Google ScholarGoogle Scholar
  26. Gérard Huet. 1975. A Unification Algorithm for Typed λ -Calculus. Theoretical Computer Science, 1 (1975), 27–57. Google ScholarGoogle ScholarCross RefCross Ref
  27. Irvin Hwang, Andreas Stuhlmüller, and Noah D Goodman. 2011. Inducing probabilistic programs by Bayesian program merging. arXiv preprint arXiv:1110.5667. Google ScholarGoogle Scholar
  28. Toshihide Ibaraki. 1977. The Power of Dominance Relations in Branch-and-Bound Algorithms. J. ACM, 24, 2 (1977), 264–279. https://doi.org/10.1145/322003.322010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learning programmatic idioms for scalable semantic parsing. arXiv preprint arXiv:1904.09086. Google ScholarGoogle Scholar
  30. Thomas Johnsson. 1985. Lambda Lifting: Treansforming Programs to Recursive Equations. In Functional Programming Languages and Computer Architecture, FPCA 1985, Nancy, France, September 16-19, 1985, Proceedings, Jean-Pierre Jouannaud (Ed.) (Lecture Notes in Computer Science, Vol. 201). Springer, 190–203. https://doi.org/10.1007/3-540-15975-4_37 Google ScholarGoogle ScholarCross RefCross Ref
  31. R Kenny Jones, David Charatan, Paul Guerrero, Niloy J Mitra, and Daniel Ritchie. 2021. ShapeMOD: macro operation discovery for 3D shape programs. ACM Transactions on Graphics (TOG), 40, 4 (2021), 1–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Manos Koukoutos, Mukund Raghothaman, Etienne Kneuss, and Viktor Kuncak. 2017. On repair with probabilistic attribute grammars. arXiv preprint arXiv:1707.04148. Google ScholarGoogle Scholar
  33. Michihiro Kuramochi and George Karypis. 2001. Frequent Subgraph Discovery. In Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA, Nick Cercone, Tsau Young Lin, and Xindong Wu (Eds.). IEEE Computer Society, 313–320. https://doi.org/10.1109/ICDM.2001.989534 Google ScholarGoogle ScholarCross RefCross Ref
  34. Michihiro Kuramochi and George Karypis. 2004. Finding Frequent Patterns in a Large Sparse Graph. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004, Michael W. Berry, Umeshwar Dayal, Chandrika Kamath, and David B. Skillicorn (Eds.). SIAM, 345–356. https://doi.org/10.1137/1.9781611972740.32 Google ScholarGoogle ScholarCross RefCross Ref
  35. A. H. Land and A. G. Doig. 1960. An Automatic Method of Solving Discrete Programming Problems. Econometrica, 28, 3 (1960), 497–520. issn:00129682, 14680262 http://www.jstor.org/stable/1910129 Google ScholarGoogle ScholarCross RefCross Ref
  36. Tessa Lau, Steven A Wolfman, Pedro Domingos, and Daniel S Weld. 2003. Programming by demonstration using version space algebra. Machine Learning, 53, 1 (2003), 111–156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Miguel Lázaro-Gredilla, Dianhuan Lin, J Swaroop Guntupalli, and Dileep George. 2019. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics, 4, 26 (2019), eaav3150. Google ScholarGoogle Scholar
  38. Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016, Bernd Fischer and Ina Schaefer (Eds.). ACM, 70–80. https://doi.org/10.1145/2993236.2993244 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Percy Liang, Michael I Jordan, and Dan Klein. 2010. Learning programs: A hierarchical Bayesian approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 639–646. Google ScholarGoogle Scholar
  40. Dianhuan Lin, Eyal Dechter, Kevin Ellis, Joshua B Tenenbaum, and Stephen H Muggleton. 2014. Bias reformulation for one-shot function induction. Google ScholarGoogle Scholar
  41. Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems (TOPLAS), 2, 1 (1980), 90–121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dale Miller. 1991. A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification. Journal of Logic and Computation, 1, 4 (1991), 497–536. Google ScholarGoogle ScholarCross RefCross Ref
  43. Dale Miller. 1992. Unification under a Mixed Prefix. Journal of Symbolic Computation, 14 (1992), 321–358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network Motifs: Simple Building Blocks of Complex Networks. Science, 298, 5594 (2002), 824–827. https://doi.org/10.1126/science.298.5594.824 arxiv:https://www.science.org/doi/pdf/10.1126/science.298.5594.824. Google ScholarGoogle ScholarCross RefCross Ref
  45. Tom M Mitchell. 1977. Version spaces: A candidate elimination approach to rule learning. In Proceedings of the 5th international joint conference on Artificial intelligence-Volume 1. 305–310. Google ScholarGoogle Scholar
  46. David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discret. Optim., 19 (2016), 79–102. https://doi.org/10.1016/j.disopt.2016.01.005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B Tenenbaum, and Armando Solar-Lezama. 2021. Representing Partial Programs with Blended Abstract Semantics. In International Conference on Learning Representations. Google ScholarGoogle Scholar
  48. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices, 51, 6 (2016), 522–538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Oleksandr Polozov and Sumit Gulwani. 2015. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 107–126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Falk Schreiber and Henning Schwöbbermeyer. 2005. Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks. Trans. Comp. Sys. Biology, 3 (2005), 89–104. https://doi.org/10.1007/11599128_7 Google ScholarGoogle ScholarCross RefCross Ref
  51. Ameesh Shah, Eric Zhan, Jennifer Sun, Abhinav Verma, Yisong Yue, and Swarat Chaudhuri. 2020. Learning Differentiable Programs with Admissible Neural Heuristics. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 4940–4952. https://proceedings.neurips.cc/paper/2020/file/342285bb2a8cadef22f667eeb6a63732-Paper.pdf Google ScholarGoogle Scholar
  52. Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27, 3 (1948), 379–423. Google ScholarGoogle Scholar
  53. Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. 2019. Program synthesis and semantic parsing with learned code idioms. Advances in Neural Information Processing Systems, 32 (2019). Google ScholarGoogle Scholar
  54. Thoralf Skolem. 1920. Logisch-kombinatorische Untersuchungen über die Erfüllbarkeit oder Bewiesbarkeit mathematischer Sätze nebst einem Theorem über dichte Mengen. Google ScholarGoogle Scholar
  55. Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang., 5, POPL (2021), Article 23, jan, 29 pages. https://doi.org/10.1145/3434304 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, and Jacob Andreas. 2021. Leveraging language to learn program abstractions and search heuristics. In International Conference on Machine Learning. 11193–11204. Google ScholarGoogle Scholar
  57. Catherine Wong, William McCarthy, Gabriel Grand, Jacob Andreas, Joshua B Tenenbaum, Robert Hawkins, and Judy Fan. 2022. Identifying concept libraries from language about object structure. In CogSci. To appear. Google ScholarGoogle Scholar

Index Terms

  1. Top-Down Synthesis for Library Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader