Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Automated discovery of algorithms from data

Abstract

To automate the discovery of new scientific and engineering principles, artificial intelligence must distill explicit rules from experimental data. This has proven difficult because existing methods typically search through the enormous space of possible functions. Here we introduce deep distilling, a machine learning method that does not perform searches but instead learns from data using symbolic essence neural networks and then losslessly condenses the network parameters into a concise algorithm written in computer code. This distilled code, which can contain loops and nested logic, is equivalent to the neural network but is human-comprehensible and orders-of-magnitude more compact. On arithmetic, vision and optimization tasks, the distilled code is capable of out-of-distribution systematic generalization to solve cases orders-of-magnitude larger and more complex than the training data. The distilled algorithms can sometimes outperform human-designed algorithms, demonstrating that deep distilling is able to discover generalizable principles complementary to human expertise.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Deep distilling automatically writes computer code.
Fig. 2: Deep distilling the rules of cellular automata.
Fig. 3: Deep distilling learns generalizable algorithms written as code.
Fig. 4: Distilled algorithms can outperform human-designed algorithms.

Similar content being viewed by others

Data availability

The datasets used in this work are included with the code. Source Data are provided with this paper.

Code availability

The code used to distill ENNs has been deposited at CodeOcean31 and at https://github.com/pauljblazek/deepdistilling.

References

  1. Arrieta, A. B. et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).

    Article  Google Scholar 

  2. Gunning, D. & Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Magazine 40, 44–58 (2019).

    Article  Google Scholar 

  3. Marcus, G. Deep learning: a critical appraisal. Preprint at http://arxiv.org/abs/1801.00631 (2018).

  4. Chen, M. et al. Evaluating large language models trained on code. Preprint at http://arxiv.org/abs/2107.03374 (2021).

  5. Austin, J. et al. Program synthesis with large language models. Preprint at http://arxiv.org/abs/2108.07732 (2021).

  6. Li, Y. et al. Competition-level code generation with AlphaCode. Science 378, 1092–1097 (2022).

    Article  ADS  CAS  PubMed  Google Scholar 

  7. Zelikman, E. et al. Parsel: algorithmic reasoning with language models by composing decompositions. Preprint at http://arxiv.org/abs/2212.10561 (2023).

  8. Romera-Paredes, B. et al. Mathematical discoveries from program search with large language models. Nature 625, 468-475 (2023).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  9. Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gulwani, S. Automating string processing in spreadsheets using input–output examples. In POPL ’11: Proc. 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Vol. 46, 317–330 (ACM, 2011).

  11. Gulwani, S. et al. Inductive programming meets the real world. Commun. ACM 58, 90–99 (2015).

    Article  Google Scholar 

  12. Raedt, L. D. et al. (eds) Approaches and Applications of Inductive Programming (Dagstuhl Seminar 19202) (Dagstuhl, 2019).

  13. Kitzelmann, E. Inductive Programming: A Survey of Program Synthesis Techniques in Approaches and Applications of Inductive Programming (eds Schmid, U. et al.) 50–73 (Springer, 2010).

  14. Balog, M., Gaunt, A. L., Brockschmidt, M., Nowozin, S. & Tarlow, D. DeepCoder: learning to write programs. In 5th Int. Conf. Learn. Represent. (2017).

  15. Polozov, O. & Gulwani, S. FlashMeta: a framework for inductive program synthesis. In Proc. 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications 107–126 (ACM, 2015).

  16. Blazek, P. J. & Lin, M. M. Explainable neural networks that simulate reasoning. Nat. Comput. Sci. 1, 607–618 (2021).

    Article  PubMed  Google Scholar 

  17. Kautz, H. A. The third AI summer: AAAI Robert S. Engelmore memorial lecture. AI Magazine 43, 105–125 (2022).

    Article  Google Scholar 

  18. Besold, T. R. et al. in Neuro-symbolic artificial intelligence: the state of the art. (eds Hitzler, P. & Sarker, M. K.) Ch. 1 (IOS Press, 2022).

  19. McCulloch, W. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115–133 (1943).

    Article  MathSciNet  Google Scholar 

  20. Mitchell, M. in Non-standard ComputationMolecular Computation, Cellular Automata, Evolutionary Algorithms, Quantum Computers (eds Gramß, T. et al.) Ch. 4 (Wiley, 2005); https://doi.org/10.1002/3527602968.ch4

  21. Wolfram, S. Statistical mechanics of cellular automata. Rev. Modern Phys. 55, 601–644 (1983).

    Article  ADS  MathSciNet  Google Scholar 

  22. Lake, B. M. & Baroni, M. Human-like systematic generalization through a meta-learning neural network. Nature 623, 115–121 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3–71 (1988).

    Article  CAS  PubMed  Google Scholar 

  24. Gardner, M. Mathematical games: the fantastic combinations of John Conway’s new solitaire game life. Scientific American 223, 120–123 (1970).

    Article  Google Scholar 

  25. Rendell, P. A universal Turing machine in Conway’s Game of Life. In 2011 International Conference on High Performance Computing Simulation 764–772 (IEEE, 2011).

  26. Karp, R. Reducibility among combinatorial problems. In Proc. Complexity of Computer Computations Vol. 40, 85–103 (Springer, 1972).

  27. Poloczek, M., Schnitger, G., Williamson, D. & Zuylen, A. Greedy algorithms for the maximum satisfiability problem: simple algorithms and inapproximability bounds. SIAM J. Comput. 46, 1029–1061 (2017).

    Article  MathSciNet  Google Scholar 

  28. Mukhopadhyay, P. & Chaudhuri, B. B. A survey of Hough transform. Pattern Recognition 48, 993–1010 (2015).

    Article  ADS  Google Scholar 

  29. Adams, G. S., Converse, B. A., Hales, A. H. & Klotz, L. E. People systematically overlook subtractive changes. Nature 592, 258–261 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  30. McCluskey, E. J. Minimization of Boolean functions. Bell Syst. Tech. J. 35, 1417–1444 (1956).

    Article  MathSciNet  Google Scholar 

  31. Blazek, P. B. & Lin, M. M. Deep distilling: automated algorithm discovery using explainable deep learning. Code Ocean https://doi.org/10.24433/CO.6047170.v1 (2024).

Download references

Acknowledgements

This work was supported by the UTSW High Risk/High Impact grant.

Author information

Authors and Affiliations

Authors

Contributions

P.J.B. and M.M.L. conceptualized the work, wrote the paper, and performed the visualizations and methodology. M.M.L. acquired funding. P.J.B. and K.V. performed investigations and co-wrote the software.

Corresponding author

Correspondence to Milo M. Lin.

Ethics declarations

Competing interests

P.J.B. and M.M.L. are co-authors on international patent applications related to ENNs (PCT/US2021/019470) and to deep distilling (PCT/US2022/040885). K.V. declares no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Joseph Bakarji and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, discussion and examples of output code generated by the method described.

Reporting Summary

Source data

Source Data Fig. 2

Source data files (.csv).

Source Data Fig. 4

Source data files (.csv).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blazek, P.J., Venkatesh, K. & Lin, M.M. Automated discovery of algorithms from data. Nat Comput Sci 4, 110–118 (2024). https://doi.org/10.1038/s43588-024-00593-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-024-00593-9

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics