ABSTRACT
When choosing between competing symbolic models for a data set, a human will naturally prefer the "simpler" expression or the one which more closely resembles equations previously seen in a similar context. This suggests a non-uniform prior on functions, which is, however, rarely considered within a symbolic regression (SR) framework. In this paper we develop methods to incorporate detailed prior information on both functions and their parameters into SR. Our prior on the structure of a function is based on a n-gram language model, which is sensitive to the arrangement of operators relative to one another in addition to the frequency of occurrence of each operator. We also develop a formalism based on the Fractional Bayes Factor to treat numerical parameter priors in such a way that models may be fairly compared though the Bayesian evidence, and explicitly compare Bayesian, Minimum Description Length and heuristic methods for model selection. We demonstrate the performance of our priors relative to literature standards on benchmarks and a real-world dataset from the field of cosmology.
Supplemental Material
Available for Download
Supplemental material.
- Deaglan J. Bartlett, Harry Desmond, and Pedro G. Ferreira. 2022. Exhaustive Symbolic Regression. arXiv e-prints, Article arXiv:2211.11461 (Nov. 2022), arXiv:2211.11461 pages. arXiv:2211.11461 [astro-ph.CO] Google ScholarCross Ref
- Sören Becker, Michal Klein, Alexander Neitz, Giambattista Parascandolo, and Niki Kilbertus. 2022. Discovering ordinary differential equations that govern time-series. arXiv e-prints, Article arXiv:2211.02830 (Nov. 2022), arXiv:2211.02830 pages. arXiv:2211.02830 [cs.LG] Google ScholarCross Ref
- James O. Berger and Luis R. Pericchi. 1996. The Intrinsic Bayes Factor for Model Selection and Prediction. J. Amer. Statist. Assoc. 91, 433 (1996), 109--122. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1996.10476668 Google ScholarCross Ref
- Jose M. Bernardo. 1979. Reference Posterior Distributions for Bayesian Inference. Journal of the Royal Statistical Society: Series B (Methodological) 41, 2 (1979), 113--128. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1979.tb01066.x Google ScholarCross Ref
- Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, and Giambattista Parascandolo. 2021. Neural Symbolic Regression that scales. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 936--945. https://proceedings.mlr.press/v139/biggio21a.htmlGoogle Scholar
- G. F. Bomarito, P. E. Leser, N. C. M. Strauss, K. M. Garbrecht, and J. D. Hochhalter. 2022. Bayesian Model Selection for Reducing Bloat and Overfitting in Genetic Programming for Symbolic Regression. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (Boston, Massachusetts) (GECCO '22). Association for Computing Machinery, New York, NY, USA, 526--529. Google ScholarDigital Library
- G. F. Bomarito, P. E. Leser, N. C. M. Strauss, K. M. Garbrecht, and J. D. Hochhalter. 2023. Automated learning of interpretable models with quantified uncertainty. Computer Methods in Applied Mechanics and Engineering 403 (Jan. 2023), 115732. arXiv:2205.01626 [cs.NE] Google ScholarCross Ref
- Jure Brence, Ljupčo Todorovski, and Sašo Džeroski. 2021. Probabilistic grammars for equation discovery. Knowledge-Based Systems 224 (2021), 107077. Google ScholarCross Ref
- Per A Brodtkorb and John D'Errico. 2015. numdifftools 0.9.11. https://github.com/pbrod/numdifftools.Google Scholar
- Dillon Brout et al. 2022. The Pantheon+ Analysis: Cosmological Constraints. arXiv e-prints, Article arXiv:2202.04077 (Feb. 2022), arXiv:2202.04077 pages. arXiv:2202.04077 [astro-ph.CO]Google Scholar
- Qi Chen, Bing Xue, and Mengjie Zhang. 2022. Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression. IEEE Transactions on Cybernetics 52, 4 (2022), 2382--2395. Google Scholar
- J.H. Conway, N.J.A. Sloane, E. Bannai, R.E. Borcherds, J. Leech, S.P. Norton, A.M. Odlyzko, R.A. Parker, L. Queen, and B.B. Venkov. 2013. Sphere Packings, Lattices and Groups. Springer New York. https://books.google.fr/books?id=hoTjBwAAQBAJGoogle Scholar
- T. M. Cover and J. A. Thomas. 1991. Elements of Information Theory (2nd ed.). Wiley.Google Scholar
- Miles Cranmer. 2020. PySR: Fast & Parallelized Symbolic Regression in Python/Julia. Google ScholarCross Ref
- Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. 2020. Discovering Symbolic Models from Deep Learning with Inductive Biases. NeurIPS 2020 (2020). arXiv:2006.11287 [cs.LG]Google Scholar
- Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, and François Charton. 2022. Deep Symbolic Regression for Recurrent Sequences. arXiv e-prints, Article arXiv:2201.04600 (Jan. 2022), arXiv:2201.04600 pages. arXiv:2201.04600 [cs.LG] Google ScholarCross Ref
- E. David. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley.Google Scholar
- R.P. Feynman, R.B. Leighton, and M. Sands. 2015. The Feynman Lectures on Physics, Vol. I: The New Millennium Edition: Mainly Mechanics, Radiation, and Heat. Number v. 1. Basic Books. https://books.google.fr/books?id=d76DBQAAQBAJGoogle Scholar
- R.P. Feynman, R.B. Leighton, M.L. Sands, and M.A. Gottlieb. 2006. The Feynman Lectures on Physics. Number v. 2 in The Feynman Lectures on Physics. Pearson/Addison-Wesley. https://books.google.fr/books?id=AbruAAAAMAAJGoogle Scholar
- R.P. Feynman, R.B. Leighton, M.L. Sands, and M.A. Gottlieb. 2006. The Feynman Lectures on Physics: Quantum mechanics. Pearson/Addison-Wesley. https://books.google.fr/books?id=_6XvAAAAMAAJGoogle Scholar
- William A. Gale and Geoffrey Sampson. 1995. Good-turing frequency estimation without tears. Journal of Quantitative Linguistics 2, 3 (1995), 217--237. arXiv:https://doi.org/10.1080/09296179508590051 Google ScholarCross Ref
- H. Goldstein, C.P. Poole, and J.L. Safko. 2002. Classical Mechanics. Addison Wesley. https://books.google.fr/books?id=tJCuQgAACAAJGoogle Scholar
- I. J. GOOD. 1953. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS. Biometrika 40, 3--4 (12 1953), 237--264. arXiv:https://academic.oup.com/biomet/article-pdf/40/3-4/237/492571/40-3-4-237.pdf Google ScholarCross Ref
- P. Grunwald. 2007. The Minimum Description Length Principle. MIT Press.Google ScholarDigital Library
- Peter Grünwald and Teemu Roos. 2019. Minimum Description Length Revisited. arXiv e-prints, Article arXiv:1908.08484 (Aug. 2019), arXiv:1908.08484 pages. arXiv:1908.08484 [stat.ME]Google Scholar
- Roger Guimerà, Ignasi Reichardt, Antoni Aguilar-Mogas, Francesco A. Massucci, Manuel Miranda, Jordi Pallarès, and Marta Sales-Pardo. 2020. A Bayesian machine scientist to aid in the solution of challenging scientific problems. Science Advances 6, 5 (2020), eaav6971. arXiv:https://www.science.org/doi/pdf/10.1126/sciadv.aav6971 Google ScholarCross Ref
- R. Haupt and S. Haupt. 2004. Practical genetic algorithms (2nd ed.). Wyley.Google Scholar
- Hitoshi Iba, Hugo De Garis, and Taisuke Sato. 1994. Genetic programming using a minimum description length principle. Advances in genetic programming 1 (1994), 265--284.Google Scholar
- John David Jackson. 1999. Classical electrodynamics (3rd ed. ed.). Wiley, New York, NY. http://cdsweb.cern.ch/record/490457Google Scholar
- Harold Jeffreys. 1946. An Invariant Form for the Prior Probability in Estimation Problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 186, 1007 (1946), 453--461. http://www.jstor.org/stable/97883Google ScholarCross Ref
- Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, and François Charton. 2022. End-to-end symbolic regression with transformers. arXiv e-prints, Article arXiv:2204.10532 (April 2022), arXiv:2204.10532 pages. arXiv:2204.10532 [cs.LG] Google ScholarCross Ref
- Lukas Kammerer, Gabriel Kronberger, Bogdan Burlacu, Stephan M. Winkler, Michael Kommenda, and Michael Affenzeller. 2021. Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication. arXiv e-prints, Article arXiv:2109.13895 (Sept. 2021), arXiv:2109.13895 pages. arXiv:2109.13895 [cs.LG]Google Scholar
- Slava M. Katz. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35 (1987), 400--401.Google ScholarCross Ref
- Michael Kommenda, Andreas Beham, Michael Affenzeller, and Gabriel Kronberger. 2021. Complexity Measures for Multi-objective Symbolic Regression. arXiv e-prints, Article arXiv:2109.00238 (Sept. 2021), arXiv:2109.00238 pages. arXiv:2109.00238 [cs.LG]Google Scholar
- Michael Korns. 2011. Accuracy in Symbolic Regression. 129--151. Google ScholarCross Ref
- Jiří Kubalík, Erik Derner, and Robert Babuška. 2023. Neural Networks for Symbolic Regression. arXiv e-prints, Article arXiv:2302.00773 (Feb. 2023), arXiv:2302.00773 pages. arXiv:2302.00773 [cs.NE] Google ScholarCross Ref
- Guillaume Lample and François Charton. 2019. Deep Learning for Symbolic Mathematics. arXiv e-prints, Article arXiv:1912.01412 (Dec. 2019), arXiv:1912.01412 pages. arXiv:1912.01412 [cs.SC] Google ScholarCross Ref
- Aaron D. Lanterman. 2001. Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection. International Statistical Review 69, 2 (2001), 185--212. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1751-5823.2001.tb00456.x Google ScholarCross Ref
- Pablo Lemos, Niall Jeffrey, Miles Cranmer, Shirley Ho, and Peter Battaglia. 2022. Rediscovering orbital mechanics with machine learning. arXiv e-prints, Article arXiv:2202.02306 (Feb. 2022), arXiv:2202.02306 pages. arXiv:2202.02306 [astroph.EP]Google Scholar
- F. B. Lempers. 1971. Posterior probabilities of alternative linear models. Some theoretical considerations and empirical experiments. [By] F. B. Lempers. Rotterdam University Press [Rotterdam]. 118 p. pages.Google Scholar
- Christopher D. Manning and Hinrich Schutze. 1999. Foundations of statistical natural language processing. MIT Press, Cambridge, Mass.Google ScholarDigital Library
- Trent McConaghy. 2011. FFX: Fast, Scalable, Deterministic Symbolic Regression Technology. Springer New York, New York, NY, 235--260. Google ScholarCross Ref
- James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaśkowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, and Una May O'Reilly. 2012. Genetic programming needs better benchmarks. In GECCO'12 - Proceedings of the 14th International Conference on Genetic and Evolutionary Computation (GECCO'12 - Proceedings of the 14th International Conference on Genetic and Evolutionary Computation). 791--798. 14th International Conference on Genetic and Evolutionary Computation, GECCO'12 ; Conference date: 07-07-2012 Through 11-07-2012. Google ScholarDigital Library
- Aaron Meurer et al. 2017. SymPy: symbolic computing in Python. PeerJ Computer Science 3 (Jan. 2017), e103. Google ScholarCross Ref
- Anthony O'Hagan. 1995. Fractional Bayes Factors for Model Comparison. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1 (1995), 99--138. http://www.jstor.org/stable/2346088Google ScholarCross Ref
- Adam G. Riess, Wenlong Yuan, Lucas M. Macri, Dan Scolnic, Dillon Brout, Stefano Casertano, David O. Jones, Yukei Murakami, Gagandeep S. Anand, Louise Breuval, Thomas G. Brink, Alexei V. Filippenko, Samantha Hoffmann, Saurabh W. Jha, W. D'arcy Kenworthy, John Mackenty, Benjamin E. Stahl, and WeiKang Zheng. 2022. A Comprehensive Measurement of the Local Value of the Hubble Constant with 1 km s−1 Mpc−1 Uncertainty from the Hubble Space Telescope and the SH0ES Team. ApJ 934, 1, Article L7 (July 2022), L7 pages. arXiv:2112.04510 [astro-ph.CO] Google ScholarCross Ref
- J. Rissanen. 1978. Modeling by shortest data description. Automatica 14, 5 (1978), 465--471. Google ScholarDigital Library
- Jorma Rissanen. 1983. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics 11, 2 (1983), 416 -- 431. Google ScholarCross Ref
- Daniel Rivero and Enrique Fernandez-Blanco. 2019. A New Deterministic Technique for Symbolic Regression. arXiv e-prints, Article arXiv:1908.06754 (Aug. 2019), arXiv:1908.06754 pages. arXiv:1908.06754 [cs.LG]Google Scholar
- M.D. Schwartz. 2014. Quantum Field Theory and the Standard Model. Cambridge University Press. https://books.google.fr/books?id=HbdEAgAAQBAJGoogle Scholar
- Gideon Schwarz. 1978. Estimating the Dimension of a Model. Annals of Statistics 6, 2 (July 1978), 461--464.Google ScholarCross Ref
- Dan Scolnic, Dillon Brout, Anthony Carr, Adam G. Riess, Tamara M. Davis, Arianna Dwomoh, David O. Jones, Noor Ali, Pranav Charvu, Rebecca Chen, Erik R. Peterson, Brodie Popovic, Benjamin M. Rose, Charlotte Wood, Peter J. Brown, Ken Chambers, David A. Coulter, Kyle G. Dettman, Georgios Dimitriadis, Alexei V. Filippenko, Ryan J. Foley, Saurabh W. Jha, Charles D. Kilpatrick, Robert P. Kirshner, Yen-Chen Pan, Armin Rest, Cesar Rojas-Bravo, Matthew R. Siebert, Benjamin E. Stahl, and WeiKang Zheng. 2021. The Pantheon+ Analysis: The Full Dataset and Light-Curve Release. arXiv e-prints, Article arXiv:2112.03863 (Dec. 2021), arXiv:2112.03863 pages. arXiv:2112.03863 [astro-ph.CO]Google Scholar
- Parshin Shojaee, Kazem Meidani, Amir Barati Farimani, and Chandan K. Reddy. 2023. Transformer-based Planning for Symbolic Regression. arXiv e-prints, Article arXiv:2303.06833 (March 2023), arXiv:2303.06833 pages. arXiv:2303.06833 [cs.LG] Google ScholarCross Ref
- Guido F. Smits and Mark Kotanchek. 2005. Pareto-Front Exploitation in Symbolic Regression. Springer US, Boston, MA, 283--299. Google ScholarCross Ref
- R.S. Sutton and A.G. Barto. 2018. Reinforcement Learning, second edition: An Introduction. MIT Press. https://books.google.co.uk/books?id=sWV0DwAAQBAJGoogle Scholar
- J. Takeuchi. 1997. Characterization of the Bayes estimator and the MDL estimator for exponential families. IEEE Transactions on Information Theory 43, 4 (1997), 1165--1174. Google ScholarDigital Library
- A. M. Turing. 1950. I.---COMPUTING MACHINERY AND INTELLIGENCE. Mind LIX, 236 (10 1950), 433--460. Google ScholarCross Ref
- Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max Tegmark. 2020. AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. arXiv e-prints, Article arXiv:2006.10782 (June 2020), arXiv:2006.10782 pages. arXiv:2006.10782 [cs.LG]Google Scholar
- Silviu-Marian Udrescu and Max Tegmark. 2020. AI Feynman: A physics-inspired method for symbolic regression. Science Advances 6, 16 (April 2020), eaay2631. arXiv:1905.11481 [physics.comp-ph] Google ScholarCross Ref
- Nguyen Quang Uy, Nguyen Xuan Hoai, Michael O'Neill, Robert I. McKay, and Edgar Galván López. 2011. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet. Program. Evolvable Mach. 12, 2 (2011), 91--119. Google ScholarDigital Library
- Mojtaba Valipour, Bowen You, Maysum Panju, and Ali Ghodsi. 2021. SymbolicGPT: A Generative Transformer Model for Symbolic Regression. arXiv e-prints, Article arXiv:2106.14131 (June 2021), arXiv:2106.14131 pages. arXiv:2106.14131 [cs.LG] Google ScholarCross Ref
- Martin Vastl, Jonáš Kulhánek, Jiří Kubalík, Erik Derner, and Robert Babuška. 2022. SymFormer: End-to-end symbolic regression using transformer-based architecture. arXiv e-prints, Article arXiv:2205.15764 (May 2022), arXiv:2205.15764 pages. arXiv:2205.15764 [cs.LG] Google ScholarCross Ref
- Ekaterina J. Vladislavleva, Guido F. Smits, and Dick den Hertog. 2009. Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming. IEEE Transactions on Evolutionary Computation 13, 2 (2009), 333--349. Google ScholarDigital Library
- C. S. Wallace and P. R. Freeman. 1987. Estimation and Inference by Compact Coding. Journal of the Royal Statistical Society. Series B (Methodological) 49, 3 (1987), 240--265. http://www.jstor.org/stable/2985992Google ScholarCross Ref
- C. S. Wallace and P. R. Freeman. 1992. Single-Factor Analysis by Minimum Message Length Estimation. Journal of the Royal Statistical Society: Series B (Methodological) 54, 1 (1992), 195--209. arXiv:https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2517-6161.1992.tb01874.x Google ScholarCross Ref
- Steven Weinberg. 1972. Gravitation and Cosmology: Principles and Applications of the General Theory of Relativity. John Wiley and Sons, New York.Google Scholar
- Matthias Werner, Andrej Junginger, Philipp Hennig, and Georg Martius. 2022. Uncertainty in equation learning. In GECCO '22: Genetic and Evolutionary Computation Conference, Companion Volume, Boston, Massachusetts, USA, July 9 -- 13, 2022, Jonathan E. Fieldsend and Markus Wagner (Eds.). ACM, 2298--2305. Google ScholarDigital Library
- Tony Worm and Kenneth Chiu. 2013. Prioritized Grammar Enumeration: Symbolic Regression by Dynamic Programming. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (Amsterdam, The Netherlands) (GECCO '13). Association for Computing Machinery, New York, NY, USA, 1021--1028. Google ScholarDigital Library
Index Terms
- Priors for symbolic regression
Recommendations
Conjugate priors and variable selection for Bayesian quantile regression
Bayesian variable selection in quantile regression models is often a difficult task due to the computational challenges and non-availability of conjugate prior distributions. These challenges are rarely addressed via either penalized likelihood function ...
Learning locally minimax optimal Bayesian networks
We consider the problem of learning Bayesian network models in a non-informative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different ...
Variational Bayesian multinomial probit regression with Gaussian process priors
It is well known in the statistics literature that augmenting binary and polychotomous response models with gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation ...
Comments