Abstract
Abstract Probabilistic programming languages allow programmers to write down conditional probability distributions that represent statistical and machine learning models as programs that use observe statements. These programs are run by accumulating likelihood at each observe statement, and using the likelihood to steer random choices and weigh results with inference algorithms such as importance sampling or MCMC. We argue that naive likelihood accumulation does not give desirable semantics and leads to paradoxes when an observe statement is used to condition on a measure-zero event, particularly when the observe statement is executed conditionally on random data. We show that the paradoxes disappear if we explicitly model measure-zero events as a limit of positive measure events, and that we can execute these type of probabilistic programs by accumulating infinitesimal probabilities rather than probability densities. Our extension improves probabilistic programming languages as an executable notation for probability distributions by making it more well-behaved and more expressive, by allowing the programmer to be explicit about which limit is intended when conditioning on an event of measure zero.
- Nathanael L. Ackermann, Cameron E. Freer, and Daniel M. Roy. 2017. On computability and disintegration. Mathematical Structures in Computer Science 27, 8 ( 2017 ), 1287-1314. https://doi.org/10.1017/S0960129516000098 Google ScholarCross Ref
- Bob Carpenter, Andrew Gelman, Matthew Hofman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: A Probabilistic Programming Language. Journal of Statistical Software, Articles 76, 1 ( 2017 ), 1-32. https://doi.org/10.18637/jss.v076.i01 Google ScholarCross Ref
- Joseph Chang and David Pollard. 1997. Conditioning as disintegration. Statistica Neerlandica 51, 3 ( 1997 ), 287-317. https://doi.org/10.1111/ 1467-9574. 00056 Google ScholarCross Ref
- Fredrik Dahlqvist and Dexter Kozen. 2020. Semantics of higher-order probabilistic programs with conditioning, In POPL. PACMPL. https://doi.org/10.1145/3371125 Google ScholarDigital Library
- Noah Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and Joshua B. Tenenbaum. 2008. Church: a language for generative models. In UAI. 220-229. https://doi.org/10.5555/2969033.2969207 Google ScholarDigital Library
- Chris Heunen, Ohad Kammar, Sam Staton, and Hongseok Yang. 2017. A convenient category for higher-order probability theory. In LICS. 1-12. https://doi.org/10.1109/LICS. 2017.8005137 Google ScholarCross Ref
- Jules Jacobs. 2020. Paradoxes of Probabilistic Programming: Artifact. https://doi.org/10.5281/zenodo.4075076 Google ScholarDigital Library
- Edwin Thompson Jaynes. 2003. Probability theory: The logic of science. Cambridge University Press, Cambridge.Google Scholar
- Brooks Paige, Frank Wood, Arnaud Doucet, and Yee Whye Teh. 2014. Asynchronous Anytime Sequential Monte Carlo. In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 3410-3418.Google Scholar
- Chung-Chieh Shan and Norman Ramsey. 2017. Exact Bayesian inference by symbolic disintegration, In POPL. PACMPL, 130-144. https://doi.org/10.1145/3009837.3009852 Google ScholarDigital Library
- Sam Staton. 2017. Commutative Semantics for Probabilistic Programming. In Proceedings of the 26th European Symposium on Programming Languages and Systems-Volume 10201. Springer-Verlag, Berlin, Heidelberg, 855-879. https://doi.org/10. 1007/978-3-662-54434-1_32 Google ScholarDigital Library
- David Tolpin, Jan-Willem van de Meent, Brooks Paige, and Frank Wood. 2015. Output-Sensitive Adaptive MetropolisHastings for Probabilistic Programs. In Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, Vol. 9285. 311-326. https://doi.org/10.1007/978-3-319-23525-7_19 Google ScholarCross Ref
- Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, and Frank Wood. 2018. An Introduction to Probabilistic Programming. arXiv:arXiv: 1809.10756Google Scholar
- John von Neumann. 1951. Various Techniques Used in Connection with Random Digits. In Monte Carlo Method. National Bureau of Standards Applied Mathematics Series, Vol. 12. US Government Printing Ofice, Washington, DC, Chapter 13, 36-38.Google Scholar
- Frank Wood, Jan-Willem van de Meent, and Vikash Mansinghka. 2014. A New Approach to Probabilistic Programming Inference. In AISTATS 2014 ( JMLR Proceedings). JMLR.org, 1024-1032. http://jmlr.org/proceedings/papers/v33/wood14. htmlGoogle Scholar
- Yi Wu, Siddharth Srivastava, Nicholas Hay, Simon Du, and Stuart Russell. 2018. Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80 ), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 5343-5352. http://proceedings.mlr.press/v80/wu18f.htmlGoogle Scholar
Index Terms
- Paradoxes of probabilistic programming: and how to condition on events of measure zero with infinitesimal probabilities
Recommendations
Probabilistic abductive logic programming using Dirichlet priors
Probabilistic programming is an area of research that aims to develop general inference algorithms for probabilistic models expressed as probabilistic programs whose execution corresponds to inferring the parameters of those models. In this paper, we ...
Probabilistic programming with stochastic variational message passing
AbstractStochastic approximation methods for variational inference have recently gained popularity in the probabilistic programming community since these methods are amenable to automation and allow online, scalable, and universal approximate ...
Practical probabilistic programming with monads
Haskell '15The machine learning community has recently shown a lot of interest in practical probabilistic programming systems that target the problem of Bayesian inference. Such systems come in different forms, but they all express probabilistic models as ...
Comments