skip to main content
10.1145/1143844.1143963acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Published:25 June 2006Publication History

ABSTRACT

Inference in Markov Decision Processes has recently received interest as a means to infer goals of an observed action, policy recognition, and also as a tool to compute policies. A particularly interesting aspect of the approach is that any existing inference technique in DBNs now becomes available for answering behavioral question--including those on continuous, factorial, or hierarchical state representations. Here we present an Expectation Maximization algorithm for computing optimal policies. Unlike previous approaches we can show that this actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time. The algorithm is generic in that any inference technique can be utilized in the E-step. We demonstrate this for exact inference on a discrete maze and Gaussian belief state propagation in continuous stochastic optimal control problems.

References

  1. Attias, H. (2003). Planning by probabilistic inference. Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics.Google ScholarGoogle Scholar
  2. Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. Proc. of the 14th Int. Joint Conf. on Artificial Intelligence (IJCAI 1995) (pp. 1104--1111). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bui, H., Venkatesh, S., & West, G. (2002). Policy recognition in the abstract hidden markov models. Journal of Artificial Intelligence Research, 17, 451--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ghahramani, Z., & Jordan, M. I. (1995). Factorial hidden Markov models. Advances in Neural Information Processing Systems, NIPS (pp. 472--478). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research (JAIR), 19, 399--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. Proc. of Uncertainty in Artificial Intelligence (UAI 1998) (pp. 220--229). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Koller, D., & Parr, R. (1999). Computing factored value functions for policies in structured MDPs. Proc. of the 16th Int. Joint Conf. on Artificial Intelligence (IJCAI) (pp. 1332--1339). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kveton, B., & Hauskrecht, M. (2005). An MCMC approach to solving hybrid factored MDPs. Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI 2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Minka, T. (2001). A family of algorithms for approximate bayesian inference. PhD thesis, MIT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Murphy, K. (2002). Dynamic bayesian networks: Representation, inference and learning. PhD Thesis, UC Berkeley, Computer Science Division. See particularly the chapter on DBN at http://www.cs.ubc.ca/murphyk/Papers/dbnchapter.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ng, A. Y., Parr, R., & Koller, D. (1999). Policy search via density estimation. Advances in Neural Information Processing Systems 12 (pp. 1022--1028).Google ScholarGoogle Scholar
  12. Sutton, R., & Barto, A. (1998). Reinforcement learning. MIT Press, Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Verma, D., & Rao, R. P. N. (2006). Goal-based imitation as probabilistic inference over graphical models. Advances in Neural Information Processing Systems 18 (NIPS 2005).Google ScholarGoogle Scholar

Index Terms

  1. Probabilistic inference for solving discrete and continuous state Markov Decision Processes

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              ICML '06: Proceedings of the 23rd international conference on Machine learning
              June 2006
              1154 pages
              ISBN:1595933832
              DOI:10.1145/1143844

              Copyright © 2006 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 June 2006

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              ICML '06 Paper Acceptance Rate140of548submissions,26%Overall Acceptance Rate140of548submissions,26%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader