skip to main content
research-article

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

We show that the problem of finding an optimal stochastic blind controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard in PSPACE and sqrt-sum-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is convex and admits efficient global solutions.

References

  1. Al-Khayyal, F. A. and Falk, J. E. 1983. Jointly constrained biconvex programming. Math. Operations Res. 8, 2, 273--286.Google ScholarGoogle ScholarCross RefCross Ref
  2. Allender, E., Bürgisser, P., Kjeldgaard-Pedersen, J., and Miltersen, P. B. 2009. On the complexity of numerical analysis. SIAM J. Comput. 38, 5, 1987--2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amato, C., Bernstein, D. S., and Zilberstein, S. 2007. Solving POMDPs using quadratically constrained linear programs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bernstein, D. S., Hansen, E. A., and Zilberstein, S. 2005. Bounded policy iteration for decentralized POMDPs. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Canny, J. F. 1988. Some algebraic and geometric computations in PSPACE. In Proceedings of the ACM Symposium on Theory of Computing. 460--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chrisman, L. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the 10th National Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Etessami, K. and Yannakakis, M. 2010. On the complexity of Nash equilibria and other fixed points. SIAM J. Comput. 39, 6, 2531--2597. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Garey, M. R., Graham, R. L., and Johnson, D. S. 1976. Some NP-complete geometric problems. In Proceedings of the ACM Symposium on Theory of Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hansen, E. 1998. Solving POMDPs by searching in policy space. In Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hastings, N. A. J. and Sadjadi, D. 1979. Markov programming with policy constraints. Euro. J. Oper. Res. 3, 253--255.Google ScholarGoogle ScholarCross RefCross Ref
  13. Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Littman, M. L. 1994. Memoryless policies: Theoretical limitations and practical results. In Proceedings of the 3rd International Conference on Simulation of Adaptive Behavior. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Littman, M. L., Goldsmith, J., and Mundhenk, M. 1998. The computational complexity of probabilistic planning. J. Artif. Intell. Res. 9, 1--36. Google ScholarGoogle ScholarCross RefCross Ref
  16. Lusena, C., Goldsmith, J., and Mundhenk, M. 2001. Nonapproximability results for partially observable Markov decision processes. J. Artif. Intell. Res. 14, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Madani, O., Hanks, S., and Condon, A. 1999. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Meuleau, N., Kim, K., Kaelbling, L., and Cassandra, A. 1999. Solving POMDPs by searching the space of finite policies. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Motzkin, T. S. and Straus, E. G. 1965. Maxima for graphs and a new proof of a theorem of Turán. Canadian J. Math. 17, 533--540.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mundhenk, M., Goldsmith, J., Lusena, C., and Allender, E. 2000. Complexity of finite-horizon Markov decision process problems. J. ACM 47, 681--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Papadimitriou, C. H. and Tsitsiklis, J. N. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Platzman, L. K. 1981. A feasible computational approach to infinite-horizon partially-observed Markov decision problems. Tech. rep., School of Industrial and Systems Engineering, Georgia Institute of Technology. J-81-2.Google ScholarGoogle Scholar
  23. Poupart, P. and Boutilier, C. 2004. Bounded finite state controllers. In Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf Eds., MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  24. Puterman, M. 1994. Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Serin, Y. and Kulkarni, V. G. 2005. Markov decision processes under observability constraints. Math. Methods Oper. Res. 61, 311--328.Google ScholarGoogle ScholarCross RefCross Ref
  26. Singh, S. P., Jaakkola, T., and Jordan, M. I. 1994. Learning without state-estimation in partially observable Markovian decision processes. In Proceedings of the 11th International Conference on Machine Learning.Google ScholarGoogle Scholar
  27. Smith, J. L. 1971. Markov decisions on a partitioned state space. IEEE Trans. Syst., Man., Cybern. 1, 55--60.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sondik, E. J. 1971. The optimal control of partially observable Markov decision processes. Ph.D. thesis, Stanford University.Google ScholarGoogle Scholar
  29. Toussaint, M., Storkey, A., and Harmeling, S. 2011. Expectation-Maximization methods for solving (PO)MDPs and optimal control problems. In Bayesian Time Series Models, D. Barber, A. T. Cemgil, and S. Chiappa Eds., Cambridge University Press.Google ScholarGoogle Scholar

Index Terms

  1. On the Computational Complexity of Stochastic Controller Optimization in POMDPs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader