Abstract
We show that the problem of finding an optimal stochastic blind controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard in PSPACE and sqrt-sum-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is convex and admits efficient global solutions.
- Al-Khayyal, F. A. and Falk, J. E. 1983. Jointly constrained biconvex programming. Math. Operations Res. 8, 2, 273--286.Google ScholarCross Ref
- Allender, E., Bürgisser, P., Kjeldgaard-Pedersen, J., and Miltersen, P. B. 2009. On the complexity of numerical analysis. SIAM J. Comput. 38, 5, 1987--2006. Google ScholarDigital Library
- Amato, C., Bernstein, D. S., and Zilberstein, S. 2007. Solving POMDPs using quadratically constrained linear programs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Bernstein, D. S., Hansen, E. A., and Zilberstein, S. 2005. Bounded policy iteration for decentralized POMDPs. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
- Canny, J. F. 1988. Some algebraic and geometric computations in PSPACE. In Proceedings of the ACM Symposium on Theory of Computing. 460--467. Google ScholarDigital Library
- Chrisman, L. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the 10th National Conference on Artificial Intelligence. Google ScholarDigital Library
- Etessami, K. and Yannakakis, M. 2010. On the complexity of Nash equilibria and other fixed points. SIAM J. Comput. 39, 6, 2531--2597. Google ScholarDigital Library
- Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY. Google ScholarDigital Library
- Garey, M. R., Graham, R. L., and Johnson, D. S. 1976. Some NP-complete geometric problems. In Proceedings of the ACM Symposium on Theory of Computing. Google ScholarDigital Library
- Hansen, E. 1998. Solving POMDPs by searching in policy space. In Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
- Hastings, N. A. J. and Sadjadi, D. 1979. Markov programming with policy constraints. Euro. J. Oper. Res. 3, 253--255.Google ScholarCross Ref
- Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99--134. Google ScholarDigital Library
- Littman, M. L. 1994. Memoryless policies: Theoretical limitations and practical results. In Proceedings of the 3rd International Conference on Simulation of Adaptive Behavior. Google ScholarDigital Library
- Littman, M. L., Goldsmith, J., and Mundhenk, M. 1998. The computational complexity of probabilistic planning. J. Artif. Intell. Res. 9, 1--36. Google ScholarCross Ref
- Lusena, C., Goldsmith, J., and Mundhenk, M. 2001. Nonapproximability results for partially observable Markov decision processes. J. Artif. Intell. Res. 14, 2001. Google ScholarDigital Library
- Madani, O., Hanks, S., and Condon, A. 1999. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence. Google ScholarDigital Library
- Meuleau, N., Kim, K., Kaelbling, L., and Cassandra, A. 1999. Solving POMDPs by searching the space of finite policies. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
- Motzkin, T. S. and Straus, E. G. 1965. Maxima for graphs and a new proof of a theorem of Turán. Canadian J. Math. 17, 533--540.Google ScholarCross Ref
- Mundhenk, M., Goldsmith, J., Lusena, C., and Allender, E. 2000. Complexity of finite-horizon Markov decision process problems. J. ACM 47, 681--720. Google ScholarDigital Library
- Papadimitriou, C. H. and Tsitsiklis, J. N. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450. Google ScholarDigital Library
- Platzman, L. K. 1981. A feasible computational approach to infinite-horizon partially-observed Markov decision problems. Tech. rep., School of Industrial and Systems Engineering, Georgia Institute of Technology. J-81-2.Google Scholar
- Poupart, P. and Boutilier, C. 2004. Bounded finite state controllers. In Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf Eds., MIT Press, Cambridge, MA.Google Scholar
- Puterman, M. 1994. Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York. Google ScholarDigital Library
- Serin, Y. and Kulkarni, V. G. 2005. Markov decision processes under observability constraints. Math. Methods Oper. Res. 61, 311--328.Google ScholarCross Ref
- Singh, S. P., Jaakkola, T., and Jordan, M. I. 1994. Learning without state-estimation in partially observable Markovian decision processes. In Proceedings of the 11th International Conference on Machine Learning.Google Scholar
- Smith, J. L. 1971. Markov decisions on a partitioned state space. IEEE Trans. Syst., Man., Cybern. 1, 55--60.Google ScholarCross Ref
- Sondik, E. J. 1971. The optimal control of partially observable Markov decision processes. Ph.D. thesis, Stanford University.Google Scholar
- Toussaint, M., Storkey, A., and Harmeling, S. 2011. Expectation-Maximization methods for solving (PO)MDPs and optimal control problems. In Bayesian Time Series Models, D. Barber, A. T. Cemgil, and S. Chiappa Eds., Cambridge University Press.Google Scholar
Index Terms
- On the Computational Complexity of Stochastic Controller Optimization in POMDPs
Recommendations
The Complexity of Markov Decision Processes
We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable ...
Strong computational lower bounds via parameterized complexity
We develop new techniques for deriving strong computational lower bounds for a class of well-known NP-hard problems. This class includes weighted satisfiability, dominating set, hitting set, set cover, clique, and independent set. For example, although ...
Towards average-case complexity analysis of NP optimization problems
SCT '95: Proceedings of the 10th Annual Structure in Complexity Theory Conference (SCT'95)For the worst-case complexity measure, if P=NP, then P=OptP, i.e., all NP optimization problems are polynomial-time solvable. On the other hand, it is not clear whether a similar relation holds when considering average-case complexity. We investigate ...
Comments