research-article

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

Authors:
Nikos Vlassis

University of Luxembourg

University of Luxembourg
View Profile

,
Michael L. Littman

Brown University

Brown University
View Profile

,
David Barber

University College London

University College London
View Profile

Authors Info & Claims

ACM Transactions on Computation Theory Volume 4 Issue 4Article No.: 12pp 1–8https://doi.org/10.1145/2382559.2382563

Published:01 November 2012Publication History

ACM Transactions on Computation Theory

Abstract

We show that the problem of finding an optimal stochastic blind controller in a Markov decision process is an NP-hard problem. The corresponding decision problem is NP-hard in PSPACE and sqrt-sum-hard, hence placing it in NP would imply breakthroughs in long-standing open problems in computer science. Our result establishes that the more general problem of stochastic controller optimization in POMDPs is also NP-hard. Nonetheless, we outline a special case that is convex and admits efficient global solutions.

References

Al-Khayyal, F. A. and Falk, J. E. 1983. Jointly constrained biconvex programming. Math. Operations Res. 8, 2, 273--286.Google ScholarCross Ref
Allender, E., Bürgisser, P., Kjeldgaard-Pedersen, J., and Miltersen, P. B. 2009. On the complexity of numerical analysis. SIAM J. Comput. 38, 5, 1987--2006. Google ScholarDigital Library
Amato, C., Bernstein, D. S., and Zilberstein, S. 2007. Solving POMDPs using quadratically constrained linear programs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
Bernstein, D. S., Hansen, E. A., and Zilberstein, S. 2005. Bounded policy iteration for decentralized POMDPs. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
Canny, J. F. 1988. Some algebraic and geometric computations in PSPACE. In Proceedings of the ACM Symposium on Theory of Computing. 460--467. Google ScholarDigital Library
Chrisman, L. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the 10th National Conference on Artificial Intelligence. Google ScholarDigital Library
Etessami, K. and Yannakakis, M. 2010. On the complexity of Nash equilibria and other fixed points. SIAM J. Comput. 39, 6, 2531--2597. Google ScholarDigital Library
Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY. Google ScholarDigital Library
Garey, M. R., Graham, R. L., and Johnson, D. S. 1976. Some NP-complete geometric problems. In Proceedings of the ACM Symposium on Theory of Computing. Google ScholarDigital Library
Hansen, E. 1998. Solving POMDPs by searching in policy space. In Proceedings of the 14th International Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
Hastings, N. A. J. and Sadjadi, D. 1979. Markov programming with policy constraints. Euro. J. Oper. Res. 3, 253--255.Google ScholarCross Ref
Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. 1998. Planning and acting in partially observable stochastic domains. Artif. Intell. 101, 99--134. Google ScholarDigital Library
Littman, M. L. 1994. Memoryless policies: Theoretical limitations and practical results. In Proceedings of the 3rd International Conference on Simulation of Adaptive Behavior. Google ScholarDigital Library
Littman, M. L., Goldsmith, J., and Mundhenk, M. 1998. The computational complexity of probabilistic planning. J. Artif. Intell. Res. 9, 1--36. Google ScholarCross Ref
Lusena, C., Goldsmith, J., and Mundhenk, M. 2001. Nonapproximability results for partially observable Markov decision processes. J. Artif. Intell. Res. 14, 2001. Google ScholarDigital Library
Madani, O., Hanks, S., and Condon, A. 1999. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proceedings of the 16th National Conference on Artificial Intelligence. Google ScholarDigital Library
Meuleau, N., Kim, K., Kaelbling, L., and Cassandra, A. 1999. Solving POMDPs by searching the space of finite policies. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
Motzkin, T. S. and Straus, E. G. 1965. Maxima for graphs and a new proof of a theorem of Turán. Canadian J. Math. 17, 533--540.Google ScholarCross Ref
Mundhenk, M., Goldsmith, J., Lusena, C., and Allender, E. 2000. Complexity of finite-horizon Markov decision process problems. J. ACM 47, 681--720. Google ScholarDigital Library
Papadimitriou, C. H. and Tsitsiklis, J. N. 1987. The complexity of Markov decision processes. Math. Oper. Res. 12, 3, 441--450. Google ScholarDigital Library
Platzman, L. K. 1981. A feasible computational approach to infinite-horizon partially-observed Markov decision problems. Tech. rep., School of Industrial and Systems Engineering, Georgia Institute of Technology. J-81-2.Google Scholar
Poupart, P. and Boutilier, C. 2004. Bounded finite state controllers. In Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf Eds., MIT Press, Cambridge, MA.Google Scholar
Puterman, M. 1994. Markov Decision Processes : Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York. Google ScholarDigital Library
Serin, Y. and Kulkarni, V. G. 2005. Markov decision processes under observability constraints. Math. Methods Oper. Res. 61, 311--328.Google ScholarCross Ref
Singh, S. P., Jaakkola, T., and Jordan, M. I. 1994. Learning without state-estimation in partially observable Markovian decision processes. In Proceedings of the 11th International Conference on Machine Learning.Google Scholar
Smith, J. L. 1971. Markov decisions on a partitioned state space. IEEE Trans. Syst., Man., Cybern. 1, 55--60.Google ScholarCross Ref
Sondik, E. J. 1971. The optimal control of partially observable Markov decision processes. Ph.D. thesis, Stanford University.Google Scholar
Toussaint, M., Storkey, A., and Harmeling, S. 2011. Expectation-Maximization methods for solving (PO)MDPs and optimal control problems. In Bayesian Time Series Models, D. Barber, A. T. Cemgil, and S. Chiappa Eds., Cambridge University Press.Google Scholar

Index Terms

On the Computational Complexity of Stochastic Controller Optimization in POMDPs
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

The Complexity of Markov Decision Processes

We investigate the complexity of the classical problem of optimal policy computation in Markov decision processes. All three variants of the problem finite horizon, infinite horizon discounted, and infinite horizon average cost were known to be solvable ...
Read More
Strong computational lower bounds via parameterized complexity

We develop new techniques for deriving strong computational lower bounds for a class of well-known NP-hard problems. This class includes weighted satisfiability, dominating set, hitting set, set cover, clique, and independent set. For example, although ...
Read More
Towards average-case complexity analysis of NP optimization problems
SCT '95: Proceedings of the 10th Annual Structure in Complexity Theory Conference (SCT'95)

For the worst-case complexity measure, if P=NP, then P=OptP, i.e., all NP optimization problems are polynomial-time solvable. On the other hand, it is not clear whether a similar relation holds when considering average-case complexity. We investigate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computation Theory Volume 4, Issue 4
November 2012
57 pages
ISSN:1942-3454
EISSN:1942-3462
DOI:10.1145/2382559
Issue’s Table of Contents

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2012
- Revised: 1 October 2012
- Accepted: 1 October 2012
- Received: 1 July 2011
Published in toct Volume 4, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Motzkin-Straus theorem
Partially observable Markov decision process
bilinear program
computational complexity
computations on polynomials
matrix fractional program
nonlinear optimization
stochastic controller
sum-of-square-roots problem
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 314
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

ACM Transactions on Computation Theory

Abstract

References

Cited By

Index Terms

Recommendations

The Complexity of Markov Decision Processes

Strong computational lower bounds via parameterized complexity

Towards average-case complexity analysis of NP optimization problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On the Computational Complexity of Stochastic Controller Optimization in POMDPs

ACM Transactions on Computation Theory

Abstract

References

Cited By

Index Terms

Recommendations

The Complexity of Markov Decision Processes

Strong computational lower bounds via parameterized complexity

Towards average-case complexity analysis of NP optimization problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media