United We Stand: Population Based Methods for Solving Unknown POMDPs

Welsh, Noel; Wyatt, Jeremy

doi:10.1007/978-3-540-89722-4_19

United We Stand: Population Based Methods for Solving Unknown POMDPs

Noel Welsh³ &
Jeremy Wyatt³

Conference paper

1066 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5323))

Abstract

Solving large unknown POMDPs is an open research problem. Policy search is one solution method that is attractive as it scales in the size of the policy, which is typically much simpler than the environment. We present a global search algorithm capable of finding good policies for POMDPs that are substantially larger than previously reported results. Our algorithm is general; we show it can be used with, and improves the performance of, existing local search techniques such as gradient ascent. Sharing information between the members of the population is the key to our algorithm and we show it results in better performance than equivalent parallel searches that do not share information. Unlike previous work our algorithm does not require the size of the policy to be known in advance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aberdeen, D.A.: Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. Ph.D thesis, The Australian National University (2003)
Google Scholar
Beal, M., Ghahramani, Z., Rasmussen, C.E.: The infinite hidden Markov model. In: Advances in Neural Information Processing Systems, vol. 14, pp. 577–585. MIT Press, Cambridge (2002)
Google Scholar
Glickman, M.R., Sycara, K.: Evolutionary search, stochastic policies with memory, and reinforcement learning with hidden state. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 194–201 (2001)
Google Scholar
Kirkpatrick Jr., S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Article MathSciNet MATH Google Scholar
Littman, M.L.: Algorithms for Sequential Decision Making. Ph.D thesis, Brown University (1996)
Google Scholar
McCallum, A.: Reinforcement Learning with Selective Perception and Hidden State. Ph.D thesis, Department of Computer Science, University of Rochester (1995)
Google Scholar
Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Technical report, Department of Computer Science, University of Toronto (1993)
Google Scholar
Peshkin, L., Meuleau, N., Kaelbling, L.P.: Learning policies with external memory. In: Proceedings of the Sixteenth International Conference on Machine Learning, pp. 307–314 (1999)
Google Scholar
Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1025–1032 (2003)
Google Scholar
Strens, M.J.A., Moore, A.W.: Direct policy search using paired statistical tests. In: Proc. 18th International Conf. on Machine Learning, pp. 545–552. Morgan Kaufmann, San Francisco (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK
Noel Welsh & Jeremy Wyatt

Authors

Noel Welsh
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy Wyatt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INRIA Lille-Nord Europe, 59650, Villeneuve d’Ascq, France
Sertan Girgin
INRIA, LIFL, CNRS, Université de Lille, Villeneuve d’Ascq, France
Manuel Loth , Rémi Munos , Philippe Preux & Daniil Ryabko , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Welsh, N., Wyatt, J. (2008). United We Stand: Population Based Methods for Solving Unknown POMDPs. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-89722-4_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics