Considering Unseen States as Impossible in Factored Reinforcement Learning

Kozlova, Olga; Sigaud, Olivier; Wuillemin, Pierre-Henri; Meyer, Christophe

doi:10.1007/978-3-642-04180-8_64

Considering Unseen States as Impossible in Factored Reinforcement Learning

Olga Kozlova^22,23,
Olivier Sigaud²²,
Pierre-Henri Wuillemin²⁴ &
…
Christophe Meyer²⁵

Conference paper

2537 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Abstract

The Factored Markov Decision Process (fmdp) framework is a standard representation for sequential decision problems under uncertainty where the state is represented as a collection of random variables. Factored Reinforcement Learning (frl) is an Model-based Reinforcement Learning approach to fmdps where the transition and reward functions of the problem are learned. In this paper, we show how to model in a theoretically well-founded way the problems where some combinations of state variable values may not occur, giving rise to impossible states. Furthermore, we propose a new heuristics that considers as impossible the states that have not been seen so far. We derive an algorithm whose improvement in performance with respect to the standard approach is illustrated through benchmark experiments.

Download to read the full chapter text

Chapter PDF

References

Boutilier, C.: Correlated action effects in decision theoretic regression. In: Proceedings of the 13th International Conference on Uncertainty in Artificial Intelligence, pp. 30–37. AUAI Press (1997)
Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, pp. 1104–1111 (1995)
Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121, 49–100 (2000)
Article MathSciNet MATH Google Scholar
Butz, M.V., Goldberg, D.E., Stolzmann, W.: The Anticipatory Classifier System and Genetic Generalization. Natural Computing 1, 427–467 (2002)
Article MathSciNet MATH Google Scholar
Dean, T., Kanazawa, K.: A model for reasoning about persistence and causation. Computational Intelligence 5, 142–150 (1989)
Article Google Scholar
Degris, T., Sigaud, O., Wuillemin, P.-H.: Chi-square tests driven method for learning the structure of factored MDPs. In: Proceedings of the 22nd International Conference on Uncertainty in Artificial Intelligence, Massachusetts Institute of Technology, pp. 122–129. AUAI Press, Cambridge (2006a)
Google Scholar
Degris, T., Sigaud, O., Wuillemin, P.-H.: Learning the structure of factored markov decision processes in reinforcement learning problems. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 257–264. ACM Press, Pittsburgh (2006b)
Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research 19, 399–468 (2003)
MathSciNet MATH Google Scholar
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: Stochastic planning using decision diagrams. In: Proceedings of the 15th Conference on International Conference on Uncertainty in Artificial Intelligence, Stockholm, pp. 279–288 (1999)
Google Scholar
Sigaud, O., Butz, M.V., Kozlova, O., Meyer, C.: Anticipatory Learning Classifier Systems and Factored Reinforcement Learning. In: ABiALS 2008. LNCS (LNAI), vol. 5499, Springer. Heidelberg (2009)
Google Scholar
Sigaud, O., Wilson, S.W.: Learning Classifier Systems: a survey. Journal of Soft Computing 11, 1065–1078 (2007)
Article MATH Google Scholar
Sutton, R.S.: DYNA, an integrated architecture for learning, planning and reacting. In: Working Notes of the AAAI Spring Symposium on Integrated Intelligent Architectures (1991)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)
Google Scholar
Szita, I., Lőrincz, A.: The many faces of optimism: a unifying approach. In: ICML 2008: Proceedings of the 25th international conference on Machine learning, pp. 1048–1055. ACM Press, New York (2008)
Chapter Google Scholar
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut des Systèmes Intelligents et de Robotique, Université Pierre et Marie Curie - Paris 6, CNRS UMR 7222, 4 place Jussieu, F-75005, Paris, France
Olga Kozlova & Olivier Sigaud
Thales Security Solutions & Services, Simulation, 1 rue du Général de Gaulle, Osny BP 226, F95523, Cergy Pontoise, France
Olga Kozlova
Laboratoire d’Informatique de Paris 6, Université Pierre et Marie Curie - Paris 6, CNRS UMR 7606, 4 place Jussieu, F-75005, Paris, France
Pierre-Henri Wuillemin
Thales Security Solutions & Services, ThereSIS Research and Innovation Office, Campus Polytechnique 1, avenue Augustin Fresnel, 91767, Palaiseau, France
Christophe Meyer

Authors

Olga Kozlova
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Sigaud
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-Henri Wuillemin
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kozlova, O., Sigaud, O., Wuillemin, PH., Meyer, C. (2009). Considering Unseen States as Impossible in Factored Reinforcement Learning. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_64

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics