Safe Exploration Techniques for Reinforcement Learning – An Overview

Pecka, Martin; Svoboda, Tomas

doi:10.1007/978-3-319-13823-7_31

Martin Pecka¹⁶ &
Tomas Svoboda^16,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8906))

Included in the following conference series:

International Workshop on Modelling and Simulation for Autonomous Systems

2607 Accesses
30 Citations
6 Altmetric

Abstract

We overview different approaches to safety in (semi)autonomous robotics. Particularly, we focus on how to achieve safe behavior of a robot if it is requested to perform exploration of unknown states. Presented methods are studied from the viewpoint of reinforcement learning, a partially-supervised machine learning method. To collect training data for this algorithm, the robot is required to freely explore the state space – which can lead to possibly dangerous situations. The role of safe exploration is to provide a framework allowing exploration while preserving safety. The examined methods range from simple algorithms to sophisticated methods based on previous experience or state prediction. Our overview also addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world). In the conclusion we also suggest several ways that are worth researching more thoroughly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, vol. 19, p. 1 (2007)
Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483 (2009)
Article Google Scholar
Barto, A.G., Sutton, R.S., Brouwer, P.S.: Associative search network: A reinforcement learning associative memory. Biological Cybernetics (1981)
Google Scholar
Bertsekas, D.P.: Dynamic programming: deterministic and stochastic models. Prentice-Hall (1987)
Google Scholar
Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: AAMAS 2007 Proceedings, p. 1. ACM Press (2007)
Google Scholar
Consortium, N.: NIFTi robotic UGV platform (2010)
Google Scholar
Coraluppi, S.P., Marcus, S.I.: Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes. Automatica (1999)
Google Scholar
Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 225–232. ACM Press, New York (2007)
Google Scholar
Ertle, P., Tokic, M., Cubek, R., Voos, H., Soffker, D.: Towards learning of safety knowledge from human demonstrations. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5394–5399. IEEE (October 2012)
Google Scholar
Garcia, J., Fernández, F.: Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research 45, 515–564 (2012)
MATH MathSciNet Google Scholar
Garcia Polo, F.J., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 76–83. IEEE (April 2011)
Google Scholar
Geibel, P.: Reinforcement learning with bounded risk. In: ICML, pp. 162–169 (2001)
Google Scholar
Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2979–2984. IEEE (September 2011)
Google Scholar
Hans, A., Schneegaß, D., Schäfer, A., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of European Symposium on Artificial Neural Networks, pp. 23–25 (April 2008)
Google Scholar
Heger, M.: Consideration of risk in reinforcement learning. In: 11th International Machine Learning Conference (1994)
Google Scholar
Howard, R.A.: Dynamic Programming and Markov Processes. Technology Press of Massachusetts Institute of Technology (1960)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kim, D., Kim, K.E., Poupart, P.: Cost-Sensitive Exploration in Bayesian Reinforcement Learning. In: Proceedings of Neural Information Processing Systems (NIPS) (2012)
Google Scholar
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49(2-3), 267–290 (2002)
Article MATH Google Scholar
Moldovan, T.M., Abbeel, P.: Safe Exploration in Markov Decision Processes. In: Proceedings of the 29th International Conference on Machine Learning (May 2012)
Google Scholar
Nilim, A., El Ghaoui, L.: Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Operations Research 53(5), 780–798 (2005)
Article MATH MathSciNet Google Scholar
Geibel, P., Wysotzki, F.: Risk-Sensitive Reinforcement Learning Applied to Control under Constraints. Journal Of Artificial Intelligence Research 24, 81–108 (2011)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Book MATH Google Scholar
Schneider, J.G.: Exploiting model uncertainty estimates for safe dynamic control learning. Neural Information Processing Systems 9, 1047–1053 (1996)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)
Article MATH Google Scholar
Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep., Northeastern University,College of Computer Science (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Machine Perception, Dept. of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Martin Pecka & Tomas Svoboda
Czech Institute of Informatics, Robotics, and Cybernetics, Czech Technical University in Prague, Prague, Czech Republic
Tomas Svoboda

Authors

Martin Pecka
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Svoboda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NATO Modelling and Simulation Centre of Excellence, Piazza Villoresi 1, 00143, Rome, Italy
Jan Hodicky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pecka, M., Svoboda, T. (2014). Safe Exploration Techniques for Reinforcement Learning – An Overview. In: Hodicky, J. (eds) Modelling and Simulation for Autonomous Systems. MESAS 2014. Lecture Notes in Computer Science, vol 8906. Springer, Cham. https://doi.org/10.1007/978-3-319-13823-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-13823-7_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13822-0
Online ISBN: 978-3-319-13823-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics