Abstract
We overview different approaches to safety in (semi)autonomous robotics. Particularly, we focus on how to achieve safe behavior of a robot if it is requested to perform exploration of unknown states. Presented methods are studied from the viewpoint of reinforcement learning, a partially-supervised machine learning method. To collect training data for this algorithm, the robot is required to freely explore the state space – which can lead to possibly dangerous situations. The role of safe exploration is to provide a framework allowing exploration while preserving safety. The examined methods range from simple algorithms to sophisticated methods based on previous experience or state prediction. Our overview also addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world). In the conclusion we also suggest several ways that are worth researching more thoroughly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, vol. 19, p. 1 (2007)
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483 (2009)
Barto, A.G., Sutton, R.S., Brouwer, P.S.: Associative search network: A reinforcement learning associative memory. Biological Cybernetics (1981)
Bertsekas, D.P.: Dynamic programming: deterministic and stochastic models. Prentice-Hall (1987)
Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: AAMAS 2007 Proceedings, p. 1. ACM Press (2007)
Consortium, N.: NIFTi robotic UGV platform (2010)
Coraluppi, S.P., Marcus, S.I.: Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes. Automatica (1999)
Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 225–232. ACM Press, New York (2007)
Ertle, P., Tokic, M., Cubek, R., Voos, H., Soffker, D.: Towards learning of safety knowledge from human demonstrations. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5394–5399. IEEE (October 2012)
Garcia, J., Fernández, F.: Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research 45, 515–564 (2012)
Garcia Polo, F.J., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 76–83. IEEE (April 2011)
Geibel, P.: Reinforcement learning with bounded risk. In: ICML, pp. 162–169 (2001)
Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2979–2984. IEEE (September 2011)
Hans, A., Schneegaß, D., Schäfer, A., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of European Symposium on Artificial Neural Networks, pp. 23–25 (April 2008)
Heger, M.: Consideration of risk in reinforcement learning. In: 11th International Machine Learning Conference (1994)
Howard, R.A.: Dynamic Programming and Markov Processes. Technology Press of Massachusetts Institute of Technology (1960)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kim, D., Kim, K.E., Poupart, P.: Cost-Sensitive Exploration in Bayesian Reinforcement Learning. In: Proceedings of Neural Information Processing Systems (NIPS) (2012)
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49(2-3), 267–290 (2002)
Moldovan, T.M., Abbeel, P.: Safe Exploration in Markov Decision Processes. In: Proceedings of the 29th International Conference on Machine Learning (May 2012)
Nilim, A., El Ghaoui, L.: Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Operations Research 53(5), 780–798 (2005)
Geibel, P., Wysotzki, F.: Risk-Sensitive Reinforcement Learning Applied to Control under Constraints. Journal Of Artificial Intelligence Research 24, 81–108 (2011)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Schneider, J.G.: Exploiting model uncertainty estimates for safe dynamic control learning. Neural Information Processing Systems 9, 1047–1053 (1996)
Watkins, C.J., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)
Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep., Northeastern University,College of Computer Science (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pecka, M., Svoboda, T. (2014). Safe Exploration Techniques for Reinforcement Learning – An Overview. In: Hodicky, J. (eds) Modelling and Simulation for Autonomous Systems. MESAS 2014. Lecture Notes in Computer Science, vol 8906. Springer, Cham. https://doi.org/10.1007/978-3-319-13823-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-13823-7_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13822-0
Online ISBN: 978-3-319-13823-7
eBook Packages: Computer ScienceComputer Science (R0)