Skip to main content

Safe Exploration Techniques for Reinforcement Learning – An Overview

  • Conference paper
Modelling and Simulation for Autonomous Systems (MESAS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8906))

Abstract

We overview different approaches to safety in (semi)autonomous robotics. Particularly, we focus on how to achieve safe behavior of a robot if it is requested to perform exploration of unknown states. Presented methods are studied from the viewpoint of reinforcement learning, a partially-supervised machine learning method. To collect training data for this algorithm, the robot is required to freely explore the state space – which can lead to possibly dangerous situations. The role of safe exploration is to provide a framework allowing exploration while preserving safety. The examined methods range from simple algorithms to sophisticated methods based on previous experience or state prediction. Our overview also addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world). In the conclusion we also suggest several ways that are worth researching more thoroughly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, vol. 19, p. 1 (2007)

    Google Scholar 

  2. Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483 (2009)

    Article  Google Scholar 

  3. Barto, A.G., Sutton, R.S., Brouwer, P.S.: Associative search network: A reinforcement learning associative memory. Biological Cybernetics (1981)

    Google Scholar 

  4. Bertsekas, D.P.: Dynamic programming: deterministic and stochastic models. Prentice-Hall (1987)

    Google Scholar 

  5. Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: AAMAS 2007 Proceedings, p. 1. ACM Press (2007)

    Google Scholar 

  6. Consortium, N.: NIFTi robotic UGV platform (2010)

    Google Scholar 

  7. Coraluppi, S.P., Marcus, S.I.: Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes. Automatica (1999)

    Google Scholar 

  8. Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 225–232. ACM Press, New York (2007)

    Google Scholar 

  9. Ertle, P., Tokic, M., Cubek, R., Voos, H., Soffker, D.: Towards learning of safety knowledge from human demonstrations. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5394–5399. IEEE (October 2012)

    Google Scholar 

  10. Garcia, J., Fernández, F.: Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research 45, 515–564 (2012)

    MATH  MathSciNet  Google Scholar 

  11. Garcia Polo, F.J., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 76–83. IEEE (April 2011)

    Google Scholar 

  12. Geibel, P.: Reinforcement learning with bounded risk. In: ICML, pp. 162–169 (2001)

    Google Scholar 

  13. Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2979–2984. IEEE (September 2011)

    Google Scholar 

  14. Hans, A., Schneegaß, D., Schäfer, A., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of European Symposium on Artificial Neural Networks, pp. 23–25 (April 2008)

    Google Scholar 

  15. Heger, M.: Consideration of risk in reinforcement learning. In: 11th International Machine Learning Conference (1994)

    Google Scholar 

  16. Howard, R.A.: Dynamic Programming and Markov Processes. Technology Press of Massachusetts Institute of Technology (1960)

    Google Scholar 

  17. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)

    Google Scholar 

  18. Kim, D., Kim, K.E., Poupart, P.: Cost-Sensitive Exploration in Bayesian Reinforcement Learning. In: Proceedings of Neural Information Processing Systems (NIPS) (2012)

    Google Scholar 

  19. Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49(2-3), 267–290 (2002)

    Article  MATH  Google Scholar 

  20. Moldovan, T.M., Abbeel, P.: Safe Exploration in Markov Decision Processes. In: Proceedings of the 29th International Conference on Machine Learning (May 2012)

    Google Scholar 

  21. Nilim, A., El Ghaoui, L.: Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Operations Research 53(5), 780–798 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  22. Geibel, P., Wysotzki, F.: Risk-Sensitive Reinforcement Learning Applied to Control under Constraints. Journal Of Artificial Intelligence Research 24, 81–108 (2011)

    Google Scholar 

  23. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)

    Book  MATH  Google Scholar 

  24. Schneider, J.G.: Exploiting model uncertainty estimates for safe dynamic control learning. Neural Information Processing Systems 9, 1047–1053 (1996)

    Google Scholar 

  25. Watkins, C.J., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)

    Article  MATH  Google Scholar 

  26. Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep., Northeastern University,College of Computer Science (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Pecka, M., Svoboda, T. (2014). Safe Exploration Techniques for Reinforcement Learning – An Overview. In: Hodicky, J. (eds) Modelling and Simulation for Autonomous Systems. MESAS 2014. Lecture Notes in Computer Science, vol 8906. Springer, Cham. https://doi.org/10.1007/978-3-319-13823-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13823-7_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13822-0

  • Online ISBN: 978-3-319-13823-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics