Skip to main content

Multigrid Reinforcement Learning with Reward Shaping

  • Conference paper
Artificial Neural Networks - ICANN 2008 (ICANN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5163))

Included in the following conference series:

Abstract

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper we propose a way to solve this problem in reinforcement learning with state space discretisation. In particular, we show that the potential function can be learned online in parallel with the actual reinforcement learning process. If the Q-function is learned for states determined by a given grid, a V-function for states with lower resolution can be learned in parallel and used to approximate the potential for ground learning. The novel algorithm is presented and experimentally evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning, pp. 278–287 (1999)

    Google Scholar 

  2. Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the 15th International Conference on Machine Learning, pp. 463–471 (1998)

    Google Scholar 

  3. Marthi, B.: Automatic shaping and decomposition of reward functions. In: Proceedings of the 24th International Conference on Machine Learning, pp. 601–608 (2007)

    Google Scholar 

  4. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  5. Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)

    MATH  MathSciNet  Google Scholar 

  6. Chow, C.S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 36(8), 898–914 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  7. Anderson, C., Crawford-Hines, S.: Multigrid Q-learning. Technical Report CS-94-121, Colorado State University (1994)

    Google Scholar 

  8. Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  9. Wingate, D., Seppi, K.D.: Prioritization methods for accelerating MDP solvers. Journal of Machine Learning Research 6, 851–881 (2005)

    MathSciNet  Google Scholar 

  10. Epshteyn, A., DeJong, G.: Qualitative reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 305–312 (2006)

    Google Scholar 

  11. Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49(2-3), 291–323 (2002)

    Article  MATH  Google Scholar 

  12. Stone, P., Veloso, M.: Layered learning. In: Proceedings of the 11th European Conference on Machine Learning (2000)

    Google Scholar 

  13. Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of International Conference on Machine Learning, pp. 167–173 (1993)

    Google Scholar 

  14. Moore, A., Baird, L., Kaelbling, L.P.: Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1316–1323 (1999)

    Google Scholar 

  15. Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems (1993)

    Google Scholar 

  16. Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Proccedings of Advances in Neural Information Processing Systems, vol. 10 (1997)

    Google Scholar 

  17. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MATH  MathSciNet  Google Scholar 

  18. Taylor, M.E., Stone, P.: Behavior transfer for value-function-based reinforcement learning. In: Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 53–59 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Véra Kůrková Roman Neruda Jan Koutník

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grześ, M., Kudenko, D. (2008). Multigrid Reinforcement Learning with Reward Shaping. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87536-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87536-9_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87535-2

  • Online ISBN: 978-3-540-87536-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics