Abstract
Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper we propose a way to solve this problem in reinforcement learning with state space discretisation. In particular, we show that the potential function can be learned online in parallel with the actual reinforcement learning process. If the Q-function is learned for states determined by a given grid, a V-function for states with lower resolution can be learned in parallel and used to approximate the potential for ground learning. The novel algorithm is presented and experimentally evaluated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning, pp. 278–287 (1999)
Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: Proceedings of the 15th International Conference on Machine Learning, pp. 463–471 (1998)
Marthi, B.: Automatic shaping and decomposition of reward functions. In: Proceedings of the 24th International Conference on Machine Learning, pp. 601–608 (2007)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)
Chow, C.S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 36(8), 898–914 (1991)
Anderson, C., Crawford-Hines, S.: Multigrid Q-learning. Technical Report CS-94-121, Colorado State University (1994)
Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211 (1999)
Wingate, D., Seppi, K.D.: Prioritization methods for accelerating MDP solvers. Journal of Machine Learning Research 6, 851–881 (2005)
Epshteyn, A., DeJong, G.: Qualitative reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 305–312 (2006)
Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49(2-3), 291–323 (2002)
Stone, P., Veloso, M.: Layered learning. In: Proceedings of the 11th European Conference on Machine Learning (2000)
Kaelbling, L.P.: Hierarchical learning in stochastic domains: Preliminary results. In: Proceedings of International Conference on Machine Learning, pp. 167–173 (1993)
Moore, A., Baird, L., Kaelbling, L.P.: Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1316–1323 (1999)
Dayan, P., Hinton, G.E.: Feudal reinforcement learning. In: Proceedings of Advances in Neural Information Processing Systems (1993)
Parr, R., Russell, S.: Reinforcement learning with hierarchies of machines. In: Proccedings of Advances in Neural Information Processing Systems, vol. 10 (1997)
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Taylor, M.E., Stone, P.: Behavior transfer for value-function-based reinforcement learning. In: Proceedings of the 4th International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 53–59 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grześ, M., Kudenko, D. (2008). Multigrid Reinforcement Learning with Reward Shaping. In: Kůrková, V., Neruda, R., Koutník, J. (eds) Artificial Neural Networks - ICANN 2008. ICANN 2008. Lecture Notes in Computer Science, vol 5163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87536-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-540-87536-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87535-2
Online ISBN: 978-3-540-87536-9
eBook Packages: Computer ScienceComputer Science (R0)