Abstract
Option discovery and skill acquisition frameworks are integral to the functioning of a hierarchically organized Reinforcement learning agent. However, such techniques often yield a large number of options or skills, which can be represented succinctly by filtering out any redundant information. Such a reduction can decrease the required computation while also improving the performance on a target task. To compress an array of option policies, we attempt to find a policy basis that accurately captures the set of all options. In this work, we propose Option Encoder, an auto-encoder based framework with intelligently constrained weights, that helps discover a collection of basis policies. The policy basis can be used as a proxy for the original set of skills in a suitable hierarchically organized framework. We demonstrate the efficacy of our method on a collection of grid-worlds evaluating the obtained policy basis on downstream tasks and demonstrate qualitative results on the Deepmind-lab task.
A. Manoharan and R. Ramesh—The two authors contributed equally.
R. Ramesh—Work done primarily while at the Indian Institute of Technology Madras.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ammar, H.B., Eaton, E., Ruvolo, P., Taylor, M.: Online multi-task learning for policy gradient methods. In: International Conference on Machine Learning, pp. 1206–1214 (2014)
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Barreto, A., et al.: The option keyboard: combining skills in reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 13031–13041 (2019)
Beattie, C., et al.: Deepmind lab. arXiv preprint arXiv:1612.03801 (2016)
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. arXiv preprint arXiv:1802.06070 (2018)
Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in neural information processing systems, pp. 1008–1014 (2000)
Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the Fourier basis. In: Twenty-fifth AAAI conference on artificial intelligence (2011)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Machado, M.C., Bellemare, M.G., Bowling, M.: A laplacian framework for option discovery in reinforcement learning. arXiv preprint arXiv:1703.00956 (2017)
Mahadevan, S., Maggioni, M.: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes. J. Mach. Learn. Res. 8(Oct), 2169–2231 (2007)
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density (2001)
Menache, I., Mannor, S., Shimkin, N.: Q-cut—dynamic discovery of sub-goals in reinforcement learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36755-1_25
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming (1994)
Rajendran, J., Lakshminarayanan, A.S., Khapra, M.M., Prasanna, P., Ravindran, B.: Attend, adapt and transfer: attentive deep architecture for adaptive transfer from multiple sources in the same domain. arXiv preprint arXiv:1510.02879 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 95. ACM (2004)
Şimşek, Ö., Barto, A.G.: Skill characterization based on betweenness. In: Advances in Neural Information Processing Systems, pp. 1497–1504 (2009)
Şimşek, O., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning, pp. 816–823. ACM Press (2005). https://doi.org/10.1145/1102351.1102454
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Vezhnevets, A.S., et al.: FeUdal networks for hierarchical reinforcement learning. arXiv:1703.01161 [cs], March 2017
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Manoharan, A., Ramesh, R., Ravindran, B. (2021). Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12458. Springer, Cham. https://doi.org/10.1007/978-3-030-67661-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-67661-2_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67660-5
Online ISBN: 978-3-030-67661-2
eBook Packages: Computer ScienceComputer Science (R0)