Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Manoharan, Arjun; Ramesh, Rahul; Ravindran, Balaraman

doi:10.1007/978-3-030-67661-2_30

Arjun Manoharan^12,13,
Rahul Ramesh¹⁴ &
Balaraman Ravindran^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12458))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1392 Accesses

Abstract

Option discovery and skill acquisition frameworks are integral to the functioning of a hierarchically organized Reinforcement learning agent. However, such techniques often yield a large number of options or skills, which can be represented succinctly by filtering out any redundant information. Such a reduction can decrease the required computation while also improving the performance on a target task. To compress an array of option policies, we attempt to find a policy basis that accurately captures the set of all options. In this work, we propose Option Encoder, an auto-encoder based framework with intelligently constrained weights, that helps discover a collection of basis policies. The policy basis can be used as a proxy for the original set of skills in a suitable hierarchically organized framework. We demonstrate the efficacy of our method on a collection of grid-worlds evaluating the obtained policy basis on downstream tasks and demonstrate qualitative results on the Deepmind-lab task.

A. Manoharan and R. Ramesh—The two authors contributed equally.

R. Ramesh—Work done primarily while at the Indian Institute of Technology Madras.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ammar, H.B., Eaton, E., Ruvolo, P., Taylor, M.: Online multi-task learning for policy gradient methods. In: International Conference on Machine Learning, pp. 1206–1214 (2014)
Google Scholar
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Barreto, A., et al.: The option keyboard: combining skills in reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 13031–13041 (2019)
Google Scholar
Beattie, C., et al.: Deepmind lab. arXiv preprint arXiv:1612.03801 (2016)
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. arXiv preprint arXiv:1802.06070 (2018)
Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in neural information processing systems, pp. 1008–1014 (2000)
Google Scholar
Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the Fourier basis. In: Twenty-fifth AAAI conference on artificial intelligence (2011)
Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Machado, M.C., Bellemare, M.G., Bowling, M.: A laplacian framework for option discovery in reinforcement learning. arXiv preprint arXiv:1703.00956 (2017)
Mahadevan, S., Maggioni, M.: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes. J. Mach. Learn. Res. 8(Oct), 2169–2231 (2007)
MathSciNet MATH Google Scholar
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density (2001)
Google Scholar
Menache, I., Mannor, S., Shimkin, N.: Q-cut—dynamic discovery of sub-goals in reinforcement learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36755-1_25
Chapter MATH Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming (1994)
Google Scholar
Rajendran, J., Lakshminarayanan, A.S., Khapra, M.M., Prasanna, P., Ravindran, B.: Attend, adapt and transfer: attentive deep architecture for adaptive transfer from multiple sources in the same domain. arXiv preprint arXiv:1510.02879 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961
Article Google Scholar
Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 95. ACM (2004)
Google Scholar
Şimşek, Ö., Barto, A.G.: Skill characterization based on betweenness. In: Advances in Neural Information Processing Systems, pp. 1497–1504 (2009)
Google Scholar
Şimşek, O., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning, pp. 816–823. ACM Press (2005). https://doi.org/10.1145/1102351.1102454
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Article MathSciNet Google Scholar
Vezhnevets, A.S., et al.: FeUdal networks for hierarchical reinforcement learning. arXiv:1703.01161 [cs], March 2017

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Madras, Chennai, India
Arjun Manoharan & Balaraman Ravindran
Robert Bosch Centre for Data Science and AI, Chennai, India
Arjun Manoharan & Balaraman Ravindran
University of Pennsylvania, Philadelphia, USA
Rahul Ramesh

Authors

Arjun Manoharan
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Ramesh
View author publications
You can also search for this author in PubMed Google Scholar
Balaraman Ravindran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Arjun Manoharan or Rahul Ramesh .

Editor information

Editors and Affiliations

Albert-Ludwigs-Universität, Freiburg, Germany
Frank Hutter
TU Darmstadt, Darmstadt, Germany
Kristian Kersting
Ghent University, Ghent, Belgium
Jefrey Lijffijt
Saarland University, Saarbrücken, Germany
Isabel Valera

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manoharan, A., Ramesh, R., Ravindran, B. (2021). Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12458. Springer, Cham. https://doi.org/10.1007/978-3-030-67661-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-67661-2_30
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67660-5
Online ISBN: 978-3-030-67661-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)