Skip to main content

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12458))

  • 1392 Accesses

Abstract

Option discovery and skill acquisition frameworks are integral to the functioning of a hierarchically organized Reinforcement learning agent. However, such techniques often yield a large number of options or skills, which can be represented succinctly by filtering out any redundant information. Such a reduction can decrease the required computation while also improving the performance on a target task. To compress an array of option policies, we attempt to find a policy basis that accurately captures the set of all options. In this work, we propose Option Encoder, an auto-encoder based framework with intelligently constrained weights, that helps discover a collection of basis policies. The policy basis can be used as a proxy for the original set of skills in a suitable hierarchically organized framework. We demonstrate the efficacy of our method on a collection of grid-worlds evaluating the obtained policy basis on downstream tasks and demonstrate qualitative results on the Deepmind-lab task.

A. Manoharan and R. Ramesh—The two authors contributed equally.

R. Ramesh—Work done primarily while at the Indian Institute of Technology Madras.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ammar, H.B., Eaton, E., Ruvolo, P., Taylor, M.: Online multi-task learning for policy gradient methods. In: International Conference on Machine Learning, pp. 1206–1214 (2014)

    Google Scholar 

  2. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)

    Google Scholar 

  3. Barreto, A., et al.: The option keyboard: combining skills in reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 13031–13041 (2019)

    Google Scholar 

  4. Beattie, C., et al.: Deepmind lab. arXiv preprint arXiv:1612.03801 (2016)

  5. Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. arXiv preprint arXiv:1802.06070 (2018)

  6. Fernando, C., et al.: PathNet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)

  7. Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  Google Scholar 

  8. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in neural information processing systems, pp. 1008–1014 (2000)

    Google Scholar 

  9. Konidaris, G., Osentoski, S., Thomas, P.: Value function approximation in reinforcement learning using the Fourier basis. In: Twenty-fifth AAAI conference on artificial intelligence (2011)

    Google Scholar 

  10. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  11. Machado, M.C., Bellemare, M.G., Bowling, M.: A laplacian framework for option discovery in reinforcement learning. arXiv preprint arXiv:1703.00956 (2017)

  12. Mahadevan, S., Maggioni, M.: Proto-value functions: a laplacian framework for learning representation and control in markov decision processes. J. Mach. Learn. Res. 8(Oct), 2169–2231 (2007)

    MathSciNet  MATH  Google Scholar 

  13. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density (2001)

    Google Scholar 

  14. Menache, I., Mannor, S., Shimkin, N.: Q-cut—dynamic discovery of sub-goals in reinforcement learning. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 295–306. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36755-1_25

    Chapter  MATH  Google Scholar 

  15. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  16. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  17. Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342 (2015)

  18. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming (1994)

    Google Scholar 

  19. Rajendran, J., Lakshminarayanan, A.S., Khapra, M.M., Prasanna, P., Ravindran, B.: Attend, adapt and transfer: attentive deep architecture for adaptive transfer from multiple sources in the same domain. arXiv preprint arXiv:1510.02879 (2015)

  20. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  21. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961

    Article  Google Scholar 

  22. Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 95. ACM (2004)

    Google Scholar 

  23. Şimşek, Ö., Barto, A.G.: Skill characterization based on betweenness. In: Advances in Neural Information Processing Systems, pp. 1497–1504 (2009)

    Google Scholar 

  24. Şimşek, O., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning, pp. 816–823. ACM Press (2005). https://doi.org/10.1145/1102351.1102454

  25. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  26. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  Google Scholar 

  27. Vezhnevets, A.S., et al.: FeUdal networks for hierarchical reinforcement learning. arXiv:1703.01161 [cs], March 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Arjun Manoharan or Rahul Ramesh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manoharan, A., Ramesh, R., Ravindran, B. (2021). Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12458. Springer, Cham. https://doi.org/10.1007/978-3-030-67661-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67661-2_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67660-5

  • Online ISBN: 978-3-030-67661-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics