skip to main content
research-article

Reusability and Transferability of Macro Actions for Reinforcement Learning

Published:05 April 2022Publication History
Skip Abstract Section

Abstract

Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an RL agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this article, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action derived along with one RL method can be reused by another RL method for training, while the second one, transferability, indicates that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first derive macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the derived macro actions.

REFERENCES

  1. [1] Asai M. and Fukunaga A.. 2015. Solving large-scale planning problems by decomposition and macro generation. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 1624.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bellemare Marc, Srinivasan Sriram, Ostrovski Georg, Schaul Tom, Saxton David, and Munos Remi. 2016. Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems 29 (2016), 14711479.Google ScholarGoogle Scholar
  3. [3] Bellemare M. G., Naddaf Y., Veness J., and Bowling M.. 2013. The arcade learning environment: An evaluation platform for general agents. J. Artificial Intelligence Research (JAIR) 47 (Jun. 2013), 253279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Botea A., Enzenberger M., Müller M., and Schaeffer J.. 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. J. Artificial Intelligence Research (JAIR) 24 (Oct. 2005), 581621.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Burda Yuri, Edwards Harri, Pathak Deepak, Storkey Amos, Darrell Trevor, and Efros Alexei A.. 2018. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355 (2018).Google ScholarGoogle Scholar
  6. [6] Burda Yuri, Edwards Harrison, Storkey Amos, and Klimov Oleg. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).Google ScholarGoogle Scholar
  7. [7] Chrpa L. and Vallati M.. 2019. Improving domain-independent planning via critical section macro-operators. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 33. 75467553.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Coles A. I. and Smith A. J.. 2007. MARVIN: A heuristic search planner with online macro-action learning. J. Artificial Intelligence Research (JAIR) 28 (2007), 119156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] DeJong G. and Mooney R.. 1986. Explanation-based learning: An alternative view. Machine Llearning 1, 2 (1986), 145176.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] DiStefano Joseph J., Stubberud Allen R., and Williams Ivan J.. 2012. Feedback and Control Systems. McGraw-Hill Education.Google ScholarGoogle Scholar
  11. [11] Durugkar I. P., Rosenbaum C., Dernbach S., and Mahadevan S.. 2016. Deep reinforcement learning with macro-actions. arXiv:1606.04615 (Jun. 2016).Google ScholarGoogle Scholar
  12. [12] Heecheol K., Yamada M., Miyoshi K., and Yamakawa H.. 2019. Macro action reinforcement learning with sequence disentanglement using variational autoencoder. arXiv:1903.09366 (May 2019).Google ScholarGoogle Scholar
  13. [13] Hill A., Raffin A., Ernestus M., Gleave A., Kanervisto A., Traore R., Dhariwal P., Hesse C., Klimov O., Nichol A., Plappert M., Radford A., Schulman J., Sidor S., and Wu Y.. 2018. Stable baselines. https://github.com/hill-a/stable-baselines. (2018).Google ScholarGoogle Scholar
  14. [14] Houthooft R., Chen X., Duan Y., Schulman J., Turck F. De, and Abbeel P.. 2016. VIME: Variational Information Maximizing Exploration. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS). 11091117.Google ScholarGoogle Scholar
  15. [15] Kaelbling L. P.. 1993. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the 10th International Conference on Machine Learning, 951, 167173.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Khetarpal Khimya, Klissarov Martin, Chevalier-Boisvert Maxime, Bacon Pierre-Luc, and Precup Doina. 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence. 44444451.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Korf R. E.. 1985. Macro-operators: A weak method for learning. Artificial Intelligence 26, 1 (1985), 3577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Mitchell Melanie. 1998. An Introduction to Genetic Algorithms. MIT Press, Cambridge, Mass.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Mnih V., Badia A. P., Mirza M., Graves A., Lillicrap T., Harley T., Silver D., and Kavukcuoglu K.. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML). 19281937.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., and Riedmiller M.. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602 (Dec. 2013).Google ScholarGoogle Scholar
  21. [21] Mnih V., Kavukcuoglu K., Silver D., Rusu A. A., Veness J., Bellemare M. G., Graves A., Riedmiller M., Fidjeland A. K., Ostrovski G., S. Petersen, C. Beattie, A. Sadik, L. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529533.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Moriarty D. E., Schultz A. C., and Grefenstette J. J.. 1999. Evolutionary algorithms for reinforcement learning. J. Artificial Intelligence Research (JAIR) 11 (1999), 241276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Newton M. A. H., Levine J., Fox M., and Long D.. 2007. Learning macro-actions for arbitrary planners and domains. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 256263.Google ScholarGoogle Scholar
  24. [24] Ostrovski Georg, Bellemare Marc G., Oord Aäron, and Munos Rémi. 2017. Count-based exploration with neural density models. In Proceedings of the International Conference on Machine Learning(PMLR), 27212730.Google ScholarGoogle Scholar
  25. [25] Pathak D., Agrawal P., Efros A. A., and Darrell T.. 2017. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning (ICML).Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Sacerdoti E. D.. 1974. Planning in a hierarchy of abstraction spaces. Artificial Intelligence 5, 2 (1974), 115135.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Salimans T., Ho J., Chen X., Sidor S., and Sutskever I.. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (Sept. 2017).Google ScholarGoogle Scholar
  28. [28] Schulman J., Wolski F., Dhariwal P., Radford A., and Klimov O.. 2017. Proximal policy optimization algorithms. arXiv:1707.06347 (Aug. 2017).Google ScholarGoogle Scholar
  29. [29] Stooke Adam and Abbeel Pieter. 2018. Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811 (2018).Google ScholarGoogle Scholar
  30. [30] Such F. P., Madhavan V., Conti E., Lehman J., Stanley K. O., and Clune J.. 2018. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (Apr. 2018).Google ScholarGoogle Scholar
  31. [31] Sutton R. S. and Barto A. G.. 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Sutton Richard S., McAllester David A., Singh Satinder P., and Mansour Yishay. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems. 10571063.Google ScholarGoogle Scholar
  33. [33] Sutton R. S., Precup D., and Singh S.. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1–2 (Aug. 1999), 181211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Williams Ronald J.. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 3 (1992), 229256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Xu S., Kuang H., Zhi Z., Hu R., Liu Y., and Sun H.. 2019. Macro action selection with deep reinforcement learning in StarCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 15. 9499.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Reusability and Transferability of Macro Actions for Reinforcement Learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Evolutionary Learning and Optimization
            ACM Transactions on Evolutionary Learning and Optimization  Volume 2, Issue 1
            March 2022
            106 pages
            EISSN:2688-3007
            DOI:10.1145/3485164
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 April 2022
            • Online AM: 24 February 2022
            • Accepted: 1 January 2022
            • Revised: 1 November 2021
            • Received: 1 May 2021
            Published in telo Volume 2, Issue 1

            Check for updates

            Qualifiers

            • research-article
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)71
            • Downloads (Last 6 weeks)12

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text