research-article

Reusability and Transferability of Macro Actions for Reinforcement Learning

Authors:
Yi-Hsiang Chang

National Tsing Hua University, Hsinchu, Taiwan

National Tsing Hua University, Hsinchu, Taiwan
View Profile

,
Kuan-Yu Chang

National Tsing Hua University, Hsinchu, Taiwan

National Tsing Hua University, Hsinchu, Taiwan

0000-0003-1120-2970
View Profile

,
Henry Kuo

Harvard University, Cambridge, Massachusetts

Harvard University, Cambridge, Massachusetts

0000-0002-4667-4794
View Profile

,
Chun-Yi Lee

National Tsing Hua University, Hsinchu, Taiwan

National Tsing Hua University, Hsinchu, Taiwan

0000-0002-4680-4800
View Profile

ACM Transactions on Evolutionary Learning and Optimization Volume 2 Issue 1Article No.: 4pp 1–16https://doi.org/10.1145/3514260

Published:05 April 2022Publication History

ACM Transactions on Evolutionary Learning and Optimization

Abstract

Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an RL agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this article, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action derived along with one RL method can be reused by another RL method for training, while the second one, transferability, indicates that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first derive macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the derived macro actions.

REFERENCES

[1] Asai M. and Fukunaga A.. 2015. Solving large-scale planning problems by decomposition and macro generation. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 16–24.Google ScholarCross Ref
[2] Bellemare Marc, Srinivasan Sriram, Ostrovski Georg, Schaul Tom, Saxton David, and Munos Remi. 2016. Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems 29 (2016), 1471–1479.Google Scholar
[3] Bellemare M. G., Naddaf Y., Veness J., and Bowling M.. 2013. The arcade learning environment: An evaluation platform for general agents. J. Artificial Intelligence Research (JAIR) 47 (Jun. 2013), 253–279.Google ScholarDigital Library
[4] Botea A., Enzenberger M., Müller M., and Schaeffer J.. 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. J. Artificial Intelligence Research (JAIR) 24 (Oct. 2005), 581–621.Google ScholarDigital Library
[5] Burda Yuri, Edwards Harri, Pathak Deepak, Storkey Amos, Darrell Trevor, and Efros Alexei A.. 2018. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355 (2018).Google Scholar
[6] Burda Yuri, Edwards Harrison, Storkey Amos, and Klimov Oleg. 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).Google Scholar
[7] Chrpa L. and Vallati M.. 2019. Improving domain-independent planning via critical section macro-operators. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 33. 7546–7553.Google ScholarDigital Library
[8] Coles A. I. and Smith A. J.. 2007. MARVIN: A heuristic search planner with online macro-action learning. J. Artificial Intelligence Research (JAIR) 28 (2007), 119–156.Google ScholarDigital Library
[9] DeJong G. and Mooney R.. 1986. Explanation-based learning: An alternative view. Machine Llearning 1, 2 (1986), 145–176.Google ScholarCross Ref
[10] DiStefano Joseph J., Stubberud Allen R., and Williams Ivan J.. 2012. Feedback and Control Systems. McGraw-Hill Education.Google Scholar
[11] Durugkar I. P., Rosenbaum C., Dernbach S., and Mahadevan S.. 2016. Deep reinforcement learning with macro-actions. arXiv:1606.04615 (Jun. 2016).Google Scholar
[12] Heecheol K., Yamada M., Miyoshi K., and Yamakawa H.. 2019. Macro action reinforcement learning with sequence disentanglement using variational autoencoder. arXiv:1903.09366 (May 2019).Google Scholar
[13] Hill A., Raffin A., Ernestus M., Gleave A., Kanervisto A., Traore R., Dhariwal P., Hesse C., Klimov O., Nichol A., Plappert M., Radford A., Schulman J., Sidor S., and Wu Y.. 2018. Stable baselines. https://github.com/hill-a/stable-baselines. (2018).Google Scholar
[14] Houthooft R., Chen X., Duan Y., Schulman J., Turck F. De, and Abbeel P.. 2016. VIME: Variational Information Maximizing Exploration. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS). 1109–1117.Google Scholar
[15] Kaelbling L. P.. 1993. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the 10th International Conference on Machine Learning, 951, 167–173.Google ScholarCross Ref
[16] Khetarpal Khimya, Klissarov Martin, Chevalier-Boisvert Maxime, Bacon Pierre-Luc, and Precup Doina. 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence. 4444–4451.Google ScholarCross Ref
[17] Korf R. E.. 1985. Macro-operators: A weak method for learning. Artificial Intelligence 26, 1 (1985), 35–77.Google ScholarDigital Library
[18] Mitchell Melanie. 1998. An Introduction to Genetic Algorithms. MIT Press, Cambridge, Mass.Google ScholarCross Ref
[19] Mnih V., Badia A. P., Mirza M., Graves A., Lillicrap T., Harley T., Silver D., and Kavukcuoglu K.. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML). 1928–1937.Google ScholarDigital Library
[20] Mnih V., Kavukcuoglu K., Silver D., Graves A., Antonoglou I., Wierstra D., and Riedmiller M.. 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602 (Dec. 2013).Google Scholar
[21] Mnih V., Kavukcuoglu K., Silver D., Rusu A. A., Veness J., Bellemare M. G., Graves A., Riedmiller M., Fidjeland A. K., Ostrovski G., S. Petersen, C. Beattie, A. Sadik, L. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529–533.Google ScholarCross Ref
[22] Moriarty D. E., Schultz A. C., and Grefenstette J. J.. 1999. Evolutionary algorithms for reinforcement learning. J. Artificial Intelligence Research (JAIR) 11 (1999), 241–276.Google ScholarDigital Library
[23] Newton M. A. H., Levine J., Fox M., and Long D.. 2007. Learning macro-actions for arbitrary planners and domains. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 256–263.Google Scholar
[24] Ostrovski Georg, Bellemare Marc G., Oord Aäron, and Munos Rémi. 2017. Count-based exploration with neural density models. In Proceedings of the International Conference on Machine Learning(PMLR), 2721–2730.Google Scholar
[25] Pathak D., Agrawal P., Efros A. A., and Darrell T.. 2017. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning (ICML).Google ScholarCross Ref
[26] Sacerdoti E. D.. 1974. Planning in a hierarchy of abstraction spaces. Artificial Intelligence 5, 2 (1974), 115–135.Google ScholarCross Ref
[27] Salimans T., Ho J., Chen X., Sidor S., and Sutskever I.. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (Sept. 2017).Google Scholar
[28] Schulman J., Wolski F., Dhariwal P., Radford A., and Klimov O.. 2017. Proximal policy optimization algorithms. arXiv:1707.06347 (Aug. 2017).Google Scholar
[29] Stooke Adam and Abbeel Pieter. 2018. Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811 (2018).Google Scholar
[30] Such F. P., Madhavan V., Conti E., Lehman J., Stanley K. O., and Clune J.. 2018. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (Apr. 2018).Google Scholar
[31] Sutton R. S. and Barto A. G.. 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarDigital Library
[32] Sutton Richard S., McAllester David A., Singh Satinder P., and Mansour Yishay. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems. 1057–1063.Google Scholar
[33] Sutton R. S., Precup D., and Singh S.. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1–2 (Aug. 1999), 181–211.Google ScholarDigital Library
[34] Williams Ronald J.. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 3 (1992), 229–256.Google ScholarDigital Library
[35] Xu S., Kuang H., Zhi Z., Hu R., Liu Y., and Sun H.. 2019. Macro action selection with deep reinforcement learning in StarCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 15. 94–99.Google ScholarCross Ref

Index Terms

Reusability and Transferability of Macro Actions for Reinforcement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning with abstraction and generalization
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Theory of randomized search heuristics

Index terms have been assigned to the content through auto-classification.

Recommendations

Knowledge of opposite actions for reinforcement learning

Abstract: Reinforcement learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for solving real-world problems. However, RL agents generally face a very large state space in many applications. They ...
Read More
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia

Reinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Evolutionary Learning and Optimization Volume 2, Issue 1
March 2022
106 pages
EISSN:2688-3007
DOI:10.1145/3485164
Editors:
Juergen Branke
Warwick Business School, UK
,
Darrell Whitley
Colorado State University, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 April 2022
- Online AM: 24 February 2022
- Accepted: 1 January 2022
- Revised: 1 November 2021
- Received: 1 May 2021
Published in telo Volume 2, Issue 1

Check for updates
Author Tags
Reinforcement learning
genetic algorithm
macro actions
reusability
transferability
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 810
  Total Downloads
- Downloads (Last 12 months)71
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Reusability and Transferability of Macro Actions for Reinforcement Learning

ACM Transactions on Evolutionary Learning and Optimization

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Knowledge of opposite actions for reinforcement learning

Reward Shaping in Episodic Reinforcement Learning

Evaluation of reinforcement learning techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Reusability and Transferability of Macro Actions for Reinforcement Learning

ACM Transactions on Evolutionary Learning and Optimization

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Knowledge of opposite actions for reinforcement learning

Reward Shaping in Episodic Reinforcement Learning

Evaluation of reinforcement learning techniques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media