Abstract
Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an RL agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this article, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action derived along with one RL method can be reused by another RL method for training, while the second one, transferability, indicates that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first derive macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the derived macro actions.
- [1] . 2015. Solving large-scale planning problems by decomposition and macro generation. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 16–24.Google ScholarCross Ref
- [2] . 2016. Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems 29 (2016), 1471–1479.Google Scholar
- [3] . 2013. The arcade learning environment: An evaluation platform for general agents. J. Artificial Intelligence Research (JAIR) 47 (
Jun. 2013), 253–279.Google ScholarDigital Library - [4] . 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. J. Artificial Intelligence Research (JAIR) 24 (
Oct. 2005), 581–621.Google ScholarDigital Library - [5] . 2018. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355 (2018).Google Scholar
- [6] . 2018. Exploration by random network distillation. arXiv preprint arXiv:1810.12894 (2018).Google Scholar
- [7] . 2019. Improving domain-independent planning via critical section macro-operators. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Vol. 33. 7546–7553.Google ScholarDigital Library
- [8] . 2007. MARVIN: A heuristic search planner with online macro-action learning. J. Artificial Intelligence Research (JAIR) 28 (2007), 119–156.Google ScholarDigital Library
- [9] . 1986. Explanation-based learning: An alternative view. Machine Llearning 1, 2 (1986), 145–176.Google ScholarCross Ref
- [10] . 2012. Feedback and Control Systems. McGraw-Hill Education.Google Scholar
- [11] . 2016. Deep reinforcement learning with macro-actions. arXiv:1606.04615 (
Jun. 2016).Google Scholar - [12] . 2019. Macro action reinforcement learning with sequence disentanglement using variational autoencoder. arXiv:1903.09366 (
May 2019).Google Scholar - [13] . 2018. Stable baselines. https://github.com/hill-a/stable-baselines. (2018).Google Scholar
- [14] . 2016. VIME: Variational Information Maximizing Exploration. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS). 1109–1117.Google Scholar
- [15] . 1993. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the 10th International Conference on Machine Learning, 951, 167–173.Google ScholarCross Ref
- [16] . 2020. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence. 4444–4451.Google ScholarCross Ref
- [17] . 1985. Macro-operators: A weak method for learning. Artificial Intelligence 26, 1 (1985), 35–77.Google ScholarDigital Library
- [18] . 1998. An Introduction to Genetic Algorithms. MIT Press, Cambridge, Mass.Google ScholarCross Ref
- [19] . 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML). 1928–1937.Google ScholarDigital Library
- [20] . 2013. Playing Atari with deep reinforcement learning. arXiv:1312.5602 (
Dec. 2013).Google Scholar - [21] . 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (
Feb. 2015), 529–533.Google ScholarCross Ref - [22] . 1999. Evolutionary algorithms for reinforcement learning. J. Artificial Intelligence Research (JAIR) 11 (1999), 241–276.Google ScholarDigital Library
- [23] . 2007. Learning macro-actions for arbitrary planners and domains. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS). 256–263.Google Scholar
- [24] . 2017. Count-based exploration with neural density models. In Proceedings of the International Conference on Machine Learning(PMLR), 2721–2730.Google Scholar
- [25] . 2017. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning (ICML).Google ScholarCross Ref
- [26] . 1974. Planning in a hierarchy of abstraction spaces. Artificial Intelligence 5, 2 (1974), 115–135.Google ScholarCross Ref
- [27] . 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (
Sept. 2017).Google Scholar - [28] . 2017. Proximal policy optimization algorithms. arXiv:1707.06347 (
Aug. 2017).Google Scholar - [29] . 2018. Accelerated methods for deep reinforcement learning. arXiv preprint arXiv:1803.02811 (2018).Google Scholar
- [30] . 2018. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567 (
Apr. 2018).Google Scholar - [31] . 1998. Introduction to Reinforcement Learning. Vol. 135. MIT Press, Cambridge, MA.Google ScholarDigital Library
- [32] . 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems. 1057–1063.Google Scholar
- [33] . 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1–2 (
Aug. 1999), 181–211.Google ScholarDigital Library - [34] . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 3 (1992), 229–256.Google ScholarDigital Library
- [35] . 2019. Macro action selection with deep reinforcement learning in StarCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 15. 94–99.Google ScholarCross Ref
Index Terms
- Reusability and Transferability of Macro Actions for Reinforcement Learning
Recommendations
Knowledge of opposite actions for reinforcement learning
Abstract: Reinforcement learning (RL) is one of the machine intelligence techniques with several characteristics that make it suitable for solving real-world problems. However, RL agents generally face a very large state space in many applications. They ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and MultimediaReinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...
Comments