Abstract
Reinforcement learning agents need a reward signal to learn successful policies. When this signal is sparse or the corresponding gradient is deceptive, such agents need a dedicated mechanism to efficiently explore their search space without relying on the reward. Looking for a large diversity of behaviors or using Motion Planning (MP) algorithms are two options in this context. In this paper, we build on the common roots between these two options to investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms. These algorithms look for diversity in an outcome space or behavioral space which is generally hand-designed to represent what matters for a given task. The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency. In particular, we show empirically that, if the mapping is smooth enough, i.e. if two close policies in the parameter space lead to similar outcomes, then diversity algorithms tend to inherit exploration properties of MP algorithms. By contrast, if it is not, diversity algorithms lose the properties of their MP counterparts and their performance strongly depends on heuristics like filtering mechanisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Also called behavioral space in the literature.
- 2.
Additional details are available in https://arxiv.org/pdf/2104.04768.pdf.
- 3.
The non-rectangular shape of \(\mathcal {O}\) in the 3D ballistic throw environment makes some cells of the expansion grid unreachable, which explains why GEP eventually covers only about \(\sim 60\%\) of \(\mathcal {O}\).
References
Akkaya, I., et al.: Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113 (2019)
Cideron, G., Pierrot, T., Perrin, N., Beguir, K., Sigaud, O.: Qd-rl: Efficient mixing of quality and diversity in reinforcement learning (2020)
Colas, C., Sigaud, O., Oudeyer, P.: GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. CoRR arXiv:1802.05054 (2018)
Cully, A., Demiris, Y.: Quality and diversity optimization: a unifying modular framework. IEEE Trans. Evol. Comput. 22(2), 245–259 (2018)
Deb, K., Deb, D.: Analysing mutation schemes for real-parameter genetic algorithms. Int. J. Artif. Intell. Soft Comput. 4, 1–28 (2014)
Doncieux, S., Laflaquière, A., Coninx, A.: Novelty search: a theoretical perspective. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 99–106. ACM, Prague Czech Republic (July 2019)
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-explore: a new approach for hard-exploration problems (2019)
Forestier, S.: Intrinsically Motivated Goal Exploration in Child Development and Artificial Intelligence: Learning and Development of Speech and Tool Use. Ph.D. thesis, U. Bordeaux (2019)
Hsu, D., Latombe, J., Motwani, R.: Path planning in expansive configuration spaces. In: Proceedings ICRA, vol. 3, pp. 2719–2726 (1997)
Kleinbort, M., Solovey, K., Littlefield, Z., Bekris, K.E., Halperin, D.: Probabilistic completeness of RRT for geometric and kinodynamic planning with forward propagation. arXiv:1809.07051 [cs] (September 2018)
LaValle, S.M.: Rapidly-exploring random trees: A new tool for path planning. Technical Report, 98–11, Computer Science Department, Iowa State University (1998)
Lehman, J., Chen, J., Clune, J., Stanley, K.O.: Safe mutations for deep and recurrent neural networks through output gradients. arXiv:1712.06563 [cs] (May 2018)
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)
Matheron, G., Perrin, N., Sigaud, O.: The problem with DDPG: understanding failures in deterministic environments with sparse rewards. arXiv preprint arXiv:1911.11679 (2019)
Matheron, G., Perrin, N., Sigaud, O.: Pbcs : efficient exploration and exploitation using a synergy between reinforcement learning and motion planning (2020)
Silver, D.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR 1712.06567, pp. 1–2 (2017)
Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Acknowledgements
This work was partially supported by the French National Research Agency (ANR), Project ANR-18-CE33-0005 HUSKI.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chenu, A., Perrin-Gilbert, N., Doncieux, S., Sigaud, O. (2021). Selection-Expansion: A Unifying Framework for Motion-Planning and Diversity Search Algorithms. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-86380-7_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)