Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access May 10, 2013

Adaptive Exploration through Covariance Matrix Adaptation Enables Developmental Motor Learning

  • Freek Stulp EMAIL logo and Pierre-Yves Oudeyer

Abstract

The “Policy Improvement with Path Integrals” (PI2) [25] and “Covariance Matrix Adaptation - Evolutionary Strategy” [8] are considered to be state-of-the-art in direct reinforcement learning and stochastic optimization respectively. We have recently shown that incorporating covariance matrix adaptation into PI2 – which yields the PICMA2 algorithm – enables adaptive exploration by continually and autonomously reconsidering the exploration/exploitation trade-off. In this article, we provide an overview of our recent work on covariance matrix adaptation for direct reinforcement learning [22–24], highlight its relevance to developmental robotics, and conduct further experiments to analyze the results. We investigate two complementary phenomena from developmental robotics. First, we demonstrate PICMA2’s ability to adapt to slowly or abruptly changing tasks due to its continual and adaptive exploration. This is an important component of life-long skill learning in dynamic environments. Second, we show on a reaching task PICMA2 how subsequently releases degrees of freedom from proximal to more distal limbs as learning progresses. A similar effect is observed in human development, where it is known as ‘proximodistal maturation’.

References

[1] L. Arnold, A. Auger, N. Hansen, and Y. Ollivier. Information-geometric optimization algorithms: A unifying picture via invariance principles. Technical report, INRIA Saclay, 2011.Search in Google Scholar

[2] A. Baranes and P-Y. Oudeyer. The interaction of maturational constraints and intrinsic motivations in active motor development. In IEEE International Conference on Development and Learning, 2011.10.1109/DEVLRN.2011.6037315Search in Google Scholar

[3] N. E. Berthier, R.K. Clifton, D.D. McCall, and D.J. Robin. Proximodistal structure of early reaching in human infants. Exp Brain Res, 1999.10.1007/s002210050795Search in Google Scholar PubMed

[4] L. Berthouze and M. Lungarella. Motor skill acquisition under environmental perturbations: On the necessity of alternate freezing and freeing degrees of freedom. Adaptive Behavior, 12(1): 47–63, 2004.Search in Google Scholar

[5] Josh C. Bongard. Morphological change in machines accelerates the evolution of robust behavior. Proceedigns of the National Academy of Sciences of the United States of America (PNAS), January 2010.Search in Google Scholar

[6] Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213–231, March 2003. ISSN 1532-4435.Search in Google Scholar

[7] T. Glasmachers, T. Schaul, S. Yi, D. Wierstra, and J. Schmidhuber. Exponential natural evolution strategies. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 393–400. ACM, 2010.10.1145/1830483.1830557Search in Google Scholar

[8] N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001.10.1162/106365601750190398Search in Google Scholar PubMed

[9] A. J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.Search in Google Scholar

[10] Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49(2-3):209–232, 2002. ISSN 0885-6125.10.1023/A:1017984413808Search in Google Scholar

[11] J. Konczak, M. Borutta, T Helge, and J. Dichgans. The development of goal-directed reaching in infants: hand trajectory formation and joint torque control. Experimental Brain Research, 1995.10.1007/BF00241365Search in Google Scholar PubMed

[12] A. Miyamae, Y. Nagata, I. Ono, and S. Kobayashi. Natural policy gradient methods with parameter-based exploration for control tasks. Advances in Neural Information Processing Systems, 2:437–441, 2010.Search in Google Scholar

[13] Y. Nagai, M. Asada, and K. Hosoda. Learning for joint attention helped by functional development. Advanced Robotic, 20(10), 2006.10.1163/156855306778522497Search in Google Scholar

[14] Jan Peters and Stefan Schaal. Natural actor-critic. Neurocomputing, 71(7-9):1180–1190, 2008.10.1016/j.neucom.2007.11.026Search in Google Scholar

[15] R. Ros and N. Hansen. A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity. In Proceedings on Parallel Problem Solving from Nature, 296–305, 2008.10.1007/978-3-540-87700-4_30Search in Google Scholar

[16] Thomas Rückstiess, Frank Sehnke, Tom Schaul, Daan Wierstra, Yi Sun, and Jürgen Schmidhuber. Exploring parameter space in reinforcement learning. Paladyn. Journal of Behavioral Robotics, 1:14–24, 2010. ISSN 2080-9778.10.2478/s13230-010-0002-4Search in Google Scholar

[17] A. Saltelli, K. Chan, and E. M. Scott. Sensitivity analysis. Chichester: Wiley, 2000.Search in Google Scholar

[18] Stefan Schaal. The sl simulation and real-time control software package. Technical report, University of Southern California, 2007.Search in Google Scholar

[19] Matthew Schlesinger, Domenico Parisi, and Jonas Langer. Learning to reach by constraining the movement search space. Developmental Science, 3:67–80, 2000.10.1111/1467-7687.00101Search in Google Scholar

[20] F. Sehnke, C. Osendorfer, T. Rückstie, A. Graves, J. Peters, and J. Schmidhuber. Parameter-exploring policy gradients. Neural Networks, 23(4):551–559, 2010.10.1016/j.neunet.2009.12.004Search in Google Scholar PubMed

[21] Andrew Stout, George D. Konidaris, and Andrew G. Barto. Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning. In AAAI, 2005.10.21236/ADA440079Search in Google Scholar

[22] Freek Stulp. Adaptive exploration for continual reinforcement learning. In International Conference on Intelligent Robots and Systems (IROS), 2012.10.1109/IROS.2012.6385818Search in Google Scholar

[23] Freek Stulp and Pierre-Yves Oudeyer. Emergent proximo-distal maturation through adaptive exploration. In International Conference on Development and Learning (ICDL), 2012. Paper of Excellence Award.10.1109/DevLrn.2012.6400586Search in Google Scholar

[24] Freek Stulp and Olivier Sigaud. Path integral policy improvement with covariance matrix adaptation. In Proceedings of the 29th International Conference on Machine Learning (ICML), 2012.Search in Google Scholar

[25] Evangelos Theodorou, Jonas Buchli, and Stefan Schaal. A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research, 11:3137 3181, 2010.Search in Google Scholar

[26] Sebastian B. Thrun. Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie-Mellon University, 1992.Search in Google Scholar

[27] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8: 229–256, 1992.10.1007/BF00992696Search in Google Scholar

Received: 2012-12-15
Accepted: 2013-3-27
Published Online: 2013-5-10
Published in Print: 2012-9-1

© Freek Stulp et al.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.2478/s13230-013-0108-6/html
Scroll to top button