Skip to main content

Advertisement

Log in

Reinforcement learning to adjust parametrized motor primitives to new situations

Autonomous Robots Aims and scope Submit manuscript

Abstract

Humans manage to adapt learned movements very quickly to new situations by generalizing learned behaviors from similar situations. In contrast, robots currently often need to re-learn the complete movement. In this paper, we propose a method that learns to generalize parametrized motor plans by adapting a small set of global parameters, called meta-parameters. We employ reinforcement learning to learn the required meta-parameters to deal with the current situation, described by states. We introduce an appropriate reinforcement learning algorithm based on a kernelized version of the reward-weighted regression. To show its feasibility, we evaluate this algorithm on a toy example and compare it to several previous approaches. Subsequently, we apply the approach to three robot tasks, i.e., the generalization of throwing movements in darts, of hitting movements in table tennis, and of throwing balls where the tasks are learned on several different real physical robots, i.e., a Barrett WAM, a BioRob, the JST-ICORP/SARCOS CBi and a Kuka KR 6.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. Note that the dynamical systems motor primitives ensure the stability of the movement generation but cannot guarantee the stability of the movement execution (Ijspeert et al. 2002; Schaal et al. 2007).

  2. The equality (Φ T I)−1 Φ T R=Φ T(ΦΦ TR −1)−1 is straightforward to verify by left and right multiplying the non-inverted terms: Φ T R(ΦΦ TR −1)=(Φ T I)Φ T.

References

  • Barto, A., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.

    Article  MathSciNet  Google Scholar 

  • Bays, P., & Wolpert, D. (2007). Computational principles of sensorimotor control that minimise uncertainty and variability. Journal of Physiology, 578, 387–396.

    Article  Google Scholar 

  • Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2004). Learning to act from observation and practice. International Journal of Humanoid Robotics, 1(4), 585–611.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.

    MATH  Google Scholar 

  • Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.

    Article  Google Scholar 

  • Cheng, G., Hyon, S., Morimoto, J., Ude, A., Hale, J. G., Colvin, G., Scroggin, W., & Jacobsen, S. C. (2007). CB: A humanoid research platform for exploring neuroscience. Advanced Robotics, 21(10), 1097–1114.

    Article  Google Scholar 

  • Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278.

    Article  MATH  Google Scholar 

  • Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4–6), 495–506.

    Article  Google Scholar 

  • Engel, Y., Mannor, S., & Meir, R. (2005). Reinforcement learning with Gaussian processes. In Proc. int. conf. machine learning (pp. 201–208).

    Chapter  Google Scholar 

  • Grimes, D. B., & Rao, R. P. N. (2008). Learning nonparametric policies by imitation. In Proc. int. conf. intelligent robots and system (pp. 2022–2028).

    Google Scholar 

  • Huber, M., & Grupen, R. (1998). Learning robot control—using control policies as abstract actions. In NIPS’98 workshop: abstraction and hierarchy in reinforcement learning.

    Google Scholar 

  • Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1523–1530).

    Google Scholar 

  • Jaakkola, T., Jordan, M. I., Singh, S. P. (1993). Convergence of stochastic iterative dynamic programming algorithms. In Advances in neural information processing systems (Vol. 6, pp. 703–710).

    Google Scholar 

  • Jetchev, N., & Toussaint, M. (2009). Trajectory prediction: learning to map situations to robot trajectories. In Proc. int. conf. machine learning (p. 57).

    Google Scholar 

  • Kober, J., & Peters, J. (2011a). Learning elementary movements jointly with a higher level task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 338–343).

    Google Scholar 

  • Kober, J., & Peters, J. (2011b). Policy search for motor primitives in robotics. Machine Learning, 84(1–2), 171–203.

    Article  MATH  Google Scholar 

  • Kober, J., Mülling, K., Krömer, O., Lampert, C. H., Schölkopf, B., & Peters, J. (2010a). Movement templates for learning of hitting and batting. In Proc. IEEE int. conf. robotics and automation (pp. 853–858).

    Google Scholar 

  • Kober, J., Oztop, E., & Peters, J. (2010b). Reinforcement learning to adjust robot movements to new situations. In Proc. robotics: science and systems conf. (pp. 33–40).

    Google Scholar 

  • Kronander, K., Khansari-Zadeh, M. S., & Billard, A. (2011). Learning to control planar hitting motions in a minigolf-like task. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 710–717).

    Google Scholar 

  • Lampariello, R., Nguyen-Tuong, D., Castellini, C., Hirzinger, G., & Peters, J. (2011). Trajectory planning for optimal robot catching in real-time. In Proc. IEEE int. conf. robotics and automation (pp. 3719–3726).

    Google Scholar 

  • Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proc. int. conf. uncertainty in artificial intelligence (pp. 354–361).

    Google Scholar 

  • Lens, T., Kunz, J., Trommer, C., Karguth, A., & von Stryk, O. (2010). Biorob-arm: A quickly deployable and intrinsically safe, light-weight robot arm for service robotics applications. In 41st international symposium on robotics/6th German conference on robotics (pp. 905–910).

    Google Scholar 

  • Masters Games Ltd (2010). The rules of darts. http://www.mastersgames.com/rules/darts-rules.htm.

  • McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. In Proc. int. conf. machine learning (pp. 361–368).

    Google Scholar 

  • McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing.

    Google Scholar 

  • Mülling, K., Kober, J., & Peters, J. (2010). Learning table tennis with a mixture of motor primitives. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 411–416).

    Chapter  Google Scholar 

  • Mülling, K., Kober, J., & Peters, J. (2011). A biomimetic approach to robot table tennis. Adaptive Behavior, 9(5), 359–376.

    Article  Google Scholar 

  • Nakanishi, J., Morimoto, J., Endo, G., Cheng, G., Schaal, S., & Kawato, M. (2004). Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems, 47(2–3), 79–91.

    Article  Google Scholar 

  • Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In Proc. IEEE-RAS int. conf. humanoid robots (pp. 91–98).

    Google Scholar 

  • Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In Proc. IEEE int. conf. robotics and automation (pp. 1293–1298).

    Google Scholar 

  • Peters, J., & Schaal, S. (2008a). Learning to control in operational space. The International Journal of Robotics Research, 27(2), 197–212.

    Article  Google Scholar 

  • Peters, J., & Schaal, S. (2008b). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–697.

    Article  Google Scholar 

  • Pongas, D., Billard, A., & Schaal, S. (2005). Rapid synchronization and accurate phase-locking of rhythmic motor primitives. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 2911–2916).

    Chapter  Google Scholar 

  • Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Proc. eleventh annual conference on computational learning theory (pp. 101–103). New York: ACM.

    Chapter  Google Scholar 

  • Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445.

    Article  Google Scholar 

  • Schmidt, R., & Wrisberg, C. (2000). Motor learning and performance (2nd edn.). Champaign: Human Kinetics.

    Google Scholar 

  • Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.

    Google Scholar 

  • Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (Vol. 12, pp. 1057–1063).

    Google Scholar 

  • Ude, A., Gams, A., Asfour, T., & Morimoto, J. (2010). Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Transactions on Robotics, 26(5), 800–815.

    Article  Google Scholar 

  • Urbanek, H., Albu-Schäffer, A., & van der Smagt, P. (2004). Learning from demonstration repetitive movements for autonomous service robotics. In Proc. IEEE/RSJ int. conf. intelligent robots and systems (pp. 3495–3500).

    Google Scholar 

  • Welling, M. (2010). The Kalman filter. Lecture notes.

  • Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.

    MATH  Google Scholar 

  • Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.

    Google Scholar 

Download references

Acknowledgements

The project receives funding from the European Community’s Seventh Framework Programme under grant agreement no. ICT-248273 GeRT. The project receives funding from the European Community’s Seventh Framework Programme under grant agreement no. ICT-270327 CompLACS. The authors thank Prof. K. Wöllhaf from the University of Applied Sciences Ravensburg-Weingarten for supporting the Kuka KR 6 experiment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jens Kober.

Appendix: Motor primitive meta-parameters

Appendix: Motor primitive meta-parameters

The motor primitives based on dynamical systems (Ijspeert et al. 2002; Schaal et al. 2007; Kober et al. 2010a) have six natural meta-parameters: the initial position \(\mathbf {\mathrm {x}}_{1}^{0}\), the initial velocity \(\mathbf {\mathrm {x}}_{2}^{0}\), the goal g, the goal velocities \(\dot{ \mathbf {\mathrm {g}}}\), the amplitude A and the duration T. The meta-parameters modify the global movement by rescaling it spatially or temporally, or by reshaping it with respect to the desired boundary conditions. In the table tennis task the initial position and velocity are determined by the phase preceding the hitting phase. In Fig. 24 we illustrate influence of the goal, goal velocity and duration meta-parameters on the movement generation.

Fig. 24
figure 24

In this figure, we demonstrate the influence of the goal, goal velocity and duration meta-parameters. The movement represents the hitting phase of the table tennis experiment (Sect. 3.3) and we demonstrate the variation of the meta-parameters employed in this task. The ball is hit at the end of the movement. In these plots we only vary a single meta-parameter at a time and keep the other ones fixed. In (a) the goal g is varied, which allows to hit the ball in different locations and with different orientations. In (b) duration T is varied, which allows to time the hit. In (c) the goal velocity \(\dot{ \mathbf {\mathrm {g}}}\) is varied, which allows to aim at different locations on the opponent’s side of the table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kober, J., Wilhelm, A., Oztop, E. et al. Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33, 361–379 (2012). https://doi.org/10.1007/s10514-012-9290-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-012-9290-3

Keywords

Navigation