Skip to main content
Log in

Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots

  • Research Article
  • Published:
Machine Intelligence Research Aims and scope Submit manuscript

Abstract

Endowing quadruped robots with the skill to forward jump is conducive to making it overcome barriers and pass through complex terrains. In this paper, a model-free control architecture with target-guided policy optimization and deep reinforcement learning (DRL) for quadruped robot jumping is presented. First, the jumping phase is divided into take-off and flight-landing phases, and optimal strategies with solt actor-critic (SAC) are constructed for the two phases respectively. Second, policy learning including expectations, penalties in the overall jumping process, and extrinsic excitations is designed. Corresponding policies and constraints are all provided for successful take-off, excellent flight attitude and stable standing after landing. In order to avoid low efficiency of random exploration, a curiosity module is introduced as extrinsic rewards to solve this problem. Additionally, the target-guided module encourages the robot explore closer and closer to desired jumping target. Simulation results indicate that the quadruped robot can realize completed forward jumping locomotion with good horizontal and vertical distances, as well as excellent motion attitudes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. C. T. Richards, L. B. Porro, A. J. Collings. Kinematic control of extreme jump angles in the red-legged running frog. Kassina maculata. Journal of Experimental Biology, vol.220, no. 10, pp. 1894–1904, 2017. DOI: https://doi.org/10.1242/jeb.144279.

    PubMed  Google Scholar 

  2. J. Z. Yu, Z. S. Su, Z. X. Wu, M. Tan. Development of a fast-swimming dolphin eobot capable of leapping. IEEE/ASME Transactions on Mechatronics, vol.21, no. 5, pp. 2307–2316, 2016. DOI: https://doi.org/10.1109/TMECH.0016.5727720.

    Article  Google Scholar 

  3. M. Focchi, A. Del Prete, I. Havoutis, R. Featherstone, D. G. Caldwell, C. Semini. High-slope terrain locomotion for torque-controlled quadruped robots. Autonomous Robots, vol.41, no. 1, pp. 259–272, 2017. DOI: https://doi.org/10.1007/s10514-016-9573-1.

    Article  Google Scholar 

  4. M. Rutschmann, B. Satzinger, M. Byl, K. Byl. Nonlinear model predictive control for rough-terrain robot hopping. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura-Algarve, Portugal, pp. 1859–1864, 2012. DOI: https://doi.org/10.1109/IROS.2012.6385865.

    Google Scholar 

  5. J. Di Carlo, P. Ml. Wensing, B. Katz, G. Biedt, S. Kim. Dynamic locomotion in the MIT cheetah 3 through convex model-predictive control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Madrid, Spain, pp. 7440–7447, 2018. DOI: https://doi.org/10.1109/IROS.2018.8594448.

    Google Scholar 

  6. M. M. G. Ardakani, B. Olofsson, A. Robertsson, R. Johansson. Model predictive control for real-time point-to-point trajectory generation. IEEE Transactions on Automation Science and Engineering, vol.16, no. 2, pp. 972–983, 2019. DOI: https://doi.org/10.1109/TASE.2018.2882764.

    Article  Google Scholar 

  7. F. Kikuchi, Y. Ota, S. Hirose. Basic performance experiments for jumping quadruped. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Las Vegas, USA, pp. 3378–3383, 2003. DOI: https://doi.org/10.1109/IROS.2003.1249678.

    Google Scholar 

  8. A. Yamada, H. Mameda, H. Mochiyama, H. Fujimoto. A compact jumping robot utilizing snap-through buckling with bend and twist. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Taipei, China, pp. 389–394, 2010. DOI: https://doi.org/10.1109/IROS.2010.5652928.

    Google Scholar 

  9. C. Gehring, S. Coros, M. Hutter, C. D. Bellicoso, H. Heijnen, R. Diethelm, M. Bloesch, P. Fankhauser, J. Hwangbo, M. Hoepflinger, R Siegwart. Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped oobo. IEEE Robotics & Automation Magazine, vol.23, no. 1, pp. 34–13, 2016. DOI: https://doi.org/10.1109/MRA.2015.2505910.

    Article  Google Scholar 

  10. J. Zhong, J. Z. Fan, J. Zhao, W. Zhang. Kinematic analysis of jumping leg driven by artificial muscles. In Proceedings of IEEE International Conference on Mechatronics and Automation, Chengdu, China, pp. 1004–1008, 2012. DOI: https://doi.org/10.1109/ICMA.2012.6283387.

  11. Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, Singapore, pp, 3357–3364, 2077. DOI: https://doi.org/10.1109/ICRA.2017.7989381.

  12. H. B. Shi, L. Shi, M. Xu, K. S. Hwang. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Transactions on Industrial Informatics, vol.16, no. 4, pp. 2393–2402, 2020. DOI: https://doi.org/10.1109/TII.2019.2936167.

    Article  Google Scholar 

  13. Z. Y. Yang, K. Merrick, L. W. Jin, H. A. Abbass. Hierarchical deep reinforcement learning for continuous action control. IEEE Transactions on Neural Networks and Learning Systems, vol.29, no. 11, pp. 5174–5184, 2018. DOI: https://doi.org/10.1109/TNNLS.2018.2805379.

    Article  MathSciNet  PubMed  Google Scholar 

  14. M. Breyer, F. Furrer, T. Novkovic, R. Siegwart, J. Nieto. Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning. IEEE Robotics and Automation Letters, vol.4, no. 2, pp. 1549–1556, 2019. DOI: https://doi.org/10.1109/LRA.2019.2896467.

    Article  Google Scholar 

  15. H. J. Huang, Y. C. Yang, H. Wang, Z. G. Ding, H. Sari, F. Adachi. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Transactions on Vehicular Technology, vol.69, no. 1, pp. 1117–1121, 2020. DOI: https://doi.org/10.1109/TVT.2019.2952549.

    Article  Google Scholar 

  16. J. Xu, T. Du, M. Foshey, B. C. Li, B. Zhu, A. Schulz, W. Matusik. Learning to fly: Computational controller design for hybrid UAVs with reinforcement learning. ACM Transactions on Graphics, vol. 38, no. 4, Article number 42, 2019. DOI: https://doi.org/10.1145/3306346.3322940.

  17. A. Cully, J. Clune, D. Tarapore, J. B. Mouret. Robots that can adapt like animals. Nature, vol. 521, no. 7553, pp. 503–531, 2015. DOI: https://doi.org/10.1038/nature14422.

    Article  ADS  CAS  PubMed  Google Scholar 

  18. J. Tan, T. N. Zhang, E. Coumans, A. Iscen, Y. F. Bai, D. Hafner, S. Bohez, V. Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Proceedings of the 14th Robotics: Science and Systems, Pittsburgh, USA, 2018. DOI: https://doi.org/10.15607/RSS.2018.XIV.010.

  19. A. Singla, S. Bhattacharya, D. Dholakiya, S. Bhatnagar, A. Ghosal, B. Amrutur, S. Kolathaya. Realizmg learned quadruped locomotion behaviors through kinematic motion prirmtives. In Proceedings of International Conference on Robotics and Automation, IEEE, Montreal, Canada, pp. 7434–7440, 2019. DOI: https://doi.org/10.1109/ICRA.2019.8794179.

    Google Scholar 

  20. P. X. Long, T. X. Fan, X. Y. Liao, W. X. Liu, H. Zhang, J. Pan. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, Brisbane, Australia, pp. 6252–6259, 2018. DOI: https://doi.org/10.1109/ICRA.2018.8461113.

  21. T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, S. Levine. Learning to walk via deep reinforcement learning. In Proceedings of the 15th Robotics: Science and Systems, Freiburg im Breisgau, Germany, 2019. DOI: https://doi.org/10.15607/RSS.2019.XV.011.

  22. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Prox mal poHcy optimization algorithms. [Online], Available: https://arxiv.org/abs/1707.06347, 2017.

  23. Q. Nguyen, M. J. Powell, B. Katz, J. Di Carlo, S. Kim. Optimized jumping on the MIT cheetah 3 robot. In Proceedings of International Conference on Robotics and Automation, Montreal, Canada, pp. 7448–7454, 2019. DOI: https://doi.org/10.1109/ICRA.2019.8794449.

  24. G. Bellegarda, Q. Nguyen. Robust quadruped jumping via deep reinforcement learning. [Online], Available: https://arxiv.org/abs/2011.07089, 2020.

  25. N. Rudin, H. Kolvenbach, V. Tsounis, M. Hutter. Cat-like jumping and landing of legged robots in low gravity using deep reinforcement learning. IEEE Transactions on Robotics, vol.38, no. 1, pp. 317–328, 2022. DOI: https://doi.org/10.1109/TRO.2021.3084374.

    Article  Google Scholar 

  26. H. W. Park, P. M. Wensing, S. Kim. High-speed bounding with the MIT Cheetah 2: Control design and experiments. The International Journal of Robotics Research, vol.36, no. 2, pp. 167–192, 2017. DOI: https://doi.org/10.1177/0278364917694244.

    Article  Google Scholar 

  27. G. P. Jung, C. S. Casarez, J. Lee, S. M. Baek, S. J. Yim, S. H. Chae, R. S. Fearing, K. J. Cho. JumpRoACH: A trajectory-adjustable integrated jumping-crawling robot. IEEE/ASME Transactions on Mechatronics, vol.24, no. 3, pp. 947–958, 2019. DOI: https://doi.org/10.1109/TMECH.2019.2907743.

    Article  Google Scholar 

  28. B. Ugurlu, K. Kotaka, T. Narikiyo. Actively-compliant locomotion control on rough terrain: Cyclic jumping and trotting experiments on a stiff-by-nature quadruped. In Proceedings of IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, pp. 3313–3320, 2013. DOI: https://doi.org/10.1109/ICRA.2013.6631039.

  29. H. W. Park, P. M. Wensing, S. Kim. Online planning for autonomous running jumps over obstacles in high-speed quadrupeds. In Proceedings of Robotics: Science and Systems, Roma, Italy, 2015. DOI: https://doi.org/10.15607/RSS.2015.XI.047.

  30. T. T. Wang, W. Guo, M. T. Li, F. S. Zha, L. N. Sun. CPG control for biped hopping robot in unpredictable environment. Journal of Bionic Engineering, vol.9, no. 1, pp. 29–38, 2012. DOI: https://doi.org/10.1016/S1672-6529(11)60094-2.

    Article  ADS  Google Scholar 

  31. J. Z. Yu, M. Tan, J. Chen, J. W. Zhang. A survey on CPG-inspired control models and system implementation. IEEE Transactions on Neural Networks and Learning Systems, vol.25, no. 3, pp. 441–456, 2014. DOI: https://doi.org/10.1109/TNNLS.2013.2280596.

    Article  PubMed  Google Scholar 

  32. N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Y. Wang, S. M. A. Eslami, M. Riedmiller, D. Silver. Emergence of locomotion behaviours in rich environments. [Online], Available: https://arxiv.org/abs/1707.02286, 2017.

  33. X. B. Peng, G. Berseth, M. Van De Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics, vol. 35, no. 4, Article number 81, 2016. DOI: https://doi.org/10.1145/2897824.2925881.

  34. A. Zeng, S. R. Song, S. Welker, J. Lee, A. Rodriguez, T. Funkhouser. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Madrid, Spain, pp. 4238–4245, 2018. DOI: https://doi.org/10.1109/IROS.2018.8593986.

    Google Scholar 

  35. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, M. Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, vol. 4, no. 26, Article number eaau5872, 2019. DOI: https://doi.org/10.1126/scirobotics.aau5872.

  36. X. B. Peng, E. Coumans, T. N. Zhang, T. W. E. Lee, J. Tan, S. Levine. Learning agile robotic locomotion skills by imitating animals. In Proceedings of the 14th Robotics: Science and Systems, Corvalis, USA, 2020.

  37. Y. Li, D. Xu. Skill learning for robotic insertion based on one-shot demonstration and reinforcement learning. International Journal of Automation and Computing, vol.18, no. 3, pp. 457–467, 2021. DOI: https://doi.org/10.1007/s11633-021-1290-3.

    Article  Google Scholar 

  38. Z. M. Xie, G. Berseth, P. Clary, J. Hurst, M. Van De Panne. Feedback control for Cassie with deep reinforcement learning. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Madrid, Spain, pp. 1241–1246, 2018. DOI: https://doi.org/10.1109/IROS.2018.8593722.

    Google Scholar 

  39. D. O. Won, K. R. Müller, S. W. Lee. An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Science Robotics, vol. 5, no. 46, Article number eabb9764, 2020. DOI: https://doi.org/10.1126/scirobotics.abb9764.

  40. Q. L. Dang, W. Xu, Y. F. Yuan. A dynamic resource allocation strategy with reinforcement learning for multimodal multi-objective optimization. Machine Intelligence Research, vol.19, no. 2, pp. 138–152, 2022. DOI: https://doi.org/10.1007/s11633-022-1314-7.

    Article  Google Scholar 

  41. Z. Li, S. R. Xue, X. H. Yu, H. J. Gao. Controller optimization for multirate systems based on reinforcement learning. International Journal of Automation and Computing, vol. 17, no. 3, pp. 417–427, 2020. DOI: https://doi.org/10.10079/s11633-020-1229-0.

    Article  Google Scholar 

  42. S. X. Gu, T. Lillicrap, I. Sutskever, S. Levine. Continuous deep Q-learning with model-based acceleration. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, USA, pp. 2829–2838, 2016. DOI: https://doi.org/10.5555/3045390.3045688.

  43. X. B. Peng, M. Van De Panne. Learning locomotion skills using deepRL; Does the choice of action space matter? In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Los Angeles, USA, Article number 12, 2016. DOI: https://doi.org/10.1145/3099564.3099567.

  44. N. P. Farazi, T. Ahamed, L. Barua, B. Zou. Deep reinforcement learning and transportation research: A comprehensive review. [Online], Available; https://arxiv.org/abs/2010.06187, 2020.

  45. B. Y. Li, T. Lu, J. Y. Li, N. Lu, Y. H. Cai, S. Wang. ACDER: Augmented curiosity-driven experience replay. In Proceedings of IEEE International Conference on Robotics and Automation, Paris, France, pp. 4218–4224, 2020. DOI: https://doi.org/10.1109/ICRA40945.2020.9197421.

  46. C. Banerjee, Z. Y. Chen, N. Noman. Improved soft actor-critic: Mixing prioritized off-policy samples with on-policy experiences. IEEE Transactions on Neural Networks and Learning Systems, to be published. DOI: https://doi.org/10.1109/TNNLS.2022.3174051.

  47. S. Qi, W. Lin, Z. Hong, H. Chen, W. Zhang. Perceptive autonomous stair climbing for quadruped robots. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, Czech Republic, pp. 2313–2320, 2021. DOI: https://doi.org/10.1109/IROS51168.2021.9636302.

  48. T. Haarnoja, A. Zhou, P. Abbeel, S. Levine. Soft actor-criticI Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1856–1865, 2018.

  49. R. S. Sutton, A. G. Barto. Reinforcement LearningI An Introduction, Cambridge, UKI MIT Press, 1998.

    Google Scholar 

  50. X. Han, J. Stephant, G. Mourioux, D. Meizel. A ZMP based interval criterion for rollover-risk diagnosis. IFAC-PapersOnline, vol. 48, no. 21, pp. 277–282, 2015. DOI: https://doi.org/10.1016/j.ifacol.2015.09.540.

    Article  Google Scholar 

  51. P. Y. Oudeyer. Computational theories of curiosity-driven learning. [Online], Available: https://arxiv.org/abs/1802.10546, 2018.

  52. D. Pathak, P. Agrawal, A. A. Efros, T. Darrell. Curiosity-driven exploration by self-supervised prediction. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, pp. 488–489. DOI: https://doi.org/10.1109/CVPRW.2017.70.

  53. D. P. Kingma, J. Ba. AdamI A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980v9, 2015.

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61773374) and National Key Research and Development Program of China (No. 2017YFB1300104).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Zou.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Chi Zhang received the B. Sc. degree in automation from Shenyang University of Chemical Technology, China in 2015, and the M. Sc. degree jointly in control engineering from Harbin University of Science and Technology, and Institute of Automation, Chinese Academy of Sciences, China in 2018. He is currently a Ph. D. degree candidate in computer applied technology at Institute of Automation, Chinese Academy of Sciences, China.

His research interests include robotics, intelligent control and reinforcement learning.

E-mail zhangchi2015@ia.ac.cn

ORCID iD: 0000-0001-5527-3362

Wei Zou received the B. Sc. degree from Inner Mongolia University of Science and Technology, China in 1997, the M. Sc. degree from Shandong University, China in 2000, and the Ph. D. degree from Institute of Automation, Chinese Academy of Sciences, China in 2003, all in control science and engineering. He is currently a professor with Institute of Automation, Chinese Academy of Sciences, China.

His research interests include visual control and intelligent robots.

E-mail: wei.zou@ia.ac.cn (Corresponding author)

ORCID iD: 0000-0003-4215-5361

Ningbo Cheng received the B. Sc. and M. Sc. degrees in mechanical engineering from Harbin University of Science and Technology, China, in 2004 and 2007, respectively, and the Ph. D. degree in mechanical engineering from Tsinghua University, China in 2012. He is currently an assistant professor with Research Center of Precision Sensing and Control, Chinese Academy of Sciences, China.

His research interests include parallel robot and control.

E-mail: ningbo.cheng@ia.ac.cn

ORCID iD: 0000-0002-7504-4097

Shuomo Zhang received the B. Sc. degree in mechanical engineering from Shandong University, China in 2016, and the M. Sc degree in control science and engineering from Institute of Automation, Shanghai Jiao Tong University, China in 2020. Currently he is a Ph. D. degree candidate in computer applied technology at Institute of Automation, Chinese Academy of Sciences, China.

His research interests include quadruped locomotion and optimal control.

E-mail: zhangshuomo2020@ia.ac.cn

ORCID iD: 0009-0009-7869-0826

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Zou, W., Cheng, N. et al. Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots. Mach. Intell. Res. (2024). https://doi.org/10.1007/s11633-023-1429-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11633-023-1429-5

Keywords

Navigation