Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots

Zhang, Chi; Zou, Wei; Cheng, Ningbo; Zhang, Shuomo

doi:10.1007/s11633-023-1429-5

Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots

Research Article
Published: 22 February 2024

(2024)
Cite this article

Machine Intelligence Research Aims and scope Submit manuscript

102 Accesses
Explore all metrics

Abstract

Endowing quadruped robots with the skill to forward jump is conducive to making it overcome barriers and pass through complex terrains. In this paper, a model-free control architecture with target-guided policy optimization and deep reinforcement learning (DRL) for quadruped robot jumping is presented. First, the jumping phase is divided into take-off and flight-landing phases, and optimal strategies with solt actor-critic (SAC) are constructed for the two phases respectively. Second, policy learning including expectations, penalties in the overall jumping process, and extrinsic excitations is designed. Corresponding policies and constraints are all provided for successful take-off, excellent flight attitude and stable standing after landing. In order to avoid low efficiency of random exploration, a curiosity module is introduced as extrinsic rewards to solve this problem. Additionally, the target-guided module encourages the robot explore closer and closer to desired jumping target. Simulation results indicate that the quadruped robot can realize completed forward jumping locomotion with good horizontal and vertical distances, as well as excellent motion attitudes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Energy-Efficient Trotting for Legged Robots

High Dynamic Bounding and Jumping Motion of Quadruped Robot Based on Stable Optimization Control

Article 07 October 2023

High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

Article 14 December 2022

References

C. T. Richards, L. B. Porro, A. J. Collings. Kinematic control of extreme jump angles in the red-legged running frog. Kassina maculata. Journal of Experimental Biology, vol.220, no. 10, pp. 1894–1904, 2017. DOI: https://doi.org/10.1242/jeb.144279.
PubMed Google Scholar
J. Z. Yu, Z. S. Su, Z. X. Wu, M. Tan. Development of a fast-swimming dolphin eobot capable of leapping. IEEE/ASME Transactions on Mechatronics, vol.21, no. 5, pp. 2307–2316, 2016. DOI: https://doi.org/10.1109/TMECH.0016.5727720.
Article Google Scholar
M. Focchi, A. Del Prete, I. Havoutis, R. Featherstone, D. G. Caldwell, C. Semini. High-slope terrain locomotion for torque-controlled quadruped robots. Autonomous Robots, vol.41, no. 1, pp. 259–272, 2017. DOI: https://doi.org/10.1007/s10514-016-9573-1.
Article Google Scholar
M. Rutschmann, B. Satzinger, M. Byl, K. Byl. Nonlinear model predictive control for rough-terrain robot hopping. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vilamoura-Algarve, Portugal, pp. 1859–1864, 2012. DOI: https://doi.org/10.1109/IROS.2012.6385865.
Google Scholar
J. Di Carlo, P. Ml. Wensing, B. Katz, G. Biedt, S. Kim. Dynamic locomotion in the MIT cheetah 3 through convex model-predictive control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Madrid, Spain, pp. 7440–7447, 2018. DOI: https://doi.org/10.1109/IROS.2018.8594448.
Google Scholar
M. M. G. Ardakani, B. Olofsson, A. Robertsson, R. Johansson. Model predictive control for real-time point-to-point trajectory generation. IEEE Transactions on Automation Science and Engineering, vol.16, no. 2, pp. 972–983, 2019. DOI: https://doi.org/10.1109/TASE.2018.2882764.
Article Google Scholar
F. Kikuchi, Y. Ota, S. Hirose. Basic performance experiments for jumping quadruped. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Las Vegas, USA, pp. 3378–3383, 2003. DOI: https://doi.org/10.1109/IROS.2003.1249678.
Google Scholar
A. Yamada, H. Mameda, H. Mochiyama, H. Fujimoto. A compact jumping robot utilizing snap-through buckling with bend and twist. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Taipei, China, pp. 389–394, 2010. DOI: https://doi.org/10.1109/IROS.2010.5652928.
Google Scholar
C. Gehring, S. Coros, M. Hutter, C. D. Bellicoso, H. Heijnen, R. Diethelm, M. Bloesch, P. Fankhauser, J. Hwangbo, M. Hoepflinger, R Siegwart. Practice makes perfect: An optimization-based approach to controlling agile motions for a quadruped oobo. IEEE Robotics & Automation Magazine, vol.23, no. 1, pp. 34–13, 2016. DOI: https://doi.org/10.1109/MRA.2015.2505910.
Article Google Scholar
J. Zhong, J. Z. Fan, J. Zhao, W. Zhang. Kinematic analysis of jumping leg driven by artificial muscles. In Proceedings of IEEE International Conference on Mechatronics and Automation, Chengdu, China, pp. 1004–1008, 2012. DOI: https://doi.org/10.1109/ICMA.2012.6283387.
Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, A. Farhadi. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, Singapore, pp, 3357–3364, 2077. DOI: https://doi.org/10.1109/ICRA.2017.7989381.
H. B. Shi, L. Shi, M. Xu, K. S. Hwang. End-to-end navigation strategy with deep reinforcement learning for mobile robots. IEEE Transactions on Industrial Informatics, vol.16, no. 4, pp. 2393–2402, 2020. DOI: https://doi.org/10.1109/TII.2019.2936167.
Article Google Scholar
Z. Y. Yang, K. Merrick, L. W. Jin, H. A. Abbass. Hierarchical deep reinforcement learning for continuous action control. IEEE Transactions on Neural Networks and Learning Systems, vol.29, no. 11, pp. 5174–5184, 2018. DOI: https://doi.org/10.1109/TNNLS.2018.2805379.
Article MathSciNet PubMed Google Scholar
M. Breyer, F. Furrer, T. Novkovic, R. Siegwart, J. Nieto. Comparing task simplifications to learn closed-loop object picking using deep reinforcement learning. IEEE Robotics and Automation Letters, vol.4, no. 2, pp. 1549–1556, 2019. DOI: https://doi.org/10.1109/LRA.2019.2896467.
Article Google Scholar
H. J. Huang, Y. C. Yang, H. Wang, Z. G. Ding, H. Sari, F. Adachi. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Transactions on Vehicular Technology, vol.69, no. 1, pp. 1117–1121, 2020. DOI: https://doi.org/10.1109/TVT.2019.2952549.
Article Google Scholar
J. Xu, T. Du, M. Foshey, B. C. Li, B. Zhu, A. Schulz, W. Matusik. Learning to fly: Computational controller design for hybrid UAVs with reinforcement learning. ACM Transactions on Graphics, vol. 38, no. 4, Article number 42, 2019. DOI: https://doi.org/10.1145/3306346.3322940.
A. Cully, J. Clune, D. Tarapore, J. B. Mouret. Robots that can adapt like animals. Nature, vol. 521, no. 7553, pp. 503–531, 2015. DOI: https://doi.org/10.1038/nature14422.
Article ADS CAS PubMed Google Scholar
J. Tan, T. N. Zhang, E. Coumans, A. Iscen, Y. F. Bai, D. Hafner, S. Bohez, V. Vanhoucke. Sim-to-real: Learning agile locomotion for quadruped robots. In Proceedings of the 14th Robotics: Science and Systems, Pittsburgh, USA, 2018. DOI: https://doi.org/10.15607/RSS.2018.XIV.010.
A. Singla, S. Bhattacharya, D. Dholakiya, S. Bhatnagar, A. Ghosal, B. Amrutur, S. Kolathaya. Realizmg learned quadruped locomotion behaviors through kinematic motion prirmtives. In Proceedings of International Conference on Robotics and Automation, IEEE, Montreal, Canada, pp. 7434–7440, 2019. DOI: https://doi.org/10.1109/ICRA.2019.8794179.
Google Scholar
P. X. Long, T. X. Fan, X. Y. Liao, W. X. Liu, H. Zhang, J. Pan. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In Proceedings of IEEE International Conference on Robotics and Automation, Brisbane, Australia, pp. 6252–6259, 2018. DOI: https://doi.org/10.1109/ICRA.2018.8461113.
T. Haarnoja, S. Ha, A. Zhou, J. Tan, G. Tucker, S. Levine. Learning to walk via deep reinforcement learning. In Proceedings of the 15th Robotics: Science and Systems, Freiburg im Breisgau, Germany, 2019. DOI: https://doi.org/10.15607/RSS.2019.XV.011.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov. Prox mal poHcy optimization algorithms. [Online], Available: https://arxiv.org/abs/1707.06347, 2017.
Q. Nguyen, M. J. Powell, B. Katz, J. Di Carlo, S. Kim. Optimized jumping on the MIT cheetah 3 robot. In Proceedings of International Conference on Robotics and Automation, Montreal, Canada, pp. 7448–7454, 2019. DOI: https://doi.org/10.1109/ICRA.2019.8794449.
G. Bellegarda, Q. Nguyen. Robust quadruped jumping via deep reinforcement learning. [Online], Available: https://arxiv.org/abs/2011.07089, 2020.
N. Rudin, H. Kolvenbach, V. Tsounis, M. Hutter. Cat-like jumping and landing of legged robots in low gravity using deep reinforcement learning. IEEE Transactions on Robotics, vol.38, no. 1, pp. 317–328, 2022. DOI: https://doi.org/10.1109/TRO.2021.3084374.
Article Google Scholar
H. W. Park, P. M. Wensing, S. Kim. High-speed bounding with the MIT Cheetah 2: Control design and experiments. The International Journal of Robotics Research, vol.36, no. 2, pp. 167–192, 2017. DOI: https://doi.org/10.1177/0278364917694244.
Article Google Scholar
G. P. Jung, C. S. Casarez, J. Lee, S. M. Baek, S. J. Yim, S. H. Chae, R. S. Fearing, K. J. Cho. JumpRoACH: A trajectory-adjustable integrated jumping-crawling robot. IEEE/ASME Transactions on Mechatronics, vol.24, no. 3, pp. 947–958, 2019. DOI: https://doi.org/10.1109/TMECH.2019.2907743.
Article Google Scholar
B. Ugurlu, K. Kotaka, T. Narikiyo. Actively-compliant locomotion control on rough terrain: Cyclic jumping and trotting experiments on a stiff-by-nature quadruped. In Proceedings of IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, pp. 3313–3320, 2013. DOI: https://doi.org/10.1109/ICRA.2013.6631039.
H. W. Park, P. M. Wensing, S. Kim. Online planning for autonomous running jumps over obstacles in high-speed quadrupeds. In Proceedings of Robotics: Science and Systems, Roma, Italy, 2015. DOI: https://doi.org/10.15607/RSS.2015.XI.047.
T. T. Wang, W. Guo, M. T. Li, F. S. Zha, L. N. Sun. CPG control for biped hopping robot in unpredictable environment. Journal of Bionic Engineering, vol.9, no. 1, pp. 29–38, 2012. DOI: https://doi.org/10.1016/S1672-6529(11)60094-2.
Article ADS Google Scholar
J. Z. Yu, M. Tan, J. Chen, J. W. Zhang. A survey on CPG-inspired control models and system implementation. IEEE Transactions on Neural Networks and Learning Systems, vol.25, no. 3, pp. 441–456, 2014. DOI: https://doi.org/10.1109/TNNLS.2013.2280596.
Article PubMed Google Scholar
N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Y. Wang, S. M. A. Eslami, M. Riedmiller, D. Silver. Emergence of locomotion behaviours in rich environments. [Online], Available: https://arxiv.org/abs/1707.02286, 2017.
X. B. Peng, G. Berseth, M. Van De Panne. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics, vol. 35, no. 4, Article number 81, 2016. DOI: https://doi.org/10.1145/2897824.2925881.
A. Zeng, S. R. Song, S. Welker, J. Lee, A. Rodriguez, T. Funkhouser. Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Madrid, Spain, pp. 4238–4245, 2018. DOI: https://doi.org/10.1109/IROS.2018.8593986.
Google Scholar
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, M. Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, vol. 4, no. 26, Article number eaau5872, 2019. DOI: https://doi.org/10.1126/scirobotics.aau5872.
X. B. Peng, E. Coumans, T. N. Zhang, T. W. E. Lee, J. Tan, S. Levine. Learning agile robotic locomotion skills by imitating animals. In Proceedings of the 14th Robotics: Science and Systems, Corvalis, USA, 2020.
Y. Li, D. Xu. Skill learning for robotic insertion based on one-shot demonstration and reinforcement learning. International Journal of Automation and Computing, vol.18, no. 3, pp. 457–467, 2021. DOI: https://doi.org/10.1007/s11633-021-1290-3.
Article Google Scholar
Z. M. Xie, G. Berseth, P. Clary, J. Hurst, M. Van De Panne. Feedback control for Cassie with deep reinforcement learning. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Madrid, Spain, pp. 1241–1246, 2018. DOI: https://doi.org/10.1109/IROS.2018.8593722.
Google Scholar
D. O. Won, K. R. Müller, S. W. Lee. An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Science Robotics, vol. 5, no. 46, Article number eabb9764, 2020. DOI: https://doi.org/10.1126/scirobotics.abb9764.
Q. L. Dang, W. Xu, Y. F. Yuan. A dynamic resource allocation strategy with reinforcement learning for multimodal multi-objective optimization. Machine Intelligence Research, vol.19, no. 2, pp. 138–152, 2022. DOI: https://doi.org/10.1007/s11633-022-1314-7.
Article Google Scholar
Z. Li, S. R. Xue, X. H. Yu, H. J. Gao. Controller optimization for multirate systems based on reinforcement learning. International Journal of Automation and Computing, vol. 17, no. 3, pp. 417–427, 2020. DOI: https://doi.org/10.10079/s11633-020-1229-0.
Article Google Scholar
S. X. Gu, T. Lillicrap, I. Sutskever, S. Levine. Continuous deep Q-learning with model-based acceleration. In Proceedings of the 33rd International Conference on International Conference on Machine Learning, New York, USA, pp. 2829–2838, 2016. DOI: https://doi.org/10.5555/3045390.3045688.
X. B. Peng, M. Van De Panne. Learning locomotion skills using deepRL; Does the choice of action space matter? In Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation, Los Angeles, USA, Article number 12, 2016. DOI: https://doi.org/10.1145/3099564.3099567.
N. P. Farazi, T. Ahamed, L. Barua, B. Zou. Deep reinforcement learning and transportation research: A comprehensive review. [Online], Available; https://arxiv.org/abs/2010.06187, 2020.
B. Y. Li, T. Lu, J. Y. Li, N. Lu, Y. H. Cai, S. Wang. ACDER: Augmented curiosity-driven experience replay. In Proceedings of IEEE International Conference on Robotics and Automation, Paris, France, pp. 4218–4224, 2020. DOI: https://doi.org/10.1109/ICRA40945.2020.9197421.
C. Banerjee, Z. Y. Chen, N. Noman. Improved soft actor-critic: Mixing prioritized off-policy samples with on-policy experiences. IEEE Transactions on Neural Networks and Learning Systems, to be published. DOI: https://doi.org/10.1109/TNNLS.2022.3174051.
S. Qi, W. Lin, Z. Hong, H. Chen, W. Zhang. Perceptive autonomous stair climbing for quadruped robots. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, Czech Republic, pp. 2313–2320, 2021. DOI: https://doi.org/10.1109/IROS51168.2021.9636302.
T. Haarnoja, A. Zhou, P. Abbeel, S. Levine. Soft actor-criticI Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, pp. 1856–1865, 2018.
R. S. Sutton, A. G. Barto. Reinforcement LearningI An Introduction, Cambridge, UKI MIT Press, 1998.
Google Scholar
X. Han, J. Stephant, G. Mourioux, D. Meizel. A ZMP based interval criterion for rollover-risk diagnosis. IFAC-PapersOnline, vol. 48, no. 21, pp. 277–282, 2015. DOI: https://doi.org/10.1016/j.ifacol.2015.09.540.
Article Google Scholar
P. Y. Oudeyer. Computational theories of curiosity-driven learning. [Online], Available: https://arxiv.org/abs/1802.10546, 2018.
D. Pathak, P. Agrawal, A. A. Efros, T. Darrell. Curiosity-driven exploration by self-supervised prediction. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, pp. 488–489. DOI: https://doi.org/10.1109/CVPRW.2017.70.
D. P. Kingma, J. Ba. AdamI A method for stochastic optimization. [Online], Available: https://arxiv.org/abs/1412.6980v9, 2015.

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61773374) and National Key Research and Development Program of China (No. 2017YFB1300104).

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100149, China
Chi Zhang, Wei Zou & Shuomo Zhang
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Chi Zhang, Wei Zou, Ningbo Cheng & Shuomo Zhang

Authors

Chi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zou
View author publications
You can also search for this author in PubMed Google Scholar
Ningbo Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Shuomo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Zou.

Ethics declarations

The authors declared that they have no conflicts of interest to this work.

Additional information

Colored figures are available in the online version at https://link.springer.com/journal/11633

Chi Zhang received the B. Sc. degree in automation from Shenyang University of Chemical Technology, China in 2015, and the M. Sc. degree jointly in control engineering from Harbin University of Science and Technology, and Institute of Automation, Chinese Academy of Sciences, China in 2018. He is currently a Ph. D. degree candidate in computer applied technology at Institute of Automation, Chinese Academy of Sciences, China.

His research interests include robotics, intelligent control and reinforcement learning.

E-mail zhangchi2015@ia.ac.cn

ORCID iD: 0000-0001-5527-3362

Wei Zou received the B. Sc. degree from Inner Mongolia University of Science and Technology, China in 1997, the M. Sc. degree from Shandong University, China in 2000, and the Ph. D. degree from Institute of Automation, Chinese Academy of Sciences, China in 2003, all in control science and engineering. He is currently a professor with Institute of Automation, Chinese Academy of Sciences, China.

His research interests include visual control and intelligent robots.

E-mail: wei.zou@ia.ac.cn (Corresponding author)

ORCID iD: 0000-0003-4215-5361

Ningbo Cheng received the B. Sc. and M. Sc. degrees in mechanical engineering from Harbin University of Science and Technology, China, in 2004 and 2007, respectively, and the Ph. D. degree in mechanical engineering from Tsinghua University, China in 2012. He is currently an assistant professor with Research Center of Precision Sensing and Control, Chinese Academy of Sciences, China.

His research interests include parallel robot and control.

E-mail: ningbo.cheng@ia.ac.cn

ORCID iD: 0000-0002-7504-4097

Shuomo Zhang received the B. Sc. degree in mechanical engineering from Shandong University, China in 2016, and the M. Sc degree in control science and engineering from Institute of Automation, Shanghai Jiao Tong University, China in 2020. Currently he is a Ph. D. degree candidate in computer applied technology at Institute of Automation, Chinese Academy of Sciences, China.

His research interests include quadruped locomotion and optimal control.

E-mail: zhangshuomo2020@ia.ac.cn

ORCID iD: 0009-0009-7869-0826

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Zou, W., Cheng, N. et al. Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots. Mach. Intell. Res. (2024). https://doi.org/10.1007/s11633-023-1429-5

Download citation

Received: 23 August 2022
Accepted: 23 February 2023
Published: 22 February 2024
DOI: https://doi.org/10.1007/s11633-023-1429-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots

Abstract

Access this article

Similar content being viewed by others

Learning Energy-Efficient Trotting for Legged Robots

High Dynamic Bounding and Jumping Motion of Quadruped Robot Based on Stable Optimization Control

High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards Jumping Skill Learning by Target-guided Policy Optimization for Quadruped Robots

Abstract

Access this article

Similar content being viewed by others

Learning Energy-Efficient Trotting for Legged Robots

High Dynamic Bounding and Jumping Motion of Quadruped Robot Based on Stable Optimization Control

High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation