Abstract
Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet.
Supplemental Material
Available for Download
supplemental
- R. Abdal, Y. Qin, and P. Wonka. 2019. Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?. In Proc. of the IEEE/CVF Int. Conf. on Computer Vision. 4432--4441.Google Scholar
- K. Aberman, Y. Weng, D. Lischinski, D. Cohen-Or, and B. Chen. 2020. Unpaired Motion Style Transfer from Video to Animation. ACM Trans. Graph. 39, 4 (2020).Google ScholarDigital Library
- A. Aghajanyan, S. Gupta, and L. Zettlemoyer. 2021. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 7319--7328.Google Scholar
- F. Alet, T. Lozano-Perez, and L. P. Kaelbling. 2018. Modular meta-learning. In Conf. on Robot Learning (Proc. of Machine Learning Research, Vol. 87). 856--868.Google Scholar
- M. Andrychowicz, M. Denil, S. G. Colmenarejo, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. de Freitas. 2016. Learning to Learn by Gradient Descent by Gradient Descent. In Neural Information Processing Systems. 3988--3996.Google Scholar
- K. Bergamin, S. Clavet, D. Holden, and J. R. Forbes. 2019. DReCon: Data-Driven Responsive Control of Physics-Based Characters. ACM Trans. Graph. 38, 6 (2019).Google ScholarDigital Library
- D. Berthelot, T. Schumm, and L. Metz. 2017. BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv:1703.10717 [cs.LG]Google Scholar
- P. Bojanowski, A. Joulin, D. Lopez-Pas, and A. Szlam. 2018. Optimizing the Latent Space of Generative Networks. In Int. Conf. on Machine Learning (Proc. of Machine Learning Research, Vol. 80). 600--609.Google Scholar
- J. Chemin and J. Lee. 2018. A Physics-Based Juggling Simulation Using Reinforcement Learning. In ACM SIGGRAPH Conf. on Motion, Interaction and Games. Article 3.Google Scholar
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. In NIPS 2014 Workshop on Deep Learning.Google Scholar
- C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine. 2017. Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer. In IEEE Int. Conf. on Robotics and Automation. 2169--2176.Google Scholar
- Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel. 2016. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv:1611.02779 [cs.AI]Google Scholar
- D. Epstein, T. Park, R. Zhang, E. Shechtman, and A. A. Efros. 2022. BlobGAN: Spatially Disentangled Scene Representations. In Computer Vision - ECCV 2022. 616--635.Google Scholar
- C. Finn, P. Abbeel, and S. Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Int. Conf. on Machine Learning. 1126--1135.Google Scholar
- A. Frezzato, A. Tangri, and S. Andrews. 2022. Synthesizing Get-Up Motions for Physics-based Characters. Comput. Graph. Forum 41, 8 (2022), 207--218.Google ScholarCross Ref
- Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. 2016. Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research 17, 1 (2016), 2096--2030.Google ScholarDigital Library
- I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. 2017. Improved Training of Wasserstein GANs. In Neural Information Processing Systems, Vol. 30. 5769--5779.Google Scholar
- A. Gupta, R. Mendonca, Y. Liu, P. Abbeel, and S. Levine. 2018. Meta-Reinforcement Learning of Structured Exploration Strategies. In Neural Information Processing Systems, Vol. 31. 5307--5316.Google Scholar
- T. Haarnoja, H. Tang, P. Abbeel, and S. Levine. 2017. Reinforcement Learning with Deep Energy-Based Policies. In Int. Conf. on Machine Learning. 1352--1361.Google Scholar
- T. Harada, S. Taoka, T. Mori, and T. Sato. 2004. Quantitative Evaluation Method for Pose and Motion Similarity Based on Human Perception. In IEEE/RAS Int. Conf. on Humanoid Robots, Vol. 1. 494--512.Google Scholar
- F. G. Harvey, M. Yurick, D. Nowrouzezahrai, and C. Pal. 2020. Robust Motion In-betweening. ACM Trans. Graph. 39, 4, Article 60 (2020).Google ScholarDigital Library
- N. Heess, J. J. Hunt, T. P. Lillicrap, and D. Silver. 2015. Memory-based control with recurrent neural networks. arXiv:1512.04455 [cs.LG]Google Scholar
- N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286 [cs.AI]Google Scholar
- D. Hejna, L. Pinto, and P. Abbeel. 2020. Hierarchically Decoupled Imitation For Morphological Transfer. In 37th Int. Conf. on Machine Learning, Vol. 119. 4159--4171.Google Scholar
- J. Ho and S. Ermon. 2016. Generative Adversarial Imitation Learning. Advances in Neural Information Processing Systems 29 (2016).Google Scholar
- R. Houthooft, Y. Chen, P. Isola, B. Stadie, F. Wolski, O. Jonathan Ho, and P. Abbeel. 2018. Evolved policy gradients. In Neural Information Processing Systems, Vol. 31. 5405--5414.Google Scholar
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685 [cs.CL]Google Scholar
- A. Jahanian, L. Chai, and P. Isola. 2020. On the "Steerability" of Generative Adversarial Networks. In Int. Conf. on Learning Representations.Google Scholar
- J. Juravsky, Y. Guo, S. Fidler, and X. B. Peng. 2022. PADL: Language-Directed Physics-Based Character Control. In SIGGRAPH Asia 2022 Conf. Papers. Article 19.Google Scholar
- A. Karpathy and M. van de Panne. 2012. Curriculum Learning for Motor Skills. In Canadian Conf. on Artificial Intelligence. Springer, 325--330.Google Scholar
- T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. 2020. Analyzing and Improving the Image Quality of StyleGAN. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 8110--8119.Google Scholar
- D. P. Kingma and J. Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]Google Scholar
- A. Kwiatkowski, E. Alvarado, V. Kalogeiton, C. K. Liu, J. Pettré, M. van de Panne, and M.-P. Cani. 2022. A Survey on Reinforcement Learning Methods in Character Animation. Comput. Graph. Forum 41, 2 (2022), 613--639.Google ScholarCross Ref
- C. Li, H. Farkhoor, R. Liu, and J. Yosinski. 2018. Measuring the Intrinsic Dimension of Objective Landscapes. In Int. Conf. on Learning Representations.Google Scholar
- J. H. Lim and J. C. Ye. 2017. Geometric GAN. arXiv:1705.02894 [stat.ML]Google Scholar
- H. Y. Ling, F. Zinno, G. Cheng, and M. van de Panne. 2020. Character controllers using motion VAEs. ACM Trans. Graph. 39, 4, Article 40 (2020).Google ScholarDigital Library
- L. Liu and J. Hodgins. 2017. Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning. ACM Trans. Graph. 36, 4, Article 42a (2017).Google ScholarDigital Library
- L. Liu and J. Hodgins. 2018. Learning Basketball Dribbling Skills Using Trajectory Optimization and Deep Reinforcement Learning. ACM Trans. Graph. 37, 4, Article 142 (2018), 14 pages.Google ScholarDigital Library
- Y. Luo, K. Xie, S. Andrews, and P. Kry. 2021. Catching and Throwing Control of a Physically Simulated Hand. In ACM SIGGRAPH Conf. on Motion, Interaction and Games. Article 15.Google Scholar
- V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, and G. State. 2021. Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning. arXiv:2108.10470 [cs.RO]Google Scholar
- I. Mason, S. Starke, H. Zhang, H. Bilen, and T. Komura. 2018. Few-shot Learning of Homogeneous Human Locomotion Styles. Comput. Graph. Forum 37, 7 (2018), 143--153.Google ScholarCross Ref
- J. Merel, Y. Tassa, D. TB, S. Srinivasan, J. Lemmon, Z. Wang, G. Wayne, and N. Heess. 2017. Learning human behaviors from motion capture by adversarial imitation. arXiv:1707.02201 [cs.RO]Google Scholar
- J. Merel, S. Tunyasuvunakool, A. Ahuja, Y. Tassa, L. Hasenclever, V. Pham, T. Erez, G. Wayne, and N. Heess. 2020. Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks. ACM Trans. Graph. 39, 4, Article 39 (2020).Google ScholarDigital Library
- C. Mou, X. Wang, L. Xie, Y. Wu, J. Zhang, Z. Qi, Y. Shan, and X. Qie. 2023. T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. arXiv:2302.08453 [cs.CV]Google Scholar
- A. Nichol, J. Achiam, and J. Schulman. 2018. On First-Order Meta-Learning Algorithms. arXiv:1803.02999 [cs.LG]Google Scholar
- E. Parisotto, L. J. Ba, and R. Salakhutdinov. 2016. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In Int. Conf. on Learning Representations.Google Scholar
- X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne. 2018a. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. ACM Trans. Graph. 37, 4, Article 143 (2018).Google ScholarDigital Library
- X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. 2018b. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. In IEEE Int. Conf. on Robotics and Automation. 3803--3810.Google Scholar
- X. B. Peng, M. Chang, G. Zhang, P. Abbeel, and S. Levine. 2019. MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies. In Advances in Neural Information Processing Systems. 3681--3692.Google Scholar
- X. B. Peng, Y. Guo, L. Halper, S. Levine, and S. Fidler. 2022. ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters. ACM Trans. Graph. 41, 4, Article 94 (2022).Google ScholarDigital Library
- X. B. Peng, Z. Ma, P. Abbeel, S. Levine, and A. Kanazawa. 2021. AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control. ACM Trans. Graph. 40, 4, Article 144 (2021).Google ScholarDigital Library
- P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein. 2021. The Intrinsic Dimension of Images and Its Impact on Learning. In Int. Conf. on Learning Representations.Google Scholar
- A. Radford, L. Metz, and S. Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [cs.LG]Google Scholar
- A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine. 2017. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles. In Int. Conf. on Learning Representations.Google Scholar
- S. Ravi and H. Larochelle. 2017. Optimization as a Model for Few-Shot Learning. In Int. Conf. on Learning Representations.Google Scholar
- A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell. 2016a. Policy Distillation. arXiv:1511.06295 [cs.LG]Google Scholar
- A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell. 2016b. Progressive Neural Networks. arXiv:1606.04671 [cs.LG]Google Scholar
- A. A. Rusu, M. Večerík, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell. 2017. Sim-to-Real Robot Learning from Pixels with Progressive Nets. In Conf. on Robot Learning. 262--270.Google Scholar
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG]Google Scholar
- Y. Shen, J. Gu, X. Tang, and B. Zhou. 2020. Interpreting the Latent Space of GANs for Semantic Face Editing. In Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 9243--9252.Google Scholar
- T. Silver, K. Allen, J. Tenenbaum, and L. Kaelbling. 2019. Residual Policy Learning. arXiv:1812.06298 [cs.RO]Google Scholar
- S. Starke, I. Mason, and T. Komura. 2022. DeepPhase: Periodic Autoencoders for Learning Motion Phase Manifolds. ACM Trans. Graph. 41, 4, Article 136 (2022).Google ScholarDigital Library
- J. K. Tang, H. Leung, T. Komura, and H. P. Shum. 2008. Emulating human perception of motion similarity. Computer Animation and Virtual Worlds 19, 3--4 (2008), 211--221.Google ScholarCross Ref
- T. Tao, M. Wilson, R. Gou, and M. van de Panne. 2022. Learning to Get Up. In ACM SIGGRAPH 2022 Conf. Proceedings. Article 47.Google Scholar
- C. Tessler, Y. Kasten, Y. Guo, S. Mannor, G. Chechik, and X. B. Peng. 2023. CALM: Conditional Adversarial Latent Models for Directable Virtual Characters. In ACM SIGGRAPH 2023 Conf. Proceedings. Article 37.Google Scholar
- E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. 2017. Adversarial Discriminative Domain Adaptation. In IEEE Conf. on Computer Vision and Pattern Recognition. 2962--2971.Google Scholar
- D. Wang, E. Shelhamer, S. Liu, B. A. Olshausen, and T. Darrell. 2021. Tent: Fully TestTime Adaptation by Entropy Minimization. In Int. Conf. on Learning Representations.Google Scholar
- J. Won, D. Gopinath, and J. Hodgins. 2021. Control Strategies for Physically Simulated Characters Performing Two-Player Competitive Sports. ACM Trans. Graph. 40, 4, Article 146 (2021).Google ScholarDigital Library
- J. Wu, C. Zhang, T. Xue, B. Freeman, and J. Tenenbaum. 2016. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. In Advances in Neural Information Processing Systems, Vol. 29.Google Scholar
- Z. Xie, H. Y. Ling, N. H. Kim, and M. van de Panne. 2020. ALLSTEPS: Curriculum-driven Learning of Stepping Stone Skills. Comput. Graph. Forum 39, 8 (2020), 213--224.Google ScholarDigital Library
- Z. Xie, S. Starke, H. Y. Ling, and M. van de Panne. 2022. Learning Soccer Juggling Skills with Layer-Wise Mixture-of-Experts. In ACM SIGGRAPH 2022 Conf. Proceedings. Article 25.Google Scholar
- P. Xu and I. Karamouzas. 2021. A GAN-Like Approach for Physics-Based Imitation Learning and Interactive Character Control. Proc. of the ACM on Computer Graphics and Interactive Techniques 4, 3, Article 44 (2021).Google Scholar
- P. Xu, X. Shang, V. Zordan, and I. Karamouzas. 2023. Composite Motion Learning with Task Control. ACM Trans. Graph. 42, 4, Article 93 (2023).Google ScholarDigital Library
- Z. Xu, H. P. van Hasselt, and D. Silver. 2018. Meta-Gradient Reinforcement Learning. In Advances in Neural Information Processing Systems, Vol. 31.Google Scholar
- H. Yao, Z. Song, B. Chen, and L. Liu. 2022. ControlVAE: Model-Based Learning of Generative Controllers for Physics-Based Characters. ACM Trans. Graph. 41, 6, Article 183 (2022).Google ScholarDigital Library
- Z. Yin, Z. Yang, M. van de Panne, and K. Yin. 2021. Discovering Diverse Athletic Jumping Strategies. ACM Trans. Graph. 40, 4, Article 91 (2021).Google ScholarDigital Library
- W. Yu, G. Turk, and C. K. Liu. 2018. Learning Symmetric and Low-Energy Locomotion. ACM Trans. Graph. 37, 4, Article 144 (2018).Google ScholarDigital Library
- L. Zhang and M. Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. arXiv:2302.05543 [cs.CV]Google Scholar
- P. Zhuang, O. O. Koyejo, and A. Schwing. 2021. Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation. In Int. Conf. on Learning Representations.Google Scholar
Index Terms
- AdaptNet: Policy Adaptation for Physics-Based Character Control
Recommendations
Composite Motion Learning with Task Control
We present a deep learning method for composite and task-driven motion control for physically simulated characters. In contrast to existing data-driven approaches using reinforcement learning that imitate full-body motions, we learn decoupled motions for ...
How to train your dragon: example-guided control of flapping flight
Imaginary winged creatures in computer animation applications are expected to perform a variety of motor skills in a physically realistic and controllable manner. Designing physics-based controllers for a flying creature is still very challenging ...
A GAN-Like Approach for Physics-Based Imitation Learning and Interactive Character Control
We present a simple and intuitive approach for interactive control of physically simulated characters. Our work builds upon generative adversarial networks (GAN) and reinforcement learning, and introduces an imitation learning framework where an ensemble ...
Comments