A geometric optimal control approach for imitation and generalization of manipulation skills
Introduction
Skillful manipulation does not just mean how precisely a person can perform a task, but also how well one can cope with complex and changing scenarios and exploit the system redundancy to counteract perturbations. This fact arises from a variety of research areas, including biomechanics, neuroscience, sport science, control, and robotics, with related formulations including minimal intervention principle [1], uncontrolled manifold [2], and optimal feedback control [3]. In the uncontrolled manifold, for example, we assume that the human would not be very stiff in the directions that do not matter for the task, so we expect to see wide variations in the motion of those directions. The central nervous system will not focus on the do-not-matter directions. Instead, its control effort would be delegated among the variables crucial for the task (low variability) [4].
Synergies in the system can be defined by considering a more general definition of variation, namely the correlation. Synergistic systems can keep the system functionality not merely by controlling only one variable but also all the linked elements. Studies such as [5] show that this capability is the main principle that helps biological systems in nature deal with complex situations.
We can use different coordinate systems to describe the task. Sternad et al. [4] have shown that the choice of the coordinate system plays an important role in (co)variation modeling. Each coordinate system can be seen as a set of different features. Some features (coordinate systems) can exploit the structure of the task by forgiving the errors in different ways, so they are more advantageous over others. This advantage can be seen from both computational and geometrical aspects, see [6] for an overview. Moreover, studies in human motion planning [7], [8] suggest that humans can model the task in different or even “mixed” coordinate frames, such as body reference frame (the putamen) and gaze- or head-centered frame (parietal cortex), which may be well addressed by the concept of multiple candidate geometries formulated as an optimal control problem.
Most graspable shapes in our human-made environment, either the whole object or a local part of it, can be described by three main types of coordinate systems: Cartesian, cylindrical and spherical coordinate systems as shown in Fig. 1. In addition to object shapes, some tasks can be represented more efficiently in a specific coordinate system. As shown in the second row of Fig. 1, rotating an object (e.g., opening/closing a door, turning a page of a book) can be described in a cylindrical system, while for wiping a table and pointing to an object, it is better suited to use a prismatic and a spherical system, respectively. Defining the task in the proper manifold allows the task to be represented with more relevant geometric features, enabling the robot to extract and learn the skill more easily.
In this article, we introduce a motion planning approach that can benefit from different coordinate systems and present an approach to extract the most relevant one by statistical analysis. We use a small set of demonstrations and a set of Riemannian manifolds to estimate Gaussian distributions for each coordinate system. The skill generalization problem is formulated as a general optimal control problem (OCP). The cost function of the OCP is defined using the Gaussian distributions constructed in the optimal coordinate system with the reference reproduced by Gaussian mixture regression (GMR). The processing pipeline of the approach is shown in Fig. 2.
In our previous work [9], we used open-loop control in an optimal control framework to reproduce the task. However, the (i)LQR method can also provide optimal feedback gains, which can be used to cope with external disturbances. The values of the feedback gains determine how the system would react to the disturbances, which is highly related to the task. However, these gains are often set manually as diagonal matrices. Typically, they are also exclusively defined in a Euclidean space, which limits the application scenarios involving objects and actions characterized by other geometries. Learning task-dependent feedback gains is discussed in LfD by considering the data variations. We show in this article how modeling these variations in different types of coordinate systems can improve the system’s autonomous behavior. Modeling the data in multiple types of coordinate systems reveals the more subtle correlation between the states of the system and provides a more systematic way to exploit the variations.
Feedback gains are also necessary for collaborative tasks to keep humans safe (by not applying excessive force) and provide more intuitive interactions. However, it is not enough. Feedback gains allow the system to react to spatial disturbances, but they do not consider temporal ones. When the robot interacts with a predictable environment, it may not be a big issue. However, a human is highly unpredictable and introduces many sources of spatiotemporal disturbances to the system. It would be more favorable to have a time-independent controller, usually called policy in the literature. To gain this goal, we implement the phase estimation technique by decoupling the time variable of the system, i.e., phase, from the real-world clock, and calculating the phase as a function of the robot states. This improvement allows us to get a time-independent policy from the iLQR method without changing anything on the optimization part.
We also represented invariant features of observed manipulation in the chosen coordinate system from Cartesian, Cylindrical with axis and Spherical coordinates in [9]. The system was defined at the level of the robot kinematic, where a linear system reproduced the time-driven movement in an optimal control framework with a non-linear cost function. Beyond the combination of our preliminary contributions on imitation of manipulation skills using multiple geometries, the contributions of this research are the following: (1) We propose an approach to improve the generalization capability of manipulation skills by fully considering different types of coordinate systems including Cartesian, Cylindrical with three distinct main axes, and Spherical coordinates. (2) We introduce a motion planning approach using OCP defined in different coordinate systems both at the level of the task and at the level of the robot kinematic structure. (3) We define precision matrices within an optimal control formulation, resulting in the automatic determination of feedback gains for the controller from the sparse demonstrations represented in multiple coordinate systems. The associated feedback controllers consider these different objectives in a coordinated manner, allowing the robot to exploit task variations with diverse geometries. (4) We validate the application of our method for manipulation tasks on a real robot in both autonomous and shared-control modes.
The notations are summarized in Table 1. In the remainder of the article, we summarize the related works in Section 2, and give an overview of the background in Section 3. The method is explained in Section 4. Section 5 includes the experiments and we discuss the results in Section 6. We conclude the article in Section 7 by summarizing the contribution, limitations and future works.
Section snippets
Related work
The aim of the proposed approach is to utilize the geometric data obtained from few demonstrations to enhance the robot adaptation capability. We utilize optimal control techniques to facilitate the transfer of the desired behaviors to different situations. Additionally, we seek to explore the benefits of leveraging this information in a shared control setting, where the robot determines which directions or geometric information to prioritize while enabling the human operator to control the
(iterative) linear quadratic regulator control ((i)LQR)
The OCP formulation can be used to solve trajectory optimization and planning problems, by considering a time window covering the entire task. In an OCP, a cost function is minimized with respect to control commands over a time window, subject to a function describing the system evolution by starting from an initial state . The general discrete form of OCP consists of a cost subject to the dynamics where and .
Manifold selection
We assume that we have gathered a set of data such as , where is the total number of demonstrations. The choice of the manifold to express the demonstrations affects the learning and control procedure. So the first step is to find the most relevant manifold to represent the task. We assume that this manifold is time-dependent, and use the same manifold for all the time steps in a stage. In the following, we describe the procedure of selecting a manifold at time , and in the
Experiments
We first analyze the controller proposed in different coordinate systems by introducing a point-mass reaching task. For the whole pipeline, we took grasping and box-opening as two typical manipulation tasks in our daily life to validate our approach in the presence and absence of perturbation, both in a simulation and on a real robot. As a last experiment, we verify our method on a shared control human–robot collaboration using a virtual guidance system and policy extraction utilizing the phase
Discussion
One of the biggest challenges in LfD is that we are limited to few demonstrations. Typically, the recorded demonstrations cannot describe all the aspects of the task if we try to learn those in a black-box manner. These challenges motivated researchers to gather information from other sources, such as providing structures and representations that can generalize to a large range of manipulation skills.
In this article, we proposed a winner-takes-all strategy by estimating which manifold
Conclusion
In this article, we proposed a learning from demonstration approach considering different types of coordinate systems. The goal is to reduce the number of demonstrations required by a robot to acquire manipulation skills. The data distribution is extracted using a model of Gaussian distributions on Riemannian manifolds. We showed how standard optimal control formulation could be easily extended to other manifolds, by learning the structure of the cost and the precision matrices used in the cost
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the China Scholarship Council, China (CSC, No. 202006120159) funded by the Major Research Plan of the National Natural Science Foundation of China (No. 92048301), and by the SWITCH project (https://switch-project.github.io/) funded by the Swiss National Science Foundation .
Boyang Ti received the B.S. degree in mechanical engineering from the Dalian University of Technology (DUT), Dalian, China, in 2017. From 2021 to 2022, he was an intern at Robot Learning & Interaction group of Idiap Research Institute. He is currently pursuing the Ph.D. degree with the State Key Laboratory of Robotics and System, Harbin Institute of Technology (HIT). His research interests include human–robot collaboration, learning from demonstration, robotic skill learning, and optimal
References (56)
- et al.
Motor planning explains human behaviour in tasks with multiple solutions
Robot. Auton. Syst.
(2013) - et al.
Learning compliant robotic movements based on biomimetic motor adaptation
Robot. Auton. Syst.
(2021) - et al.
Generalization of orientation trajectories and force–torque profiles for learning human assembly skill
Robot. Comput.-Integr. Manuf.
(2022) - et al.
Generalizing demonstrated motion trajectories using coordinate-free shape descriptors
Robot. Auton. Syst.
(2019) - et al.
Fuzzy gaussian mixture models
Pattern Recognit.
(2012) - et al.
The explicit linear quadratic regulator for constrained systems
Automatica
(2002) - et al.
Automatic generation and detection of highly reliable fiducial markers under occlusion
Pattern Recognit.
(2014) - et al.
A minimal intervention principle for coordinated movement
- et al.
The uncontrolled manifold concept: identifying control variables for a functional task
Exp. Brain Res.
(1999) - et al.
Principles of sensorimotor learning
Nat. Rev.
(2011)
Coordinate dependence of variability analysis
PLoS Comput. Biol.
Synergies: Atoms of brain and behavior
Geometric and Numerical Foundations of Movements
Movement timing and invariance arise from several geometries
PLoS Comput. Biol.
Compliance and force control for computer controlled manipulators
IEEE Trans. Syst. Man Cybernet.
A survey of simple geometric primitives detection methods for captured 3D data
Fit4CAD: A point cloud benchmark for fitting simple geometric primitives in CAD objects
Comput. Graph.
Supervised fitting of geometric primitives to 3d point clouds
Introducing geometric constraint expressions into robot constrained motion specification and control
IEEE Robot. Autom. Lett.
Learning from humans
Learning from demonstration (programming by demonstration)
C-LEARN: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy
Inferring geometric constraints in human demonstrations
Optimization-based hierarchical motion planning for autonomous racing
Optimal path planning and speed control integration strategy for ugvs in static and dynamic environments
IEEE Trans. Veh. Technol.
Motor skills learning and generalization with adapted curvilinear gaussian mixture model
J. Intell. Robot. Syst.
Cited by (2)
Boyang Ti received the B.S. degree in mechanical engineering from the Dalian University of Technology (DUT), Dalian, China, in 2017. From 2021 to 2022, he was an intern at Robot Learning & Interaction group of Idiap Research Institute. He is currently pursuing the Ph.D. degree with the State Key Laboratory of Robotics and System, Harbin Institute of Technology (HIT). His research interests include human–robot collaboration, learning from demonstration, robotic skill learning, and optimal control. Website: https://tflqw.github.io.
Amirreza Razmjoo is a research assistant at the Idiap Research Institute, and a Ph.D. Student at Ecole Polytechnique Fédérale de Lausanne (EPFL). He received his master’s from Sharif university of Technology and his bachelor’s from the university of Tehran. His research is focused on physical human–robot interaction, learning from demonstration, and optimal control.
Dr Yongsheng Gao received the B.Sc., M.Sc., and Ph.D. degrees from the State Key Laboratory of Robotics Institute, Harbin Institute of Technology (HIT), Harbin, China, in 1994, 2001, and 2007, respectively, where he is currently an Associate Professor. His research interests include pathological tremor suppress, tele-operation robot, and biomedical signal processing.
Dr Jie Zhao received the B.S. and Ph.D. degrees in mechatronics engineering from the Harbin Institute of Technology (HIT), Harbin, China, in 1990 and 1996, respectively. He is currently a Professor with the School of Mechatronics Engineering, HIT, where he is also the Director of the State Key Laboratory of Robotics and Systems. He is the Leader of the Subject Matter Expert Group of Intelligent Robots in the National 863 Program supervised by the Ministry of Science and Technology of China. His research interests include industrial robots and bionic robots.
Dr Sylvain Calinon is a Senior Research Scientist at the Idiap Research Institute, heading the Robot Learning & Interaction group. He is also a Lecturer at the Ecole Polytechnique Fédérale de Lausanne (EPFL). From 2009 to 2014, he was a Team Leader at the Italian Institute of Technology. From 2007 to 2009, he was a Postdoc in the Learning Algorithms and Systems Laboratory, EPFL, where he obtained his PhD in 2007. His research interests cover robot learning, human–robot collaboration, optimal control and model-based optimization. Website: https://calinon.ch.