A nonparametric Bayesian approach toward robot learning by demonstration
Highlights
► A method for learning by demonstration is proposed. ► The method is based on nonparametric Bayesian statistics. ► Our approach improves state-of-the-art GMR-based methods.
Introduction
In the last years, robot learning by demonstration has turned out to be one of the most active research topics in the field of robotics. Robot learning by demonstration encompasses methods by which a robot can learn new skills by simple observation of a human teacher, similar to the way humans learn new skills by imitation [1], [2], [3], [4], [5], [6], [7], [8]. Coming up with successful robot learning by demonstration methodologies can be of great benefit to the robotics community, since it will greatly obviate the need of programming a robot how to perform a task, which can be rather tedious and expensive, while, by making robots more user-friendly, it increases the appeal of applying robots to real-life environments.
Toward this end, robotics researchers have utilized a multitude of methodologies from as diverse research areas as machine learning, computer vision [9], and human–robot interaction [10]. Learning by demonstration algorithms may comprise learning an approximation to the state-action mapping (mapping function), or learning a model of the world dynamics and deriving a policy from this information (system model). Mapping function learning comprises classification-based and regression-based approaches. Classification approaches categorize their input into discrete classes, thus the input to the classifier is the robot state, and the discrete output classes are robot actions. Gaussian Mixture Models (GMMs), decision trees, Bayesian networks, and hidden Markov models are typical methods used to effect the classification task. Regression approaches map demonstration states to continuous action spaces resulting from combining multiple demonstration set actions. As such, typically regression approaches apply to low-level trajectory-based learning by demonstration, and not to high-level behaviors. Finally, the system model approach uses a state transition model of the world, and from this derives a policy, typically by means of reinforcement learning (RL). As such, it usually has the drawback of high computational demands, due to the considerably large dimensionality of the entailed search space of the RL algorithm.
Recently, several researchers have also considered developing libraries of dynamic movement primitives (DMPs) as a way to facilitate generalization of the learned models to new unseen situations [11], [12]. DMPs are sets of differential equations that represent the task’s dynamics. Generalization using DMPs is effected by parameterizing them with new appropriate start and goal positions to generalize to novel situations, with the advantage of good robustness to perturbation. This is typically performed by application of regression methods based on local weighting of training data at execution time.
In this work, we focus on trajectory-based learning by demonstration techniques. The most popular trends of work in this field consist in the investigation of the utility of probabilistic generative models, such as Gaussian mixture regression (GMR) and derivatives, [13] hidden Markov models [14], and Gaussian process regression [2]. GMR, in particular, has been shown to be very successful in encoding demonstrations, extracting their underlying constraints, and reproducing smooth generalized motor trajectories, while imposing considerably low computational costs [15], [1]. GMR-based approaches toward learning by demonstration rely on the postulation of a Gaussian mixture model to encode the covariance relations between different variables (either in the task space, or in the robot joints space). If the correlations vary significantly between regions, then each local region of the state space visited during the demonstrations will need a few Gaussians to encode these local dynamics. Given the required number of Gaussians and a set of training data (human-generated demonstrations), the expectation-maximization (EM) algorithm is eventually employed to estimate the parameters of the model.
The most common data-driven methodologies for GMR model selection, that is determination of the appropriate number of GMR model component densities, are typically based on the popular Bayesian information criterion (BIC) for finite mixture models [16], or other related likelihood-based or entropy-based model size selection criteria [17]. However, such model selection methods suffer from significant drawbacks: To begin with, they entail training of multiple models (to select from), a tedious procedure which can be applied only up to a limited extent, due to its computational demands. Moreover, effectiveness of the BIC criterion is contingent on a number of conditions, which are not necessarily fulfilled in real-life application scenarios [17]; thus, BIC-based approximations are rather prone to yielding noisy model size estimates. Most significantly, likelihood- and entropy-based model selection criteria are notorious for their heavy overfitting proneness, hence often leading to over-estimation of the required model size [18].
Dirichlet process mixture (DPM) models are flexible Bayesian nonparametric models which have become very popular in statistics over the last few years, for performing nonparametric density estimation [19], [20], [21]. Briefly, a realization of a DPM can be seen as an infinite mixture of distributions with given parametric shape (e.g., Gaussian). This theory is based on the observation that an infinite number of component distributions in an ordinary finite mixture model tends on the limit to a Dirichlet process prior [20], [22]. Indeed, although theoretically a DPM model has an infinite number of parameters, it turns out that inference for the model is possible, since only the parameters of a finite number of mixture components need to be represented explicitly; this can be done by means of an elegant and computationally efficient truncated variational Bayesian approximation [23]. Eventually, as a part of the model fitting procedure, the nonparametric Bayesian inference scheme induced by a DPM model yields a posterior distribution on the proper number of model component densities [24], rather than selecting a fixed number of mixture components. Hence, the obtained nonparametric Bayesian formulation eliminates the need of doing inference (or making arbitrary choices) on the number of mixture components necessary to represent the modeled data.
Under this motivation, in this work we introduce a nonparametric Bayesian approach toward Gaussian mixture regression, with application to robot learning by demonstration. Our approach is based on the consideration of a GMR model with a countably infinite number of constituent states, and is effected by utilization of a Dirichlet process (DP) prior distribution; we shall be referring to this new model as the Dirichlet process Gaussian mixture regression (DPGMR) model. Inference for the DPGMR model is conducted using an elegant variational Bayesian algorithm, and is facilitated by means of a stick-breaking construction of the DP prior, which allows for the derivation of a computationally tractable expression of the model variational posteriors. Our novel mixture regression methodology is subsequently applied to yield a nonparametric Bayesian approach toward robot learning by demonstration, the efficacy of which is illustrated by considering a number of demanding robot learning by demonstration scenarios.
The remainder of this paper is organized as follows: In Section 2, Gaussian mixture regression as applied to robot learning by demonstration is introduced in a concise manner. In Section 3, we provide a brief review of concepts from the field of Dirichlet process mixture models, emerging in the cornerstone of nonparametric Bayesian statistics. In Section 4, we derive the proposed nonparametric Bayesian approach toward robot learning by demonstration. In Section 5, the experimental evaluation of the proposed algorithm is performed. The final section concludes this paper.
Section snippets
Gaussian mixture regression for robot learning by demonstration
Let us consider the current position of the moving end-effector of a robot as the predictor variable of our machine learning algorithm, and the velocity that must be adopted by the robot’s end-effector at the next time-step, in order to comply with the learnt trajectory, as the algorithm’s response variable . GMR postulates a model of the conditional expectation of the set of response variables given the set of predictor variables , by exploiting the information available in a set of
Dirichlet process mixture models
Dirichlet process models were first introduced by Ferguson [30]. A DP is characterized by a base distribution and a positive scalar , usually referred to as the innovation parameter, and is denoted as . Essentially, a DP is a distribution placed over a distribution. Let us suppose we randomly draw a sample distribution from a DP, and, subsequently, we independently draw random variables from : Integrating out , the joint distribution
Proposed approach
Let , with being the set of predictor variables and response variables the joint distribution of which is represented by means of a postulated GMR model. We want to model this data by means of a nonparametric Bayesian formulation of the GMR model. For this purpose, we postulate a GMR model with a countably infinite number of states. To formulate such a model, we begin by postulating a Gaussian DPM model for the joint distribution of the and , and we further derive the
Experimental evaluation
In this section, we present our experimental evaluation of the DPGMR algorithm in a series of applications dealing with robot learning by demonstration. More specifically, we compare algorithm performance against well established, state-of-the-art methods in the field of robotics, namely Gaussian mixture regression (GMR) [1], and Gaussian process regression (GPR) [36], [37]. We have considered three application scenarios with potential practical applicability under an one- and a multi-shot
Conclusions
In this paper, we presented a nonparametric Bayesian approach toward trajectory-based robot learning by demonstration. The proposed approach is based on the postulation of a Gaussian mixture regression model comprising a countably infinite number of states, and is facilitated by the imposition of a Dirichlet process prior over the model states. The proposed approach allows for the automatic determination of the proper number of GMR model states, without the need of resorting to model order
Acknowledgment
This work has been partially funded by the EU FP7 ALIZ-E project (grant 248116).
Sotirios P. Chatzis received the M. Eng. degree in Electrical and Computer Engineering with distinction from the National Technical University of Athens, in 2005, and the Ph.D. degree in Machine Learning, in 2008, from the same institution. From January 2009 till June 2010 he was a Postdoctoral Fellow with the University of Miami, USA. Currently, he is a post-doctoral researcher with the Department of Electrical and Electronic Engineering, Imperial College London. His major research interests
References (41)
- et al.
A survey of robot learning from demonstration
Robotics and Autonomous Systems
(2009) - et al.
Programming-by-demonstration of reaching motions–a next-state-planner approach
Robotics and Autonomous Systems
(2010) - et al.
Discovering optimal imitation strategies
Robotics and Autonomous Systems
(2004) - et al.
Learning human arm movements by imitation: evaluation of a biologically inspired connectionist architecture
Robotics and Autonomous Systems
(2001) - et al.
A survey of robot learning from demonstration
Robotics and Autonomous Systems
(2009) - et al.
Hierarchical attentive multiple models for execution and recognition (HAMMER)
Robotics and Autonomous Systems
(2006) - et al.
Discriminative and adaptive imitation in uni-manual and bi-manual tasks
Robotics and Autonomous Systems
(2006) - et al.
Robot programming by demonstration
- et al.
A developmental roadmap for learning by imitation in robots
IEEE Transactions in Systems Man and Cybernetic - Part B: Cybernetics
(2007) - P. Pastor, H. Hoffmann, T. Asfour, S. Schaal, Learning and generalization of motor skills by learning from...
Visual learning by imitation with motor representations
IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics
Task-specific generalization of discrete and periodic dynamic movement primitives
IEEE Transactions on Robotics
Supervised learning from incomplete data via an EM approach
Advances in Neural Information Processing Systems
Estimating the dimension of a model
The Annals of Statistics
Signal modeling and classification using a robust latent space model based on distributions
IEEE Transactions on Signal Processing
Bayesian nonparametric inference for random distributions and related functions
Journal of the Royal Statistical Society B
Markov chain sampling methods for Dirichlet process mixture models
Journal of Computational and Graphical Statistics
Cited by (28)
Extended Gaussian mixture regression for forward and inverse analysis
2021, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :An expectation–maximization (EM) algorithm [12] is a common method of estimating the parameters of Gaussian mixture models (GMMs) [3] in GMR, or the parameters can be stably estimated by setting a prior distribution for each parameter using the variational Bayesian (VB) method [13]. The GMM parameters obtained with VB have been applied to GMR for robot learning [14], and VB-based GMR has also been applied to regression models for estimating product quality in an industrial plant [15]. Whether GMR is used to predict Y from X (regression or forward analysis) or to predict X from Y (inverse analysis), the predictive ability of GMR is important.
Dynamic Type-2 Fuzzy Dependent Dirichlet Regression Mixture clustering model
2017, Applied Soft Computing JournalCitation Excerpt :DPRM clustering technique uses Dirichlet process priors to cluster time series and regression data. Other techniques include regression mixture clustering presented by McGEE and Carleton [33] and Quandt [34], Dirichlet process Gaussian mixture regression clustering model [35], and Piecewise Regression Mixture (PWRM) clustering method studied by Chamroukhi [36–39]. PWRM clustering model has been proposed for simultaneous clustering and segmentation of regression data.
Multimode process data modeling: A Dirichlet process mixture model based Bayesian robust factor analyzer approach
2015, Chemometrics and Intelligent Laboratory SystemsCitation Excerpt :Despite of the theoretical feasibility, another key problem one should concern for such robust model is how to derive a tractable and efficient inference. Since the DPM based model is composed of an essentially countable infinite number of mixture components [28], one has to resort to some kinds of approximate inference methods. A tractable way is to conduct the probabilistic inference by the reversible jump Markov chain Monte Carlo (MCMC) based sampling approach [29–31].
A morphable template framework for robot learning by demonstration: Integrating one-shot and incremental learning approaches
2014, Robotics and Autonomous SystemsCitation Excerpt :Finally, two real-world applications of the iCub humanoid robot using this framework are presented in Section 6 before Section 7 concludes this paper with the overview of future work. Most LbD work can be categorised into either learning a mapping function to approximate the state-action relationship or learning a system model to represent the world dynamics [13]. The system model approach typically involves reinforcement learning to find a policy from demonstrations for relating its action and the world dynamics.
Autonomous tactile perception: A combined improved sensing and Bayesian nonparametric approach
2014, Robotics and Autonomous SystemsCitation Excerpt :to learn switching linear dynamical models with an unknown number of modes for describing complex dynamical phenomena. The important problem of imitation learning has also been tackled from a nonparametric Bayesian perspective by some researchers that used hierarchical Dirichlet processes [39] and infinite Gaussian mixture models [40,41] in particular. The problem of automatic classification of chemical sensor data from autonomous underwater vehicles is a task related to the one we consider in this paper.
Sotirios P. Chatzis received the M. Eng. degree in Electrical and Computer Engineering with distinction from the National Technical University of Athens, in 2005, and the Ph.D. degree in Machine Learning, in 2008, from the same institution. From January 2009 till June 2010 he was a Postdoctoral Fellow with the University of Miami, USA. Currently, he is a post-doctoral researcher with the Department of Electrical and Electronic Engineering, Imperial College London. His major research interests comprise machine learning theory and methodologies with a special focus on hierarchical Bayesian models, reservoir computing, robot learning by demonstration, copulas, quantum statistics, and Bayesian nonparametrics. His Ph.D. research was supported by the Bodossaki Foundation, Greece, and the Greek Ministry for Economic Development, whereas he was awarded the Dean’s scholarship for Ph.D. studies, being the best performing Ph.D. student of his class. In his first five years as a researcher he has first-authored 23 papers in the most prestigious journals of his research field.
Dimitrios Korkinof received the Diploma in Electrical & Computer engineering (M.Sc. equivalent) from the Aristotle University of Thessaloniki, Greece. He graduated in 2010 with excellent academic achievement and 3rd in his class.
He is currently pursuing a Ph.D. at the Department of Electrical & Electronic Engineering of Imperial College London, where he is researching aspects of statistical machine learning theory with applications to robotics.
His current academic interests include Bayesian statistics, variational inference, stochastic processes and nonparametric methods for computer vision, action recognition and other robotics-related applications.
Yiannis Demiris is a senior lecturer of Imperial College London. He has significant expertise in cognitive systems, assistive robotics, multi-robot systems, robot human interaction and learning by demonstration, in particular in action perception and learning. Dr Demiris’ research is funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC), the Royal Society, BAE Systems, and the EU FP7 program through projects ALIZ-E and EFAA, both addressing novel machine learning approaches to human–robot interaction. Additionally the group collaborates with the BBC’s Research and Development Department on the “Learning Human Action Models” project. Dr Yiannis Demiris has guest edited special issues of the IEEE Transactions on SMC-B specifically on Learning by Observation, Demonstration, and Imitation, and of the Adaptive Behavior Journal on Developmental Robotics. He has organized six international workshops on Robot Learning, BioInspired Machine Learning, Epigenetic Robotics, and Imitation in Animals and Artifacts (AISB), was the chair of the IEEE International Conference on Development and Learning (ICDL) for 2007, as well as the program chair of the ACM/IEEE International Conference on Human–Robot Interaction (HRI) 2008. He is a Senior Member of IEEE, and a member of the Institute of Engineering & Technology of Britain (IET).