A nonparametric Bayesian approach toward robot learning by demonstration

doi:10.1016/j.robot.2012.02.005

Robotics and Autonomous Systems

Volume 60, Issue 6, June 2012, Pages 789-802

https://doi.org/10.1016/j.robot.2012.02.005 Get rights and content

Abstract

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood- or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios.

Highlights

► A method for learning by demonstration is proposed. ► The method is based on nonparametric Bayesian statistics. ► Our approach improves state-of-the-art GMR-based methods.

Introduction

In the last years, robot learning by demonstration has turned out to be one of the most active research topics in the field of robotics. Robot learning by demonstration encompasses methods by which a robot can learn new skills by simple observation of a human teacher, similar to the way humans learn new skills by imitation [1], [2], [3], [4], [5], [6], [7], [8]. Coming up with successful robot learning by demonstration methodologies can be of great benefit to the robotics community, since it will greatly obviate the need of programming a robot how to perform a task, which can be rather tedious and expensive, while, by making robots more user-friendly, it increases the appeal of applying robots to real-life environments.

Toward this end, robotics researchers have utilized a multitude of methodologies from as diverse research areas as machine learning, computer vision [9], and human–robot interaction [10]. Learning by demonstration algorithms may comprise learning an approximation to the state-action mapping (mapping function), or learning a model of the world dynamics and deriving a policy from this information (system model). Mapping function learning comprises classification-based and regression-based approaches. Classification approaches categorize their input into discrete classes, thus the input to the classifier is the robot state, and the discrete output classes are robot actions. Gaussian Mixture Models (GMMs), decision trees, Bayesian networks, and hidden Markov models are typical methods used to effect the classification task. Regression approaches map demonstration states to continuous action spaces resulting from combining multiple demonstration set actions. As such, typically regression approaches apply to low-level trajectory-based learning by demonstration, and not to high-level behaviors. Finally, the system model approach uses a state transition model of the world, and from this derives a policy, typically by means of reinforcement learning (RL). As such, it usually has the drawback of high computational demands, due to the considerably large dimensionality of the entailed search space of the RL algorithm.

Recently, several researchers have also considered developing libraries of dynamic movement primitives (DMPs) as a way to facilitate generalization of the learned models to new unseen situations [11], [12]. DMPs are sets of differential equations that represent the task’s dynamics. Generalization using DMPs is effected by parameterizing them with new appropriate start and goal positions to generalize to novel situations, with the advantage of good robustness to perturbation. This is typically performed by application of regression methods based on local weighting of training data at execution time.

In this work, we focus on trajectory-based learning by demonstration techniques. The most popular trends of work in this field consist in the investigation of the utility of probabilistic generative models, such as Gaussian mixture regression (GMR) and derivatives, [13] hidden Markov models [14], and Gaussian process regression [2]. GMR, in particular, has been shown to be very successful in encoding demonstrations, extracting their underlying constraints, and reproducing smooth generalized motor trajectories, while imposing considerably low computational costs [15], [1]. GMR-based approaches toward learning by demonstration rely on the postulation of a Gaussian mixture model to encode the covariance relations between different variables (either in the task space, or in the robot joints space). If the correlations vary significantly between regions, then each local region of the state space visited during the demonstrations will need a few Gaussians to encode these local dynamics. Given the required number of Gaussians and a set of training data (human-generated demonstrations), the expectation-maximization (EM) algorithm is eventually employed to estimate the parameters of the model.

The most common data-driven methodologies for GMR model selection, that is determination of the appropriate number of GMR model component densities, are typically based on the popular Bayesian information criterion (BIC) for finite mixture models [16], or other related likelihood-based or entropy-based model size selection criteria [17]. However, such model selection methods suffer from significant drawbacks: To begin with, they entail training of multiple models (to select from), a tedious procedure which can be applied only up to a limited extent, due to its computational demands. Moreover, effectiveness of the BIC criterion is contingent on a number of conditions, which are not necessarily fulfilled in real-life application scenarios [17]; thus, BIC-based approximations are rather prone to yielding noisy model size estimates. Most significantly, likelihood- and entropy-based model selection criteria are notorious for their heavy overfitting proneness, hence often leading to over-estimation of the required model size [18].

Dirichlet process mixture (DPM) models are flexible Bayesian nonparametric models which have become very popular in statistics over the last few years, for performing nonparametric density estimation [19], [20], [21]. Briefly, a realization of a DPM can be seen as an infinite mixture of distributions with given parametric shape (e.g., Gaussian). This theory is based on the observation that an infinite number of component distributions in an ordinary finite mixture model tends on the limit to a Dirichlet process prior [20], [22]. Indeed, although theoretically a DPM model has an infinite number of parameters, it turns out that inference for the model is possible, since only the parameters of a finite number of mixture components need to be represented explicitly; this can be done by means of an elegant and computationally efficient truncated variational Bayesian approximation [23]. Eventually, as a part of the model fitting procedure, the nonparametric Bayesian inference scheme induced by a DPM model yields a posterior distribution on the proper number of model component densities [24], rather than selecting a fixed number of mixture components. Hence, the obtained nonparametric Bayesian formulation eliminates the need of doing inference (or making arbitrary choices) on the number of mixture components necessary to represent the modeled data.

Under this motivation, in this work we introduce a nonparametric Bayesian approach toward Gaussian mixture regression, with application to robot learning by demonstration. Our approach is based on the consideration of a GMR model with a countably infinite number of constituent states, and is effected by utilization of a Dirichlet process (DP) prior distribution; we shall be referring to this new model as the Dirichlet process Gaussian mixture regression (DPGMR) model. Inference for the DPGMR model is conducted using an elegant variational Bayesian algorithm, and is facilitated by means of a stick-breaking construction of the DP prior, which allows for the derivation of a computationally tractable expression of the model variational posteriors. Our novel mixture regression methodology is subsequently applied to yield a nonparametric Bayesian approach toward robot learning by demonstration, the efficacy of which is illustrated by considering a number of demanding robot learning by demonstration scenarios.

The remainder of this paper is organized as follows: In Section 2, Gaussian mixture regression as applied to robot learning by demonstration is introduced in a concise manner. In Section 3, we provide a brief review of concepts from the field of Dirichlet process mixture models, emerging in the cornerstone of nonparametric Bayesian statistics. In Section 4, we derive the proposed nonparametric Bayesian approach toward robot learning by demonstration. In Section 5, the experimental evaluation of the proposed algorithm is performed. The final section concludes this paper.

Section snippets

Gaussian mixture regression for robot learning by demonstration

Let us consider the current position of the moving end-effector of a robot as the predictor variable $β$ of our machine learning algorithm, and the velocity that must be adopted by the robot’s end-effector at the next time-step, in order to comply with the learnt trajectory, as the algorithm’s response variable $\dot{β}$ . GMR postulates a model of the conditional expectation of the set of response variables $\dot{β}$ given the set of predictor variables $β$ , by exploiting the information available in a set of

Dirichlet process mixture models

Dirichlet process models were first introduced by Ferguson [30]. A DP is characterized by a base distribution $G_{0}$ and a positive scalar $α$ , usually referred to as the innovation parameter, and is denoted as $DP (G_{0}, α)$ . Essentially, a DP is a distribution placed over a distribution. Let us suppose we randomly draw a sample distribution $G$ from a DP, and, subsequently, we independently draw $N$ random variables ${Θ_{n}^{*}}_{n = 1}^{N}$ from $G$ : $G | {G_{0}, α} \sim DP (G_{0}, α)$ $Θ_{n}^{*} | G \sim G, n = 1, \dots N .$ Integrating out $G$ , the joint distribution

Proposed approach

Let $y = {y_{n}}_{n = 1}^{N}$ , with $y_{n} = {β_{n}, {\dot{β}}_{n}}$ being the set of predictor variables and response variables the joint distribution of which is represented by means of a postulated GMR model. We want to model this data by means of a nonparametric Bayesian formulation of the GMR model. For this purpose, we postulate a GMR model with a countably infinite number of states. To formulate such a model, we begin by postulating a Gaussian DPM model for the joint distribution of the $β$ and $\dot{β}$ , and we further derive the

Experimental evaluation

In this section, we present our experimental evaluation of the DPGMR algorithm in a series of applications dealing with robot learning by demonstration. More specifically, we compare algorithm performance against well established, state-of-the-art methods in the field of robotics, namely Gaussian mixture regression (GMR) [1], and Gaussian process regression (GPR) [36], [37]. We have considered three application scenarios with potential practical applicability under an one- and a multi-shot

Conclusions

In this paper, we presented a nonparametric Bayesian approach toward trajectory-based robot learning by demonstration. The proposed approach is based on the postulation of a Gaussian mixture regression model comprising a countably infinite number of states, and is facilitated by the imposition of a Dirichlet process prior over the model states. The proposed approach allows for the automatic determination of the proper number of GMR model states, without the need of resorting to model order

Acknowledgment

This work has been partially funded by the EU FP7 ALIZ-E project (grant 248116).

References (41)

B. Argall et al.
A survey of robot learning from demonstration
Robotics and Autonomous Systems
(2009)
A. Skoglund et al.
Programming-by-demonstration of reaching motions–a next-state-planner approach
Robotics and Autonomous Systems
(2010)
A. Billard et al.
Discovering optimal imitation strategies
Robotics and Autonomous Systems
(2004)
A. Billard et al.
Learning human arm movements by imitation: evaluation of a biologically inspired connectionist architecture
Robotics and Autonomous Systems
(2001)
B.D. Argall et al.
A survey of robot learning from demonstration
Robotics and Autonomous Systems
(2009)
Y. Demiris et al.
Hierarchical attentive multiple models for execution and recognition (HAMMER)
Robotics and Autonomous Systems
(2006)
A.G. Billard et al.
Discriminative and adaptive imitation in uni-manual and bi-manual tasks
Robotics and Autonomous Systems
(2006)
A. Billard et al.
Robot programming by demonstration
M. Lopes et al.
A developmental roadmap for learning by imitation in robots
IEEE Transactions in Systems Man and Cybernetic - Part B: Cybernetics
(2007)
P. Pastor, H. Hoffmann, T. Asfour, S. Schaal, Learning and generalization of motor skills by learning from...

M. Lopes et al.

Visual learning by imitation with motor representations

IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics

(2005)

A. Ude et al.

Task-specific generalization of discrete and periodic dynamic movement primitives

IEEE Transactions on Robotics

(2010)

P. Pastor, H. Hoffmann, T. Asfour, S. Schaal, Learning and generalization of motor skills by learning from...

Z. Ghahramani et al.

Supervised learning from incomplete data via an EM approach

Advances in Neural Information Processing Systems

(1994)

D. Lee, Y. Nakamura, Mimesis scheme using a monocular vision system on a humanoid robot, in: Proc. IEEE International...

G. Schwarz

Estimating the dimension of a model

The Annals of Statistics

(1978)

G. McLachlan, D. Peel, Finite Mixture Models, Wiley Series in Probability and Statistics,...

S. Chatzis et al.

Signal modeling and classification using a robust latent space model based on $t$ distributions

IEEE Transactions on Signal Processing

(2008)

S. Walker et al.

Bayesian nonparametric inference for random distributions and related functions

Journal of the Royal Statistical Society B

(1999)

R. Neal

Markov chain sampling methods for Dirichlet process mixture models

Journal of Computational and Graphical Statistics

(2000)

Cited by (28)

Extended Gaussian mixture regression for forward and inverse analysis
2021, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
An expectation–maximization (EM) algorithm [12] is a common method of estimating the parameters of Gaussian mixture models (GMMs) [3] in GMR, or the parameters can be stably estimated by setting a prior distribution for each parameter using the variational Bayesian (VB) method [13]. The GMM parameters obtained with VB have been applied to GMR for robot learning [14], and VB-based GMR has also been applied to regression models for estimating product quality in an industrial plant [15]. Whether GMR is used to predict Y from X (regression or forward analysis) or to predict X from Y (inverse analysis), the predictive ability of GMR is important.
In molecular, material, and process designs, it is important to perform inverse analysis of the regression models constructed with machine learning using target values of the properties and activities. Although many approaches actually employ a pseudo-inverse analysis, Gaussian mixture regression (GMR) can achieve direct inverse analysis. This paper describes the development and use of extended GMR (EGMR), which offers improved predictive ability over conventional GMR. EGMR includes implementations of both GMR and Bayesian GMR, which is based on a variational Bayesian method. The hyperparameters for each model are optimized, and the choice of model for the specific data is determined, through cross-validation. The effectiveness of the proposed EGMR is verified using numerically simulated datasets, compound datasets, a material dataset, and spectral datasets. These datasets contain real data. The predictive ability of EGMR is found to be greater than or equal to that of GMR in all cases, and the prediction errors can be reduced by more than 30%. Furthermore, it is confirmed that EGMR can perform inverse analysis with high reproducibility, even in the extrapolation region of an objective variable. The Python code for EGMR is available at https://github.com/hkaneko1985/dcekit.
Dynamic Type-2 Fuzzy Dependent Dirichlet Regression Mixture clustering model
2017, Applied Soft Computing Journal
Citation Excerpt :
DPRM clustering technique uses Dirichlet process priors to cluster time series and regression data. Other techniques include regression mixture clustering presented by McGEE and Carleton [33] and Quandt [34], Dirichlet process Gaussian mixture regression clustering model [35], and Piecewise Regression Mixture (PWRM) clustering method studied by Chamroukhi [36–39]. PWRM clustering model has been proposed for simultaneous clustering and segmentation of regression data.
In this paper, a new dynamic Interval Type-2 Fuzzy Dependent Dirichlet Piecewise Regression Mixture (IT2FDDPRM) clustering model is proposed. The model overcomes shortcomings of both Dependent Dirichlet Process Mixture (DDPM) technique and Interval Type-2 Fuzzy C-regression Clustering Model (IT2FCRM). DDPM method demonstrates that the probability of assigning data to a cluster including the maximum number of data among all clusters is higher, and it ignores the similarity of data to a cluster. However, the new IT2FDDPRM clustering technique supports assignment of data to a cluster which has the most similarity to them. It also allows the model to generate infinite number of clusters. Moreover, it has the capability of segmenting functions assigned to clusters. The model is validated using statistical tests, three validity functions, and mean square error of the model. The results of numerical experiments show that the proposed method has superior performance to other clustering techniques in literature.
Multimode process data modeling: A Dirichlet process mixture model based Bayesian robust factor analyzer approach
2015, Chemometrics and Intelligent Laboratory Systems
Citation Excerpt :
Despite of the theoretical feasibility, another key problem one should concern for such robust model is how to derive a tractable and efficient inference. Since the DPM based model is composed of an essentially countable infinite number of mixture components [28], one has to resort to some kinds of approximate inference methods. A tractable way is to conduct the probabilistic inference by the reversible jump Markov chain Monte Carlo (MCMC) based sampling approach [29–31].
In this study, a novel Bayesian robust mixture factor analyzer (BRMFA) is proposed to deal with the robust multimode process modeling problem. Traditional factor analyzers with Gaussian assumptions are susceptible to outliers. For this issue, the Student's t mixture model is developed so that outliers can be well explained during the modeling phase. To deal with the model selection problems, two probabilistic determination stages are merged in the Bayesian robust model. Specifically, the truncated stick-breaking represented Dirichlet process mixture (DPM) model is utilized to conduct the mixture components automatic selection, and then the automatic relevance determination (ARD) strategy is included to choose the latent space dimensions. To derive a computational tractable inference, a variational Bayesian (VB) algorithm is developed for parameter estimation. Several case studies are given for demonstrations, results of which show that the new proposed method is more insensitive to outliers during process modeling, compared with traditional methods.
A morphable template framework for robot learning by demonstration: Integrating one-shot and incremental learning approaches
2014, Robotics and Autonomous Systems
Citation Excerpt :
Finally, two real-world applications of the iCub humanoid robot using this framework are presented in Section 6 before Section 7 concludes this paper with the overview of future work. Most LbD work can be categorised into either learning a mapping function to approximate the state-action relationship or learning a system model to represent the world dynamics [13]. The system model approach typically involves reinforcement learning to find a policy from demonstrations for relating its action and the world dynamics.
Robot learning by demonstration is key to bringing robots into daily social environments to interact with and learn from human and other agents. However, teaching a robot to acquire new knowledge is a tedious and repetitive process and often restrictive to a specific setup of the environment. We propose a template-based learning framework for robot learning by demonstration to address both generalisation and adaptability. This novel framework is based upon a one-shot learning model integrated with spectral clustering and an online learning model to learn and adapt actions in similar scenarios. A set of statistical experiments is used to benchmark the framework components and shows that this approach requires no extensive training for generalisation and can adapt to environmental changes flexibly. Two real-world applications of an iCub humanoid robot playing the tic-tac-toe game and soldering a circuit board are used to demonstrate the relative merits of the framework.
Autonomous tactile perception: A combined improved sensing and Bayesian nonparametric approach
2014, Robotics and Autonomous Systems
Citation Excerpt :
to learn switching linear dynamical models with an unknown number of modes for describing complex dynamical phenomena. The important problem of imitation learning has also been tackled from a nonparametric Bayesian perspective by some researchers that used hierarchical Dirichlet processes [39] and infinite Gaussian mixture models [40,41] in particular. The problem of automatic classification of chemical sensor data from autonomous underwater vehicles is a task related to the one we consider in this paper.
In recent years, autonomous robots have increasingly been deployed in unknown environments and required to manipulate or categorize unknown objects. In order to cope with these unfamiliar situations, improvements must be made both in sensing technologies and in the capability to autonomously train perception models. In this paper, we explore this problem in the context of tactile surface identification and categorization. Using a highly-discriminant tactile probe based upon large bandwidth, triple axis accelerometer that is sensitive to surface texture and material properties, we demonstrate that unsupervised learning for surface identification with this tactile probe is feasible. To this end, we derived a Bayesian nonparametric approach based on Pitman–Yor processes to model power-law distributions, an extension of our previous work using Dirichlet processes Dallaire et al. (2011). When tested against a large collection of surfaces and without providing the actual number of surfaces, the tactile probe combined with our proposed approach demonstrated near-perfect recognition in many cases and achieved perfect recognition given the right conditions. We consider that our combined improvements demonstrate the feasibility of effective autonomous tactile perception systems.
Probabilistic Movement Primitives Based Hierarchical Multi-Task Learning Framework
2023, SSRN

View all citing articles on Scopus

Sotirios P. Chatzis received the M. Eng. degree in Electrical and Computer Engineering with distinction from the National Technical University of Athens, in 2005, and the Ph.D. degree in Machine Learning, in 2008, from the same institution. From January 2009 till June 2010 he was a Postdoctoral Fellow with the University of Miami, USA. Currently, he is a post-doctoral researcher with the Department of Electrical and Electronic Engineering, Imperial College London. His major research interests comprise machine learning theory and methodologies with a special focus on hierarchical Bayesian models, reservoir computing, robot learning by demonstration, copulas, quantum statistics, and Bayesian nonparametrics. His Ph.D. research was supported by the Bodossaki Foundation, Greece, and the Greek Ministry for Economic Development, whereas he was awarded the Dean’s scholarship for Ph.D. studies, being the best performing Ph.D. student of his class. In his first five years as a researcher he has first-authored 23 papers in the most prestigious journals of his research field.

Dimitrios Korkinof received the Diploma in Electrical & Computer engineering (M.Sc. equivalent) from the Aristotle University of Thessaloniki, Greece. He graduated in 2010 with excellent academic achievement and 3rd in his class.

He is currently pursuing a Ph.D. at the Department of Electrical & Electronic Engineering of Imperial College London, where he is researching aspects of statistical machine learning theory with applications to robotics.

His current academic interests include Bayesian statistics, variational inference, stochastic processes and nonparametric methods for computer vision, action recognition and other robotics-related applications.

Yiannis Demiris is a senior lecturer of Imperial College London. He has significant expertise in cognitive systems, assistive robotics, multi-robot systems, robot human interaction and learning by demonstration, in particular in action perception and learning. Dr Demiris’ research is funded by the UK’s Engineering and Physical Sciences Research Council (EPSRC), the Royal Society, BAE Systems, and the EU FP7 program through projects ALIZ-E and EFAA, both addressing novel machine learning approaches to human–robot interaction. Additionally the group collaborates with the BBC’s Research and Development Department on the “Learning Human Action Models” project. Dr Yiannis Demiris has guest edited special issues of the IEEE Transactions on SMC-B specifically on Learning by Observation, Demonstration, and Imitation, and of the Adaptive Behavior Journal on Developmental Robotics. He has organized six international workshops on Robot Learning, BioInspired Machine Learning, Epigenetic Robotics, and Imitation in Animals and Artifacts (AISB), was the chair of the IEEE International Conference on Development and Learning (ICDL) for 2007, as well as the program chair of the ACM/IEEE International Conference on Human–Robot Interaction (HRI) 2008. He is a Senior Member of IEEE, and a member of the Institute of Engineering & Technology of Britain (IET).

View full text

A nonparametric Bayesian approach toward robot learning by demonstration

Abstract

Highlights

Introduction

Section snippets

Gaussian mixture regression for robot learning by demonstration

Dirichlet process mixture models

Proposed approach

Experimental evaluation

Conclusions

Acknowledgment

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robotics and Autonomous Systems

Robot programming by demonstration

A developmental roadmap for learning by imitation in robots

IEEE Transactions in Systems Man and Cybernetic - Part B: Cybernetics

Visual learning by imitation with motor representations

IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics

Task-specific generalization of discrete and periodic dynamic movement primitives

IEEE Transactions on Robotics

Supervised learning from incomplete data via an EM approach

Advances in Neural Information Processing Systems

Estimating the dimension of a model

The Annals of Statistics

Signal modeling and classification using a robust latent space model based on t distributions

IEEE Transactions on Signal Processing

Bayesian nonparametric inference for random distributions and related functions

Journal of the Royal Statistical Society B

Markov chain sampling methods for Dirichlet process mixture models

Journal of Computational and Graphical Statistics

Signal modeling and classification using a robust latent space model based on $t$ distributions