Elsevier

Neural Networks

Volume 72, December 2015, Pages 123-139
Neural Networks

2015 Special Issue
Off-line simulation inspires insight: A neurodynamics approach to efficient robot task learning

https://doi.org/10.1016/j.neunet.2015.09.002Get rights and content

Abstract

There is currently an increasing demand for robots able to acquire the sequential organization of tasks from social learning interactions with ordinary people. Interactive learning-by-demonstration and communication is a promising research topic in current robotics research. However, the efficient acquisition of generalized task representations that allow the robot to adapt to different users and contexts is a major challenge. In this paper, we present a dynamic neural field (DNF) model that is inspired by the hypothesis that the nervous system uses the off-line re-activation of initial memory traces to incrementally incorporate new information into structured knowledge. To achieve this, the model combines fast activation-based learning to robustly represent sequential information from single task demonstrations with slower, weight-based learning during internal simulations to establish longer-term associations between neural populations representing individual subtasks. The efficiency of the learning process is tested in an assembly paradigm in which the humanoid robot ARoS learns to construct a toy vehicle from its parts. User demonstrations with different serial orders together with the correction of initial prediction errors allow the robot to acquire generalized task knowledge about possible serial orders and the longer term dependencies between subgoals in very few social learning interactions. This success is shown in a joint action scenario in which ARoS uses the newly acquired assembly plan to construct the toy together with a human partner.

Introduction

Currently, new generations of robots are built that are supposed to interact closely with ordinary people in their working and living environments. These robots have to master a wide variety of everyday tasks that cannot be completely designed in advance by experts as in traditional industrial applications (Schaal, 2007). A major challenge of current robotics research is thus to endow robots with an adaptive, efficient and user-friendly instruction method that would allow ordinary people to teach the robot new tasks in an open-ended manner. Ideally, naïve users may bring their own learning experiences in social interactions with other people to the robotics domain. Learning by observing and imitating others behaviors and their consequences is a powerful social learning mechanism for human-to-human knowledge transfer (Bandura, 1971). It is attractive for the robotics domain as well since learning by observation significantly speeds up skill acquisition compared to individual discovery in potentially dangerous trial-and-error learning. While a full-blown, human-like social learning capacity for robots still remains a distant goal, major progress has been made over the last decade in various research directions of the programming by demonstration approach (for review papers and collections see e.g., (Billard et al., 2008, Dautenhahn and Nehaniv, 2002)). Most robotics experiments thus far have focused on the level of transferring motor skills for object manipulation from human to robot. Some other research started at the more abstract level of learning serial tasks defined by a sequence of subgoals or events in domains like assembly work (Ikeuchi & Suehiro, 1994), navigation (Nicolescu & Matarić, 2003) or household manipulation (Pardowitz, Knoop, Dillmann, & Zöllner, 2007). Recent developments stress the importance of social learning cues such as verbal feedback and communicative gestures to guide the real-time learning process in an incremental manner (Otero et al., 2008, Thomaz and Breazeal, 2008). However, little attention has been paid thus far on the generality of the acquired task knowledge, and the efficiency of the learning process in terms of the number of demonstrations needed (Pardowitz et al., 2007, Wu and Demiris, 2010). To act as an intelligent and flexible co-worker it is not sufficient that the robot memorizes a demonstrated task, such as, for instance assembling a furniture or laying the dinner table, as a simple linear sequence of events. The robot should be able to adapt the serial order of task execution to the preferences of different users, and at the same time should be able to understand and represent the task structure where the achievement of multiple independent subgoals may enable a final outcome. This in turn requires an ability to connect temporally nonadjacent subtasks. Very importantly, since multiple demonstrations of the same task would be time-consuming and annoying for users, the acquisition of generalized task knowledge in very few demonstrations is crucial for user acceptance.

In this paper we present a neurodynamics approach to robot task learning that takes inspiration from a hypothesis about how the nervous system might efficiently consolidate initial memory traces of sequential events into structured knowledge. Converging lines of evidence from computational theories (e.g., connectionist networks  McClelland, O’reilly, & McNaughton, 1995, for review see  O’Reilly & Norman, 2002) and neurophysiological studies (Euston, Tatsuno, & McNaughton, 2007, for review see Sutherland & McNaughton, 2000) support the notion of two complementary learning systems that allow distributed neural structures to gradually integrate new input patterns without disturbing previously stored information. A fast system is responsible for the rapid storage of newly demonstrated sequential events. The spontaneous re-activation of these memory traces during “off-line states”, in which the neural system is not processing external inputs, facilitates the gradual adjustment of synaptic weights in associative networks of the slow system encoding generalized knowledge of the sequential structure. To model the two complementary learning systems, we apply the theoretical framework of dynamic neural fields (DNFs) that has been proven in the past to provide key processing mechanisms for applications in cognitive modeling (Schöner, 2008) and in cognitive robotics (Erlhagen & Bicho, 2006). Most importantly, DNFs explain the existence of self-sustained activity in neural populations as the result of reciprocal positive feedback between neighboring neurons (Amari, 1977). Persistent activity has been reported in many areas of higher association cortices and is commonly believed to support a multitude of relevant cognitive functions such as working memory, decision making, and the learning of associations between events separated in time (Curtis and Lee, 2010, Miller, 2000). The intrinsically stable dynamics of population activity modeled by DNFs allows us not only to implement the short-term maintenance of task-relevant sequential information but also the active rehearsal of this information during off-line learning periods not constrained by the time course of observed events.

To test the learning model in real-world robotics experiments, we adopt a construction task that we have used in previous work to test a DNF architecture for natural and fluent human–robot interactions (HRI) (Bicho et al., 2011, Bicho et al., 2010). The main goal of the present study is to acquire in a social learning situation with human tutors the knowledge about possible serial orders of executing the assembly steps. This shared task knowledge was predefined by the designer in our earlier HRI studies. To this end, one or more human tutors first show the humanoid robot ARoS the assembly work consisting of a series of assembly steps necessary to construct a toy object from its parts. The tutor then provides immediate verbal feedback about predicted next steps when the robot tries to reproduce the serial order from memory. Demonstrations with different serial orders together with the integration of prediction errors in the associative learning process allow the robot to acquire generalized task knowledge in very few social learning interactions. This success is shown in a joint action scenario in which ARoS uses the newly acquired assembly plan to construct the toy together with a human partner.

The rest of the paper is structured as follows: Section  2 describes the construction task, the learning paradigm and the robotic platform ARoS. Section  3 presents an overview about basic processing principles of the dynamic neural field framework. Section  4 contains the description of the DNF based learning model. Experimental results are presented in Section  5. The paper finishes with a discussion of results and future work in Section  6.

Section snippets

Task description

By observing a human tutor, the robot has to learn the sequential structure of individual assembly steps necessary to construct a toy vehicle from its components (Fig. 1). The vehicle has a round platform with an axle as its base (BA). On each side, a wheel (RW and LW) has to be attached to the axle and subsequently fixed with a nut (RN and LN, respectively). Subsequently, 4 columns (GC, BC, RC, MC) identified by their different colors (green, blue, red and magenta, respectively) have to be

The Dynamic Neural Field framework

Dynamic neural fields (DNFs) represent a theoretical framework for developing cognitive control architectures that is consistent with fundamental principles of cortical information processing in distributed networks of connected neuronal populations (Erlhagen and Bicho, 2006, Schöner, 2008). Task-relevant information is represented by supra-threshold activity patterns (or bumps) of neural populations. These patterns are initially triggered by a transient input from external sources

DNF model of task learning

Fig. 5 presents a schematic view of the model architecture with two interconnected modules. It reflects the idea of two complementary learning systems that has been proposed as a solution for natural and artificial learning systems to avoid the potentially “catastrophic” interference of old and new memories (McClelland et al., 1995). Converging lines of neurophysiological evidence suggest that Prefrontal Cortex–Hippocampus interactions might constitute a neural substrate for this hypothesis.

Results

We report results of three learning experiments in which the robot ARoS observed different tutors executing the assembly sequence with varying temporal orders. In the last experiment, an error occurred and was eventually corrected following additional demonstrations. Each experiment consists of observation, off-line rehearsal and execution phases. During observation, the human has all parts within reach to perform the whole sequence. During execution, objects are distributed on the table, and

Discussion and future work

For decades, the idea has been discussed in animal and human research that off-line periods after learning such as awake resting or sleep promote the gradual incorporation of newly acquired information into long-term memory representations. Off-line improvements without further practice have been reported in a wide variety of perceptual, motor and complex cognitive tasks (Stickgold, 2005). The mental replay of memory traces of recent behavioral experiences has been hypothesized to play an

Acknowledgments

The work was funded by FCT - Fundação para a Ciência e Tecnologia, through the PhD Grants SFRH/BD/48529/ 2008 and SFRH/BD/41179/2007 and Project NETT: Neural Engineering Transformative Technologies, EU-FP7 ITN (nr.289146) and the FCT-Research Center CMAT (PEst-OE/MAT/UI0013/ 2014).

References (48)

  • A.R. Seitz et al.

    A common framework for perceptual learning

    Current opinion in neurobiology

    (2007)
  • G.R. Sutherland et al.

    Memory trace reactivation in hippocampal and neocortical neuronal ensembles

    Current Opinion in Neurobiology

    (2000)
  • A.L. Thomaz et al.

    Teachable robots: Understanding human teaching behavior to build more effective robot learners

    Artificial Intelligence

    (2008)
  • S.-i. Amari

    Dynamic of pattern formation in lateral inhibition type neural fields

    Biological Cybernetics

    (1977)
  • A. Bandura

    Social learning theory

    (1971)
  • E. Bicho et al.

    Integrating verbal and nonverbal communication in a dynamic neural field architecture for human–robot interaction

    Frontiers in Neurorobotics

    (2010)
  • A.G. Billard et al.

    Robot programming by demonstration

  • A. Cleeremans et al.

    Learning the structure of event sequences

    Journal of Experimental Psychology. General

    (1991)
  • S. Coombes et al.

    Exotic dynamics in a firing rate model of neural tissue with threshold accommodation

  • K. Dautenhahn et al.

    Imitation in animals and artifacts

    (2002)
  • W. Erlhagen et al.

    The dynamic neural field approach to cognitive robotics

    Journal of Neural Engineering

    (2006)
  • W. Erlhagen et al.

    Dynamic field theory of movement preparation

    Psychological Review

    (2002)
  • D.R. Euston et al.

    Fast-forward playback of recent memory sequences in prefrontal cortex during sleep

    Science

    (2007)
  • F. Ferreira et al.

    Learning a musical sequence by observation: A robotics implementation of a dynamic neural field model

  • Cited by (15)

    • Brain-inspired multiple-target tracking using Dynamic Neural Fields

      2022, Neural Networks
      Citation Excerpt :

      It has been widely employed in the past to model cognitive functions including visual attention, single object tracking and motion extrapolation, working memory or the learning of object pose and identity (Erlhagen, 2003; Erlhagen & Jancke, 2004; Fix, Rougier, & Alexandre, 2010; Jenkins, Samuelson, Penny, & Spencer, 2021; Lomp, Faubel, & Schöner, 2017). The theory is also used in cognitive robotics for decision making, action understanding and observational task learning (Erlhagen & Bicho, 2006; Sousa, Erlhagen, Ferreira, & Bicho, 2015; Wojtak et al., 2021). Several hardware implementation studies on this theory have used neuromorphic approaches (Martel & Sandamirskaya, 2016) and FPGA (De Vangel, Torres-Huitzil, & Girau, 2015, 2016).

    • Rapid Learning of Complex Sequences with Time Constraints: A Dynamic Neural Field Model

      2021, IEEE Transactions on Cognitive and Developmental Systems
    View all citing articles on Scopus
    View full text