Elsevier

Neural Networks

Volume 22, Issue 2, March 2009, Pages 144-154
Neural Networks

2009 Special Issue
Cross-modal and scale-free action representations through enaction

https://doi.org/10.1016/j.neunet.2009.01.007Get rights and content

Abstract

Embodied action representation and action understanding are the first steps to understand what it means to communicate. We present a biologically plausible mechanism to the representation and the recognition of actions in a neural network with spiking neurons based on the learning mechanism of spike-timing-dependent plasticity (STDP). We show how grasping is represented through the multi-modal integration between the vision and tactile maps across multiple temporal scales. The network evolves into a small-world organization with scale-free dynamics promoting efficient inter-modal binding of the neural assemblies with accurate timing. Finally, it acquires the qualitative properties of the mirror neuron system to trigger an observed action performed by someone else.

Introduction

Before articulating the first words, the development of social cognition starts with non-verbal communication and the understanding of actions performed by others. Perception of movements, gestures and actions of someone else can help us understand (or guess) about his intentions, his desires, and his emotions.

These capacities of non-verbal communication are argued to be formed from the existence of pragmatic representations, generally implicit arose from the intertwining between perception and action within the brain (Hiraki, 2006, Rizzolatti et al., 1996), they constitute the body schema that activate automatically the motoric representations in the prefrontal and frontal area. It follows that, observing someone else acting, recognizing it, and understanding it may result then from a direct pairing between the visually observed action and our own motoric representation of it. Differently said, the observer mentally “simulates” the action from his own experience of it (Gallese, 2005), leading then to a “resonance entrainment” in his motor system (Rizzolatti and Craighero, 2004, Rizzolatti et al., 1996, Rizzolatti et al., 2001). This phenomenon, termed mirror neurons–located in the F5 area in the pre-motor cortex (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996)–describes the neurons’ response to action-related visual stimuli, such as graspable object or action of other individuals.

Of particular importance, mirror neurons show temporal congruence between visual and motor neurons (Oztop, Kawato, & Arbib, 2006): mirror neurons fire with accurate timing to both observed and to hidden end-state actions. Visual representations of an observed action are therefore temporally linked to our own motor representations of the same action, a product of associative learning in line with the generalist theories of imitation (Brass and Heyes, 2005, Heyes, 2001). According to them, what facilitates imitation is due to the general organization of motor control rather than a special purpose mechanism dedicated to imitation. Mirror neurons are thus not innate systems, but rather acquired from learned perceptual-motor links. Other evidences from developmental psychology tend to confirm that timing between sensory and motor representation is crucial for babies in order to acquire the significance of one action. For instance, infants identify soon the timing correlations and the sequential order of events; e.g., synchrony and contingency (Prince & Hollich, 2005). Moreover, in interceptive actions such as reaching and grasping, synchrony detection between different sensory and/or motor channels is particularly important for detecting the right timing for contact or that of preparatory actions (Corbetta et al., 2000, Prince and Hollich, 2005). More complex cognitive abilities–e.g., imitation, self-agency and social interaction–may be developed from these newly acquired affordances (Heyes, 2004, Meltzoff and Moore, 1977, Nadel et al., 2005, Rochat, 2003, Zukow-Goldring, 2005). Taken together, these considerations suggest that exploiting the mechanism(s) regulating timing at the neural level can reveal some of the principle(s) behind action representation, cognitive development and social interaction.

At the neural level, the regulation mechanism responsible for the timing delays between the spikes is the one of spike-timing-dependent plasticity termed STDP (cf. Bi and Poo (1998) and also Abbott and Nelson (2000)). Temporal structure of complex actions, for instance, are decomposed with millisecond order precision into ordered sequences of neural rules in canonical motor neurons and in the mirror neuron system (Changeux and DeBevoise, 2004, Lestou et al., 2008, Rizzolatti et al., 1996). Precisely, information processing in large networks of spiking neurons is performed both in the temporal domain (i.e., time delay between the spikes) and in the spatial domain (i.e., spatial location of the neurons). It is therefore the coherency of the local dynamics among the neural pairs that will (or will not) produce a coherency at the network scale—we mean a functional integration among the different parallel processes in the maps into a dynamical representation of the body in action.

Our main objective is to understand how such global integration in the neural dynamics is produced during physical interactions. How functional connectivity in the network permits the representation of one action from the differentiated processes done in the sensor and the motor maps having a structured multi-modal activity of the neural code. In this paper, we demonstrate how actions are represented at the neural level as accurate spatio-temporal clusters sparsely encoded over distant neural maps ruled by the learning mechanism of STDP. We set up an experiment of grasping, in which the temporal sequence of the action is acquired (or “represented”) through the neural interaction between vision and tactile modalities. This functional integration–termed “vertical association” by Brass and Heyes (2005)–between the sensory and motor maps produces the emergent structure of reentrant or mirroring maps, a result of their entanglement due to embodiment. Interestingly, reentry achieves the cross-modal linkage between the tactile and vision maps making the neural system earn the capabilities of associative memory. For instance, inter-modal activation (capacity to trigger one modality from another) and anticipation (capacity to anticipate the next state of the other modality) combining the feature of a coupled forward and inverse model, predicting the sensory consequences of a motor command and transforming a desired sensory state into a motor command that can achieve it (Oztop et al., 2006). Since the network produces inter-modal associations, information may be retrieved back from the activation of another modality. It follows that the observation of one action (i.e., visual information available only) will induce the simulation of the missing modality (i.e., haptic perception). The qualitative property observed in the mirror neuron system.

In the first section, we present the framework employed to design our neural network. Thereinafter, we study how the network acquires appropriate perception–action matching from repeated experiences of seeing and touching permitting to reproduce the qualitative properties of canonical and mirror neurons: firing to executed actions and to observed actions. We then discuss the relevance of our findings to cross-modal binding and to functional integration in the brain. We advance that the neural organization of the mirror neuron system is mediated by the regulatory mechanism of STDP for action representation and action understanding using the same pathways.

Section snippets

Framework

In comparison with classical feed-forward neural networks, information processing in recurrent networks of spiking neurons is not based on the statistical modeling of the available data but rather on the parallel processing of the neurons combined in a self-organized fashion (i.e., assembling the relative spatio-temporal coordinations). We define, in this part, the network architecture, the neuron model used in our experiments and the reinforcement mechanism of spike-timing-dependent plasticity

Experiments of eye–hand coordination and grasping

We reproduce the experimental series conducted by Rizzolatti et al. (1996) illustrating the qualitative aspects of mirror neurons and of canonical neurons: inter-modal binding, action representation and action understanding with temporal constraint. These neurons combine visuo-motor properties to represent one action sequence and to fire at precise timing. In our experiments, we investigate the conditions for such situation to arise in a network of spiking neurons that would lead from the

Discussion

We present a biologically plausible mechanism to the representation of one action in a multi-modal neural network based on the learning mechanism of STDP. Temporal structure of complex actions (e.g., grasping) are decomposed with millisecond order precision into ordered sequence of neural rules. The assembling of these very many small scripts from contingent visuo-tactile inputs produce at the body level coherent clusters of hundreds of milliseconds order range expanded in the whole network.

Acknowledgment

The authors would like to acknowledge the Asada ERATO Synergistic project which provided the grant for this research.

References (53)

  • J. Tani et al.

    Self-organization of distributedly represented multiple behavior schemata in a mirror system: Reviews of robot experiments using rnnpb

    Neural Networks

    (2004)
  • D.M. Wolpert et al.

    Perspectives and problems in motor learning

    Trends in Cognitive sciences

    (2001)
  • L. Abbott et al.

    Synaptic plasticity: Taming the beast

    Nature Neuroscience

    (2000)
  • Alirezaei, H., Nagakubo, A., & Kuniyoshi, Y. (2007a). A deformable and deformation sensitive tactile distribution...
  • Alirezaei, H., Nagakubo, A., & Kuniyoshi, Y. (2007b). A highly stretchable tactile distribution sensor for smooth...
  • T. Aoki et al.

    Synchrony-induced switching behavior of spike pattern attractors created by spike-timing-dependent plasticity

    Neural Computation

    (2007)
  • L. Barsalou

    Grounded cognition

    Annual Review of Psychology

    (2008)
  • A. Berthoz

    Le Sens du Mouvement

    (1997)
  • G. Bi et al.

    Activity-induced synaptic modifications in hippocampal culture, dependence of spike timing, synaptic strength and cell type

    Journal of Neuroscience

    (1998)
  • G. Buzsaki

    Rhythms of the brain

    (2006)
  • J. Changeux et al.

    The physiology of truth : Neuroscience and human knowledge

    (2004)
  • G.M. Edelman

    Neural darwinism: The theory of neuronal group selection

    (1987)
  • G.M. Edelman et al.

    A universe of consciousness (Consciousness)

    (2000)
  • T. Falck-Ytter et al.

    Infants predict other people’s action goals

    Nature Neurosciences

    (2006)
  • V. Gallese

    Embodied simulation: From neurons to phenomenal experience

    Phenomenology and the Cognitive Sciences

    (2005)
  • V. Gallese et al.

    Action recognition in the premotor cortex

    Brain

    (1996)
  • View full text