Skip to main content

Advertisement

Log in

Recognizing speech in a novel accent: the motor theory of speech perception reframed

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener’s native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. See, for example, the article http://www.nationmaster.com/encyclopedia/Phonology.

  2. “She” and “her” will stand in for “he or she” and “his or her” respectively, unless the context makes clear which gender is intended.

  3. By contrast, in vision the correspondence problem is the challenge of matching features extracted from the two retinas (or from the one retina at different times) that correspond to the same feature in the external world.

  4. The Brown verbal frequency is the frequency of occurrence in verbal language derived from the London-Lund Corpus of English Conversation by Brown (1984).

References

  • Adda-Decker M (2001) Towards multilingual interoperability in automatic speech recognition. Speech Commun 35(1):5–20

    Article  Google Scholar 

  • Arbib MA (2005) Interweaving protosign and protospeech: further developments beyond the mirror. Interact Stud Soc Behav Commun Biol Artif Syst 6:145–171

    Article  Google Scholar 

  • Arbib MA (2006) Aphasia, apraxia and the evolution of the language-ready brain. Aphasiology 20:1–30

    Google Scholar 

  • Arbib MA (2008) Mirror neurons & language. In: Stemmer B, Whitaker H (eds) Handbook of the neuroscience of language. Elsevier Science, Amsterdam, pp 237–246

    Chapter  Google Scholar 

  • Arbib MA (2010) Mirror system activity for action and language is embedded in the integration of dorsal & ventral pathways. Brain and Language 112:12–24

    Article  PubMed  Google Scholar 

  • Arbib MA (2012) How the brain got language: the mirror system hypothesis. Oxford University Press, New York

    Google Scholar 

  • Arbib MA, Rizzolatti G (1997) Neural expectations: a possible evolutionary path from manual skills to language. Commun Cogn 29:393–424

    Google Scholar 

  • Association IP (1999) The handbook of the international phonetic association. Cambridge University Press, Cambridge

    Google Scholar 

  • Bahl LR, Jelinek F (1975) Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition. IEEE Trans Inf Theory 21(4):404–411

    Article  Google Scholar 

  • Barrett AM, Foundas AL, Heilman KM (2005) Speech and gesture are mediated by independent systems. Behav Brain Sci 28:125–126

    Article  Google Scholar 

  • Basirat A, Sato M, Schwartz J-L, Kahane P, Lachaux J-P (2008) Parieto-frontal gamma band activity during the perceptual emergence of speech forms. NeuroImage 42(1):404–413

    Article  PubMed  Google Scholar 

  • Best C, McRoberts G, Goodell E (2001) Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. J Acoust Soc Am 109(2):775–794

    Article  PubMed  CAS  Google Scholar 

  • Bonaiuto JB, Arbib MA (2010) Extending the mirror neuron system model, II: what did I just do? A new role for mirror neurons. Biol Cybern 102:341–359

    Article  PubMed  Google Scholar 

  • Bonaiuto JB, Rosta E, Arbib MA (2007) Extending the mirror neuron system model, I: audible actions and invisible grasps. Biol Cybern 96:9–38

    Article  PubMed  Google Scholar 

  • Bradlow AR, Bent T (2008) Perceptual adaptation to non-native speech. Cognition 106(2):707

    Article  PubMed  Google Scholar 

  • Brown GD (1984) A frequency count of 190,000 words in the London-Lund Corpus of English conversation. Behav Res Methods 16(6):502–532

    Article  Google Scholar 

  • Buccino G, Lui F, Canessa N, Patteri I, Lagravinese G, Benuzzi F, Porro CA, Rizzolatti G (2004) Neural circuits involved in the recognition of actions performed by nonconspecifics: an FMRI study. J Cogn Neurosci 16(1):114–126

    Article  PubMed  Google Scholar 

  • Eisner F, McQueen JM (2005) The specificity of perceptual learning in speech processing. Atten Percept Psychophys 67(2):224–238

    Article  Google Scholar 

  • Fagg AH, Arbib MA (1998) Modeling parietal-premotor interactions in primate control of grasping. Neural Netw 11(7–8):1277–1303

    Article  PubMed  Google Scholar 

  • Ferrari PF, Gallese V, Rizzolatti G, Fogassi L (2003) Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur J Neurosci 17(8):1703–1714

    Article  PubMed  Google Scholar 

  • Ferrari PF, Rozzi S, Fogassi L (2005) Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex. J Cogn Neurosci 17(2):212–226

    Article  PubMed  Google Scholar 

  • Ferrari PF, Visalberghi E, Paukner A, Fogassi L, Ruggiero A, Suomi SJ (2006) Neonatal imitation in rhesus macaques. PLoS Biol 4(9):e302

    Google Scholar 

  • Francis A, Baldwin K, Nusbaum H (2000) Effects of training on attention to acoustic cues. Percept Psychophys 62(8):1668–1680. doi:10.3758/BF03212164

    Article  PubMed  CAS  Google Scholar 

  • Francis AL, Nusbaum HC (2002) Selective attention and the acquisition of new phonetic categories. J Exp Psychol Hum Percept Perform 28(2):349–366

    Article  PubMed  Google Scholar 

  • Galantucci B, Fowler CA, Turvey MT (2006) The motor theory of speech perception reviewed. Psychon Bull Rev 13(3):361–377

    Article  PubMed  Google Scholar 

  • Gales M, Young S (2007) The application of hidden Markov models in speech recognition. Found Trends in Signal Process 1: 195–304

    Google Scholar 

  • Gallese V, Fogassi L, Fadiga L, Rizzolatti G (2002) Action representation and the inferior parietal lobule. In: Prinz W, Hommel B (eds) Attention & performance XIX. Common mechanisms in perception and action. Oxford University Press, Oxford

    Google Scholar 

  • Goldinger SD (1998) Echoes of echoes? An episodic theory of lexical access. Psychol Rev 105(2):251

    Article  PubMed  CAS  Google Scholar 

  • Goldstein L, Byrd D, Saltzman E (2006) The role of vocal tract gestural action units in understanding the evolution of phonology. In: Arbib MA (ed) From action to language via the mirror system. Cambridge University Press, Cambridge, pp 215–249

    Chapter  Google Scholar 

  • Goldstone RL (1998) Perceptual learning. Annu Rev Psychol 49(1):585–612

    Article  PubMed  CAS  Google Scholar 

  • Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends Neurosci 15:20–25

    Article  PubMed  CAS  Google Scholar 

  • Grossberg S (2003) Resonant neural dynamics of speech perception. J Phon 31(3):423–445

    Article  Google Scholar 

  • Guenther FH, Ghosh SS, Tourville JA (2006) Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang 96(3):280–301

    Article  PubMed  Google Scholar 

  • Hawkins S (2003) Roles and representations of systematic fine phonetic detail in speech understanding. J Phon 31(3):373–405

    Article  Google Scholar 

  • Hickok G (2009) The functional neuroanatomy of language. Phys Life Rev 6:121–143

    Article  PubMed  Google Scholar 

  • Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92(1–2):67–99

    Article  PubMed  Google Scholar 

  • Hickok G, Poeppel D (2009) Motor influence of speech perception: the view from Grenoble. Talking brains news and views on the neural organization of language (Blog moderated by Greg Hickok and David Poeppel) http://talkingbrains.blogspot.com/2009/2004/motor-influence-of-speech-perception.html

  • Hintzman DL (1986) Schema abstraction in a multiple-trace memory model. Psychol Rev 93:411–428

    Article  Google Scholar 

  • Jaynes ET (2003) Probability theory: the logic of science. Cambridge university press, Cambridge

    Book  Google Scholar 

  • Kirchhoff K (1998) Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: Proceedings of ICSLP, Citeseer, pp 891–894

  • Klatt DH (1979) Speech perception: a model of acoustic-phonetic analysis and lexical access. J Phon 7(312):1–26

    Google Scholar 

  • Kohler E, Keysers C, Umilta MA, Fogassi L, Gallese V, Rizzolatti G (2002) Hearing sounds, understanding actions: action representation in mirror neurons. Science 297(5582):846–848

    Article  PubMed  CAS  Google Scholar 

  • Kröger BJ, Kannampuzha J, Neuschaefer-Rube C (2009) Towards a neurocomputational model of speech production and perception. Speech Commun 51(9):793–809

    Article  Google Scholar 

  • Kuhl PK, Miller JD (1975) Speech perception by the chinchilla: voiced-voiceless distinction in alveolar plosive consonants. Science 190:69–72

    Article  PubMed  CAS  Google Scholar 

  • Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 21:1–36

    Article  PubMed  CAS  Google Scholar 

  • Liberman AM, Whalen DH (2000) On the relation of speech to language. Trends Cogn Sci 4(5):187–196

    Article  PubMed  Google Scholar 

  • Lindblom B (1990) Explaining phonetic variation: a sketch of the H &H theory. Speech Prod Speech Model 55:403–439

    Article  Google Scholar 

  • Lotto AJ, Hickok GS, Holt LL (2009) Reflections on mirror neurons and speech perception. Trends Cogn Sci 13(3):110–114

    Article  PubMed  Google Scholar 

  • Lotto AJ, Kluender KR, Holt LL (1997) Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica). J Acoust Soc Am 102(2 Pt 1):1134–1140

    Article  PubMed  CAS  Google Scholar 

  • Luria AR (1973) The working brain. Penguin Books, Harmondsworth

    Google Scholar 

  • MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–546

    PubMed  CAS  Google Scholar 

  • MacNeilage PF, Davis BL (2005) The frame/content theory of evolution of speech: comparison with a gestural origins theory. Interact Stud Soc Behav Commun Biol Artif Syst 6:173–199

    Article  Google Scholar 

  • Massaro DW, Chen TH (2008) The motor theory of speech perception revisited. Psychon Bull Rev 15(2):453–457; discussion 458–462

    Google Scholar 

  • Meltzoff AN, Moore MK (1977) Imitation of facial and manual gestures by human neonates. Science 198:75–78

    Article  PubMed  CAS  Google Scholar 

  • Moineau S, Dronkers NF, Bates E (2005) Exploring the processing continuum of single-word comprehension in aphasia. J Speech Lang Hear Res 48(4):884–896

    Article  PubMed  Google Scholar 

  • Moulin-Frier C, Laurent R, Bessière P, Schwartz J-L, Diard J (2012) Adverse conditions improve distinguishability of auditory, motor and percep-tuo-motor theories of speech perception: an exploratory Bayesian modeling study. Lang Cogn Process 27:1240–1263 (7–8 Special Issue: Speech Recognition in Adverse Conditions) doi:10.1080/01690965.2011.645313

  • Norris D, McQueen JM, Cutler A (2003) Perceptual learning in speech. Cogn Psychol 47(2):204–238

    Article  PubMed  Google Scholar 

  • Oztop E, Arbib MA (2002) Schema design and implementation of the grasp-related mirror neuron system. Biol Cybern 87(2):116–140

    Article  PubMed  Google Scholar 

  • Oztop E, Bradley NS, Arbib MA (2004) Infant grasp learning: a computational model. Exp Brain Res 158(4):480–503

    Article  PubMed  Google Scholar 

  • Pierrehumbert J (2002) Word-specific phonetics. Lab Phonol 7:101–139

    Google Scholar 

  • Pinto J, Szoke I (2008) Fast approximate spoken term detection from sequence of phonemes. The 31st annual international ACM SIGIR conference 20–24 July 2008, Singapore

  • Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Pro IEEE 77(2):257–286

    Article  Google Scholar 

  • Rauschecker JP (1998) Parallel processing in the auditory cortex of primates. Audiol Neurootol 3:86–103

    Article  PubMed  CAS  Google Scholar 

  • Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci 97(22):11800–11806. doi:10.1073/pnas.97.22.11800

    Article  PubMed  CAS  Google Scholar 

  • Rizzolatti G, Arbib M (1998) Language within our grasp. Trends Neurosci 21:188–194

    Article  PubMed  CAS  Google Scholar 

  • Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169–192

    Article  PubMed  CAS  Google Scholar 

  • Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cogn Brain Res 3:131–141

    Article  CAS  Google Scholar 

  • Sato M, Baciu M, Lœvenbruck H, Schwartz JL, Cathiard MA, Segebarth C, Abry C (2004) Multistable representation of speech forms: a functional MRI study of verbal transformations. NeuroImage 23(3):1143–1151

    Article  PubMed  Google Scholar 

  • Schwartz J-L, Boë L-J, Abry C (2007) Linking dispersion-focalization theory and the maximum utilization of the available distinctive features principle in a perception-for-action-control theory. Oxford University Press, Oxford

    Google Scholar 

  • Schwartz J-L, Basirat A, Ménard L, Sato M (2012) The perception-for-action-control theory (PACT): a perceptuo-motor theory of speech perception. J Neurolinguistics 25(5):336–354

    Google Scholar 

  • Skipper JI, Goldin-Meadow S, Nusbaum HC, Small SL (2007) Speech-associated gestures, Broca’s area, and the human mirror system. Brain Lang 101(3):260–277

    Article  PubMed  Google Scholar 

  • Studdert-Kennedy M, Goldstein L (2003) Launching language: the gestural origin of discrete infinity. Stud Evol Lang 3:235–254

    Article  Google Scholar 

  • Umiltà MA, Escola L, Intskirveli I, Grammont F, Rochat M, Caruana F, Jezzini A, Gallese V, Rizzolatti G (2008) When pliers become fingers in the monkey motor system. Proc Natl Acad Sci USA 105(6):2209–2213

    Article  PubMed  Google Scholar 

  • Ungerleider LG, Mishkin M (1982) Two cortical visual systems. In: Ingle DJ, Goodale MA, Mansfield RJW (eds) Analysis of visual behavior. The MIT Press, Cambridge

    Google Scholar 

  • van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102(4):1181–1186

    Article  PubMed  Google Scholar 

  • Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2):260–269

    Article  Google Scholar 

  • Weinberger HS (2010) The speech accent archive. George Mason University http://accent.gmu.edu/index.php

  • Whalen DH, Noiray A, Iskarous K, Bolanos L (2009) Relative contribution of jaw and tongue to the vowel height dimension in American English. J Acoust Soc Am 125(4):2698–2698

    Google Scholar 

  • Wilson M (1988) MRC psycholinguistic database: machine-usable dictionary, version 2.00. Behav Res Methods Instrum Comput 20:6–10

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clément Moulin-Frier.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (doc 70 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moulin-Frier, C., Arbib, M.A. Recognizing speech in a novel accent: the motor theory of speech perception reframed. Biol Cybern 107, 421–447 (2013). https://doi.org/10.1007/s00422-013-0557-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-013-0557-3

Keywords

Navigation