Skip to main content
Log in

Multisensory vocal communication in primates and the evolution of rhythmic speech

  • Review
  • Published:
Behavioral Ecology and Sociobiology Aims and scope Submit manuscript

Abstract

The integration of the visual and auditory modalities during human speech perception is the default mode of speech processing. That is, visual speech perception is not a capacity that is “piggybacked” on to auditory-only speech perception. Visual information from the mouth and other parts of the face is used by all perceivers to enhance auditory speech. This integration is ubiquitous and automatic and is similar across all individuals across all cultures. The two modalities seem to be integrated even at the earliest stages of human cognitive development. If multisensory speech is the default mode of perception, then this should be reflected in the evolution of vocal communication. The purpose of this review is to describe the data that reveal that human speech is not uniquely multisensory. In fact, the default mode of communication is multisensory in nonhuman primates as well but perhaps emerging with a different developmental trajectory. Speech production, however, exhibits a unique bimodal rhythmic structure in that both the acoustic output and the movements of the mouth are rhythmic and tightly correlated. This structure is absent in most monkey vocalizations. One hypothesis is that the bimodal speech rhythm may have evolved through the rhythmic facial expressions of ancestral primates, as indicated by mounting comparative evidence focusing on the lip-smacking gesture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Adachi I, Hampton RR (2011) Rhesus monkeys see who they hear: spontaneous crossmodal memory for familiar conspecifics. PLoS One 6:e23345

    Article  PubMed  CAS  Google Scholar 

  • Adachi I, Kuwahata H, Fujita K, Tomonaga M, Matsuzawa T (2006) Japanese macaques form a cross-modal representation of their own species in their first year of life. Primates 47:350–354

    Article  PubMed  Google Scholar 

  • Adachi I, Kuwahata H, Fujita K, Tomonaga M, Matsuzawa T (2009) Plasticity of the ability to form cross-modal representations in infant Japanese macaques. Dev Sci 12:446–452

    Article  PubMed  Google Scholar 

  • Batterson VG, Rose SA, Yonas A, Grant KS, Sackett GP (2008) The effect of experience on the development of tactual–visual transfer in pigtailed macaque monkeys. Dev Psychobiol 50:88–96

    Article  PubMed  Google Scholar 

  • Chandrasekaran C, Lemus L, Trubanova A, Gondan M, Ghazanfar AA (2011) Monkeys and humans share a common computation for face/voice integration. PLoS Comput Biol 7:e1002165

    Article  PubMed  CAS  Google Scholar 

  • Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar AA (2009) The natural statistics of audiovisual speech. PLoS Comput Biol 5:e1000436

    Article  PubMed  Google Scholar 

  • Crystal T, House A (1982) Segmental durations in connected speech signals: preliminary results. J Acoust Soc Am 72:705–716

    Article  PubMed  CAS  Google Scholar 

  • De Marco A, Visalberghi E (2007) Facial displays in young tufted capuchin monkeys (Cebus apella): appearance, meaning, context and target. Folia Primatol 78:118–137

    Article  PubMed  Google Scholar 

  • Dobson SD (2009) Socioecological correlates of facial mobility in nonhuman anthropoids. Am J Phys Anthropol 138:413–420

    Article  Google Scholar 

  • Dolata JK, Davis BL, MacNeilage PF (2008) Characteristics of the rhythmic organization of vocal babbling: implications for an amodal linguistic rhythm. Inf Behav Dev 31:422–431

    Article  Google Scholar 

  • Drullman R, Festen JM, Plomp R (1994) Effect of reducing slow temporal modulations on speech reception. J Acoust Soc Am 95:2670–2680

    Article  PubMed  CAS  Google Scholar 

  • Elliot TM, Theunissen FE (2009) The modulation transfer function for speech intelligibility. PLoS Comput Biol 5:e1000302

    Article  Google Scholar 

  • Evans TA, Howell S, Westergaard GC (2005) Auditory–visual cross-modal perception of communicative stimuli in tufted capuchin monkeys (Cebus apella). J Exp Psychol-Anim Behav Process 31:399–406

    Article  PubMed  Google Scholar 

  • Ferrari P, Visalberghi E, Paukner A, Fogassi L, Ruggiero A, Suomi S (2006) Neonatal imitation in rhesus macaques. PLoS Biol 4:1501

    Article  CAS  Google Scholar 

  • Ferrari PF, Paukner A, Ionica C, Suomi S (2009) Reciprical face-to-face communication between rhesus macaque mothers and their newborn infants. Curr Biol 19:1768–1772

    Article  PubMed  CAS  Google Scholar 

  • Fitch WT (1997) Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. J Acoust Soc Am 102:1213–1222

    Article  PubMed  CAS  Google Scholar 

  • Ghazanfar AA, Chandrasekaran C, Morrill RJ (2010) Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: implications for the evolution of audiovisual speech. Eur J Neurosci 31:1807–1817

    Article  PubMed  Google Scholar 

  • Ghazanfar AA, Logothetis NK (2003) Facial expressions linked to monkey calls. Nature 423:937–938

    Article  PubMed  CAS  Google Scholar 

  • Ghazanfar AA, Rendall D (2008) Evolution of human vocal production. Curr Biol 18:R457–R460

    Article  PubMed  CAS  Google Scholar 

  • Ghazanfar AA, Santos LR (2004) Primate brains in the wild: the sensory bases for social interactions. Nat Rev Neurosci 5:603–616

    Article  PubMed  CAS  Google Scholar 

  • Ghazanfar AA, Morrill RJ, Kayser C (2013) Monkeys are perceptually tuned to facial expressions that exhibit a “theta”-like speech rhythm. P Natl Acad Sci USA 110:1959–1963.

    Google Scholar 

  • Ghazanfar AA, Takahashi DY, Mathur N, Fitch WT (2012) Cineradiography of monkey lipsmacking reveals the putative origins of speech dynamics. Curr Biol 22:1176–1182

    Article  PubMed  CAS  Google Scholar 

  • Ghazanfar AA, Turesson HK, Maier JX, van Dinther R, Patterson RD, Logothetis NK (2007) Vocal tract resonances as indexical cues in rhesus monkeys. Curr Biol 17:425–430

    Article  PubMed  CAS  Google Scholar 

  • Gibson KR (1991) Myelination and behavioral development: a comparative perspective on questions of neoteny, altriciality and intelligence. In: Gibson KR, Peterson AC (eds) Brain maturation and cognitive development: comparative and cross-cultural perspectives. Aldine de Gruyter, New York, pp 29–63

    Google Scholar 

  • Greenberg S, Carvey H, Hitchcock L, Chang S (2003) Temporal properties of spontaneous speech—a syllable-centric perspective. J Phon 31:465–485

    Article  Google Scholar 

  • Gunderson VM (1983) Development of cross-modal recognition in infant pigtail monkeys (Macaca nemestrina). Dev Psychol 19:398–404

    Article  Google Scholar 

  • Gunderson VM, Rose SA, Grantwebster KS (1990) Cross-modal transfer in high-risk and low-risk infant pigtailed macaque monkeys. Dev Psychol 26:576–581

    Article  Google Scholar 

  • Gustison ML, le Roux A, Bergman TJ (2012) Derived vocalizations of geladas (Theropithecus gelada) and the evolution of vocal complexity in primates. Philos T Roy Soc B 367:1847–1859

    Article  Google Scholar 

  • Hauser MD, Evans CS, Marler P (1993) The role of articulation in the production of rhesus-monkey, Macaca mulatta, vocalizations. Anim Behav 45:423–433

    Article  Google Scholar 

  • Hauser MD, Ybarra MS (1994) The role of lip configuration in monkey vocalizations—experiments using xylocaine as a nerve block. Brain Lang 46:232–244

    Article  PubMed  CAS  Google Scholar 

  • Hiiemae KM, Palmer JB (2003) Tongue movements in feeding and speech. Crit Rev Oral Biol Med 14:413–429

    Article  PubMed  Google Scholar 

  • Hiiemae KM, Palmer JB, Medicis SW, Hegener J, Jackson BS, Lieberman DE (2002) Hyoid and tongue surface movements in speaking and eating. Arch Oral Biol 47:11–27

    Article  PubMed  Google Scholar 

  • Hinde RA, Rowell TE (1962) Communication by posture and facial expressions in the rhesus monkey (Macaca mulatta). Proc Zool Soc Lond 138:1–21

    Google Scholar 

  • Izumi A, Kojima S (2004) Matching vocalizations to vocalizing faces in a chimpanzee (Pan troglodytes). Anim Cogn 7:179–184

    Article  PubMed  Google Scholar 

  • Jordan KE, Brannon EM, Logothetis NK, Ghazanfar AA (2005) Monkeys match the number of voices they hear with the number of faces they see. Curr Biol 15:1034–1038

    Article  PubMed  CAS  Google Scholar 

  • Kaas JH (1991) Plasticity of sensory and motor maps in adult animals. Annu Rev Neurosci 5:137–167

    Article  Google Scholar 

  • Lewkowicz DJ, Ghazanfar AA (2006) The decline of cross-species intersensory perception in human infants. P Natl Acad Sci USA 103:6771–6774

    Article  CAS  Google Scholar 

  • Lewkowicz DJ, Ghazanfar AA (2009) The emergence of multisensory systems through perceptual narrowing. Trends Cogn Sci 13:470–478

    Article  PubMed  Google Scholar 

  • Locke JL (2008) Lipsmacking and babbling: syllables, sociality, and survival. In: Davis BL, Zajdo K (eds) The syllable in speech production. Erlbaum, New York, pp 111–129

    Google Scholar 

  • MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–511

    PubMed  CAS  Google Scholar 

  • Malecot A, Johonson R, Kizziar P-A (1972) Syllable rate and utterance length in French. Phonetica 26:235–251

    Article  PubMed  CAS  Google Scholar 

  • Malkova L, Heuer E, Saunders RC (2006) Longitudinal magnetic resonance imaging study of rhesus monkey brain development. Eur J Neurosci 24:3204–3212

    Article  PubMed  CAS  Google Scholar 

  • Matsuo K, Palmer JB (2010) Kinematic linkage of the tongue, jaw, and hyoid during eating and speech. Arch Oral Biol 55:325–331

    Article  PubMed  Google Scholar 

  • McComb K, Semple S (2005) Coevolution of vocal communication and sociality in primates. Biol Lett 1:381–385

    Article  PubMed  Google Scholar 

  • Moore CA, Ruark JL (1996) Does speech emerge from earlier appearing motor behaviors? J Speech Hear Res 39:1034–1047

    PubMed  CAS  Google Scholar 

  • Moore CA, Smith A, Ringel RL (1988) Task specific organization of activity in human jaw muscles. J Speech Hear Res 31:670–680

    PubMed  CAS  Google Scholar 

  • Morrill RJ, Paukner A, Ferrari PF, Ghazanfar AA (2012) Monkey lip-smacking develops like the human speech rhythm. Dev Sci 15:557–568

    Article  PubMed  Google Scholar 

  • Ostry DJ, Munhall KG (1994) Control of jaw orientation and position in mastication and speech. J Neurophysiol 71:1528–1545

    PubMed  CAS  Google Scholar 

  • Parr LA (2004) Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition. Anim Cogn 7:171–178

    Article  PubMed  Google Scholar 

  • Parr LA, Cohen M, de Waal F (2005) Influence of social context on the use of blended and graded facial displays in chimpanzees. Int J Primatol 26:73–103

    Article  Google Scholar 

  • Redican WK (1975) Facial expressions in nonhuman primates. In: Rosenblum LA (ed) Primate behavior: developments in field and laboratory research. Academic, New York, pp 103–194

    Google Scholar 

  • Saberi K, Perrott DR (1999) Cognitive restoration of reversed speech. Nature 398:760–760

    Article  PubMed  CAS  Google Scholar 

  • Sacher GA, Staffeldt EF (1974) Relation of gestation time to brain weight for placental mammals: implications for the theory of vertebrate growth. Am Nat 108:593–615

    Article  Google Scholar 

  • Seyfarth RM, Cheney DL (1986) Vocal development in vervet monkeys. Anim Behav 34:1640–1658

    Article  Google Scholar 

  • Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:303–304

    Article  PubMed  CAS  Google Scholar 

  • Shepherd SV, Lanzilotto M, Ghazanfar AA (2012) Facial muscle coordination during rhythmic facial expression and ingestive movement. J Neurosci 32:6105–6116

    Article  PubMed  CAS  Google Scholar 

  • Sliwa J, Duhamel JR, Pascalis O, Wirth S (2011) Spontaneous voice–face identity matching by rhesus monkeys for familiar conspecifics and humans. P Natl Acad Sci USA 108:1735–1740

    Article  CAS  Google Scholar 

  • Smith A, Zelaznik HN (2004) Development of functional synergies for speech motor coordination in childhood and adolescence. Dev Psychobiol 45:22–33

    Article  PubMed  Google Scholar 

  • Smith ZM, Delgutte B, Oxenham AJ (2002) Chimaeric sounds reveal dichotomies in auditory perception. Nature 416:87–90

    Article  PubMed  CAS  Google Scholar 

  • Steeve RW (2010) Babbling and chewing: jaw kinematics from 8 to 22 months. J Phon 38:445–458

    Article  PubMed  Google Scholar 

  • Steeve RW, Moore CA, Green JR, Reilly KJ, McMurtrey JR (2008) Babbling, chewing, and sucking: oromandibular coordination at 9 months. J Speech Lang Hear Res 51:1390–1404

    Article  PubMed  Google Scholar 

  • Sugita Y (2008) Face perception in monkeys reared with no exposure to faces. P Natl Acad Sci USA 105:394–398

    Article  CAS  Google Scholar 

  • Turkewitz G, Kenny PA (1982) Limitations on input as a basis for neural organization and perceptual development: a preliminary theoretical statement. Dev Psychobiol 15:357–368

    Article  PubMed  CAS  Google Scholar 

  • Van Hooff JARAM (1962) Facial expressions of higher primates. Symp Zool Soc Lond 8:97–125

    Google Scholar 

  • Vitkovitch M, Barber P (1996) Visible speech as a function of image quality: effects of display parameters on lipreading ability. App Cogn Psychol 10:121–140

    Article  Google Scholar 

  • Yehia H, Rubin P, Vatikiotis-Bateson E (1998) Quantitative association of vocal-tract and facial behavior. Speech Comm 26:23–43

    Article  Google Scholar 

  • Yehia HC, Kuratate T, Vatikiotis-Bateson E (2002) Linking facial animation, head motion and speech acoustics. J Phon 30:555–568

    Article  Google Scholar 

  • Zangenehpour S, Ghazanfar AA, Lewkowicz DJ, Zatorre RJ (2009) Heterochrony and cross-species intersensory matching by infant vervet monkeys. PLoS One 4:e4302

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

The authors gratefully acknowledge the scientific contributions and numerous discussions with the following people: Adrian Bartlett, Chand Chandrasekaran, Ipek Kulahci, Darshana Narayanan, Stephen Shepherd, Daniel Takahashi, and Hjalmar Turesson. This work was supported by NIH R01NS054898 and the James S. McDonnell Scholar Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Asif A. Ghazanfar.

Additional information

Communicated by J. Higham

This manuscript is part of the special issue “Multimodal Communication”. Guest Editors: James P. Higham and Eileen A. Hebets

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghazanfar, A.A. Multisensory vocal communication in primates and the evolution of rhythmic speech. Behav Ecol Sociobiol 67, 1441–1448 (2013). https://doi.org/10.1007/s00265-013-1491-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00265-013-1491-z

Keywords

Navigation