Abstract
The integration of the visual and auditory modalities during human speech perception is the default mode of speech processing. That is, visual speech perception is not a capacity that is “piggybacked” on to auditory-only speech perception. Visual information from the mouth and other parts of the face is used by all perceivers to enhance auditory speech. This integration is ubiquitous and automatic and is similar across all individuals across all cultures. The two modalities seem to be integrated even at the earliest stages of human cognitive development. If multisensory speech is the default mode of perception, then this should be reflected in the evolution of vocal communication. The purpose of this review is to describe the data that reveal that human speech is not uniquely multisensory. In fact, the default mode of communication is multisensory in nonhuman primates as well but perhaps emerging with a different developmental trajectory. Speech production, however, exhibits a unique bimodal rhythmic structure in that both the acoustic output and the movements of the mouth are rhythmic and tightly correlated. This structure is absent in most monkey vocalizations. One hypothesis is that the bimodal speech rhythm may have evolved through the rhythmic facial expressions of ancestral primates, as indicated by mounting comparative evidence focusing on the lip-smacking gesture.
Similar content being viewed by others
References
Adachi I, Hampton RR (2011) Rhesus monkeys see who they hear: spontaneous crossmodal memory for familiar conspecifics. PLoS One 6:e23345
Adachi I, Kuwahata H, Fujita K, Tomonaga M, Matsuzawa T (2006) Japanese macaques form a cross-modal representation of their own species in their first year of life. Primates 47:350–354
Adachi I, Kuwahata H, Fujita K, Tomonaga M, Matsuzawa T (2009) Plasticity of the ability to form cross-modal representations in infant Japanese macaques. Dev Sci 12:446–452
Batterson VG, Rose SA, Yonas A, Grant KS, Sackett GP (2008) The effect of experience on the development of tactual–visual transfer in pigtailed macaque monkeys. Dev Psychobiol 50:88–96
Chandrasekaran C, Lemus L, Trubanova A, Gondan M, Ghazanfar AA (2011) Monkeys and humans share a common computation for face/voice integration. PLoS Comput Biol 7:e1002165
Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar AA (2009) The natural statistics of audiovisual speech. PLoS Comput Biol 5:e1000436
Crystal T, House A (1982) Segmental durations in connected speech signals: preliminary results. J Acoust Soc Am 72:705–716
De Marco A, Visalberghi E (2007) Facial displays in young tufted capuchin monkeys (Cebus apella): appearance, meaning, context and target. Folia Primatol 78:118–137
Dobson SD (2009) Socioecological correlates of facial mobility in nonhuman anthropoids. Am J Phys Anthropol 138:413–420
Dolata JK, Davis BL, MacNeilage PF (2008) Characteristics of the rhythmic organization of vocal babbling: implications for an amodal linguistic rhythm. Inf Behav Dev 31:422–431
Drullman R, Festen JM, Plomp R (1994) Effect of reducing slow temporal modulations on speech reception. J Acoust Soc Am 95:2670–2680
Elliot TM, Theunissen FE (2009) The modulation transfer function for speech intelligibility. PLoS Comput Biol 5:e1000302
Evans TA, Howell S, Westergaard GC (2005) Auditory–visual cross-modal perception of communicative stimuli in tufted capuchin monkeys (Cebus apella). J Exp Psychol-Anim Behav Process 31:399–406
Ferrari P, Visalberghi E, Paukner A, Fogassi L, Ruggiero A, Suomi S (2006) Neonatal imitation in rhesus macaques. PLoS Biol 4:1501
Ferrari PF, Paukner A, Ionica C, Suomi S (2009) Reciprical face-to-face communication between rhesus macaque mothers and their newborn infants. Curr Biol 19:1768–1772
Fitch WT (1997) Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. J Acoust Soc Am 102:1213–1222
Ghazanfar AA, Chandrasekaran C, Morrill RJ (2010) Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: implications for the evolution of audiovisual speech. Eur J Neurosci 31:1807–1817
Ghazanfar AA, Logothetis NK (2003) Facial expressions linked to monkey calls. Nature 423:937–938
Ghazanfar AA, Rendall D (2008) Evolution of human vocal production. Curr Biol 18:R457–R460
Ghazanfar AA, Santos LR (2004) Primate brains in the wild: the sensory bases for social interactions. Nat Rev Neurosci 5:603–616
Ghazanfar AA, Morrill RJ, Kayser C (2013) Monkeys are perceptually tuned to facial expressions that exhibit a “theta”-like speech rhythm. P Natl Acad Sci USA 110:1959–1963.
Ghazanfar AA, Takahashi DY, Mathur N, Fitch WT (2012) Cineradiography of monkey lipsmacking reveals the putative origins of speech dynamics. Curr Biol 22:1176–1182
Ghazanfar AA, Turesson HK, Maier JX, van Dinther R, Patterson RD, Logothetis NK (2007) Vocal tract resonances as indexical cues in rhesus monkeys. Curr Biol 17:425–430
Gibson KR (1991) Myelination and behavioral development: a comparative perspective on questions of neoteny, altriciality and intelligence. In: Gibson KR, Peterson AC (eds) Brain maturation and cognitive development: comparative and cross-cultural perspectives. Aldine de Gruyter, New York, pp 29–63
Greenberg S, Carvey H, Hitchcock L, Chang S (2003) Temporal properties of spontaneous speech—a syllable-centric perspective. J Phon 31:465–485
Gunderson VM (1983) Development of cross-modal recognition in infant pigtail monkeys (Macaca nemestrina). Dev Psychol 19:398–404
Gunderson VM, Rose SA, Grantwebster KS (1990) Cross-modal transfer in high-risk and low-risk infant pigtailed macaque monkeys. Dev Psychol 26:576–581
Gustison ML, le Roux A, Bergman TJ (2012) Derived vocalizations of geladas (Theropithecus gelada) and the evolution of vocal complexity in primates. Philos T Roy Soc B 367:1847–1859
Hauser MD, Evans CS, Marler P (1993) The role of articulation in the production of rhesus-monkey, Macaca mulatta, vocalizations. Anim Behav 45:423–433
Hauser MD, Ybarra MS (1994) The role of lip configuration in monkey vocalizations—experiments using xylocaine as a nerve block. Brain Lang 46:232–244
Hiiemae KM, Palmer JB (2003) Tongue movements in feeding and speech. Crit Rev Oral Biol Med 14:413–429
Hiiemae KM, Palmer JB, Medicis SW, Hegener J, Jackson BS, Lieberman DE (2002) Hyoid and tongue surface movements in speaking and eating. Arch Oral Biol 47:11–27
Hinde RA, Rowell TE (1962) Communication by posture and facial expressions in the rhesus monkey (Macaca mulatta). Proc Zool Soc Lond 138:1–21
Izumi A, Kojima S (2004) Matching vocalizations to vocalizing faces in a chimpanzee (Pan troglodytes). Anim Cogn 7:179–184
Jordan KE, Brannon EM, Logothetis NK, Ghazanfar AA (2005) Monkeys match the number of voices they hear with the number of faces they see. Curr Biol 15:1034–1038
Kaas JH (1991) Plasticity of sensory and motor maps in adult animals. Annu Rev Neurosci 5:137–167
Lewkowicz DJ, Ghazanfar AA (2006) The decline of cross-species intersensory perception in human infants. P Natl Acad Sci USA 103:6771–6774
Lewkowicz DJ, Ghazanfar AA (2009) The emergence of multisensory systems through perceptual narrowing. Trends Cogn Sci 13:470–478
Locke JL (2008) Lipsmacking and babbling: syllables, sociality, and survival. In: Davis BL, Zajdo K (eds) The syllable in speech production. Erlbaum, New York, pp 111–129
MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21:499–511
Malecot A, Johonson R, Kizziar P-A (1972) Syllable rate and utterance length in French. Phonetica 26:235–251
Malkova L, Heuer E, Saunders RC (2006) Longitudinal magnetic resonance imaging study of rhesus monkey brain development. Eur J Neurosci 24:3204–3212
Matsuo K, Palmer JB (2010) Kinematic linkage of the tongue, jaw, and hyoid during eating and speech. Arch Oral Biol 55:325–331
McComb K, Semple S (2005) Coevolution of vocal communication and sociality in primates. Biol Lett 1:381–385
Moore CA, Ruark JL (1996) Does speech emerge from earlier appearing motor behaviors? J Speech Hear Res 39:1034–1047
Moore CA, Smith A, Ringel RL (1988) Task specific organization of activity in human jaw muscles. J Speech Hear Res 31:670–680
Morrill RJ, Paukner A, Ferrari PF, Ghazanfar AA (2012) Monkey lip-smacking develops like the human speech rhythm. Dev Sci 15:557–568
Ostry DJ, Munhall KG (1994) Control of jaw orientation and position in mastication and speech. J Neurophysiol 71:1528–1545
Parr LA (2004) Perceptual biases for multimodal cues in chimpanzee (Pan troglodytes) affect recognition. Anim Cogn 7:171–178
Parr LA, Cohen M, de Waal F (2005) Influence of social context on the use of blended and graded facial displays in chimpanzees. Int J Primatol 26:73–103
Redican WK (1975) Facial expressions in nonhuman primates. In: Rosenblum LA (ed) Primate behavior: developments in field and laboratory research. Academic, New York, pp 103–194
Saberi K, Perrott DR (1999) Cognitive restoration of reversed speech. Nature 398:760–760
Sacher GA, Staffeldt EF (1974) Relation of gestation time to brain weight for placental mammals: implications for the theory of vertebrate growth. Am Nat 108:593–615
Seyfarth RM, Cheney DL (1986) Vocal development in vervet monkeys. Anim Behav 34:1640–1658
Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:303–304
Shepherd SV, Lanzilotto M, Ghazanfar AA (2012) Facial muscle coordination during rhythmic facial expression and ingestive movement. J Neurosci 32:6105–6116
Sliwa J, Duhamel JR, Pascalis O, Wirth S (2011) Spontaneous voice–face identity matching by rhesus monkeys for familiar conspecifics and humans. P Natl Acad Sci USA 108:1735–1740
Smith A, Zelaznik HN (2004) Development of functional synergies for speech motor coordination in childhood and adolescence. Dev Psychobiol 45:22–33
Smith ZM, Delgutte B, Oxenham AJ (2002) Chimaeric sounds reveal dichotomies in auditory perception. Nature 416:87–90
Steeve RW (2010) Babbling and chewing: jaw kinematics from 8 to 22 months. J Phon 38:445–458
Steeve RW, Moore CA, Green JR, Reilly KJ, McMurtrey JR (2008) Babbling, chewing, and sucking: oromandibular coordination at 9 months. J Speech Lang Hear Res 51:1390–1404
Sugita Y (2008) Face perception in monkeys reared with no exposure to faces. P Natl Acad Sci USA 105:394–398
Turkewitz G, Kenny PA (1982) Limitations on input as a basis for neural organization and perceptual development: a preliminary theoretical statement. Dev Psychobiol 15:357–368
Van Hooff JARAM (1962) Facial expressions of higher primates. Symp Zool Soc Lond 8:97–125
Vitkovitch M, Barber P (1996) Visible speech as a function of image quality: effects of display parameters on lipreading ability. App Cogn Psychol 10:121–140
Yehia H, Rubin P, Vatikiotis-Bateson E (1998) Quantitative association of vocal-tract and facial behavior. Speech Comm 26:23–43
Yehia HC, Kuratate T, Vatikiotis-Bateson E (2002) Linking facial animation, head motion and speech acoustics. J Phon 30:555–568
Zangenehpour S, Ghazanfar AA, Lewkowicz DJ, Zatorre RJ (2009) Heterochrony and cross-species intersensory matching by infant vervet monkeys. PLoS One 4:e4302
Acknowledgments
The authors gratefully acknowledge the scientific contributions and numerous discussions with the following people: Adrian Bartlett, Chand Chandrasekaran, Ipek Kulahci, Darshana Narayanan, Stephen Shepherd, Daniel Takahashi, and Hjalmar Turesson. This work was supported by NIH R01NS054898 and the James S. McDonnell Scholar Award.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by J. Higham
This manuscript is part of the special issue “Multimodal Communication”. Guest Editors: James P. Higham and Eileen A. Hebets
Rights and permissions
About this article
Cite this article
Ghazanfar, A.A. Multisensory vocal communication in primates and the evolution of rhythmic speech. Behav Ecol Sociobiol 67, 1441–1448 (2013). https://doi.org/10.1007/s00265-013-1491-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00265-013-1491-z