Alignment to visual speech information

Miller, Rachel M.; Sanchez, Kauyumari; Rosenblum, Lawrence D.

doi:10.3758/APP.72.6.1614

Alignment to visual speech information

Research Articles
Published: August 2010

Volume 72, pages 1614–1625, (2010)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Alignment to visual speech information

Download PDF

Rachel M. Miller¹,
Kauyumari Sanchez¹ &
Lawrence D. Rosenblum¹

1167 Accesses
32 Citations
11 Altmetric
1 Mention
Explore all metrics

Abstract

Speech alignment is the tendency for interlocutors to unconsciously imitate one another’s speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model’s utterances than did subjects’ nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model’s visual (silent video) utterance and subjects’ audio utterances. The subjects’ shadowed utterances were again judged as more similar to the model’s than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words.

References

Arnold, P., & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology, 92, 339–355.
Article Google Scholar
Calvert, G. A., Bullmore, E., Brammer, M. J., Campbell, R., Iversen, S. D., Woodruff, P., et al. (1997). Silent lipreading activates the auditory cortex. Science, 276, 593–596.
Article PubMed Google Scholar
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Journal of Personality & Social Psychology, 76, 893–910.
Article Google Scholar
Davis, C., & Kim, J. (2001). Repeating and remembering foreign language words: Implications for language teaching systems. Artificial Intelligence Review, 16, 37–47.
Article Google Scholar
Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates the excitability of tongue muscles: A TMS study. European Journal of Neuroscience, 15, 399–402.
Article PubMed Google Scholar
Fowler, C. A. (2004). Speech as a supermodal or amodal phenomenon. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processing (pp. 189–201). Cambridge, MA: MIT Press.
Google Scholar
Fowler, C. A., Brown, J. M., Sabadini, L., & Weihing, J. (2003). Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. Journal of Memory & Language, 49, 396–413.
Article Google Scholar
Gentilucci, M., & Bernardis, P. (2007). Imitation during phoneme production. Neuropsychologia, 45, 608–615.
Article PubMed Google Scholar
Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory: Communication, context, and consequences. In H. Giles, N. Coupland, & J. Coupland (Eds.), Contexts of accommodation: Developments in applied sociolinguistics (pp. 1–68). Cambridge: Cambridge University Press.
Chapter Google Scholar
Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279.
Article PubMed Google Scholar
Goldinger, S. D., & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin, 11, 716–722.
Article Google Scholar
Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. Journal of the Acoustical Society of America, 108, 1197–1208.
Article PubMed Google Scholar
Gregory, S. W. (1990). Analysis of fundamental frequency reveals covariation in interview partners’ speech. Journal of Nonverbal Behavior, 14, 237–251.
Article Google Scholar
Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). “Putting the face to the voice”: Matching identity across modality. Current Biology, 13, 1709–1714.
Article PubMed Google Scholar
Kaufmann, J. M., & Schweinberger, S. R. (2005). Speaker variations influence speechreading speed for dynamic faces. Perception, 34, 595–610.
Article PubMed Google Scholar
Kerzel, D., & Bekkering, H. (2000). Motor activation from visible speech: Evidence from stimulus-response compatibility. Journal of Experimental Psychology: Human Perception & Performance, 26, 634–647.
Article Google Scholar
Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and perception (JPRS Publication 50, 543). Washington, DC: Joint Publications Research Service.
Google Scholar
Kuĉera, H., & Francis, W. (1967). Computational analysis of presentday American English. Providence, RI: Brown University Press.
Google Scholar
Lachs, L., & Pisoni, D. B. (2004a). Crossmodal source identification in speech perception. Ecological Psychology, 16, 159–187.
Article Google Scholar
Lachs, L., & Pisoni, D. B. (2004b). Cross-modal source information and spoken word recognition. Journal of Experimental Psychology: Human Perception & Performance, 30, 378–396.
Article Google Scholar
Lachs, L., & Pisoni, D. B. (2004c). Specification of cross-modal source information in isolated kinematic displays of speech. Journal of the Acoustical Society of America, 116, 507–518.
Article PubMed Google Scholar
Lander, K., & Davies, R. (2008). Does face familiarity influence speechreadability? Quarterly Journal of Experimental Psychology, 61, 961–967.
Article Google Scholar
MacSweeney, M., Amaro, E., Calvert, G. A., Campbell, R., David, A. S., McGuire, P., et al. (2000). Silent speechreading in the absence of scanner noise: An event-related fMRI study. NeuroReport, 11, 1729–1733.
Article PubMed Google Scholar
MacSweeney, M., Calvert, G. A., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C. R., et al. (2002). Speechreading circuits in people born deaf. Neuropsychologia, 40, 801–807.
Article PubMed Google Scholar
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.
Article PubMed Google Scholar
Meltzoff, A. N., & Moore, M. K. (1997). Explaining facial imitation: A theoretical model. Early Development & Parenting, 6, 179–192.
Article Google Scholar
Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–162). Hillsdale, NJ: Erlbaum.
Google Scholar
Nakamura, M., Iwano, K., & Furui, S. (2008). Differences between acoustic characteristics of spontaneous and read speech and their of effects on recognition performance. Computer Speech & Language, 22, 171–184.
Article Google Scholar
Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: The role of perception. Journal of Language & Social Psychology, 21, 422–432.
Article Google Scholar
Natale, M. (1975). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of Personality & Social Psychology, 32, 790–804.
Article Google Scholar
Navarra, J., & Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulatory information enables the perception of L2 sounds. Psychological Research, 71, 4–12.
Article PubMed Google Scholar
Nygaard, L. C. (2005). The integration of linguistic and non-linguistic properties of speech. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 390–414). Malden, MA: Blackwell.
Chapter Google Scholar
Pardo, J. S. (2004). Acoustic-phonetic convergence among interacting talkers. Journal of the Acoustical Society of America, 115, 2608.
Google Scholar
Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119, 2382–2393.
Article PubMed Google Scholar
Pardo, J. S., & Remez, R. E. (2006). The perception of speech. In M. Traxler & M. A. Gernsbacher (Eds.), The handbook of psycholinguistics (2nd ed., pp. 201–248). New York: Academic Press.
Chapter Google Scholar
Porter, R. J., Jr., & Castellanos, F. X. (1980). Speech production measures of speech perception: Rapid shadowing of VCV syllables. Journal of the Acoustical Society of America, 67, 1349–1356.
Article PubMed Google Scholar
Porter, R. J., Jr., & Lubker, J. F. (1980). Rapid reproduction of vowel-vowel sequences: Evidence for a fast and direct acoustic-motoric linkage. Journal of Speech & Hearing Research, 23, 593–602.
Google Scholar
Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.
Google Scholar
Rosenblum, L. D. (2005). The primacy of multimodal speech perception. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 51–78). Malden, MA: Blackwell.
Chapter Google Scholar
Rosenblum, L. D., Miller, R. M., & Sanchez, K. (2007). Lip-read me now, hear me better later: Cross-modal transfer of talker-familiarity effects. Psychological Science, 18, 392–396.
Article PubMed Google Scholar
Rosenblum, L. D., Niehus, R. P., & Smith, N. M. (2007). Look who’s talking: Recognizing friends from visible articulation. Perception, 36, 157–159.
Article PubMed Google Scholar
Rosenblum, L. D., Smith, N. M., Nichols, S. M., Hale, S., & Lee, J. (2006). Hearing a face: Cross-modal speaker matching using isolated visible speech. Perception & Psychophysics, 68, 84–93.
Article Google Scholar
Rosenblum, L. D., Yakel, D. A., Baseer, N., Panchal, A., Nordarse, B. C., & Niehus, R. P. (2002). Visual speech information for face recognition. Perception & Psychophysics, 64, 220–229.
Article Google Scholar
Sancier, M. L., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25, 421–436.
Article Google Scholar
Schweinberger, S. R., & Soukup, G. R. (1998). Asymmetric relationships among perceptions of facial identity, emotion, and facial speech. Journal of Experimental Psychology: Human Perception & Performance, 24, 1748–1765.
Article Google Scholar
Sheffert, S. M., & Fowler, C. A. (1995). The effects of voice and visible speaker change on memory for spoken words. Journal of Memory & Language, 34, 665–685.
Article Google Scholar
Sheffert, S. M., & Olson, E. (2004). Audiovisual speech facilitates voice learning. Perception & Psychophysics, 66, 352–362.
Article Google Scholar
Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception & Psychophysics, 66, 422–429.
Article Google Scholar
Shockley, K., Santana, M. V., & Fowler, C. A. (2003). Mutual interpersonal postural constraints are involved in cooperative conversation. Journal of Experimental Psychology: Human Perception & Performance, 29, 326–332.
Article Google Scholar
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.
Article Google Scholar
Sundara, M., Namasivayam, A. K., & Chen, R. (2001). Observation-execution matching system for speech: A magnetic stimulation study. NeuroReport, 12, 1341–1344.
Article PubMed Google Scholar
Thalheimer, W., & Cook, S. (2002, August). How to calculate effect sizes from published research articles: A simplified methodology. Retrieved November 31, 2002, from http://work-learning.com/ effect_sizes.htm.
Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speechreading. Perception & Psychophysics, 62, 1405–1412.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of California, 900 University Ave., 92521, Riverside, CA
Rachel M. Miller, Kauyumari Sanchez & Lawrence D. Rosenblum

Authors

Rachel M. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Kauyumari Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence D. Rosenblum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lawrence D. Rosenblum.

Additional information

This research was supported by NIDCD Grant 1R01DC008957-01.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miller, R.M., Sanchez, K. & Rosenblum, L.D. Alignment to visual speech information. Attention, Perception, & Psychophysics 72, 1614–1625 (2010). https://doi.org/10.3758/APP.72.6.1614

Download citation

Received: 28 October 2008
Accepted: 04 April 2010
Issue Date: August 2010
DOI: https://doi.org/10.3758/APP.72.6.1614

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Alignment to visual speech information

Abstract

Article PDF

Similar content being viewed by others

Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect

The self-advantage in visual speech processing enhances audiovisual speech recognition in noise

Rethinking the McGurk effect as a perceptual illusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Alignment to visual speech information

Abstract

Article PDF

Similar content being viewed by others

Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect

The self-advantage in visual speech processing enhances audiovisual speech recognition in noise

Rethinking the McGurk effect as a perceptual illusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation