Abstract
In this research, we investigated the effects of voice and face information on the perceptual learning of talkers and on long-term memory for spoken words. In the first phase, listeners were trained over several days to identify voices from words presented auditorily or audiovisually. The training data showed that visual information about speakers enhanced voice learning, revealing cross-modal connections in talker processing akin to those observed in speech processing. In the second phase, the listeners completed an auditory or audiovisual word recognition memory test in which equal numbers of words were spoken by familiar and unfamiliar talkers. The data showed that words presented by familiar talkers were more likely to be retrieved from episodic memory, regardless of modality. Together, these findings provide new information about the representational code underlying familiar talker recognition and the role of stimulus familiarity in episodic word recognition.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Armstrong, H. A., &McKelvie, S. J. (1996). The effect of face context on recognition memory for voices.Journal of Experimental Psychology: General,123, 259–270.
Bassili, J. N. (1979). Emotion recognition: The role of facial movement and the relative importance of upper and lower areas of the face.Journal of Personality & Social Psychology,37, 2049–2058.
Bernstein, L. E., Demorest, M. E., &Tucker, P. E. (2000). Speech perception without hearing.Perception & Psychophysics,62, 233–252.
Bricker, P. D., &Pruzansky, S. (1976). Speaker recognition. In N. J. Lass (Ed.),Contemporary issues in experimental phonetics (pp. 295–326). New York: Academic Press.
Bruce, V. (1988).Recognising faces. Hove, U.K.: Erlbaum.
Bruce, V., &Valentine, T. (1988). When a nod’s as good as a wink: The role of dynamic information in facial recognition. In M. M. Gruneberg, P. E. Morris, & R. N. Sykes (Eds.),Practical aspects of memory: Current research and issues. Vol. 1: Memory in everyday life (pp. 169–174). New York: Wiley.
Christie, F., &Bruce, V. (1998). The role of dynamic information in the recognition of unfamiliar faces.Memory & Cognition,26, 780–790.
Cook, S. A., &Wilding, J. M. (1997). Earwitness testimony: 2. Voices, faces and context.Applied Cognitive Psychology,11, 527–541.
Cook, S. A., &Wilding, J. M. (2001). Earwitness testimony: Effects of exposure and attention on the face overshadowing effect.British Journal of Psychology,92, 617–629.
Craik, F. I. M., &Kirsner, K. (1974). The effect of speaker’s voice on word recognition.Quarterly Journal of Experimental Psychology,26, 274–284.
Ellis, H. D. (1986). Processes underlying face recognition. In R. Bruyer (Ed.),The neuropsychology of face perception and facial expression. Hillsdale, NJ: Erlbaum.
Erber, N. P. (1969). Interaction of audition and vision in the recognition of oral speech stimuli.Journal of Speech & Hearing Research,12, 423–425.
Fellowes, J. M., Remez, R. E., &Rubin, P. E. (1997). Perceiving the sex and identity of a talker without natural vocal timbre.Perception & Psychophysics,59, 839–849.
Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective.Journal of Phonetics14, 3–28.
Fowler, C. A., &Dekle, D. J. (1991). Listening with eye and hand: Cross-modal contributions to speech perception.Journal of Experimental Psychology: Human Perception & Performance,17, 816–828.
Goldinger, S. D. (1996). Words and voices: Implicit and explicit memory for spoken words.Journal of Experimental Psychology: Learning, Memory, & Cognition,22, 1166–1183.
Goldinger, S. D. (1998). Echoes of echoes: An episodic theory of lexical access.Psychological Review,105, 251–277.
Green, K. P., Tomiak, G. R., &Kuhl, P. K. (1997). The encoding of rate and talker information during phonetic perception.Perception & Psychophysics,59, 675–692.
Halle, M. (1985). Speculations about the representation of words in memory. In V. A. Fromkin (Ed.),Phonetic linguistics: Essays in honor of Peter Ladefoged (pp. 101–114). New York: Academic Press.
Lander, K., Christie, F., &Bruce, V. (1999). The role of movement in the recognition of famous faces.Memory & Cognition,27, 974–985.
Lansing, C. R., &McConkie, G. W. (1999). Attention to facial regions in segmental and prosodic visual speech perception tasks.Journal of Speech, Language, & Hearing Research,42, 526–539.
Lansing, C. R., &McConkie, G. W. (2003). Word identification and eye fixation locations in visual and visual-plus-auditory presentations of spoken sentences.Perception & Psychophysics,65, 536–552.
Laver, J., &Trudgill, P. (1979). Phonetic and linguistic markers in speech. In K. R. Scherer & H. Giles (Eds.),Social markers in speech (pp. 1–31). Cambridge: Cambridge University Press.
Legge, G. E., Grosmann, C., &Pieper, C. M. (1984). Learning unfamiliar voices.Journal of Experimental Psychology: Learning, Memory, & Cognition,10, 298–303.
Liberman, A. M. (1982). On finding that speech is special.American Psychologist,37, 148–167.
Liberman, A. M., &Mattingly, I. G. (1985). The motor theory of speech perception revised.Cognition,21, 1–36.
Maddox, W. T., &Estes, W. K. (1997). Direct and indirect stimulusfrequency effects in recognition.Journal of Experimental Psychology: Learning, Memory, & Cognition,23, 539–559.
McGurk, H., &MacDonald, J. W. (1976). Hearing lips and seeing voices.Nature,264, 746–748.
Mullennix, J. W., &Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception.Perception & Psychophysics,47, 379–390.
Munhall, K. G., &Vatikiotis-Bateson, E. (1998). The moving face during speech communication. In R. Campbell, B. Dodd, & D. Burnham (Eds.),Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 123–139). Hove, U.K.: Psychology Press.
Murnane, K., &Phelps, M. P. (1995). Effects of changes in relative cue strength on context-dependent recognition.Journal of Experimental Psychology: Learning, Memory, & Cognition,21, 158–172.
Nygaard, L. C., &Pisoni, D. B. (1998). Talker-specific learning in speech perception.Perception & Psychophysics,60, 355–376.
Nygaard, L. C., Sommers, M. S., &Pisoni, D. B. (1994). Speech perception as a talker-contingent process.Psychological Science,5, 42–46.
Palmeri, T. J., Goldinger, S. D., &Pisoni, D. B. (1993). Episodic encoding of voice attributes and recognition memory for spoken words.Journal of Experimental Psychology: Learning, Memory, & Cognition,19, 309–328.
Pisoni, D. B. (1996). Some thoughts on “normalization” in speech perception. In K. Johnson & J. W. Mullennix (Eds.),Talker variability in speech processing (pp. 9–32). San Diego: Academic Press.
Preminger, J. E., Lin, H.-B., Payen, M., &Levitt, H. (1998). Selective visual masking in speechreading.Journal of Speech, Language, & Hearing Research,41, 564–575.
Reisberg, D., McLean, J., &Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In R. Campbell & B. Dodd (Eds.),Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.
Remez, R. E., Fellowes, J. M., &Rubin, P. E. (1997). Talker identification based on phonetic information.Journal of Experimental Psychology: Human Perception & Performance,23, 651–666.
Rosenblum, L.D., &Saldaa, H. M. (1998). Time-varying information for speech perception. In R. Campbell, B. Dodd, & D. Burnham (Eds.),Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 61–81). Hove, U.K.: Psychology Press.
Rosenblum, L. D., Yakel, D. A., Baseer, N., Panchal, A., Nodarse, B. C., &Niehus, R. P. (2002). Visual speech information for face recognition.Perception & Psychophysics,64, 220–229.
Saldaña, H. M., Nygaard, L. C., &Pisoni, D. B. (1996). Encoding of visual speaker attributes and recognition memory for spoken words. In D. Stork & M. E. Hennecke (Eds.),Speechreading by man and machine: Models, systems, and applications (1995 NATO ASI Workshop, pp. 275–281). Berlin: Springer-Verlag.
Schacter, D. L., &Church, B. A. (1992). Auditory priming: Implicit and explicit memory for words and voices.Journal of Experimental Psychology: Learning, Memory, & Cognition,18, 915–930.
Schweinberger, S. R., &Soukup, G. R. (1998). Asymmetric relationships among perceptions of facial identity, emotion, and facial speech.Journal of Experimental Psychology: Human Perception & Performance,24, 1748–1765.
Sheffert, S. M. (1998a). Contributions of surface and conceptual information to recognition memory.Perception & Psychophysics,60, 1141–1152.
Sheffert, S. M. (1998b). Voice-specificity effects on auditory word priming.Memory & Cognition,26, 591–598.
Sheffert, S. M., &Fowler, C. A. (1995). The effects of voice and visible speaker change on memory for spoken words.Journal of Memory & Language,34, 665–685.
Sheffert, S. M., Lachs, L., &Hernández, L. (1996–1997). The Hoosier Audiovisual Multi-Talker Computer Database. InResearch on Spoken Language Processing (Progress Rep. No. 21, pp. 578–583). Bloomington: Indiana University, Speech Research Laboratory.
Sheffert, S. M., Pisoni, D. B., Fellowes, J. M., &Remez, R. E. (2002). Learning to recognize talkers from natural, sinewave, and reverse speech samples.Journal of Experimental Psychology: Human Perception & Performance,28, 1447–1469.
Sheffert, S. M., &Shiffrin, R. M. (2003). Auditory registration without learning.Journal of Experimental Psychology: Learning, Memory, & Cognition,29, 10–21.
Shepard, R. N. (1967). Recognition memory for words, sentences, and pictures.Journal of Verbal Learning & Verbal Behavior,6, 156–163.
Sumby, W. H., &Pollack, I. (1954). Visual contribution to speech intelligibility in noise.Journal of the Acoustical Society of America,26, 212–215.
Summerfield, A. Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip-reading (pp. 3–51). Hillsdale, NJ: Erlbaum.
Walker, S., Bruce, V., &O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect.Perception & Psychophysics,57, 1124–1133.
Woodhead, M. M., Baddeley, A. D., &Simmonds, D. C. (1979). On training people to recognize faces.Ergonomics,22, 333–343.
Yakel, D. A., Rosenblum, L. D., &Fortier, M. A. (2000). Effects of talker variability on speechreading.Perception & Psychophysics,62, 1405–1412.
Yarmey, A. D. (1986). Verbal, visual, and voice identification of a rape suspect under different levels of illumination.Journal of Applied Psychology,71, 363–370.
Yarmey, A. D. (1993). Stereotypes and recognition memory for faces and voices of good guys and bad guys.Applied Cognitive Psychology,7, 419–431.
Author information
Authors and Affiliations
Corresponding author
Additional information
Note—This article was accepted by the previous editorial team, headed by Neil Macmillan.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Sheffert, S.M., Olson, E. Audiovisual speech facilitates voice learning. Perception & Psychophysics 66, 352–362 (2004). https://doi.org/10.3758/BF03194884
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03194884