Research report
Electrophysiological evidence for prelinguistic infants' word recognition in continuous speech

https://doi.org/10.1016/j.cogbrainres.2004.12.009Get rights and content

Abstract

Children begin to talk at about age one. The vocabulary they need to do so must be built on perceptual evidence and, indeed, infants begin to recognize spoken words long before they talk. Most of the utterances infants hear, however, are continuous, without pauses between words, so constructing a vocabulary requires them to decompose continuous speech in order to extract the individual words. Here, we present electrophysiological evidence that 10-month-old infants recognize two-syllable words they have previously heard only in isolation when these words are presented anew in continuous speech. Moreover, they only need roughly the first syllable of the word to begin doing this. Thus, prelinguistic infants command a highly efficient procedure for segmentation and recognition of spoken words in the absence of an existing vocabulary, allowing them to tackle effectively the problem of bootstrapping a lexicon out of the highly variable, continuous speech signals in their environment.

Introduction

Learning a language from birth entails many steps. One essential step is building a vocabulary of the words of the mother tongue. From the fact that children begin their attempts to talk at around age one, it is clear that the initial steps in vocabulary building have been taken in the first year of life. This is a formidable achievement, especially given the fact that most of the utterances infants hear in the first year of life are not words in isolation, but continuous speech without pauses between the words.

The continuity of speech presents one of the greatest challenges to listeners of all ages and all languages. Boundaries between individual words in an utterance are not marked by reliable and consistent signals; yet recognizing the individual words which make up an utterance is necessary if the utterance is to be understood. Thus, the individual words must be extracted from the utterance. Fig. 1 illustrates how hard this can be. The three spectrograms in the upper part of the figure represent three isolated utterances of the same word (hofnar ‘court jester’). The three utterances are not at all the same—they differ both in duration and in spectral quality. The same word also occurs within the sentence which is shown in the lower part of the figure. There are no pauses before or after hofnar in the sentence context, and the acoustic shape of the word's onset and offset have been influenced by the preceding and following phonemes.

If it is challenge enough for the adult listener, the continuity of speech presents a very serious problem indeed to the infant listener attempting to build up an initial stock of word forms based on the available input. Word forms must be recognized as such even though they vary in acoustic form in different contexts, and even though their boundaries in a sentence context are often unmarked. Speech to infants is in this respect not different from speech between adults; in the largest available sample of speech input to an infant listener [16], continuous speech was found to account for 67% of all utterances. Of all the words the infant heard, only 9% of them were uttered in isolation. Thus, the utterance in Fig. 1 – which, as it happens, is taken from the materials of the present study – is a fair approximation of the kind of continuity problem presented daily to infant listeners. (Note that it was thus spoken in an animated, hyper-articulated style characteristic of speech to infants; variability and contextual influence in speech can in fact be far more extreme than is illustrated here).

Nonetheless, infants contrive to cope with this problem, that is, to recognize recurring word forms within continuous speech and to construct an initial set of words which, around the end of their first year, they begin to attempt to utter. That is, infants are indeed capable of segmenting words from surrounding speech context. This step in language acquisition is taken in the first year of life, before meaning is attached to words [8]. In this first year, infants start to learn how to segment the continuous speech into discrete units roughly corresponding to individual words. The first indications of word segmentation from context are simply based on acoustic form. There is abundant evidence of young infants' competence in segmenting and recognizing words, coming principally from studies using the Headturn Preference Paradigm (HPP). This method compares summed listening time for stimuli of one type versus another, with longer listening time taken to indicate a preference. In a two-stage Familiarization and Test version of HPP, infants from 7.5 to 12 months of age have been shown to listen longer to short passages containing words they had just been familiarized with than to similar passages containing unfamiliar words [7], [8], [9], [10], [12]. This suggests that the infants not only showed a preference for familiar words (over novel words), but also had been able to recognize these newly familiar words even though they were embedded in continuous speech; thus, they must have been able to segment the words from the surrounding continuous speech.

HPP, however, is an indirect measure of segmentation, and it is not possible to investigate with HPP how rapidly segmentation occurs. We wished to look more closely at the time course of word segmentation from continuous speech, and in order to achieve the high temporal resolution necessary for this question, we turned to event-related brain potentials (ERPs). Using ERPs enables us to see what happens in the infant's brain as a particular word in the speech stream is heard; thus, it gives us the opportunity to assess the time needed to segment and recognize this word from speech, as well as to determine whether words are necessarily recognized by infants as undivided wholes or whether recognition of a previously heard word in continuous speech can be initiated on the basis of part of the word.

Little is known as yet about the ERP responses corresponding to the beginnings of word recognition in infants. The Mismatch Negativity (MMN) paradigm, a passive oddball paradigm in which an unexpected change in a series of stimuli usually results in a negative-going increase in ERP amplitude, has proven to be an extremely useful method for studying auditory discrimination of tones, phonemes or syllables [4], and studies have also been conducted on discrimination of (isolated) pseudowords (e.g., in 4- and 5-month-old infants: see Ref. [17]) and (isolated) words (4- to 7-year-old children: see Ref. [11]). However, this type of paradigm is less optimal for answering the current research question, for which more complex stimuli, for example, spoken sentences, are required. To study word recognition from continuous speech, we need a paradigm in which it is possible to present (both isolated words and) full sentences.

For this, we exploited an ERP paradigm previously used in memory research [14], but in a novel way. The ERP procedure that we used had separate Familiarization and Test phases, on analogy with the two-phase HPP studies. In the Familiarization phase, we presented our participants, 28 prelinguistic 10-month-old Dutch infants, with lists of isolated Dutch words. Each list consisted of 10 tokens of the same two-syllable words (e.g., python ‘python’, hofnar ‘court jester’). The words were low in frequency and hence unlikely to be known by 10-month-olds. All had stress on the first syllable; this is a very common word structure in both English and Dutch [2], [5], and the headturn preference response has been consistently observed for such words in both languages [7], [10], [12]. The 10 tokens of any given word were each pronounced separately, so no two were identical, and each was spoken in the animated manner typical of speech directed to infants; the utterances depicted in Fig. 1 are taken from our materials. The Test phase, which immediately followed each word list, comprised eight sentences, four of which contained the familiarized words and four of which contained novel words (see Table 1 for an example of a Familiarization phase and a Test phase).

Section snippets

Participants

Twenty-eight Dutch 10-month-old infants (mean age 308 days, range 288–320 days; 10 female) participated. Sixteen additional infants were tested but excluded from further analyses because they failed to complete enough of the experiment or because the data were too noisy due to movement artifact. The parent(s) gave informed consent for participation of their infant in the study. All infants came from monolingual Dutch families without left-handedness in the immediate family. No neurological or

Results

We examined the ERP response during familiarization in order to establish criteria for the recognition response we could expect during the Test phase. Thus, we first analyzed the ERP response across the ten trials of the Familiarization phase. ERP responses were calculated for each two successive trials (that is, word positions: e.g., position 1/2 is the average of the words in position 1 and 2). The grand average waveform (Fig. 2a) shows an extended positivity for position 1/2, starting at

Discussion

The method we have developed has allowed us to see a cortical effect of word Familiarity in the 10-month-old's brain. The effect takes the form of a reduced positivity with increasing familiarity. In the Familiarization phase, we observed that the effect started very early on in the word (at about 160 ms). The two-syllable words were on average 710 ms long, so the Familiarity response started while the infants were hearing the early parts of the words. In the Test phase, we observed further

Acknowledgments

This research was supported by a SPINOZA grant from the Netherlands Organization for Scientific Research (NWO) to AC. We thank Dan Swingley and Marlies Wassenaar for useful assistance, Herb Clark, James McQueen, Elizabeth Johnson and two anonymous reviewers for comments on the manuscript, and Dennis Pasveer for making Fig. 1.

References (17)

There are more references available in the full text version of this article.

Cited by (0)

View full text