Learning at a distance I. Statistical learning of non-adjacent dependencies
Introduction
A question of long-standing interest concerns the mechanisms by which human learners acquire their native language. We know, from numerous empirical studies and theoretical discussions, that this process requires contributions from both nature and nurture—that is, from both the linguistic environment to which learners are exposed and some innate predispositions of human learners to process and learn temporally organized patterns in particular ways (see Chomsky, 1965; Gleitman & Newport, 1995; Marcus, 2001; Pinker, 1994; Seidenberg, 1997; for discussion). However, little is known about the precise processes by which this learning occurs or the mechanisms responsible for its rapidity and success.
In recent work we have shown that adults, young children, and infants are capable of computing transitional probabilities1 among adjacent syllables in rapidly presented streams of speech, and of using these statistics to group syllables into word-like units (Aslin, Saffran, & Newport, 1998; Saffran, Aslin, & Newport, 1996a; Saffran, Newport, & Aslin, 1996b; Saffran, Newport, Aslin, Tunick, & Barrueco, 1997). We believe this statistical learning mechanism may play an important role in various aspects of language acquisition—at minimum in the process of word segmentation, but also potentially in the acquisition of syntax and morphology as well (Mintz, Newport, & Bever, 2002; Morgan, Meier, & Newport, 1987; Newport & Aslin, 2000; Saffran, 2001, Saffran, 2002). However, the extent of the capabilities of this statistical learning mechanism, and the levels and types of language patterns that may be acquired with the help of such a computational device, are still unknown. In the present paper we take an important step beyond our earlier results, asking whether learners are capable of computing not only adjacent sound regularities, but also regularities among sounds that are not adjacent to one another, and if so, what types of non-adjacent regularities they can easily acquire.
As noted above, our first work focused on asking whether learners could acquire statistical regularities among immediately adjacent syllables. Indeed, most words in natural languages are comprised of consistent sound patterns among adjacent syllables, and the transitional probability relations we examined in our miniature language studies were similar to those exhibited in real human languages (Harris, 1955). But natural languages exhibit other types of regularities as well, including certain types of non-adjacent patterns (Chomsky, 1957). Any mechanism used broadly in language acquisition must therefore, in some way, be capable of learning non-adjacent regularities (Chomsky, 1957; Miller & Chomsky, 1963).
What types of non-adjacent regularities do natural languages include? In many languages, words contain regular patterns among syllables or phonemic segments that are not immediately adjacent. For example, in Tagalog, some words may receive infixes: sounds inserted within the word stem to mark a specific tense or aspect, and in Semitic languages, words may be built from a consonant pattern, such as k-t-b, with varying vowel patterns inserted between the consonants to signal time or number. Similarly, syntactic structure may involve dependencies between words that are quite distant from one another: sentence subjects that agree with verbs many words away, or wh-question words that replace noun phrases much later in the sentence. However, a central finding of modern linguistics has been that such non-adjacent relations are quite selective and display limits that are universal to languages of the world; a main enterprise, of theoretical linguistics of all flavors, has been to capture these limitations in a set of principles or universal constraints (Chomsky, 1965, Chomsky, 1981, Chomsky, 1995).
How might a learning mechanism—and in particular, a statistical learning mechanism—operate with regard to non-adjacent dependencies? An important problem for this type of computational mechanism (as for any learning device) concerns how to limit its operations, so that the patterns of language are correctly acquired, but without an unmanageable explosion in the number of computations that must be performed to do the learning (Chomsky, 1965, Chomsky, 1981; Wexler & Culicover, 1980). In order to acquire even the simplest adjacent patterns that we have studied in 4-word, 2-min experiments with infants, learners must be performing the running computation of 20 different transitional probabilities, each over 45 occurrences of the component syllables and 15–45 occurrences of syllable pairings.2 A learning mechanism additionally capable of computing and acquiring non-adjacent dependencies, while necessary for language learning, opens a computational Pandora’s box: In order to find consistent non-adjacent regularities, such a device might have to keep track of the probabilities relating all the syllables one away, two away, three away, etc. If such a device were to keep track of regularities among many types of elements—syllables, features, phonemic segments, and the like—this problem grows exponentially. But, as noted, non-adjacent regularities in natural languages take only certain forms. The problem is finding just these forms and not becoming overwhelmed by the other possibilities.
There are several possible ways of thinking about solutions to this problem. One possibility is that the statistical learning mechanism we have discovered is, in fact, a simple and low-level mechanism, limited to quick calculations among adjacent sound units. If this were the case, it would have to feed its results to another mechanism—perhaps a language acquisition device that is built to expect the properties exhibited by natural languages—in order to acquire the full range of constructions of human languages.
A second possibility is that the statistical learning mechanism we have discovered might itself be capable of a broader range of computations, among both adjacent and non-adjacent elements. But if so, what kinds of non-adjacent relations is it capable of acquiring? What might be the limits on such a learning device? Is it a very broad computational mechanism, capable of computing many patterns, both those that natural languages exhibit and also those that natural languages do not exhibit? If so, the constraints on patterns that appear in natural languages would have to be provided, during learning, by another source (e.g., a substantive language acquisition device, or a constraint on on-line processing). Alternatively, the particular computations this device can perform and the patterns that natural languages exhibit could be sharply similar and matching in their selectivities. If the latter, this would suggest that some of the constraints on natural language structure might arise from constraints on the computational abilities this mechanism exhibits.
In the present paper we address this question through a series of empirical studies of the learning of non-adjacent regularities. We begin with patterns that are, as much as possible, identical to those we have previously studied, except that they incorporate non-adjacent, rather than adjacent, regularities. As we will see, however, human learning of non-adjacent regularities appears to be extremely selective, even in our laboratory studies. Our studies therefore move on to examine those types of non-adjacent patterns that learners do and do not readily acquire. As we will show, the findings we obtain across these studies match remarkably well with the types of patterns natural languages do and do not commonly exhibit. In a companion paper, we examine this type of learning in a different primate species—Cotton-top tamarin monkeys—to see where these selectivities might be shared or specific to our species. Taken together, these papers begin to shed some light on how statistical learning mechanisms and universals of language might interact.
Section snippets
Experiment 1: Non-adjacent syllables
In our previous studies (Aslin et al., 1998; Saffran et al., 1996a, Saffran et al., 1996b), subjects readily learned words comprised of consistent sequences of adjacent syllables, discriminating them from non-occurring sequences of the same syllables, and also from sequences of the same syllables that had occurred with less consistency. These results demonstrate that human learners can acquire syllable groupings by computing, online and very rapidly, a set of statistics concerning how adjacent
Further explorations of these negative findings
Over a series of eight different experiments, involving a total of 51 subjects, we manipulated a number of variables to see whether we could demonstrate successful learning of non-adjacent syllable regularities. First, we increased the length of exposure subjects were given to the language (running some subjects for 2 sessions, across 2 consecutive days, rather than one). We also tried an implicit rather than explicit learning procedure (as in Saffran et al., 1997), since some miniature
Experiment 2: Non-adjacent syllables versus non-adjacent phonemic segments
In the present experiment, we built languages with two different types of non-adjacent regularities, but with other aspects of their structure fairly similar. One type of language involved non-adjacent syllables, like the languages we studied in the experiments described above, with transitional probabilities of 1.0 between the first and third syllables of a 3-syllable sequence, while the intervening second syllable varied. In contrast, the second type of language involved patterned
Experiment 2A: Control for the number of syllable frames
The structure of the non-adjacent syllable languages used in this experiment was identical to that used in Experiment 2, except that the number of word-frames was reduced from three to two. This also resulted in a reduction in the number of words in the language, from six to four. By the metric of syllable frames, then, these languages were equal in simplicity to the non-adjacent segment languages. By other metrics—for example, the number of total words in the language—these languages are much
Experiment 3: Non-adjacent phonemic segments (vowels)
In this experiment we built a new type of language with patterned regularities among non-adjacent phonemic segments: this time among the vowels, skipping over the consonants. In this type of language, we created transitional probabilities of 1.0 among the vowels of a 3-syllable sequence, while the consonants that intervened between these vowels varied. These languages were similar to the non-adjacent syllable and non-adjacent segment languages of Experiment 2 in other ways: in the inventory of
General discussion
The aim of the present experiments was to investigate learners’ ability to acquire non-adjacent regularities among speech sounds. In previous work we have demonstrated that human learners have a remarkable capacity to compute complex co-occurrence statistics among speech sounds (as well as other types of auditory stimuli), and to do so rapidly, online, and simultaneously over a fairly large number of sounds across a continuous stream of speech (Aslin et al., 1998; Newport & Aslin, 2000; Saffran
Conclusions
We believe that the present results provide a new and important step in understanding the nature of statistical learning and the ways in which it might be pertinent to the acquisition and structure of natural languages. The present studies asked whether statistical learning is limited to computations on adjacent sound sequences only, or rather whether learners can also perform computations on non-adjacent sound sequences. If a statistical learning mechanism could conduct its computations on
Acknowledgements
We are grateful to Toby Calandra, Elizabeth Johnson, Kirti Sharma, Kelly Kinde, and Joanne Esse for their extensive help in conducting these experiments, to Marc Hauser, Jessica Maye, and an anonymous reviewer for comments on an earlier draft of the paper, to Alex Pouget for helpful discussion of entropy, and to Katherine Demuth for insightful discussion of nonadjacent dependencies in natural languages. This research was supported in part by NIH Grant DC00167 to ELN, NIH Grant HD37082 to RNA,
References (56)
- et al.
Syllables as units in infant perception
Infant Behavior and Development
(1981) - et al.
Segmentation of the speech stream in a nonhuman primate: Statistical learning in cotton-top tamarins
Cognition
(2001) - et al.
Infant sensitivity to distributional information can affect phonetic discrimination
Cognition
(2002) - et al.
Structural packaging in the input to language learning: Contributions of intonational and morphological marking of phrases to the acquisition of language
Cognitive Psychology
(1987) Maturational constraints on language learning
Cognitive Science
(1990)The use of predictive dependencies in language learning
Journal of Memory and Language
(2001)Constraints on statistical language learning
Journal of Memory and Language
(2002)- et al.
Statistical learning of tone sequences by human infants and adults
Cognition
(1999) - et al.
Word segmentation: The role of distributional cues
Journal of Memory and Language
(1996) - et al.
The nonperceptual reality of the phoneme
Journal of Verbal Learning and Verbal Behavior
(1970)
Computation of conditional probability statistics by 8-month-old infants
Psychological Science
The cognitive basis for linguistic structures
Auditory scene analysis
Syntactic structures
Aspects of the theory of syntax
Lectures on government and binding
The minimalist program
Mechanisms of implicit learning: Connectionist models of sequence processing
Learning the structure of event sequences
Journal of Experimental Psychology: General
Phonological neighbourhoods in the developing lexicon
Journal of Child Language
Statistical learning of higher-order temporal structure from visual shape-sequences
Journal of Experimental Psychology: Learning, Memory, and Cognition
The invention of language by children: Environmental and biological influences on the acquisition of language
The structure and acquisition of reading I: Relations between orthographies and the structure of language
Autosegmental and metrical phonology
Variability and detection of invariant structure
Psychological Science
From phoneme to morpheme
Language
Cited by (509)
Pragmatic effects on semantic learnability: Insights from evidentiality
2023, Journal of Memory and Language