Elsevier

Cognitive Psychology

Volume 48, Issue 2, March 2004, Pages 127-162
Cognitive Psychology

Learning at a distance I. Statistical learning of non-adjacent dependencies

https://doi.org/10.1016/S0010-0285(03)00128-2Get rights and content

Abstract

In earlier work we have shown that adults, young children, and infants are capable of computing transitional probabilities among adjacent syllables in rapidly presented streams of speech, and of using these statistics to group adjacent syllables into word-like units. In the present experiments we ask whether adult learners are also capable of such computations when the only available patterns occur in non-adjacent elements. In the first experiment, we present streams of speech in which precisely the same kinds of syllable regularities occur as in our previous studies, except that the patterned relations among syllables occur between non-adjacent syllables (with an intervening syllable that is unrelated). Under these circumstances we do not obtain our previous results: learners are quite poor at acquiring regular relations among non-adjacent syllables, even when the patterns are objectively quite simple. In subsequent experiments we show that learners are, in contrast, quite capable of acquiring patterned relations among non-adjacent segments—both non-adjacent consonants (with an intervening vocalic segment that is unrelated) and non-adjacent vowels (with an intervening consonantal segment that is unrelated). Finally, we discuss why human learners display these strong differences in learning differing types of non-adjacent regularities, and we conclude by suggesting that these contrasts in learnability may account for why human languages display non-adjacent regularities of one type much more widely than non-adjacent regularities of the other type.

Introduction

A question of long-standing interest concerns the mechanisms by which human learners acquire their native language. We know, from numerous empirical studies and theoretical discussions, that this process requires contributions from both nature and nurture—that is, from both the linguistic environment to which learners are exposed and some innate predispositions of human learners to process and learn temporally organized patterns in particular ways (see Chomsky, 1965; Gleitman & Newport, 1995; Marcus, 2001; Pinker, 1994; Seidenberg, 1997; for discussion). However, little is known about the precise processes by which this learning occurs or the mechanisms responsible for its rapidity and success.

In recent work we have shown that adults, young children, and infants are capable of computing transitional probabilities1 among adjacent syllables in rapidly presented streams of speech, and of using these statistics to group syllables into word-like units (Aslin, Saffran, & Newport, 1998; Saffran, Aslin, & Newport, 1996a; Saffran, Newport, & Aslin, 1996b; Saffran, Newport, Aslin, Tunick, & Barrueco, 1997). We believe this statistical learning mechanism may play an important role in various aspects of language acquisition—at minimum in the process of word segmentation, but also potentially in the acquisition of syntax and morphology as well (Mintz, Newport, & Bever, 2002; Morgan, Meier, & Newport, 1987; Newport & Aslin, 2000; Saffran, 2001, Saffran, 2002). However, the extent of the capabilities of this statistical learning mechanism, and the levels and types of language patterns that may be acquired with the help of such a computational device, are still unknown. In the present paper we take an important step beyond our earlier results, asking whether learners are capable of computing not only adjacent sound regularities, but also regularities among sounds that are not adjacent to one another, and if so, what types of non-adjacent regularities they can easily acquire.

As noted above, our first work focused on asking whether learners could acquire statistical regularities among immediately adjacent syllables. Indeed, most words in natural languages are comprised of consistent sound patterns among adjacent syllables, and the transitional probability relations we examined in our miniature language studies were similar to those exhibited in real human languages (Harris, 1955). But natural languages exhibit other types of regularities as well, including certain types of non-adjacent patterns (Chomsky, 1957). Any mechanism used broadly in language acquisition must therefore, in some way, be capable of learning non-adjacent regularities (Chomsky, 1957; Miller & Chomsky, 1963).

What types of non-adjacent regularities do natural languages include? In many languages, words contain regular patterns among syllables or phonemic segments that are not immediately adjacent. For example, in Tagalog, some words may receive infixes: sounds inserted within the word stem to mark a specific tense or aspect, and in Semitic languages, words may be built from a consonant pattern, such as k-t-b, with varying vowel patterns inserted between the consonants to signal time or number. Similarly, syntactic structure may involve dependencies between words that are quite distant from one another: sentence subjects that agree with verbs many words away, or wh-question words that replace noun phrases much later in the sentence. However, a central finding of modern linguistics has been that such non-adjacent relations are quite selective and display limits that are universal to languages of the world; a main enterprise, of theoretical linguistics of all flavors, has been to capture these limitations in a set of principles or universal constraints (Chomsky, 1965, Chomsky, 1981, Chomsky, 1995).

How might a learning mechanism—and in particular, a statistical learning mechanism—operate with regard to non-adjacent dependencies? An important problem for this type of computational mechanism (as for any learning device) concerns how to limit its operations, so that the patterns of language are correctly acquired, but without an unmanageable explosion in the number of computations that must be performed to do the learning (Chomsky, 1965, Chomsky, 1981; Wexler & Culicover, 1980). In order to acquire even the simplest adjacent patterns that we have studied in 4-word, 2-min experiments with infants, learners must be performing the running computation of 20 different transitional probabilities, each over 45 occurrences of the component syllables and 15–45 occurrences of syllable pairings.2 A learning mechanism additionally capable of computing and acquiring non-adjacent dependencies, while necessary for language learning, opens a computational Pandora’s box: In order to find consistent non-adjacent regularities, such a device might have to keep track of the probabilities relating all the syllables one away, two away, three away, etc. If such a device were to keep track of regularities among many types of elements—syllables, features, phonemic segments, and the like—this problem grows exponentially. But, as noted, non-adjacent regularities in natural languages take only certain forms. The problem is finding just these forms and not becoming overwhelmed by the other possibilities.

There are several possible ways of thinking about solutions to this problem. One possibility is that the statistical learning mechanism we have discovered is, in fact, a simple and low-level mechanism, limited to quick calculations among adjacent sound units. If this were the case, it would have to feed its results to another mechanism—perhaps a language acquisition device that is built to expect the properties exhibited by natural languages—in order to acquire the full range of constructions of human languages.

A second possibility is that the statistical learning mechanism we have discovered might itself be capable of a broader range of computations, among both adjacent and non-adjacent elements. But if so, what kinds of non-adjacent relations is it capable of acquiring? What might be the limits on such a learning device? Is it a very broad computational mechanism, capable of computing many patterns, both those that natural languages exhibit and also those that natural languages do not exhibit? If so, the constraints on patterns that appear in natural languages would have to be provided, during learning, by another source (e.g., a substantive language acquisition device, or a constraint on on-line processing). Alternatively, the particular computations this device can perform and the patterns that natural languages exhibit could be sharply similar and matching in their selectivities. If the latter, this would suggest that some of the constraints on natural language structure might arise from constraints on the computational abilities this mechanism exhibits.

In the present paper we address this question through a series of empirical studies of the learning of non-adjacent regularities. We begin with patterns that are, as much as possible, identical to those we have previously studied, except that they incorporate non-adjacent, rather than adjacent, regularities. As we will see, however, human learning of non-adjacent regularities appears to be extremely selective, even in our laboratory studies. Our studies therefore move on to examine those types of non-adjacent patterns that learners do and do not readily acquire. As we will show, the findings we obtain across these studies match remarkably well with the types of patterns natural languages do and do not commonly exhibit. In a companion paper, we examine this type of learning in a different primate species—Cotton-top tamarin monkeys—to see where these selectivities might be shared or specific to our species. Taken together, these papers begin to shed some light on how statistical learning mechanisms and universals of language might interact.

Section snippets

Experiment 1: Non-adjacent syllables

In our previous studies (Aslin et al., 1998; Saffran et al., 1996a, Saffran et al., 1996b), subjects readily learned words comprised of consistent sequences of adjacent syllables, discriminating them from non-occurring sequences of the same syllables, and also from sequences of the same syllables that had occurred with less consistency. These results demonstrate that human learners can acquire syllable groupings by computing, online and very rapidly, a set of statistics concerning how adjacent

Further explorations of these negative findings

Over a series of eight different experiments, involving a total of 51 subjects, we manipulated a number of variables to see whether we could demonstrate successful learning of non-adjacent syllable regularities. First, we increased the length of exposure subjects were given to the language (running some subjects for 2 sessions, across 2 consecutive days, rather than one). We also tried an implicit rather than explicit learning procedure (as in Saffran et al., 1997), since some miniature

Experiment 2: Non-adjacent syllables versus non-adjacent phonemic segments

In the present experiment, we built languages with two different types of non-adjacent regularities, but with other aspects of their structure fairly similar. One type of language involved non-adjacent syllables, like the languages we studied in the experiments described above, with transitional probabilities of 1.0 between the first and third syllables of a 3-syllable sequence, while the intervening second syllable varied. In contrast, the second type of language involved patterned

Experiment 2A: Control for the number of syllable frames

The structure of the non-adjacent syllable languages used in this experiment was identical to that used in Experiment 2, except that the number of word-frames was reduced from three to two. This also resulted in a reduction in the number of words in the language, from six to four. By the metric of syllable frames, then, these languages were equal in simplicity to the non-adjacent segment languages. By other metrics—for example, the number of total words in the language—these languages are much

Experiment 3: Non-adjacent phonemic segments (vowels)

In this experiment we built a new type of language with patterned regularities among non-adjacent phonemic segments: this time among the vowels, skipping over the consonants. In this type of language, we created transitional probabilities of 1.0 among the vowels of a 3-syllable sequence, while the consonants that intervened between these vowels varied. These languages were similar to the non-adjacent syllable and non-adjacent segment languages of Experiment 2 in other ways: in the inventory of

General discussion

The aim of the present experiments was to investigate learners’ ability to acquire non-adjacent regularities among speech sounds. In previous work we have demonstrated that human learners have a remarkable capacity to compute complex co-occurrence statistics among speech sounds (as well as other types of auditory stimuli), and to do so rapidly, online, and simultaneously over a fairly large number of sounds across a continuous stream of speech (Aslin et al., 1998; Newport & Aslin, 2000; Saffran

Conclusions

We believe that the present results provide a new and important step in understanding the nature of statistical learning and the ways in which it might be pertinent to the acquisition and structure of natural languages. The present studies asked whether statistical learning is limited to computations on adjacent sound sequences only, or rather whether learners can also perform computations on non-adjacent sound sequences. If a statistical learning mechanism could conduct its computations on

Acknowledgements

We are grateful to Toby Calandra, Elizabeth Johnson, Kirti Sharma, Kelly Kinde, and Joanne Esse for their extensive help in conducting these experiments, to Marc Hauser, Jessica Maye, and an anonymous reviewer for comments on an earlier draft of the paper, to Alex Pouget for helpful discussion of entropy, and to Katherine Demuth for insightful discussion of nonadjacent dependencies in natural languages. This research was supported in part by NIH Grant DC00167 to ELN, NIH Grant HD37082 to RNA,

References (56)

  • R.N. Aslin et al.

    Computation of conditional probability statistics by 8-month-old infants

    Psychological Science

    (1998)
  • T.G. Bever

    The cognitive basis for linguistic structures

  • A.S. Bregman

    Auditory scene analysis

    (1990)
  • N.A. Chomsky

    Syntactic structures

    (1957)
  • N.A. Chomsky

    Aspects of the theory of syntax

    (1965)
  • N.A. Chomsky

    Lectures on government and binding

    (1981)
  • N.A. Chomsky

    The minimalist program

    (1995)
  • A. Cleeremans

    Mechanisms of implicit learning: Connectionist models of sequence processing

    (1993)
  • A. Cleeremans et al.

    Learning the structure of event sequences

    Journal of Experimental Psychology: General

    (1991)
  • J.A. Coady et al.

    Phonological neighbourhoods in the developing lexicon

    Journal of Child Language

    (2003)
  • Creel, S. C., Newport, E. L., & Aslin, R. N. (submitted). Distant melodies: Statistical learning of non-adjacent...
  • J. Fiser et al.

    Statistical learning of higher-order temporal structure from visual shape-sequences

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2002)
  • L.R. Gleitman et al.

    The invention of language by children: Environmental and biological influences on the acquisition of language

  • L.R. Gleitman et al.

    The structure and acquisition of reading I: Relations between orthographies and the structure of language

  • Goldsmith, J. (1976). Autosegmental phonology. Unpublished doctoral dissertation,...
  • J. Goldsmith

    Autosegmental and metrical phonology

    (1990)
  • R.L. Gomez

    Variability and detection of invariant structure

    Psychological Science

    (2002)
  • Z.S. Harris

    From phoneme to morpheme

    Language

    (1955)
  • Cited by (509)

    View all citing articles on Scopus
    View full text