Elsevier

Brain and Language

Volume 104, Issue 2, February 2008, Pages 159-169
Brain and Language

Inadequate and infrequent are not alike: ERPs to deviant prosodic patterns in spoken sentence comprehension

https://doi.org/10.1016/j.bandl.2007.03.005Get rights and content

Abstract

The current study on German investigates Event-Related brain Potentials (ERPs) for the perception of sentences with intonations which are infrequent (i.e. vocatives) or inadequate in daily conversation. These ERPs are compared to the processing correlates for sentences in which the syntax-to-prosody relations are congruent and used frequently during communication. Results show that perceiving an adequate but infrequent prosodic structure does not result in the same brain responses as encountering an inadequate prosodic pattern. While an early negative-going ERP followed by an N400 were observed for both the infrequent and the inadequate syntax-to-prosody association, only the inadequate intonation also elicits a P600.

Introduction

Prosody or suprasegmental phonology, respectively, is an inherent aspect of spoken language. In particular, pitch, intensity, and duration variations, as well as speech pauses are prosodic features which co-occur with other linguistic information (e.g. semantic and syntactic) in spoken language (Selkirk, 1984, Shattuck-Hufnagel and Turk, 1996). Although the term “prosody” is commonly used (especially in the literature on language processing) the expression “suprasegmental phonology” better emphasises that prosodic phenomena often relate to elements in speech which are larger than a single segment. In particular, syllables present the relevant domains for word-level prosodic features like stress (récord vs. recórd). However, the suprasegmental course of the global fundamental frequency of a speaker also spans complete syntactic phrases or sentences allowing for, e.g. a differentiation between questions and statements in intonation languages like English, Dutch or German.

Numerous psycholinguistic studies have revealed prosodic influences on the processing of the syntactic structure of spoken utterances. The majority of these studies report that the congruence between the prosodic and syntactic structure of sentences facilitates parsing (Marslen-Wilson et al., 1992, Pynte and Prieur, 1996, Schafer et al., 2000, Schepman and Rodway, 2000). On the other hand, inconsistencies between the prosodic and the syntactic interpretation source can cause garden path phenomena resulting in processing difficulties (Marslen-Wilson et al., 1992, Warren et al., 1995, Speer et al., 1996). Moreover, studies have shown that prosody is used by listeners very early during language comprehension to determine the continuation of sentences by syntactic means (Marslen-Wilson et al., 1992, Warren et al., 1995). In a cross-modal naming paradigm, Marslen-Wilson et al. tested listeners’ abilities to use suprasegmental prosodic cues for the disambiguation of the following sentences:

  • (1)

    The workers considered the last offer from the management was a real insult.

  • (2)

    The workers considered the last offer from the management of the factory.

For this purpose, volunteers were acoustically presented first with sentence fragments like the italicized parts of example (1) and (2). Second, the visual probe word ‘was’ followed. This probe was an appropriate continuation of the prosody of sentence fragment (1), but not of (2). The naming latencies for both continuations were measured. Results showed that the naming latencies for the inappropriate probe word (2) were longer than for the probe consistent with the preceding prosody (1). This result provided evidence against a default syntactic parsing mechanism as suggested by Frazier, 1979, Frazier, 1987, Frazier, 1990 in auditory language comprehension. Frazier suggested that a default parse would always be in favour of a minimal attachment construction or a direct object continuation, respectively. The visual probe ‘was’, however, only allows for a complement clause continuation, thus a non-minimal attachment option. According to the formulated assumptions, subjects are required to reparse the sentence in favour of the non-preferred option. In turn, it should take subjects longer to name the probe after the presentation of sentence fragment (1). Yet, naming latencies were longer for the probe following the prosody of stimulus (2), the minimal attachment option. This finding suggests that listeners do not automatically construct a purely syntactically driven minimal attachment structure but incorporate prosodic cues to syntactic structures already into initial parsing stages.

Further behavioural studies reveal that prosody can also guide the computation of syntactic phrase structures in globally ambiguous sentences (Lehiste, 1972, Lehiste, 1973, Price et al., 1991, Ferreira et al., 1996). Listeners were able to identify the intended meaning of a globally ambiguous sentence by using the prosodic correlates of syntactic boundaries (e.g. prefinal lengthening of syllables, pitch contour variation, and speech pauses). Thus, prosodic factors substantially contribute to the structuring and interpretation of spoken utterances. Prosody may sometimes even guide the syntactic analysis of single sentences. In contrast, a prosodic pattern which is not congruent with a particular syntactic structure can also induce processing difficulties or garden path effects, respectively.

Event-Related brain Potentials (ERPs) have been shown to be a valuable tool for the investigation of the time course of prosodic influences on language perception and the temporal dimensions of processing difficulties caused by syntax–prosody mismatches.

One of the first ERP studies concerned with these aspects was conducted on German by Steinhauer, Alter, and Friederici (1999). Listeners were presented with sentence conditions differing in their underlying syntactic structure which in turn lead to varying prosodic phrasing patterns.

  • (3)

    [Peter verspricht Anna zu arbeiten]IPh1 [und das Büro zu putzen.]IPh2

    ‘Peter promises Anna to work and the office to clean.’ (literal)

  • (4)

    [Peter verspricht]IPh1 [Anna zu entlasten]IPh2 [und das Büro zu putzen.]IPh3

    ‘Peter promises Anna to support and the office to clean.’ (literal)

  • (5)

    *[Peter verspricht]IPh1 [Anna zu arbeiten]IPh2 [und das Büro zu putzen.]IPh3

    ‘Peter promises Anna to work and the office to clean.’ (literal)

In example (3) the noun phrase ‘Anna’ is the object of the first verb ‘verspricht’ (‘promises’) due to the intransitivity of the second verb ‘zu arbeiten’ (‘to work’). As a result, one Intonational Phrase (IPh; Selkirk, 1984) is formed by the complete fragment. The IPh boundary (as proven by acoustic analyses of pitch, duration, and pause patterns) is expressed at and after the second verb. In example (4) the transitivity of the second verb induces the formation of two within-sentence IPh boundaries, namely at the right edges of the first verb ‘verspricht’ (‘promises’) and the second verb ‘zu entlasten’ (‘to support’). The ERP data of the listeners revealed a centro-posterior positive-going waveform with a latency of 500 ms to the position of each IPh boundary. This result was interpreted as evidencing the on-line structuring of spoken sentences by means of the prosodic boundaries, i.e. the closure of a major prosodic phrase. In accordance with these findings, the ERP component was termed Closure Positive Shift (CPS). An additional condition (5) in the experimental setting served to determine whether an inadequate prosodic phrasing is sufficient to elicit mismatch effects (Steinhauer et al., 1999, Friederici, 2004). This syntax–prosody mismatch condition (5) was created by combining the IPh1 from example (4) with the consecutive noun + verb complex ‘Anna zu arbeiten’ (‘Anna to work’) from example (3). In turn, the syntactic structure of the sentence requires the attachment of the noun ‘Anna’ to the first noun. However, the prosodic phrasing indicates that the noun is attached to the second verb.

The behavioural and ERP data confirmed that the mismatch between prosody and syntax was detected by listeners. In particular, a biphasic N400–P600 pattern was elicited on the verb ‘zu arbeiten’ (‘to work’). This result indicates that prosody–syntax mismatches can indeed induce garden path effects as previously reported in studies employing syntactic violations proper (Osterhout & Holcomb, 1992) or manipulating the ease of syntactic integration (Kaan, Harris, Gibson, & Holcomb, 2000). Steinhauer et al. (1999) interpreted the N400–P600 complex in their study as reflecting a lexical re-access due to the violation of the intransitive argument structure of the second verb (N400) followed by a revision of the attachment site of the noun ‘Anna’ (P600). Hence, mismatches between the syntactic and the prosodic structure of sentences can induce the same ERP deflections as the detection and revision of semantic–syntactic violations proper (i.e. N400–P600 effects; Steinhauer et al., 1999).

Yet, incongruities in sentence-level prosodic contours have also been shown to elicit ERP responses apart from N400/P600 patterns. For example, Magne et al. (2005) report a sustained negativity from 150 to 1050 ms for French accentuation patterns which are unexpected and inappropriate on the final words of dialogues. Although the effect is initially (between 150 and 300 ms) allocated to frontal electrode sites, later time periods yield effects which are largely distributed over the scalp surface. In sentence-medial positions, the polarity of the effect is reversed and occurs somewhat later than the negativity. This positive-going ERP to inappropriate sentence-medial accentuation appears statistically significant in the time range of 450–1050 ms. A further early deflection for prosodic mismatch detection has been reported by Schön, Magne, and Besson (2004). In their study on French, changes in the pitch of sentence endings elicited a negative-going ERP between 50 and 200 ms over temporal electrode sites. This effect has also been replicated with 8-year old children (Magne, Schön, & Besson, 2006).

With respect to sentence-level prosodic processing in German, Eckstein and Friederici (2006) describe an also widely distributed negativity starting around 100 ms and yielding statistical differences between 300 and 500 ms. This particular ERP is evoked when listeners expect the continuation of a sentence’s intonation contour but encounter a sentence-final prosody. On the other hand, over-pronounced pitch accents in sentence-initial positions have been shown to result in a widely distributed expectancy-related negativity between 250 and 350 ms (Heim & Alter, 2006). The given interpretations of these findings can be summarized briefly as follows: the early prosody-driven negativities are supposed to reflect automatic aspects of prosodic processing on sentence level. In particular, they are thought to arise from mismatches between expectations on an intonation contour which are built up on-line and the prosodic pattern listeners actually encounter (i.e. a target-actual comparison).

To our knowledge, no ERP study has previously investigated whether sentence-level prosodic patterns which are infrequently occurring in everyday language bear the same processing consequences than syntactic–prosodic mismatches proper and/or mere prosodic incongruities. Thus, the current study was designed to investigate whether perceiving such an infrequent intonation (i.e. a vocative intonation) elicits similar ERP deflections than the processing of an inadequate prosodic pattern or syntax–prosody mismatch, respectively.

Vocatives or so-called calling contours are, as opposed to declarative sentences or questions, prosodically rather stereotyped or stylized expressions (Ladd, 1978). Vocatives have been described as spoken chants which convey a call with either a warning character or an attempt to attract attention or help (Pike, 1945, Fox, 1969, Lewis, 1970, Leben, 1976, Ladd, 1978). In phonological terms, the intonation contour of a vocative typically comprises a downstepped tonal sequence from one stable pitch level to another (Gussenhoven, 1993, Ladd, 1996). Caspers (2000) conducted a behavioural experiment on vocative structures in Dutch concerned with the question of their general communicative intentions. They presented volunteers with written contexts supporting a vocative or a default (new information) interpretation or not, and target words realized with different intonations. Subjects had to indicate on a 1–4 scale whether the spoken vocative suits the preceding context. Caspers (2000) found that vocatives are readily accepted when accompanying information which is not forcibly new for the listeners. Thereby, highest acceptability rates were achieved when the vocative served to single out a particular element from background information, thus increasing its salience. However, when vocatives were associated with information which was forcibly new for listeners the acceptability ratings were substantially lower.

The current study is concerned with additional aspects of speech processing and employs an experimental methodology allowing for more fine-grained assertions on the temporal dimensions of perception, i.e. Event-Related brain Potentials (ERPs). Moreover, the employment of vocative intonation contours results in a processing condition with a somehow intermediate status between correct and frequently used prosodic forms (i.e. declarative and coordination sentences) and syntactic–prosodic mismatches (as derived by cross-splicing). We hypothesize that the prosodic patterns which are infrequent but correct for listeners (i.e. vocatives) do not hinder the processes of syntactic structure building as inadequate syntax–prosody mismatches do (see Steinhauer et al., 1999). However, the uncommon intonation of vocative contours might decrease the ease with which successive words can be integrated into a semantic phrase structure. Thus, infrequent prosodic patterns (vocatives) are expected to elicit an N400 similar to words with a low frequency in occurrence (Van Petten & Kutas, 1987). On the other hand, truly inadequate prosodic contours (syntax–prosody mismatches) should induce integration difficulties as reflected by an N400. In contrast to infrequent prosodies they should, however, also promote the reanalysis of a formerly assigned syntactic structure (reflected by a P600) when a syntactic-prosodically non-matching word is perceived.

Section snippets

Participants

Twenty-four (12 female) volunteers took part in the ERP study (mean age of 24.8; SD = 3.1). All participants were native speakers of German, had no hearing or neurological impairment and were right-handed as assessed by a German version of the Edinburgh Handedness Inventory (Oldfield, 1971).

Materials

Overall, four experimental sentence conditions were created (see Table 1). Each of the four conditions consisted of 40 sentences in German. In all conditions, the combination of the first nouns and verbs (e.g.

EEG recordings

The electroencephalogram (EEG) was recorded in a sound-proof and electromagnetically shielded cabin. The recordings were sampled at a rate of 250 Hz from 25 Ag/AgCl cap-mounted electrodes (FP1 + 2, FZ, F3 + 4, F7 + F8, FT3 + 4, FT7 + 8, T7 + 8, CZ, C3 + 4, CP5 + 6, PZ, P3 + 4, P7 + 8, O1 + 2) according to the international 10–20 system (Jasper, 1958). Recordings were online referenced to the left mastoid and re-referenced offline to averaged mastoids. The electrooculogram (EOG) was recorded in order to control for

Behavioural data

Participants show high acceptability rates for the highly frequent adequate prosodies of condition DEC and COO (96.7% vs. 94.3%). A two-tailed t-test revealed no differences. For the infrequent (condition VOC) and inadequate (condition CRS) syntax–prosody associations, however, acceptability judgements are lower. While the intonation of condition CRS is judged as acceptable in 81.7%, condition VOC is rated as acceptable in only 60.3% of all trials (significantly differences: t[46] = 2.95; p  .01).

Processing frequent adequate as opposed to infrequent adequate sentence prosodies

In Fig. 5, the comparison of the ERPs to the frequent adequate condition COO vs. the infrequent adequate condition VOC is displayed. The onset of the averages is congruent with the onset of the second noun (‘Patricia’). As apparent from Fig. 5, the ERPs for condition COO as compared to condition VOC display a widely distributed Early Negativity (EN) peaking at 120 ms. In addition, a second negativity is evident which peaks around 400 ms. The statistical analysis attests main effects Condition in

Discussion

The current study aimed at investigating the consequences of perceiving prosodic or suprasegmental phonological patterns, respectively, which are infrequent or inadequate with respect to a syntactic phrase structure. Behavioural and ERP correlates for the perception of the deviant prosodies were compared to responses of adequate syntax–prosody relations permanently used in everyday speech.

With respect to the behavioural results, the high acceptability rates for the conditions COO and DEC are in

Acknowledgments

Both first authors contributed equally to the paper. We thank Angela Friederici for the opportunity to collect the ERP data in the Max-Planck-Institute for Cognitive and Brain Sciences in Leipzig. Moreover, we thank Caroline Féry for discussions on the prosodic aspects of the study, and Sylvia Stasch for her support in ERP data collection. We also thank two anonymous reviewers for helpful comments on an earlier version of the paper.

This project has been supported by the Human Frontier Science

References (49)

  • R.C.N. D’Arcy et al.

    Separating phonological and semantic processing in auditory sentence processing: A high-resolution event-related brain potential study

    Human Brain Mapping

    (2004)
  • Diaz, M.T. & Swaab, T.Y. (in press). Electrophysiological differentiation of phonological and semantic integration in...
  • K. Eckstein et al.

    It’s early: Event-related potential evidence for initial interaction of syntax and prosody in speech comprehension

    Journal of Cognitive Neuroscience

    (2006)
  • F. Ferreira et al.

    Exploring the use of prosody during language comprehension using the auditory moving window technique

    Journal of Psycholinguistic Research

    (1996)
  • A. Fox

    A forgotten English tone

    Le Maître Phonétique

    (1969)
  • L. Frazier

    On comprehending sentences: Syntactic parsing strategies

    (1979)
  • L. Frazier

    Sentence processing: A tutorial review

  • L. Frazier

    Exploring the architecture of the language-processing system

  • A.D. Friederici

    Event-related brain potential studies in language

    Current Neurology and Neuroscience Reports

    (2004)
  • C. Gussenhoven

    The Dutch foot and the chanted call

    Journal of Linguistics

    (1993)
  • S. Heim et al.

    Prosodic pitch accents in language comprehension and production: ERP data and acoustic analyses

    Acta Neurobiologiae Experimentalis

    (2006)
  • P.J. Holcomb et al.

    Auditory and visual semantic priming in lexical decision—a comparison using event-related brain potentials

    Language and Cognitive Processes

    (1990)
  • P.J. Holcomb et al.

    Natural speech processing: An analysis using event-related potentials

    Psychobiology

    (1991)
  • H.H. Jasper

    Report of the committee on the methods of clinical examination in electroencephalography

    Electroencephalography and Clinical Neurophysiology

    (1958)
  • Cited by (22)

    • Making predictable unpredictable with style – Behavioral and electrophysiological evidence for the critical role of prosodic expectations in the perception of prominence in speech

      2018, Neuropsychologia
      Citation Excerpt :

      Still, this is a valid possibility and cannot be resolved by the present study. In any case, the presence of an increased N400 amplitude for deviant intonation shows that prosodic parsing is inherently integral to the semantic decoding of speech (see also Steinhauer et al., 1999; Magne et al., 2005; Eckstein and Friederici, 2005; Toepel et al., 2007; Mietz et al., 2008; Pannekamp et al., 2011), and that this is also true for languages without suprasegmental F0-dependent phonemic contrasts, in this case, Finnish. Moreover, the mechanisms for this type of prosodic re-evaluation seem to be malleable based on short-term speech exposure even for subjects with tens of years of exposure to prosodic patterns from certain language-dependent distribution.

    • ERP correlates of prosody and syntax interaction in case of embedded sentences

      2016, Journal of Neurolinguistics
      Citation Excerpt :

      The occurrence of the CPS components at the phrase boundaries of normal and incongruent sentences showed that the IPBs were processed despite the alteration of right-edge boundary in incongruent sentences. Since the CPS has been consistently found in studies investigating the processing of prosodic boundaries (Bögels et al., 2010; Hwang & Steinhauer, 2011; Isel et al., 2005; Mietz et al., 2008; Pauker et al., 2011; Steinhauer et al., 1999; Steinhauer, 2003) this result shows that the CPS appears at the closure of intonational phrases. The amplitudes of both CPS components were larger for the incongruent sentences compared to natural sentences.

    • In the eye of the listener: Pupil dilation elucidates discourse processing

      2011, International Journal of Psychophysiology
      Citation Excerpt :

      This finding is in line with the N400 components in the ERP study by Toepel et al. (2009) under identical processing conditions. Moreover, N400 modulations and behavioral effects are commonly reported when the prosodic realization of an utterance is not in agreement with the syntactic sentence structure or the information structure of a discourse (sentence processing: Heim and Alter, 2006; Mietz et al., 2008; Schafer et al., 2000; Steinhauer et al., 1999; Warren et al., 1995; discourse processing: Alter et al., 2001; Hoeks et al., 2009; Hruska and Alter, 2004). That is, increased pupil dilation responses when perceiving new information and corrections that do not bear adequate focus prosody in light of the preceding question context indicate that the conditions are similarly resource consuming for listeners.

    View all citing articles on Scopus
    View full text