Inadequate and infrequent are not alike: ERPs to deviant prosodic patterns in spoken sentence comprehension
Introduction
Prosody or suprasegmental phonology, respectively, is an inherent aspect of spoken language. In particular, pitch, intensity, and duration variations, as well as speech pauses are prosodic features which co-occur with other linguistic information (e.g. semantic and syntactic) in spoken language (Selkirk, 1984, Shattuck-Hufnagel and Turk, 1996). Although the term “prosody” is commonly used (especially in the literature on language processing) the expression “suprasegmental phonology” better emphasises that prosodic phenomena often relate to elements in speech which are larger than a single segment. In particular, syllables present the relevant domains for word-level prosodic features like stress (récord vs. recórd). However, the suprasegmental course of the global fundamental frequency of a speaker also spans complete syntactic phrases or sentences allowing for, e.g. a differentiation between questions and statements in intonation languages like English, Dutch or German.
Numerous psycholinguistic studies have revealed prosodic influences on the processing of the syntactic structure of spoken utterances. The majority of these studies report that the congruence between the prosodic and syntactic structure of sentences facilitates parsing (Marslen-Wilson et al., 1992, Pynte and Prieur, 1996, Schafer et al., 2000, Schepman and Rodway, 2000). On the other hand, inconsistencies between the prosodic and the syntactic interpretation source can cause garden path phenomena resulting in processing difficulties (Marslen-Wilson et al., 1992, Warren et al., 1995, Speer et al., 1996). Moreover, studies have shown that prosody is used by listeners very early during language comprehension to determine the continuation of sentences by syntactic means (Marslen-Wilson et al., 1992, Warren et al., 1995). In a cross-modal naming paradigm, Marslen-Wilson et al. tested listeners’ abilities to use suprasegmental prosodic cues for the disambiguation of the following sentences:
- (1)
The workers considered the last offer from the management was a real insult.
- (2)
The workers considered the last offer from the management of the factory.
For this purpose, volunteers were acoustically presented first with sentence fragments like the italicized parts of example (1) and (2). Second, the visual probe word ‘was’ followed. This probe was an appropriate continuation of the prosody of sentence fragment (1), but not of (2). The naming latencies for both continuations were measured. Results showed that the naming latencies for the inappropriate probe word (2) were longer than for the probe consistent with the preceding prosody (1). This result provided evidence against a default syntactic parsing mechanism as suggested by Frazier, 1979, Frazier, 1987, Frazier, 1990 in auditory language comprehension. Frazier suggested that a default parse would always be in favour of a minimal attachment construction or a direct object continuation, respectively. The visual probe ‘was’, however, only allows for a complement clause continuation, thus a non-minimal attachment option. According to the formulated assumptions, subjects are required to reparse the sentence in favour of the non-preferred option. In turn, it should take subjects longer to name the probe after the presentation of sentence fragment (1). Yet, naming latencies were longer for the probe following the prosody of stimulus (2), the minimal attachment option. This finding suggests that listeners do not automatically construct a purely syntactically driven minimal attachment structure but incorporate prosodic cues to syntactic structures already into initial parsing stages.
Further behavioural studies reveal that prosody can also guide the computation of syntactic phrase structures in globally ambiguous sentences (Lehiste, 1972, Lehiste, 1973, Price et al., 1991, Ferreira et al., 1996). Listeners were able to identify the intended meaning of a globally ambiguous sentence by using the prosodic correlates of syntactic boundaries (e.g. prefinal lengthening of syllables, pitch contour variation, and speech pauses). Thus, prosodic factors substantially contribute to the structuring and interpretation of spoken utterances. Prosody may sometimes even guide the syntactic analysis of single sentences. In contrast, a prosodic pattern which is not congruent with a particular syntactic structure can also induce processing difficulties or garden path effects, respectively.
Event-Related brain Potentials (ERPs) have been shown to be a valuable tool for the investigation of the time course of prosodic influences on language perception and the temporal dimensions of processing difficulties caused by syntax–prosody mismatches.
One of the first ERP studies concerned with these aspects was conducted on German by Steinhauer, Alter, and Friederici (1999). Listeners were presented with sentence conditions differing in their underlying syntactic structure which in turn lead to varying prosodic phrasing patterns.
- (3)
[Peter verspricht Anna zu arbeiten]IPh1 [und das Büro zu putzen.]IPh2
‘Peter promises Anna to work and the office to clean.’ (literal)
- (4)
[Peter verspricht]IPh1 [Anna zu entlasten]IPh2 [und das Büro zu putzen.]IPh3
‘Peter promises Anna to support and the office to clean.’ (literal)
- (5)
*[Peter verspricht]IPh1 [Anna zu arbeiten]IPh2 [und das Büro zu putzen.]IPh3
‘Peter promises Anna to work and the office to clean.’ (literal)
In example (3) the noun phrase ‘Anna’ is the object of the first verb ‘verspricht’ (‘promises’) due to the intransitivity of the second verb ‘zu arbeiten’ (‘to work’). As a result, one Intonational Phrase (IPh; Selkirk, 1984) is formed by the complete fragment. The IPh boundary (as proven by acoustic analyses of pitch, duration, and pause patterns) is expressed at and after the second verb. In example (4) the transitivity of the second verb induces the formation of two within-sentence IPh boundaries, namely at the right edges of the first verb ‘verspricht’ (‘promises’) and the second verb ‘zu entlasten’ (‘to support’). The ERP data of the listeners revealed a centro-posterior positive-going waveform with a latency of 500 ms to the position of each IPh boundary. This result was interpreted as evidencing the on-line structuring of spoken sentences by means of the prosodic boundaries, i.e. the closure of a major prosodic phrase. In accordance with these findings, the ERP component was termed Closure Positive Shift (CPS). An additional condition (5) in the experimental setting served to determine whether an inadequate prosodic phrasing is sufficient to elicit mismatch effects (Steinhauer et al., 1999, Friederici, 2004). This syntax–prosody mismatch condition (5) was created by combining the IPh1 from example (4) with the consecutive noun + verb complex ‘Anna zu arbeiten’ (‘Anna to work’) from example (3). In turn, the syntactic structure of the sentence requires the attachment of the noun ‘Anna’ to the first noun. However, the prosodic phrasing indicates that the noun is attached to the second verb.
The behavioural and ERP data confirmed that the mismatch between prosody and syntax was detected by listeners. In particular, a biphasic N400–P600 pattern was elicited on the verb ‘zu arbeiten’ (‘to work’). This result indicates that prosody–syntax mismatches can indeed induce garden path effects as previously reported in studies employing syntactic violations proper (Osterhout & Holcomb, 1992) or manipulating the ease of syntactic integration (Kaan, Harris, Gibson, & Holcomb, 2000). Steinhauer et al. (1999) interpreted the N400–P600 complex in their study as reflecting a lexical re-access due to the violation of the intransitive argument structure of the second verb (N400) followed by a revision of the attachment site of the noun ‘Anna’ (P600). Hence, mismatches between the syntactic and the prosodic structure of sentences can induce the same ERP deflections as the detection and revision of semantic–syntactic violations proper (i.e. N400–P600 effects; Steinhauer et al., 1999).
Yet, incongruities in sentence-level prosodic contours have also been shown to elicit ERP responses apart from N400/P600 patterns. For example, Magne et al. (2005) report a sustained negativity from 150 to 1050 ms for French accentuation patterns which are unexpected and inappropriate on the final words of dialogues. Although the effect is initially (between 150 and 300 ms) allocated to frontal electrode sites, later time periods yield effects which are largely distributed over the scalp surface. In sentence-medial positions, the polarity of the effect is reversed and occurs somewhat later than the negativity. This positive-going ERP to inappropriate sentence-medial accentuation appears statistically significant in the time range of 450–1050 ms. A further early deflection for prosodic mismatch detection has been reported by Schön, Magne, and Besson (2004). In their study on French, changes in the pitch of sentence endings elicited a negative-going ERP between 50 and 200 ms over temporal electrode sites. This effect has also been replicated with 8-year old children (Magne, Schön, & Besson, 2006).
With respect to sentence-level prosodic processing in German, Eckstein and Friederici (2006) describe an also widely distributed negativity starting around 100 ms and yielding statistical differences between 300 and 500 ms. This particular ERP is evoked when listeners expect the continuation of a sentence’s intonation contour but encounter a sentence-final prosody. On the other hand, over-pronounced pitch accents in sentence-initial positions have been shown to result in a widely distributed expectancy-related negativity between 250 and 350 ms (Heim & Alter, 2006). The given interpretations of these findings can be summarized briefly as follows: the early prosody-driven negativities are supposed to reflect automatic aspects of prosodic processing on sentence level. In particular, they are thought to arise from mismatches between expectations on an intonation contour which are built up on-line and the prosodic pattern listeners actually encounter (i.e. a target-actual comparison).
To our knowledge, no ERP study has previously investigated whether sentence-level prosodic patterns which are infrequently occurring in everyday language bear the same processing consequences than syntactic–prosodic mismatches proper and/or mere prosodic incongruities. Thus, the current study was designed to investigate whether perceiving such an infrequent intonation (i.e. a vocative intonation) elicits similar ERP deflections than the processing of an inadequate prosodic pattern or syntax–prosody mismatch, respectively.
Vocatives or so-called calling contours are, as opposed to declarative sentences or questions, prosodically rather stereotyped or stylized expressions (Ladd, 1978). Vocatives have been described as spoken chants which convey a call with either a warning character or an attempt to attract attention or help (Pike, 1945, Fox, 1969, Lewis, 1970, Leben, 1976, Ladd, 1978). In phonological terms, the intonation contour of a vocative typically comprises a downstepped tonal sequence from one stable pitch level to another (Gussenhoven, 1993, Ladd, 1996). Caspers (2000) conducted a behavioural experiment on vocative structures in Dutch concerned with the question of their general communicative intentions. They presented volunteers with written contexts supporting a vocative or a default (new information) interpretation or not, and target words realized with different intonations. Subjects had to indicate on a 1–4 scale whether the spoken vocative suits the preceding context. Caspers (2000) found that vocatives are readily accepted when accompanying information which is not forcibly new for the listeners. Thereby, highest acceptability rates were achieved when the vocative served to single out a particular element from background information, thus increasing its salience. However, when vocatives were associated with information which was forcibly new for listeners the acceptability ratings were substantially lower.
The current study is concerned with additional aspects of speech processing and employs an experimental methodology allowing for more fine-grained assertions on the temporal dimensions of perception, i.e. Event-Related brain Potentials (ERPs). Moreover, the employment of vocative intonation contours results in a processing condition with a somehow intermediate status between correct and frequently used prosodic forms (i.e. declarative and coordination sentences) and syntactic–prosodic mismatches (as derived by cross-splicing). We hypothesize that the prosodic patterns which are infrequent but correct for listeners (i.e. vocatives) do not hinder the processes of syntactic structure building as inadequate syntax–prosody mismatches do (see Steinhauer et al., 1999). However, the uncommon intonation of vocative contours might decrease the ease with which successive words can be integrated into a semantic phrase structure. Thus, infrequent prosodic patterns (vocatives) are expected to elicit an N400 similar to words with a low frequency in occurrence (Van Petten & Kutas, 1987). On the other hand, truly inadequate prosodic contours (syntax–prosody mismatches) should induce integration difficulties as reflected by an N400. In contrast to infrequent prosodies they should, however, also promote the reanalysis of a formerly assigned syntactic structure (reflected by a P600) when a syntactic-prosodically non-matching word is perceived.
Section snippets
Participants
Twenty-four (12 female) volunteers took part in the ERP study (mean age of 24.8; SD = 3.1). All participants were native speakers of German, had no hearing or neurological impairment and were right-handed as assessed by a German version of the Edinburgh Handedness Inventory (Oldfield, 1971).
Materials
Overall, four experimental sentence conditions were created (see Table 1). Each of the four conditions consisted of 40 sentences in German. In all conditions, the combination of the first nouns and verbs (e.g.
EEG recordings
The electroencephalogram (EEG) was recorded in a sound-proof and electromagnetically shielded cabin. The recordings were sampled at a rate of 250 Hz from 25 Ag/AgCl cap-mounted electrodes (FP1 + 2, FZ, F3 + 4, F7 + F8, FT3 + 4, FT7 + 8, T7 + 8, CZ, C3 + 4, CP5 + 6, PZ, P3 + 4, P7 + 8, O1 + 2) according to the international 10–20 system (Jasper, 1958). Recordings were online referenced to the left mastoid and re-referenced offline to averaged mastoids. The electrooculogram (EOG) was recorded in order to control for
Behavioural data
Participants show high acceptability rates for the highly frequent adequate prosodies of condition DEC and COO (96.7% vs. 94.3%). A two-tailed t-test revealed no differences. For the infrequent (condition VOC) and inadequate (condition CRS) syntax–prosody associations, however, acceptability judgements are lower. While the intonation of condition CRS is judged as acceptable in 81.7%, condition VOC is rated as acceptable in only 60.3% of all trials (significantly differences: t[46] = 2.95; p ⩽ .01).
Processing frequent adequate as opposed to infrequent adequate sentence prosodies
In Fig. 5, the comparison of the ERPs to the frequent adequate condition COO vs. the infrequent adequate condition VOC is displayed. The onset of the averages is congruent with the onset of the second noun (‘Patricia’). As apparent from Fig. 5, the ERPs for condition COO as compared to condition VOC display a widely distributed Early Negativity (EN) peaking at 120 ms. In addition, a second negativity is evident which peaks around 400 ms. The statistical analysis attests main effects Condition in
Discussion
The current study aimed at investigating the consequences of perceiving prosodic or suprasegmental phonological patterns, respectively, which are infrequent or inadequate with respect to a syntactic phrase structure. Behavioural and ERP correlates for the perception of the deviant prosodies were compared to responses of adequate syntax–prosody relations permanently used in everyday speech.
With respect to the behavioural results, the high acceptability rates for the conditions COO and DEC are in
Acknowledgments
Both first authors contributed equally to the paper. We thank Angela Friederici for the opportunity to collect the ERP data in the Max-Planck-Institute for Cognitive and Brain Sciences in Leipzig. Moreover, we thank Caroline Féry for discussions on the prosodic aspects of the study, and Sylvia Stasch for her support in ERP data collection. We also thank two anonymous reviewers for helpful comments on an earlier version of the paper.
This project has been supported by the Human Frontier Science
References (49)
The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations
Journal of Memory and Language
(1991)- et al.
Event-related potential sensitivity to acoustic and semantic properties of terminal words in sentences
Brain and Language
(1992) - et al.
The effects of processing requirements on neuropsychological responses to spoken sentences
Brain and Language
(1990) - et al.
ERP effects of listening to speech: Semantic ERP effects
Neuropsychologia
(2000) - et al.
ERP effects of listening to speech compared to reading: The P600/SPS to syntactic violations in spoken sentences and rapid serial visual presentation
Neuropsychologia
(2000) The assessment and analysis of handedness: The Edinburgh inventory
Neuropsychologia
(1971)- et al.
An ERP study of continuous speech processing: I. Semantics, syntax, and prosody in native English speakers
Cognitive Brain Research
(2003) - et al.
Ambiguous words in context: An event-related potential analysis of the time-course of meaning activation
Journal of Memory and Language
(1987) Experiments on the meaning of four types of single-accent intonation patterns in Dutch
Language and Speech
(2000)- et al.
Event related brain potential components reflect phonological and semantic processing of the terminal word of spoken sentences
Journal of Cognitive Neuroscience
(1994)
Separating phonological and semantic processing in auditory sentence processing: A high-resolution event-related brain potential study
Human Brain Mapping
It’s early: Event-related potential evidence for initial interaction of syntax and prosody in speech comprehension
Journal of Cognitive Neuroscience
Exploring the use of prosody during language comprehension using the auditory moving window technique
Journal of Psycholinguistic Research
A forgotten English tone
Le Maître Phonétique
On comprehending sentences: Syntactic parsing strategies
Sentence processing: A tutorial review
Exploring the architecture of the language-processing system
Event-related brain potential studies in language
Current Neurology and Neuroscience Reports
The Dutch foot and the chanted call
Journal of Linguistics
Prosodic pitch accents in language comprehension and production: ERP data and acoustic analyses
Acta Neurobiologiae Experimentalis
Auditory and visual semantic priming in lexical decision—a comparison using event-related brain potentials
Language and Cognitive Processes
Natural speech processing: An analysis using event-related potentials
Psychobiology
Report of the committee on the methods of clinical examination in electroencephalography
Electroencephalography and Clinical Neurophysiology
Cited by (22)
Making predictable unpredictable with style – Behavioral and electrophysiological evidence for the critical role of prosodic expectations in the perception of prominence in speech
2018, NeuropsychologiaCitation Excerpt :Still, this is a valid possibility and cannot be resolved by the present study. In any case, the presence of an increased N400 amplitude for deviant intonation shows that prosodic parsing is inherently integral to the semantic decoding of speech (see also Steinhauer et al., 1999; Magne et al., 2005; Eckstein and Friederici, 2005; Toepel et al., 2007; Mietz et al., 2008; Pannekamp et al., 2011), and that this is also true for languages without suprasegmental F0-dependent phonemic contrasts, in this case, Finnish. Moreover, the mechanisms for this type of prosodic re-evaluation seem to be malleable based on short-term speech exposure even for subjects with tens of years of exposure to prosodic patterns from certain language-dependent distribution.
ERP correlates of prosody and syntax interaction in case of embedded sentences
2016, Journal of NeurolinguisticsCitation Excerpt :The occurrence of the CPS components at the phrase boundaries of normal and incongruent sentences showed that the IPBs were processed despite the alteration of right-edge boundary in incongruent sentences. Since the CPS has been consistently found in studies investigating the processing of prosodic boundaries (Bögels et al., 2010; Hwang & Steinhauer, 2011; Isel et al., 2005; Mietz et al., 2008; Pauker et al., 2011; Steinhauer et al., 1999; Steinhauer, 2003) this result shows that the CPS appears at the closure of intonational phrases. The amplitudes of both CPS components were larger for the incongruent sentences compared to natural sentences.
Word-stem tones cue suffixes in the brain
2013, Brain ResearchIn the eye of the listener: Pupil dilation elucidates discourse processing
2011, International Journal of PsychophysiologyCitation Excerpt :This finding is in line with the N400 components in the ERP study by Toepel et al. (2009) under identical processing conditions. Moreover, N400 modulations and behavioral effects are commonly reported when the prosodic realization of an utterance is not in agreement with the syntactic sentence structure or the information structure of a discourse (sentence processing: Heim and Alter, 2006; Mietz et al., 2008; Schafer et al., 2000; Steinhauer et al., 1999; Warren et al., 1995; discourse processing: Alter et al., 2001; Hoeks et al., 2009; Hruska and Alter, 2004). That is, increased pupil dilation responses when perceiving new information and corrections that do not bear adequate focus prosody in light of the preceding question context indicate that the conditions are similarly resource consuming for listeners.
Interaction of right- and left-edge prosodic boundaries in syntactic parsing
2011, Brain Research