Abstract
The aim of this study was to dissociate the contributions of memory-based (cognitive) and adaptation-based (sensory) mechanisms underlying deviance detection in the context of natural speech. Twenty healthy right-handed native speakers of English participated in an event-related design scan in which natural speech stimuli, /de:/ (“deh”) and /deI/ (“day”); (/te:/ (“teh”) and /teI/ (“tay”) served as standards and deviants within functional magnetic resonance imaging event-related “oddball” paradigm designed to elicit the mismatch negativity component. Thus, “oddball” blocks could involve either a word deviant (“day”) resulting in a “word advantage” effect, or a non-word deviant (“deh” or “tay”). We utilized an experimental protocol controlling for refractoriness similar to that used previously when deviance detection was studied in the context of tones. Results showed that the cognitive and sensory mechanisms of deviance detection were located in the anterior and posterior auditory cortices, respectively, as was previously found in the context of tones. The cognitive effect, that was most robust for the word deviant, diminished in the “oddball” condition. In addition, the results indicated that the lexical status of the speech stimulus interacts with acoustic factors exerting a top-down modulation of the extent to which novel sounds gain access to the subject’s awareness through memory-based processes. Thus, the more salient the deviant stimulus is the more likely it is to be released from the effects of adaptation exerted by the posterior auditory cortex.
Similar content being viewed by others
Introduction
The mismatch negativity (MMN) component of the event-related-potentials (ERPs) is assumed to reflect the operation of a pre-attentive memory-based comparison mechanism (e.g., Näätänen et al. 2005). For a wide range of stimuli, there is evidence that the MMN is elicited by a cognitive mechanism based on auditory sensory memory that compares between the incoming deviant stimulus and the standard template (Näätänen and Alho 1997; Näätänen et al. 2001).
The contrasting view is that attention switch to novel sounds is based on the transient frequency-specific adaptation of posterior auditory-cortex feature-detector neurons (Desimone 1992; Näätänen 1990, 1992; Ulanovsky et al. 2003; Jääskeläinen et al. 2004). Specifically, it has been suggested that MMN arises because of selective adaptation of the N1 response by preceding standard stimuli (sensory component) leading to its attenuation. This attenuation is interpreted in terms of neural refractoriness (Näätänen 1992). Thus, this account of MMN elicitation does not rely on memory representations and is sensorial in nature.
Previous research has shown that the MMN is sensitive to acoustic as well as to phonetic attributes of phonemes (e.g., Tampas et al. 2005). Furthermore, it was demonstrated that the MMN reflects the activation of memory networks for language sounds and spoken words (e.g., Pulvermüller et al. 2003; Shtyrov et al. 2005; Pulvermüller and Shtyrov 2006), the latter referred to as the “word advantage” effect (Pettigrew et al. 2004).
The design of the current functional magnetic resonance imaging (fMRI) study was a combination of modified protocols utilized in previous studies (i.e., Pettigrew et al. 2004; Opitz et al. 2005) that enabled to disentangle cognitive and sensory contributions to change detection (Opitz et al. 2005; Maess et al. 2007) when the deviant stimulus could be either a word (“day”) resulting in the “word advantage” effect, or a non-word deviant (“deh” or “tay”).
We hypothesized that the sensory component will be located bilaterally in regions posterior to Heschl’s gyrus (HG), including the posterior superior temporal gyrus (STG), regardless of the identity of the deviant stimulus (“deh”, “day” or “tay”). This component relies on frequency-specific neurons in the auditory cortex responsible for the repetition-related decrement of N1 and its counterpart, the N1m (Romani et al. 1982; Jääskelainen et al. 2004; Opitz et al. 2005). Moreover, it was shown that the region posterior to HG is broadly tuned with respect to phonetic features (Ahveninen et al. 2006).
In contrast, it was shown that areas anterior to HG process sound-identity cues such as speech (Binder et al. 2000; Obleser et al. 2007; Ahveninen et al. 2006) and pitch (Warren and Griffiths 2003). Therefore, we assumed that the cognitive component which involves the representation of the current auditory event (Schröger 1997) will differentiate between the deviant word (“day”) and the non-words (“teh” and “tay”, i.e., the “word advantage” effect) and will be located anterior to HG.
Methods
Subjects
Twenty right-handed adult healthy subjects, native speakers of English, 8 women and 12 men, participated in the study. Subjects ranged in age from 23 to 28. All subjects gave informed consent to participate in the study.
Experimental protocol
The procedure is a modification of the one used by Opitz et al. (2005). Each subject participated in 12 functional imaging runs. The speech stimuli were grouped into two types of blocks “oddball” and “control”. In the first type of block, the “oddball” block, deviant exemplars occurred quasi-randomly embedded within standard stimuli so that the frequency of occurrence throughout the block was balanced among exemplars. The interval between two successive deviants varied quasi-randomly with gaps of 4, 6, 8, 10 or 12 standards with the constraint that gap size was balanced throughout the block. In the “Oddball” blocks deviants appeared 42 times out of a total of 354 stimuli (probability of occurrence = 12%).
In the “control” blocks the same physical deviants and standards as in the “oddball” blocks (deviant-counterparts and standard-counterparts, respectively), occurred quasi-randomly, while deviants were constrained by the same spacing rule mentioned above. However, each of the “control” blocks contained eight different equiprobable stimuli, including the deviant and standard counterparts. Thus, in the control runs the seven stimuli beside the deviant-counterpart served as “filler” or contextual stimuli which were added to the sequence so that the deviant-counterpart will appear at the same probability as any other stimulus in the sequence. Each of the stimuli in the “control” block (a total of eight different stimuli) repeated 42 times and appeared with equal probability which was identical to that of the deviants in the “oddball” block (12%). There were three blocks of each type (“oddball”/“control”). Each block was repeated twice. In total, 12 blocks of approximately 6 min each were randomly presented for each subject within a session.
Three non-words, /de:/ “deh”; /te:/ “teh”; and /teI/ “tay”, and one word, /deI/ “day”, were recorded from a male native English speaker in a sound-proof chamber. These stimuli comprised the following standard-deviant pairs that resulted in the three “oddball” runs: (1) deh (standard)–day (deviant); (2) day–deh and (3) teh–tay. Thus, “deh” and “day” swapped their roles as standard and deviant in the second “oddball” block, whereas the third block controlled for the acoustic change associated with the transition from a monophthong (/e:/) to a diphthong (/eI/) occurring in the first standard-deviant pair. Additional four stimuli, /pe:/ “peh”; /peI/ “pay”; /be:/ “beh”; and /beI/ “bay”, together with the previous four mentioned above were embedded within three control runs, each containing a deviant-counterpart , either “day”, “deh” or “tay” appearing with the same probability and obeying the same spacing rule as in the oddball blocks.
Three exemplars for each stimulus (e.g., deh1, deh2, deh3) were selected (out of a pool of 24 recordings per stimulus) on the basis of acoustic similarity (see Table 1 for the values of the lowest three formants). The parameters that were used to choose similar exemplars for each stimulus included the shape of the spectrogram at the voice onset, vowel durations, pitch and formant values (Hz) of the first three formants at the beginning (ca. 84 ms) and end (ca. 168 ms) of the /e/ segment (ca. 252 ms duration) of each stimulus. The stimuli were truncated to 280 ms and normalized to the same loudness level by using Adobe Audition 1.5 trial version software package. Spectral analysis of the stimuli was conducted by PRAAT software version 4.3.19 (http://www.praat.org).
The reason for using three tokens for each consonant-vowel stimulus was to control as much as possible for acoustic factors, other than those inherent in the structure of the stimulus, which could confound the elicitation of a deviant response. Using three different exemplars for each consonant-vowel (CV) stimulus diminished the likelihood of a contingency developing between a specific deviant-standard pair because of an uncontrolled acoustic facet associated with either the deviant or the standard.
As could be seen in Table 1 the frequency parameters across homogeneous exemplars (i.e., the three representatives of the same CV) were very similar to each other. In addition, the F 1/F 2 ratio in “teh” and “tay” is similar to the F1/F2 ratio in “deh” and “day”, respectively. Thus, the comparison between the responses to “teh” and “tay” could serve as a suitable control for the latter pair (Jacobsen et al. 2004).
The speech stimuli appeared randomly within each block and their occurrence was balanced throughout the block for standards as well as deviants. Stimuli were presented with an SOA (stimulus onset asynchrony) of 1 s at 95 dB SPL via headphones to subjects with ear-plugs (see Table 2 for an example of a “control” and “oddball” blocks in case of the “deh”–“day” contrast). Stimulus presentation was carried out by E-Prime 1.1 (1.1.4.1) (Psychology Software Tools http://www.pstnet.com).
Contrasting “oddball” deviants and standards with the corresponding stimuli in “control” runs (deviant and standard counterparts) allowed disentangling the two mechanisms of deviance detection, namely, the sensory one based on refractoriness (Näätänen and Picton 1987) from the cognitive one based on memory-based processes (Näätänen 1990; Näätänen and Winkler 1999).
The subjects were instructed to count every stimulus, press a button when they reached 100 and then reset their count and restart counting from zero again. This task was chosen to control for a possible contamination by N2b and P3b ERP-components which are elicited when deviants are being discriminated from standards (Donchin et al. 1997; Opitz et al. 2005).
Behavioral assessment of discrimination between stimuli
Thirteen subjects (out of the 20 participating in the study) rated the stimuli in a behavioral session performed outside of the magnet. The behavioral assessment was carried out in a different session. Stimuli were presented simultaneously with recorded MR scanner noise. The stimuli and the scanner noise were both presented in 62 dB SPL which was a convenient hearing level for both the stimuli and the noise presented together.
Subjects were presented with “triplets” comprised of stimuli containing /e:/ and /eI/. In each block, 15 triplets were presented separated by a 2 s interval of silence allowing the subject to respond. The stimuli were the same as those presented in the fMRI experiment. There were several patterns of triplet presentation in each block, as follows. There were three triplets in which “day” was presented first followed by two “deh”-s; three triplets in which “day” was presented last and preceded by two “deh”-s; three triplets where “deh” was presented first followed by two “day”-s and three triplets were “deh” was last and preceded by two “day”-s. In addition, there were three triplets in which “day”, “deh” and “tay” each appeared in the middle position, respectively, while the neighboring stimuli (in the first and third positions) where either the /eI/ or /e:/ counterparts (i.e., “deh” “day” “deh”; “day” “deh” “day” and “teh “tay” “teh”).
Overall, four blocks of 15 triplets each were presented to the subjects. The order of triplets was randomized within each block. The three different exemplars representing each CV stimulus were balanced across the four blocks. The subjects were required to indicate the outlier in each triplet by pressing the key (either “1”,”2” or “3” on the keyboard) that corresponded to the position of the outlier in the triplet. The outliers were assigned mainly to the extreme positions (1 or 3) in the triplet to simulate more closely the “oddball” design in which the deviant is surrounded by repetitive standards. The trials with the outlier appearing in the middle position were introduced to minimize the probability of guessing the identity of the third stimulus in the triplet after hearing the first two which were non-identical. Thus, subjects could guess the identity of the outlier with more confidence only after hearing two identical stimuli in a row.
Data acquisition parameters
Data was collected on a 3T Siemens Trio scanner. Each study began with two localizers: a three-plane localizer and a multiple-slice sagittal localizer. These were followed by the acquisition of twenty-five 6 mm T1-weighted axial slices (TR = 300 ms, TE = 2.47 ms, flip angle = 60°, FOV = 220 mm, 256 acquisition matrix). For each subject, 12 functional imaging scans were collected with slices in the same locations as the anatomical T1-weighted data. Functional images were recorded using a gradient-echo EPI sequence (TR = 1,550 ms, TE = 30 ms, flip angle = 80°, FOV = 220 mm, 64 acquisition matrix). Each functional run involved the acquisition of 245 volumes with twenty-five 6-mm axial slices. Images were converted to analyze format and the first ten volumes of each functional series were removed to account for the approach to steady-state magnetization, leaving 235 volumes for analysis.
Image analysis
Preprocessing
First, using sinc interpolation, the data from each slice were adjusted for slice acquisition time and then all data were motion corrected using SPM99 for six rigid body motions (displacement in the x, y, z direction and rotation: for pitch, roll, yaw). Flags were set for de-correlation and masking so a pixel was set to zero for every time point if it moved outside the volume. Functional image data were motion corrected by realigning the time sequence imaging to the first image in the middle run of the sequence using SPM99.
Individual subject data (responses to the deviants, standards and control stimuli) were analyzed using a General Linear Model on each voxel in the entire brain volume (Rajeevan et al. 2007). The data were normalized to a signal measure of 100 and were spatially smoothed with a 8.08 mm Gaussian kernel to account for variations in the location of activation across participants. The output maps are normalized beta-maps which are in the acquired space (3.44 mm × 3.44 mm × 6 mm).
To take these data into a common reference space, three registrations were calculated within the Yale BioImage Suite software package (http://www.bioimagesuite.org/, Papademetris et al. 2006) using the intensity-only component of the method reported in Papademetris et al. (2004). The first registration performs a linear registration between the individual subject raw functional image and that subject’s 2D anatomical image. The 2D anatomical image was then linearally registered to the individual’s 3D anatomical image. The 3D differs from the 2D in that it has a 1 × 1 × 1 mm resolution whereas the 2D z-dimension is set by slice-thickness and its x–y dimensions are set by voxel size. Finally, a non-linear registration is computed between the individual 3D anatomical image and a reference 3D image. The reference brain used was the Colin27 Brain (Holmes et al. 1998) commonly applied in SPM and other software packages. All three registrations were applied sequentially to the individual normalized beta-maps to bring all data into the common reference space.
Statistical analyses
We used a two-stage random-effects model to analyze the data. In the first stage, statistical maps were calculated in the comparison of interest for each subject as described above. In the second stage, across subject analyses were conducted and the distribution of the individual subjects’ statistics were tested for significance.
Within subject analyses
For each of the three deviants (“deh”, “day” and “tay”) the following six types of maps were calculated: (1) Deviant maps computed from oddball runs extracting the response to low frequency stimuli (2) Standard maps computed from oddball runs extracting the response to high frequency stimuli (3) Deviant control maps computed from control runs extracting the response to the same physical stimuli that served as deviants in the oddball runs (deviant-counterparts) (4) Standard control maps computed from control runs extracting the response to the same physical stimuli that served as standards in the oddball runs (standard-counterparts). The contrasts extracting each deviant-counterpart and standard-counterpart stimuli were carried out against the same baseline that formed part of the “filler” stimuli, i.e., “beh”, “peh”, “bay” and “pay”. The stimuli that served as deviants and standards (i.e., “deh”, “day”, “teh”, “tay”) were taken out of the baseline so that their extraction could be conducted against a common baseline. (5) A Word map computed from control runs contrasting the response to the meaningful word (“day”) with the non-words that formed part of the “filler” stimuli, i.e., “deh”, “beh”, “peh”. (6) A Non-word map computed from control runs contrasting the response to the non-word (“tay”) with the non-words that formed part of the “filler” stimuli (i.e., “deh”, “beh”, “peh”.) The non-word “tay” was excluded from the baseline, since it was the only non-word with a diphthong (/eI/), and to avoid contrasting it with itself.
Across subject analyses
Composite maps. Two types of composites were created: three composites of deviant maps, one for each deviant (“deh”, “day”, “tay”) and three composites of the deviant-control maps, one composite for each control run that included either “deh”, “day” or “tay” as a deviant-counterpart, respectively.
Paired t tests. The contrast between deviants and their respective counterparts isolates the cognitive component of deviance detection. For this contrast, a paired t test was calculated between the Deviant map and the Deviant control map. The contrast between standards and their respective counterparts accounts for the sensory component based on refractoriness. For this contrast, a paired t test was calculated between the Standard map and the Standard control map.
Statistical images were corrected for multiple comparisons over the whole brain using a magnitude threshold derived from Monte–Carlo simulation that takes into account the number of contiguous activated voxels (Forman et al. 1995). Individual voxel thresholds were set at P < 0.001 for both the composites and paired t test maps. Data were corrected for multiple comparisons by spatial extent of contiguous supra-threshold individual voxels (experiment-wise P < 0.001 for a cluster). In a Monte–Carlo simulation within the AFNI software package, using a smoothing kernel of 8.08 mm, a connection radius of 7.72 mm on 3.48 mm × 3.48 mm × 6 mm voxels, it was determined that an activation volume of 1,278 μl satisfied the P < 0.001 threshold.
Regions of interest analyses. The different regions of interest (ROIs) were identified on the basis of the results obtained by the paired t test maps prior to cluster-size correction (P < 0.001) reflecting the cognitive and sensory mechanisms of change-detection. Then, each participant’s model estimate of the percent signal change for each region of activation, averaged across voxels within the region was calculated for each of the six composite maps (three deviant maps and three deviant-control maps). The ROIs identified were analyzed in a 3 (ROI: anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus) × 2 (task-type: “oddball”/“control”) × 3 (deviant: “deh”, “day”, “tay”) × 2 (laterality: left/right) repeated measures ANOVA with subjects (n = 20) as a random factor.
ROI analysis for the Word/Non-word maps. To further investigate the “word effect”, each participant’s model estimate of the percent signal change for each region of activation, averaged across voxels within the region was calculated for the Word map and Non-word map. Both types of maps were calculated from control blocks. Specifically, to create the Word map “day” was contrasted with a baseline comprised of “filler” or contextual stimuli that were non-words (i.e., “beh”, “peh”, “deh”, “the”). To create the Non-Word map “tay” was contrasted with the same baseline. The delta blood-oxygen-level-dependent (BOLD) measures were subjected to a 2 (Laterality: left/right) × 3 (ROI: anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus) × 2 [Lexical Status: word (“day”) vs. non-word (“tay”)] repeated measures ANOVA with subjects (n = 20) as a random factor.
Results
Behavioral results
There were no differences in response accuracy scores, computed across the three possible positions of the outlier, among the three CVs. The response accuracy for “day” was 95.88% (±5.01), for “tay” 96.70 (±5.15) and for “deh” 95.33 (±4.22) (the number in brackets denotes the standard deviation). A paired t test that was run on the individual response accuracy data for each CV confirmed that there was no difference in accuracy levels between CVs (in all three possible comparisons P > 0.1). We have also examined the response accuracy scores for the first and third positions in the triplet for each of the CVs. The response accuracy scores ranged between 98 and 99% among the three different CV stimuli as well as between the first and third positions (since there were only four triplets per block in which the outlier appeared in the middle, the middle position was excluded from the latter calculation).
Table 3 displays the reaction time data for each deviant and for each of the three possible positions of the outlier within a triplet. As expected, it could be seen that the reaction time for the third position in the triplet was the shortest. To evaluate this trend statistically a two-way ANOVA was run with Stimulus (“day”, “tay”, “deh”) and Position (first, second, third) as factors and with Greenhouse–Geisser adjustment to account for sphericity. Only the Position factor was significant [F(2,38) = 28.70, P < 0.001]. Paired comparisons (Bonferroni corrected, P < 0.05) confirmed that reaction time associated with stimuli in the first and second position in the triplet was prolonged relative to that associated with the third stimulus in the triplet. Taken together, there is dissociation between response accuracy and reaction time data. While no differences in response accuracy were found, reaction time was shorter when the outlier CV stimulus was in the third position in the triplet.
This confirms our prediction that reaction time will be shorter in case of an outlier presented in the third position following two identical stimuli. However, the fact that the main effect of Stimulus as well as the interaction between Stimulus and Position did not reach significance confirms that the stimuli were equally discriminable in the context of the “oddball” paradigms used in this experiment.
Composite maps
Deviance effect
The spread of activation associated with “tay” deviant was very similar to that in response to “day” (Fig. 1). To elucidate the positions and extents of the activated brain regions associated with the similar responses to the deviants “day” and “tay”, images in the coronal and sagittal planes were included (Fig. 1, bottom panels).
The size of the right STG region activated in the word condition (i.e., “day”), was similar to that activated in the non-word condition (i.e., “tay”) (Table 4). In contrast, the size of the left STG region activated in the word condition was almost twofold larger than that activated by the “tay” condition. In both hemispheres, the brain regions activated in the “day” and the “tay” conditions were larger than those observed for the “deh” condition. Furthermore, the “day”, “tay” and “deh” stimuli appear to activate different regions of the STG. The “day” stimulus activated parts of the posterior STG (i.e., BA 22/42), whereas the “tay” and “deh” stimuli activated large parts of the superior and middle temporal cortices (i.e., BA 22/22) (Table 4).
Similar brain regions to those implicated in the deviance response were also found in the study of Opitz et al. (2005) in the context of tones. Specifically, in that study the deviance effect also implicated HG (primary auditory cortex) and the superior temporal plane (Talairach coordinates of peak location: −49, −14, 9; 53, −21, 10, respectively).
Deviant control maps
While activity extended from the STG across the lateral sulcus to the central sulcus in response to “day”, “deh” had a similar but more focal response in the vicinity of the primary auditory cortex. To elucidate the positions and extents of the activated brain regions associated with the more similar responses to the deviants “day” and “deh”, images in the coronal and sagittal planes were included (Fig. 2, bottom panels). In response to “tay” activity in the STG did not reach significance and activation was only observed in the left precentral gyrus (Fig. 2, Table 5).
The size of STG activation was the most extensive over the left STG for deviant “day” in comparison to “tay” where it did not reach statistical significance, and “deh” where it was more than five times smaller (Fig. 2, Table 5). In addition, in response to “day” a negative differential BOLD was evident in the left superior frontal gyrus and left occipital pole (Fig. 2, Table 5).
Paired t test maps
Cognitive effect
The cognitive effect was derived by contrasting the Deviant maps with the corresponding Deviant control maps. Only the contrast map for “day” showed activations at the significance level used throughout this study (P < 0.001, corrected for multiple comparisons), reflecting the “word-advantage” effect. The negative differential BOLD associated with the cognitive effect implicated bilaterally the region extending from the postcentral gyrus to HG including parts of the Sylvian fissure, lateral sulcus and insula (Table 6, Fig. 3). The location of this region is proximal to the one obtained by Opitz et al. (2005) that was associated with the cognitive mechanism and located bilaterally in a non-primary auditory area within the lateral temporal lobe in the anterior rim of HG (Talairach coordinates of peak location: −42, −13, 6 and 49, −12, 7).
Refractoriness effect
By looking at the t test images associated with refractoriness (Fig. 4) a negative differential BOLD could be observed posterior to HG in response to each of the standard-deviant pairs (Table 7). In case of the refractoriness effect when “deh” served as the standard activity also extended to the superior temporal sulcus (Fig. 4, left panel). Opitz et al. (2005) have also found activations in a proximal brain region in the lateral aspect of the posterior rim of HG bilaterally that was associated with the sensory mechanism of change detection (Talairach coordinates of peak location: −51, −18, 7 and 53, −19, 4).
From here on, we will refer to the region posterior to HG (Fig. 4, Table 7) as posterior auditory cortex. This region was defined functionally as showing a refractoriness effect in the current study. The region anterior to HG extending from the postcentral gyrus to HG (Fig. 3, Table 6) will be referred to as anterior auditory cortex. This region was defined functionally as showing a cognitive effect in the current study. This terminology is based on the one used by Opitz et al. (2005) that described the regions implicated in the sensory and cognitive effects as residing in the vicinity of the posterior and anterior rim of HG, respectively.
In addition, when the sensory component was extracted for “deh” standard (Fig. 4, left panel and Table 7) positive differential BOLD was observed in the superior/middle frontal gyrus as well as in the anterior cingulate cortex (ACC). Thus, the positive differential BOLD observed in the ACC and in the frontal gyrus is a direct result of contrasting “deh” with a baseline of “day” deviants to create the map reflecting the sensory component for “deh” standard.
ROI analysis
Six ROIs were chosen to be included in the ROI analysis. These ROIs were defined on the basis of previous findings and the data obtained in the current study. We have chosen the ROIs which were directly linked to the mechanisms associated with change detection, namely, the cognitive and sensory effects, or those that might be related to both change-detection and to the processing of linguistic stimuli. This is described in more detail in the section “Discussion”.
The following ROIs were identified on the basis of the paired t test maps. From the paired t test map reflecting the cognitive effect (Fig. 3) the anterior auditory cortex was chosen bilaterally. From the paired t test map reflecting the refractoriness effect when “day” served as a standard the posterior auditory cortex was chosen bilaterally (Fig. 4, middle panel). From the paired t test map reflecting the sensory component when “deh” served as the standard the superior/middle frontal gyrus was chosen bilaterally (Fig. 4, left panel, Table 7). Note that all the ROIs were defined based on the t-maps prior to cluster-size correction (P < 0.001). All except the right superior/middle frontal gyrus were apparent in the t test maps after cluster-size correction.
Figure 5 displays the average delta BOLD for each condition (“oddball”/“control”) and hemisphere (left/right) as a function of deviant stimulus and ROI. From Fig. 5 it can be seen that the largest differences in delta BOLD between conditions (“oddball”/”control”) are evident over the left hemisphere, mainly the anterior auditory cortex and superior frontal gyrus. These were larger for “day” than “deh” and smallest for “tay”. The latter was associated with diminished levels of delta BOLD within the posterior auditory cortex in the control condition.
The ROI data was subjected to a four-way ANOVA with Laterality (Left, Right), ROI (anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus), Deviant (“deh”, “day”, “tay”) and Task type (Control, Oddball) as independent factors, with an appropriate adjustment for sphericity (Greenhouse–Geisser correction). A main effect of ROI was found [F(2,38) = 49.17, P < 0.001], as well as a Laterality × ROI [F(2,38) = 5.39, P < 0.05], Deviant × ROI [F(4,76) = 4.23, P < 0.05], Task Type × ROI [F(2,38) = 7.25, P < 0.01], and Deviant × Task Type × ROI [F(4,76) = 4.60, P < 0.01] interactions. The Laterality × ROI interaction was due to enhanced delta BOLD over the left hemisphere associated with the anterior auditory cortex (P < 0.05, Bonferroni corrected).
The Deviant × Task Type × ROI interaction is depicted in Fig. 6. A set of paired comparisons (Bonferroni corrected, P < 0.05) were conducted separately within each task-type (“oddball”/”control”). In the “oddball” condition the comparisons were conducted among the three deviants for the posterior auditory cortex only, where a gradient of delta BOLD could be observed as a function of deviant type. In the “control” condition comparisons were performed among the three deviants for each of the ROIs. In the “oddball” condition a significantly larger delta BOLD was found in response to “tay” than to “deh” in the posterior auditory cortex. In the “control” condition delta BOLD to the word “day” in the anterior and posterior auditory cortices was significantly larger than that associated with the non-word “tay”.
In a separate set of paired comparisons (Bonferroni corrected, P < 0.05) the averages of delta BOLD between conditions (“oddball”/”control”) were compared for each of the three deviant stimuli and each of the ROIs (across hemispheres). A larger delta BOLD was found for the response to “day” in the “control” condition relative to the “oddball” condition implicating the anterior auditory cortex.
ROI analysis for the Word and Non-word maps
To further investigate the “word effect”, each participant’s model estimate of the percent signal change for each region of activation, was calculated separately for the Word map and Non-word map (Fig. 7, Table 8). The ROI data was subjected to a three-way ANOVA with Laterality (left, right), ROI (anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus) and Lexical Status [word (“day”), non-word (“tay”)] as independent factors, with an appropriate adjustment for sphericity (Greenhouse–Geisser correction). A main effect of ROI was found [F(2,38) = 21.03, P < 0.001] as well as a Lexical Status × ROI interaction [F(2,38) = 6.05, P < 0.01] (Fig. 8). Paired comparisons (Bonferroni corrected, P < 0.05) found a significant difference between “day” and “tay” only within the anterior auditory cortex.
The distribution of percent signal change across subjects
To learn more about the consistency and extent of activation of specific brain regions, data were analyzed for individual subjects. Specifically, the percent signal change of individual subjects was computed for each of the six ROIs chosen for the group level analysis as well as for additional two brain regions that might be implicated in speech perception, that is, the left occipital pole and the left precentral gyrus. The results showed that the majority of subjects within each statistical test showed the same pattern of activity (see Table 9).
Discussion
Main findings
Overall, the results of the current study confirmed our hypotheses. Specifically, the sensory mechanism was located in the posterior auditory cortex (Fig. 4, Table 7) and the cognitive mechanism in the anterior auditory cortex (Fig. 3, Table 6). These locations match, respectively, those found in the context of tones (Opitz et al. 2005). Moreover, as expected, the cognitive effect was statistically significant only when the deviant was a word (i.e., “day”).
The similarity between the brain regions activated by speech in this study and the regions activated by tones (Opitz et al. 2005) could not be explained by the analysis of the physical sound features. First, the contribution of the acoustic component to the cognitive mechanism in this study was removed by contrasting deviants with their physical identical control counterparts. Second, our data suggest that the salience of the sounds, either tones or speech stimuli, accounts for the activation of the anterior auditory cortex (Jääskeläinen et al. 2004; Ahveninen et al. 2006) (see also ‘Support for the existence of “what” and “where” auditory streams’). Thus, the acoustic component could only account for the similar activation observed in the posterior HG. This is also in line with previous findings that link dorsal temporal brain areas including HG and planum temporale with the analysis of auditory features of complex sounds (Binder et al. 1996; Wessinger et al. 2001; Hall et al. 2002, 2003; Seifritz et al. 2002).
Activation loci in superior and middle temporal areas that were found to be activated in the Deviant (Fig. 1) and Deviant control maps (Fig. 2) had also been earlier reported to contribute to lexical and semantic processing (Price 2000; Salmelin et al. 2000; Scott and Johnsrude 2003; Hickok and Poeppel 2004). Similarly, the left dominance for processing intelligible speech that was found in our study in the “oddball” condition match previous results that link the left posterior STG to higher level linguistic processes (Narain et al. 2003).
Support for the existence of “what” and “where” auditory streams
The results of this study are in agreement with the segregation of the auditory system into “what” and “where” processing streams (Jääskeläinen et al. 2004; Ahveninen et al. 2006) associated with the analysis of auditory object content and location features that reside in areas anterior and posterior to primary auditory cortex, respectively (Rauschecker and Tian 2000). It was suggested (Jääskeläinen et al. 2004) that the “where” system is responsible both for fast analysis of sound location which is important for attentional orienting, and for detecting the degree of sound novelty, which affects its degree of distractibility. Specifically, the degree to which unattended novel sounds distracted visual forced-choice task performance coincided with the extent that the posterior N1 response was released from inhibition. On the other hand, the anterior N1 response was associated with the processing of fine object features (Jääskeläinen et al. 2004). Similarly, the results in our study indicate that the deviant word was released from inhibition in the “control” condition because of its lexical status and that this “word advantage effect” was located anterior to HG. This is in agreement with the finding that neurons in the anterior auditory cortex are more sharply tuned to phonetic features of sounds (Ahveninen et al. 2006).
The effect of speech perception
Despite the general similarity to the results obtained in the context of tones, some of the results are specific to speech perception, as follows. In the “oddball” condition the brain regions activated in response to the deviants “day” and “tay” were larger than those observed for the “deh” deviant (Deviant maps, Fig. 1). In addition, in the control condition the precentral gyrus was activated in response to both “day” and “tay” (Fig. 2). These similarities between “day” and “tay” may reflect an extra processing effort associated with the diphthong vowel /eI/ shared by “day” and “tay” (Sonty et al. 2003; Bohland and Guenther 2006). However, the fact that the similarity between the responses to “day” and “tay” diminished in the control condition (Fig. 2) may indicate that the /eI/ diphthong associated with deviant “day” was more salient among repetitive “deh” standards (“oddball” condition) than among the variable filler stimuli (“control” condition) (Nordby et al. 1994; Sabri and Campbell 2000).
The more similar responses between “day” and “deh” in the control condition (Fig. 2) may be explained by the shared /e:/ monophthong that might have elicited an expectation to hear a meaningful word (“day”) while presented with “deh”. This expectation could account for the more extensive response in the STG relative to “tay” (Fig. 2). The finding that activation in the left anterior auditory cortex dissociates the “oddball” and “control” conditions strengthens this interpretation (Scott et al. 2006) as well as the observed increase in the percent signal change in that region associated with both “deh” and “day” (Fig. 5, “Left” panels). Thus, in the “control” condition (Fig. 2), an interaction between lexical and acoustic features affected the results.
There were additional findings indicating left hemisphere dominance, as follows. The size of STG activation was the most extensive over the left STG for “day” in comparison to “tay” where it did not reach statistical significance and “deh” where it was more than five times smaller (Deviant control maps, Fig. 2, Table 5). Furthermore, the “day” control map revealed negative differential BOLD in the left superior frontal gyrus as well as in the left occipital pole (Fig. 2, Table 5). Left hemisphere dominance was also evident in the middle/superior frontal gyrus in the sensory mechanism map for “deh” (Fig. 4, Table 7). In addition, the four-way ANOVA (Fig. 5) has shown.
This pattern of left hemisphere dominance is in agreement with a number of imaging and clinical studies (for a review see Tervaniemi and Hugdahl 2003). The left middle/superior frontal gyrus activations are in line with previous findings implicating these brain regions with tasks engaging phonological working memory (Paulesu et al. 1993; Burton et al. 2000; Siok et al. 2003; LoCasto et al. 2004) and with the suggestion that a significant portion of active frontal areas is recruited for extracting acoustic information and maintaining it in memory (LoCasto et al. 2004). The negative differential activation evident in the left occipital pole associated with the “day” control map (Fig. 2) may indicate that meaningful words are more likely to elicit activity in visual processing regions (Billingsley-Marshall et al. 2007).
It is noteworthy that the sensory mechanism map for “deh” revealed a positive differential BOLD in the ACC (Fig. 4, Table 7). The ACC is implicated in initiating or inhibiting action and is considered to be part of a larger network that includes medial/lateral frontal, prefrontal and temporal regions (Wang et al. 2005; Dias et al. 2006; Gold et al. 2006). Hence, it is possible that in our study inhibitory activations occurred in the ACC (as well as in the superior frontal gyrus) in response to “day” since it was a meaningful stimulus that interfered with the main counting task (see also Rinne et al. 2005). Since standard “deh” was contrasted with deviant “day” to create the “deh” standard map, the positive differential BOLD located at the ACC and middle/superior frontal gyrus (Fig. 4, “deh”) might have been caused by a reduced activation associated with the response to deviant “day”.
A possible effect of the lexical status in the “oddball condition
It appears that the lexical status of the deviant stimulus affected its processing not only in the “control” condition but also in the “oddball” condition in which the effects of adaptation caused by the repeating standards were more pronounced. Specifically, the “day” stimulus activated parts of the posterior STG (i.e., BA 22/42), whereas the “tay” and “deh” stimuli activated large parts of the superior and middle temporal cortices (i.e., BA 21/22) (Table 4). However, the size of the left STG region activated by “day” deviant was almost twofold larger than that activated by “tay” deviant (Deviant maps, Fig. 2, Table 4). This is in agreement with the assumption that the left posterior STG is the focus of a multi-modal network associated with language comprehension (Aboitiz and Garcia 1997; Narain et al. 2003). More strongly left-lateralized posterior superior-temporal activation is associated with analysis of speech sounds for mapping onto higher levels of language processing (e.g., syllable, word) (Price et al. 1992; Zatorre et al. 1996). Thus, although the word advantage effect was not salient in the “oddball” condition, the different spread of activation between “day” and the other non-words may indicate differential processing based on the lexical status of the deviant stimulus.
Summary
Taken together, the results of the current study corroborate the existence of two independent mechanisms contributing to the change-detection response (Opitz et al. 2005; Hoshiyama et al. 2007; Maess et al. 2007): a sensory mechanism reflected by different refractory states of those subpopulations activated by the standard and the deviant and a cognitive mechanism which relies on auditory sensory memory representations which gives rise to the word advantage effect. Thus, on the one hand, our results support the view that the MMN represents a change-detection mechanism functionally and spatially distinct from an afferent input population (N1 generators) (Näätänen et al. 2005). On the other hand, our results indicate that a release from the inhibitory effects of adaptation is a prerequisite for the full realization of the significance of the deviant stimulus. This is in line with the adaptation hypothesis that assumes that the posterior auditory cortex gates novel sounds to awareness (Jääskeläinen et al. 2004).
Conclusion
Our findings serve to unify the two opposing views suggested by Jääskeläinen et al. (2004) and Näätänen et al. (2005). Specifically, the gate to awareness for auditory deviation (Jääskeläinen et al. 2004; Näätänen et al. 2005) relies on adaptation that modulates the extent to which novel sounds will be accessible to memory-based processes. In case of the present study, the sensory component serves to modulate the salience of the speech stimulus by the degree to which it will be accessible to cortical memory traces for speech sounds (Pulvermüller and Shtyrov 2006). Furthermore, the lexical status of the speech stimulus interacts with acoustic factors exerting a top-down effect on the novelty value of the auditory object that affects, in turn, its degree of accessibility to the cognitive component.
References
Aboitiz F, Garcia V (1997) The evolutionary origin of the language areas in the human brain. A neuroanatomical perspective. Brain Rev 25:381–396
Ahveninen J, Jääskeläinen IP, Raij T, Bonmassar G, Devore S, Hamalainen M, Levanen S, Lin FH, Sams M, Shinn-Cunningham BG, Witzel T, Belliveau JW (2006) Task-modulated “what”, “where” pathways in human auditory cortex. Proc Natl Acad Sci USA 103:14608–14613
Billingsley-Marshall RL, Clear T, Mencl WE, Simos PG, Swank PR, Men D, Sarkari S, Castillo EM, Papanicolaou AC (2007) A comparison of functional MRI and magnetoencephalography for receptive language mapping. J Neurosci Methods 161:306–313
Binder JR, Frost JA, Hammeke TA, Rao SM, Cox RW (1996) Function of the left planum temporale in auditory and linguistic processing. Brain 119:1239–1247
Binder JR, Frost JA, Hammeke TA, Bellgowan PS, Springer JA, Kaufman JN, Possing ET (2000) Human temporal lobe activation by speech and nonspeech sounds. Cereb Cortex 10:512–528
Bohland JW, Guenther FH (2006) An fMRI investigation of syllable sequence production. Neuroimage 32:821–841
Burton MW, Small SL, Blumstein SE (2000) The role of segmentation in phonological processing: an fMRI investigation. J Cogn Neurosci 12:679–690
Desimone R (1992) The physiology of memory: recordings of things past. Science 258:245–246
Dias EC, McGinnis T, Smiley JF, Foxe JJ, Schroeder CE, Javitt DC (2006) Changing plans: neural correlates of executive control in monkey and human frontal cortex. Exp Brain Res 174:279–291
Donchin E, Spencer KM Dien J (1997) The varieties of deviant experience: ERP manifestations of deviance processors. In: van Box GJM, Bocker KBE (eds) Brain and behavior: past, present and future. Tilburg University Press, Tiburg, pp 67–91
Forman SD, Cohen JD, Fitzgerald JD, Eddy WF, Mintun MA, Noll DC (1995) Improved assessment of significant activation in functional magnetic resonance imaging (fMRI): use of a cluster-size threshold. Magn Reson Med 33:636–647
Gold BT, Balota DA, Jones SJ, Powell DK, Smith CD, Andersen AH (2006) Dissociation of automatic and strategic lexical-semantics: functional magnetic resonance imaging evidence for differing roles of multiple frontotemporal regions. J Neurosci 26:6523–6532
Hall DA, Johnsrude IS, Haggard MP, Palmer AR, Akeroyd MA, Summerfield AQ (2002) Spectral and temporal processing in human auditory cortex. Cereb Cortex 12:140–149
Hall DA, Hart HC, Johnsrude IS (2003) Relationships between human auditory cortical structure and function. Audiol Neurootol 8:1–18
Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92:67–99
Holmes CJ, Hoge R, Collins L, Woods R, Toga AW, Evans AC (1998) Enhancement of MR images using registration for signal averaging. J Comput Assist Tomogr 22:324–333
Hoshiyama M, Okamoto H, Kakigi R (2007) Priority of repetitive adaptation to mismatch response following undiscriminable auditory stimulation: a magnetoencephalographic study. Eur J Neurosci 25:854–862
Jääskelainen IP, Ahveninen J, Bonmassar G, Dale AM, Ilmoniemi RJ, Levanen S, Lin FH, May P, Melcher J, Stufflebeam S, Tiitinen H, Belliveau JW (2004) Human posterior auditory cortex gates novel sounds to consciousness. Proc Natl Acad Sci USA 17:6809–6814
Jacobsen T, Schröger E, Alter K (2004) Pre-attentive perception of vowel phonemes from variable speech stimuli. Psychophysiology 41:654–659
LoCasto PC, Krebs-Noble D, Gullapalli RP, Burton MW (2004) An fMRI investigation of speech and tone segmentation. J Cogn Neurosci 16:1612–1624
Maess B, Jacobsen T, Schröger E, Friederici AD (2007) Localizing pre-attentive auditory memory-based comparison: magnetic mismatch negativity to pitch change. Neuroimage 37:561–571
Näätänen R (1990) The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behav Brain Sci 13:201–288
Näätänen R (1992) Attention and brain function. Erlbaum, Hillsdale
Näätänen R, Alho K (1997) Mismatch negativity-the measure for central sound representation accuracy. Audiol Neurootol 2:341–353
Näätänen R, Jacobsen T, Winkler I (2005) Memory-based or afferent processes in mismatch negativity (MMN): a review of the evidence. Psychophysiology 42:25–32
Näätänen R, Picton T (1987) The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology 24:375–425
Näätänen R, Tervaniemi M, Sussman E, Paavilainen P, Winkler I (2001) ‘Primitive intelligence’ in the auditory cortex. Trends Neurosci 24: 283–288
Näätänen R,Winkler I (1999) The concept of auditory stimulus representation in cognitive neuroscience. Psychol Bull 12: 826–859
Narain C, Scott SK, Wise RJ, Rosen S, Leff A, Iversen SD, Matthews PM (2003) Defining a left-lateralized response specific to intelligible speech using fMRI. Cereb Cortex 13:1362–1368
Nordby H, Hammerborg D, Roth WT, Hugdahl K (1994) ERPs for infrequent omissions and inclusions of stimulus elements. Psychophysiology 31:544–552
Obleser J, Zimmermann J, Van Meter J, Rauschecker JP (2007) Multiple stages of auditory speech perception reflected in event-related FMRI. Cereb Cortex 17:2251–2257
Opitz B, Schröger E, von Cramon DY (2005) Sensory and cognitive mechanisms for preattentive change detection in auditory cortex. Eur J Neurosci 21:531–535
Papademetris X, Jackowski M, Rajeevan N, DiStasio M, Okuda H, Constable RT, Staib LH (2006) BioImage suite: an integrated medical image analysis suite: an update. ISC/NA-MIC Workshop on Open Science at MICCAI 2006, http://hdl.handle.net/1926/209
Papademetris X, Jackowski AP, Schultz RT, Staib LH, Duncan JS (2004) Integrated intensity and point-feature nonrigid registration. MICCAI 1:763–770
Paulesu E, Frith CD, Frackowiak RS (1993) The neural correlates of the verbal component of working memory. Nature 362:342–345
Pettigrew CM, Murdoch BE, Ponton CW, Finnigan S, Alku P, Kei J, Sockalingam R, Chenery HJ (2004) Automatic auditory processing of english words as indexed by the mismatch negativity, using a multiple deviant paradigm. Ear Hear 25:284–301
Price CJ (2000) The anatomy of language: contributions from functional neuroimaging. J Anat 197:335–359
Price C, Wise R, Ramsey S, Friston K, Howard D, Patterson K (1992) Regional response differences within the human auditory cortex when listening to words. Neurosci Lett 146:179–182
Pulvermüller F, Shtyrov Y (2006) Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes. Prog Neurobiol 79:49–71
Pulvermüller F, Shtyrov Y, Ilmoniemi RJ (2003) Spatio-temporal patterns of neural language processing: an MEG study using minimum-norm current estimates. Neuroimage 20:1020–1025
Rajeevan N, Papademetris X, Jackowski MP, Okuda H, Constable RT, Staib LH (2007) In: Abstract submitted to the organization for human brain mapping (OHBM) 13th annual meeting, Chicago, IL, USA
Rauschecker JP, Tian B (2000) Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci USA 97:11800–11806
Rinne T, Degerman A, Alho K (2005) Superior temporal and inferior frontal cortices are activated by infrequent sound duration decrements: an fMRI study. Neuroimage 26:66–72
Romani GL, Williamson SJ, Kaufman L (1982) Tonotopic organization of the human auditory cortex. Science 216:1339–1340
Sabri M, Campbell KB (2000) Mismatch negativity to inclusions and omissions of stimulus features. Neuroreport 11:503–1507
Salmelin R, Helenius P, Service E (2000) Neurophysiology of fluent and impaired reading: a magnetoencephalographic approach. J Clin Neurophysiol 17:163–174
Schröger E (1997) On the detection of auditory deviations: a pre-attentiveactivation model. Psychophysiology 34:245–257
Scott SK, Johnsrude IS (2003) The neuroanatomical and functional organization of speech perception. Trends Neurosci 26:100–107
Scott SK, Rosen S, Lang H, Wise RJ (2006) Neural correlates of intelligibility in speech investigated with noise vocoded speech—a positron emission tomography study. J Acoust Soc Am 120:1075–1083
Seifritz E, Esposito F, Hennel F, Mustovic H, Neuhoff JG, Bilecen D, Tedeschi G, Scheffler K, Di Salle F (2002) Spatiotemporal pattern of neural processing in the human auditory cortex. Science 297:1706–1708
Shtyrov Y, Pihko E, Pulvermüller F (2005) Determinants of dominance: is language laterality explained by physical or linguistic features of speech? Neuroimage 27:37–47
Siok WT, Jin Z, Fletcher P, Tan LH (2003) Distinct brain regions associated with syllable and phoneme. Hum Brain Mapp 18:201–207
Sonty SP, Mesulam MM, Thompson CK, Johnson NA, Weintraub S, Parrish TB, Gitelman DR (2003) Primary progressive aphasia: PPA and the language network. Ann Neurol 53:35–49
Talairach J, Tournoux P (1988) Co-planar stereotaxis atlas of the human brain. Thieme, New York
Tampas JW, Harkrider AW Hedrick MS (2005) Neurophysiological indices of speech and nonspeech stimulus processing. J Speech Lang Hear Res 48:1147–1164
Tervaniemi M, Hugdahl K (2003) Lateralization of auditory-cortex functions. Brain Res Brain Res Rev 43:231–246
Ulanovsky N, Las L, Nelken I (2003) Processing of low-probability sounds by cortical neurons. Nat Neurosci 6:391–398
Wang C, Ulbert I, Schomer DL, Marinkovic K, Halgren E (2005) Responses of human anterior cingulate cortex microdomains to error detection, conflict monitoring, stimulus-response mapping, familiarity, and orienting. J Neurosci 25:604–613
Warren JD, Griffiths TD (2003) Distinct mechanisms for processing spatial sequences and pitch sequences in the human auditory brain. J Neurosci 23:5799–5804
Wessinger CM, VanMeter J, Tian B, Van Lare J, Pekar J, Rauschecker JP (2001) Hierarchical organization of the human auditory cortex revealed by functional magnetic resonance imaging. J Cogn Neurosci 13:1–7
Zatorre R, Meyer E, Gjedde A, Evans A (1996) PET studies of phonetic processing of speech: review, replication and reanalysis. Cereb Cortex 6:21–30
Acknowledgments
We thank Jen Roth, Michelle Hampson, Pawel Skudlarski, Robin J. Schafer, Füyûze S. Tokoglu and Jagriti Arora for informative discussions about statistical analyses. We thank Hedy Sarofin, Karen Martin, Terry Hickey and Cheryl McMurray for technical assistance. We also thank Alice Faber from Haskins Laboratories for advice on stimulus selection. This material is based upon work supported by NIH NS 051622. BioImage Suite (http://www.bioimagesuite.org) is supported, in part, by the NIH/NIHBIB under grant 1 R01 EB006494-01.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Laufer, I., Negishi, M., Rajeevan, N. et al. Sensory and cognitive mechanisms of change detection in the context of speech. Brain Struct Funct 212, 427–442 (2008). https://doi.org/10.1007/s00429-007-0167-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00429-007-0167-8