Introduction

The mismatch negativity (MMN) component of the event-related-potentials (ERPs) is assumed to reflect the operation of a pre-attentive memory-based comparison mechanism (e.g., Näätänen et al. 2005). For a wide range of stimuli, there is evidence that the MMN is elicited by a cognitive mechanism based on auditory sensory memory that compares between the incoming deviant stimulus and the standard template (Näätänen and Alho 1997; Näätänen et al. 2001).

The contrasting view is that attention switch to novel sounds is based on the transient frequency-specific adaptation of posterior auditory-cortex feature-detector neurons (Desimone 1992; Näätänen 1990, 1992; Ulanovsky et al. 2003; Jääskeläinen et al. 2004). Specifically, it has been suggested that MMN arises because of selective adaptation of the N1 response by preceding standard stimuli (sensory component) leading to its attenuation. This attenuation is interpreted in terms of neural refractoriness (Näätänen 1992). Thus, this account of MMN elicitation does not rely on memory representations and is sensorial in nature.

Previous research has shown that the MMN is sensitive to acoustic as well as to phonetic attributes of phonemes (e.g., Tampas et al. 2005). Furthermore, it was demonstrated that the MMN reflects the activation of memory networks for language sounds and spoken words (e.g., Pulvermüller et al. 2003; Shtyrov et al. 2005; Pulvermüller and Shtyrov 2006), the latter referred to as the “word advantage” effect (Pettigrew et al. 2004).

The design of the current functional magnetic resonance imaging (fMRI) study was a combination of modified protocols utilized in previous studies (i.e., Pettigrew et al. 2004; Opitz et al. 2005) that enabled to disentangle cognitive and sensory contributions to change detection (Opitz et al. 2005; Maess et al. 2007) when the deviant stimulus could be either a word (“day”) resulting in the “word advantage” effect, or a non-word deviant (“deh” or “tay”).

We hypothesized that the sensory component will be located bilaterally in regions posterior to Heschl’s gyrus (HG), including the posterior superior temporal gyrus (STG), regardless of the identity of the deviant stimulus (“deh”, “day” or “tay”). This component relies on frequency-specific neurons in the auditory cortex responsible for the repetition-related decrement of N1 and its counterpart, the N1m (Romani et al. 1982; Jääskelainen et al. 2004; Opitz et al. 2005). Moreover, it was shown that the region posterior to HG is broadly tuned with respect to phonetic features (Ahveninen et al. 2006).

In contrast, it was shown that areas anterior to HG process sound-identity cues such as speech (Binder et al. 2000; Obleser et al. 2007; Ahveninen et al. 2006) and pitch (Warren and Griffiths 2003). Therefore, we assumed that the cognitive component which involves the representation of the current auditory event (Schröger 1997) will differentiate between the deviant word (“day”) and the non-words (“teh” and “tay”, i.e., the “word advantage” effect) and will be located anterior to HG.

Methods

Subjects

Twenty right-handed adult healthy subjects, native speakers of English, 8 women and 12 men, participated in the study. Subjects ranged in age from 23 to 28. All subjects gave informed consent to participate in the study.

Experimental protocol

The procedure is a modification of the one used by Opitz et al. (2005). Each subject participated in 12 functional imaging runs. The speech stimuli were grouped into two types of blocks “oddball” and “control”. In the first type of block, the “oddball” block, deviant exemplars occurred quasi-randomly embedded within standard stimuli so that the frequency of occurrence throughout the block was balanced among exemplars. The interval between two successive deviants varied quasi-randomly with gaps of 4, 6, 8, 10 or 12 standards with the constraint that gap size was balanced throughout the block. In the “Oddball” blocks deviants appeared 42 times out of a total of 354 stimuli (probability of occurrence = 12%).

In the “control” blocks the same physical deviants and standards as in the “oddball” blocks (deviant-counterparts and standard-counterparts, respectively), occurred quasi-randomly, while deviants were constrained by the same spacing rule mentioned above. However, each of the “control” blocks contained eight different equiprobable stimuli, including the deviant and standard counterparts. Thus, in the control runs the seven stimuli beside the deviant-counterpart served as “filler” or contextual stimuli which were added to the sequence so that the deviant-counterpart will appear at the same probability as any other stimulus in the sequence. Each of the stimuli in the “control” block (a total of eight different stimuli) repeated 42 times and appeared with equal probability which was identical to that of the deviants in the “oddball” block (12%). There were three blocks of each type (“oddball”/“control”). Each block was repeated twice. In total, 12 blocks of approximately 6 min each were randomly presented for each subject within a session.

Three non-words, /de:/ “deh”; /te:/ “teh”; and /teI/ “tay”, and one word, /deI/ “day”, were recorded from a male native English speaker in a sound-proof chamber. These stimuli comprised the following standard-deviant pairs that resulted in the three “oddball” runs: (1) deh (standard)–day (deviant); (2) day–deh and (3) teh–tay. Thus, “deh” and “day” swapped their roles as standard and deviant in the second “oddball” block, whereas the third block controlled for the acoustic change associated with the transition from a monophthong (/e:/) to a diphthong (/eI/) occurring in the first standard-deviant pair. Additional four stimuli, /pe:/ “peh”; /peI/ “pay”; /be:/ “beh”; and /beI/ “bay”, together with the previous four mentioned above were embedded within three control runs, each containing a deviant-counterpart , either “day”, “deh” or “tay” appearing with the same probability and obeying the same spacing rule as in the oddball blocks.

Three exemplars for each stimulus (e.g., deh1, deh2, deh3) were selected (out of a pool of 24 recordings per stimulus) on the basis of acoustic similarity (see Table 1 for the values of the lowest three formants). The parameters that were used to choose similar exemplars for each stimulus included the shape of the spectrogram at the voice onset, vowel durations, pitch and formant values (Hz) of the first three formants at the beginning (ca. 84 ms) and end (ca. 168 ms) of the /e/ segment (ca. 252 ms duration) of each stimulus. The stimuli were truncated to 280 ms and normalized to the same loudness level by using Adobe Audition 1.5 trial version software package. Spectral analysis of the stimuli was conducted by PRAAT software version 4.3.19 (http://www.praat.org).

Table 1 Pitch and frequencies of the main speech stimuli (in Hz)

The reason for using three tokens for each consonant-vowel stimulus was to control as much as possible for acoustic factors, other than those inherent in the structure of the stimulus, which could confound the elicitation of a deviant response. Using three different exemplars for each consonant-vowel (CV) stimulus diminished the likelihood of a contingency developing between a specific deviant-standard pair because of an uncontrolled acoustic facet associated with either the deviant or the standard.

As could be seen in Table 1 the frequency parameters across homogeneous exemplars (i.e., the three representatives of the same CV) were very similar to each other. In addition, the F 1/F 2 ratio in “teh” and “tay” is similar to the F1/F2 ratio in “deh” and “day”, respectively. Thus, the comparison between the responses to “teh” and “tay” could serve as a suitable control for the latter pair (Jacobsen et al. 2004).

The speech stimuli appeared randomly within each block and their occurrence was balanced throughout the block for standards as well as deviants. Stimuli were presented with an SOA (stimulus onset asynchrony) of 1 s at 95 dB SPL via headphones to subjects with ear-plugs (see Table 2 for an example of a “control” and “oddball” blocks in case of the “deh”–“day” contrast). Stimulus presentation was carried out by E-Prime 1.1 (1.1.4.1) (Psychology Software Tools http://www.pstnet.com).

Table 2 “Oddball” and “control” sequences for the “deh” (standard)-“day” (deviant) pair

Contrasting “oddball” deviants and standards with the corresponding stimuli in “control” runs (deviant and standard counterparts) allowed disentangling the two mechanisms of deviance detection, namely, the sensory one based on refractoriness (Näätänen and Picton 1987) from the cognitive one based on memory-based processes (Näätänen 1990; Näätänen and Winkler 1999).

The subjects were instructed to count every stimulus, press a button when they reached 100 and then reset their count and restart counting from zero again. This task was chosen to control for a possible contamination by N2b and P3b ERP-components which are elicited when deviants are being discriminated from standards (Donchin et al. 1997; Opitz et al. 2005).

Behavioral assessment of discrimination between stimuli

Thirteen subjects (out of the 20 participating in the study) rated the stimuli in a behavioral session performed outside of the magnet. The behavioral assessment was carried out in a different session. Stimuli were presented simultaneously with recorded MR scanner noise. The stimuli and the scanner noise were both presented in 62 dB SPL which was a convenient hearing level for both the stimuli and the noise presented together.

Subjects were presented with “triplets” comprised of stimuli containing /e:/ and /eI/. In each block, 15 triplets were presented separated by a 2 s interval of silence allowing the subject to respond. The stimuli were the same as those presented in the fMRI experiment. There were several patterns of triplet presentation in each block, as follows. There were three triplets in which “day” was presented first followed by two “deh”-s; three triplets in which “day” was presented last and preceded by two “deh”-s; three triplets where “deh” was presented first followed by two “day”-s and three triplets were “deh” was last and preceded by two “day”-s. In addition, there were three triplets in which “day”, “deh” and “tay” each appeared in the middle position, respectively, while the neighboring stimuli (in the first and third positions) where either the /eI/ or /e:/ counterparts (i.e., “deh” “day” “deh”; “day” “deh” “day” and “teh “tay” “teh”).

Overall, four blocks of 15 triplets each were presented to the subjects. The order of triplets was randomized within each block. The three different exemplars representing each CV stimulus were balanced across the four blocks. The subjects were required to indicate the outlier in each triplet by pressing the key (either “1”,”2” or “3” on the keyboard) that corresponded to the position of the outlier in the triplet. The outliers were assigned mainly to the extreme positions (1 or 3) in the triplet to simulate more closely the “oddball” design in which the deviant is surrounded by repetitive standards. The trials with the outlier appearing in the middle position were introduced to minimize the probability of guessing the identity of the third stimulus in the triplet after hearing the first two which were non-identical. Thus, subjects could guess the identity of the outlier with more confidence only after hearing two identical stimuli in a row.

Data acquisition parameters

Data was collected on a 3T Siemens Trio scanner. Each study began with two localizers: a three-plane localizer and a multiple-slice sagittal localizer. These were followed by the acquisition of twenty-five 6 mm T1-weighted axial slices (TR = 300 ms, TE = 2.47 ms, flip angle = 60°, FOV = 220 mm, 256 acquisition matrix). For each subject, 12 functional imaging scans were collected with slices in the same locations as the anatomical T1-weighted data. Functional images were recorded using a gradient-echo EPI sequence (TR = 1,550 ms, TE = 30 ms, flip angle = 80°, FOV = 220 mm, 64 acquisition matrix). Each functional run involved the acquisition of 245 volumes with twenty-five 6-mm axial slices. Images were converted to analyze format and the first ten volumes of each functional series were removed to account for the approach to steady-state magnetization, leaving 235 volumes for analysis.

Image analysis

Preprocessing

First, using sinc interpolation, the data from each slice were adjusted for slice acquisition time and then all data were motion corrected using SPM99 for six rigid body motions (displacement in the x, y, z direction and rotation: for pitch, roll, yaw). Flags were set for de-correlation and masking so a pixel was set to zero for every time point if it moved outside the volume. Functional image data were motion corrected by realigning the time sequence imaging to the first image in the middle run of the sequence using SPM99.

Individual subject data (responses to the deviants, standards and control stimuli) were analyzed using a General Linear Model on each voxel in the entire brain volume (Rajeevan et al. 2007). The data were normalized to a signal measure of 100 and were spatially smoothed with a 8.08 mm Gaussian kernel to account for variations in the location of activation across participants. The output maps are normalized beta-maps which are in the acquired space (3.44 mm × 3.44 mm × 6 mm).

To take these data into a common reference space, three registrations were calculated within the Yale BioImage Suite software package (http://www.bioimagesuite.org/, Papademetris et al. 2006) using the intensity-only component of the method reported in Papademetris et al. (2004). The first registration performs a linear registration between the individual subject raw functional image and that subject’s 2D anatomical image. The 2D anatomical image was then linearally registered to the individual’s 3D anatomical image. The 3D differs from the 2D in that it has a 1 × 1 × 1 mm resolution whereas the 2D z-dimension is set by slice-thickness and its xy dimensions are set by voxel size. Finally, a non-linear registration is computed between the individual 3D anatomical image and a reference 3D image. The reference brain used was the Colin27 Brain (Holmes et al. 1998) commonly applied in SPM and other software packages. All three registrations were applied sequentially to the individual normalized beta-maps to bring all data into the common reference space.

Statistical analyses

We used a two-stage random-effects model to analyze the data. In the first stage, statistical maps were calculated in the comparison of interest for each subject as described above. In the second stage, across subject analyses were conducted and the distribution of the individual subjects’ statistics were tested for significance.

Within subject analyses

For each of the three deviants (“deh”, “day” and “tay”) the following six types of maps were calculated: (1) Deviant maps computed from oddball runs extracting the response to low frequency stimuli (2) Standard maps computed from oddball runs extracting the response to high frequency stimuli (3) Deviant control maps computed from control runs extracting the response to the same physical stimuli that served as deviants in the oddball runs (deviant-counterparts) (4) Standard control maps computed from control runs extracting the response to the same physical stimuli that served as standards in the oddball runs (standard-counterparts). The contrasts extracting each deviant-counterpart and standard-counterpart stimuli were carried out against the same baseline that formed part of the “filler” stimuli, i.e., “beh”, “peh”, “bay” and “pay”. The stimuli that served as deviants and standards (i.e., “deh”, “day”, “teh”, “tay”) were taken out of the baseline so that their extraction could be conducted against a common baseline. (5) A Word map computed from control runs contrasting the response to the meaningful word (“day”) with the non-words that formed part of the “filler” stimuli, i.e., “deh”, “beh”, “peh”. (6) A Non-word map computed from control runs contrasting the response to the non-word (“tay”) with the non-words that formed part of the “filler” stimuli (i.e., “deh”, “beh”, “peh”.) The non-word “tay” was excluded from the baseline, since it was the only non-word with a diphthong (/eI/), and to avoid contrasting it with itself.

Across subject analyses

Composite maps. Two types of composites were created: three composites of deviant maps, one for each deviant (“deh”, “day”, “tay”) and three composites of the deviant-control maps, one composite for each control run that included either “deh”, “day” or “tay” as a deviant-counterpart, respectively.

Paired t tests. The contrast between deviants and their respective counterparts isolates the cognitive component of deviance detection. For this contrast, a paired t test was calculated between the Deviant map and the Deviant control map. The contrast between standards and their respective counterparts accounts for the sensory component based on refractoriness. For this contrast, a paired t test was calculated between the Standard map and the Standard control map.

Statistical images were corrected for multiple comparisons over the whole brain using a magnitude threshold derived from Monte–Carlo simulation that takes into account the number of contiguous activated voxels (Forman et al. 1995). Individual voxel thresholds were set at P < 0.001 for both the composites and paired t test maps. Data were corrected for multiple comparisons by spatial extent of contiguous supra-threshold individual voxels (experiment-wise P < 0.001 for a cluster). In a Monte–Carlo simulation within the AFNI software package, using a smoothing kernel of 8.08 mm, a connection radius of 7.72 mm on 3.48 mm × 3.48 mm × 6 mm voxels, it was determined that an activation volume of 1,278 μl satisfied the P < 0.001 threshold.

Regions of interest analyses. The different regions of interest (ROIs) were identified on the basis of the results obtained by the paired t test maps prior to cluster-size correction (P < 0.001) reflecting the cognitive and sensory mechanisms of change-detection. Then, each participant’s model estimate of the percent signal change for each region of activation, averaged across voxels within the region was calculated for each of the six composite maps (three deviant maps and three deviant-control maps). The ROIs identified were analyzed in a 3 (ROI: anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus) × 2 (task-type: “oddball”/“control”) × 3 (deviant: “deh”, “day”, “tay”) × 2 (laterality: left/right) repeated measures ANOVA with subjects (n = 20) as a random factor.

ROI analysis for the Word/Non-word maps. To further investigate the “word effect”, each participant’s model estimate of the percent signal change for each region of activation, averaged across voxels within the region was calculated for the Word map and Non-word map. Both types of maps were calculated from control blocks. Specifically, to create the Word map “day” was contrasted with a baseline comprised of “filler” or contextual stimuli that were non-words (i.e., “beh”, “peh”, “deh”, “the”). To create the Non-Word map “tay” was contrasted with the same baseline. The delta blood-oxygen-level-dependent (BOLD) measures were subjected to a 2 (Laterality: left/right) × 3 (ROI: anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus) × 2 [Lexical Status: word (“day”) vs. non-word (“tay”)] repeated measures ANOVA with subjects (n = 20) as a random factor.

Results

Behavioral results

There were no differences in response accuracy scores, computed across the three possible positions of the outlier, among the three CVs. The response accuracy for “day” was 95.88% (±5.01), for “tay” 96.70 (±5.15) and for “deh” 95.33 (±4.22) (the number in brackets denotes the standard deviation). A paired t test that was run on the individual response accuracy data for each CV confirmed that there was no difference in accuracy levels between CVs (in all three possible comparisons P > 0.1). We have also examined the response accuracy scores for the first and third positions in the triplet for each of the CVs. The response accuracy scores ranged between 98 and 99% among the three different CV stimuli as well as between the first and third positions (since there were only four triplets per block in which the outlier appeared in the middle, the middle position was excluded from the latter calculation).

Table 3 displays the reaction time data for each deviant and for each of the three possible positions of the outlier within a triplet. As expected, it could be seen that the reaction time for the third position in the triplet was the shortest. To evaluate this trend statistically a two-way ANOVA was run with Stimulus (“day”, “tay”, “deh”) and Position (first, second, third) as factors and with Greenhouse–Geisser adjustment to account for sphericity. Only the Position factor was significant [F(2,38) = 28.70, P < 0.001]. Paired comparisons (Bonferroni corrected, P < 0.05) confirmed that reaction time associated with stimuli in the first and second position in the triplet was prolonged relative to that associated with the third stimulus in the triplet. Taken together, there is dissociation between response accuracy and reaction time data. While no differences in response accuracy were found, reaction time was shorter when the outlier CV stimulus was in the third position in the triplet.

Table 3 Reaction time (ms) in the behavioral triplet test

This confirms our prediction that reaction time will be shorter in case of an outlier presented in the third position following two identical stimuli. However, the fact that the main effect of Stimulus as well as the interaction between Stimulus and Position did not reach significance confirms that the stimuli were equally discriminable in the context of the “oddball” paradigms used in this experiment.

Composite maps

Deviance effect

The spread of activation associated with “tay” deviant was very similar to that in response to “day” (Fig. 1). To elucidate the positions and extents of the activated brain regions associated with the similar responses to the deviants “day” and “tay”, images in the coronal and sagittal planes were included (Fig. 1, bottom panels).

Fig. 1
figure 1

Deviant maps: brain regions in response to deviant stimuli embedded within the “oddball” paradigms. BOLD contrasts were superimposed on a reference anatomical image (Holmes et al. 1998). Upper panels: axial sections displaying responses to the deviants “deh” (left), “day” (middle) and “tay” (right). Bottom panels: coronal (two left panels) and sagittal (two right panels) sections of the responses to the deviants “day” and “tay”. Axial sections range from z = −6 to 46 (by increments of 4 mm). Coronal sections range from y = −1 to 50 (by increments of 5 mm). Sagittal sections range from x = −57 to 59 (by increments of 8 mm). Display follows radiological convention (left side of the brain is displayed on the right side of the screen)

The size of the right STG region activated in the word condition (i.e., “day”), was similar to that activated in the non-word condition (i.e., “tay”) (Table 4). In contrast, the size of the left STG region activated in the word condition was almost twofold larger than that activated by the “tay” condition. In both hemispheres, the brain regions activated in the “day” and the “tay” conditions were larger than those observed for the “deh” condition. Furthermore, the “day”, “tay” and “deh” stimuli appear to activate different regions of the STG. The “day” stimulus activated parts of the posterior STG (i.e., BA 22/42), whereas the “tay” and “deh” stimuli activated large parts of the superior and middle temporal cortices (i.e., BA 22/22) (Table 4).

Table 4 Brain regions activated in Deviant maps (Fig. 1)

Similar brain regions to those implicated in the deviance response were also found in the study of Opitz et al. (2005) in the context of tones. Specifically, in that study the deviance effect also implicated HG (primary auditory cortex) and the superior temporal plane (Talairach coordinates of peak location: −49, −14, 9; 53, −21, 10, respectively).

Deviant control maps

While activity extended from the STG across the lateral sulcus to the central sulcus in response to “day”, “deh” had a similar but more focal response in the vicinity of the primary auditory cortex. To elucidate the positions and extents of the activated brain regions associated with the more similar responses to the deviants “day” and “deh”, images in the coronal and sagittal planes were included (Fig. 2, bottom panels). In response to “tay” activity in the STG did not reach significance and activation was only observed in the left precentral gyrus (Fig. 2, Table 5).

Fig. 2
figure 2

Deviant control maps: brain regions in response to deviant-counterparts stimuli embedded within the “control” paradigms. Upper panels: axial sections displaying responses to the deviant controls “deh” (left), “day” (middle) and “tay” (right). Bottom panels: coronal (two left panels) and sagittal (two right panels) sections of the responses to the deviant controls “day” and “deh”. Additional display details as in Fig. 1

Table 5 Brain regions activated in Deviant control maps (Fig. 2)

The size of STG activation was the most extensive over the left STG for deviant “day” in comparison to “tay” where it did not reach statistical significance, and “deh” where it was more than five times smaller (Fig. 2, Table 5). In addition, in response to “day” a negative differential BOLD was evident in the left superior frontal gyrus and left occipital pole (Fig. 2, Table 5).

Paired t test maps

Cognitive effect

The cognitive effect was derived by contrasting the Deviant maps with the corresponding Deviant control maps. Only the contrast map for “day” showed activations at the significance level used throughout this study (P < 0.001, corrected for multiple comparisons), reflecting the “word-advantage” effect. The negative differential BOLD associated with the cognitive effect implicated bilaterally the region extending from the postcentral gyrus to HG including parts of the Sylvian fissure, lateral sulcus and insula (Table 6, Fig. 3). The location of this region is proximal to the one obtained by Opitz et al. (2005) that was associated with the cognitive mechanism and located bilaterally in a non-primary auditory area within the lateral temporal lobe in the anterior rim of HG (Talairach coordinates of peak location: −42, −13, 6 and 49, −12, 7).

Table 6 Brain regions implicated in the cognitive effect (Fig. 3)
Fig. 3
figure 3

Brain regions implicated in the cognitive effect, showing less activation in the Deviant map than in the Deviant control map for “day”. BOLD contrasts were superimposed on a reference anatomical image (Holmes et al. 1998). Axial sections range from z = −6 to 46 (by increments of 4 mm)

Refractoriness effect

By looking at the t test images associated with refractoriness (Fig. 4) a negative differential BOLD could be observed posterior to HG in response to each of the standard-deviant pairs (Table 7). In case of the refractoriness effect when “deh” served as the standard activity also extended to the superior temporal sulcus (Fig. 4, left panel). Opitz et al. (2005) have also found activations in a proximal brain region in the lateral aspect of the posterior rim of HG bilaterally that was associated with the sensory mechanism of change detection (Talairach coordinates of peak location: −51, −18, 7 and 53, −19, 4).

Fig. 4
figure 4

Brain regions implicated in the sensory (refractoriness) effect, showing less activation in Standard maps than in Standard control maps. Responses for the contrasts with “deh” (left) “day” (middle) and “teh” are shown. BOLD contrasts were superimposed on a reference anatomical image (Holmes et al. 1998). Axial sections range from z = −6 to 46 (by increments of 4 mm)

Table 7 Brain regions implicated in the sensory (refractoriness) effect (Fig. 4)

From here on, we will refer to the region posterior to HG (Fig. 4, Table 7) as posterior auditory cortex. This region was defined functionally as showing a refractoriness effect in the current study. The region anterior to HG extending from the postcentral gyrus to HG (Fig. 3, Table 6) will be referred to as anterior auditory cortex. This region was defined functionally as showing a cognitive effect in the current study. This terminology is based on the one used by Opitz et al. (2005) that described the regions implicated in the sensory and cognitive effects as residing in the vicinity of the posterior and anterior rim of HG, respectively.

In addition, when the sensory component was extracted for “deh” standard (Fig. 4, left panel and Table 7) positive differential BOLD was observed in the superior/middle frontal gyrus as well as in the anterior cingulate cortex (ACC). Thus, the positive differential BOLD observed in the ACC and in the frontal gyrus is a direct result of contrasting “deh” with a baseline of “day” deviants to create the map reflecting the sensory component for “deh” standard.

ROI analysis

Six ROIs were chosen to be included in the ROI analysis. These ROIs were defined on the basis of previous findings and the data obtained in the current study. We have chosen the ROIs which were directly linked to the mechanisms associated with change detection, namely, the cognitive and sensory effects, or those that might be related to both change-detection and to the processing of linguistic stimuli. This is described in more detail in the section “Discussion”.

The following ROIs were identified on the basis of the paired t test maps. From the paired t test map reflecting the cognitive effect (Fig. 3) the anterior auditory cortex was chosen bilaterally. From the paired t test map reflecting the refractoriness effect when “day” served as a standard the posterior auditory cortex was chosen bilaterally (Fig. 4, middle panel). From the paired t test map reflecting the sensory component when “deh” served as the standard the superior/middle frontal gyrus was chosen bilaterally (Fig. 4, left panel, Table 7). Note that all the ROIs were defined based on the t-maps prior to cluster-size correction (P < 0.001). All except the right superior/middle frontal gyrus were apparent in the t test maps after cluster-size correction.

Figure 5 displays the average delta BOLD for each condition (“oddball”/“control”) and hemisphere (left/right) as a function of deviant stimulus and ROI. From Fig. 5 it can be seen that the largest differences in delta BOLD between conditions (“oddball”/”control”) are evident over the left hemisphere, mainly the anterior auditory cortex and superior frontal gyrus. These were larger for “day” than “deh” and smallest for “tay”. The latter was associated with diminished levels of delta BOLD within the posterior auditory cortex in the control condition.

Fig. 5
figure 5

Average delta BOLD for each condition (“oddball”/“control”) and hemisphere (left/right) as a function of ROI and deviant stimulus (“deh”, “day” or “tay”). Anter_Aud anterior auditory cortex, Post_Aud posterior auditory cortex, Sup/Mid_Frontal superior/middle frontal gyrus

The ROI data was subjected to a four-way ANOVA with Laterality (Left, Right), ROI (anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus), Deviant (“deh”, “day”, “tay”) and Task type (Control, Oddball) as independent factors, with an appropriate adjustment for sphericity (Greenhouse–Geisser correction). A main effect of ROI was found [F(2,38) = 49.17, P < 0.001], as well as a Laterality × ROI [F(2,38) = 5.39, P < 0.05], Deviant × ROI [F(4,76) = 4.23, P < 0.05], Task Type × ROI [F(2,38) = 7.25, P < 0.01], and Deviant × Task Type × ROI [F(4,76) = 4.60, P < 0.01] interactions. The Laterality × ROI interaction was due to enhanced delta BOLD over the left hemisphere associated with the anterior auditory cortex (P < 0.05, Bonferroni corrected).

The Deviant × Task Type × ROI interaction is depicted in Fig. 6. A set of paired comparisons (Bonferroni corrected, P < 0.05) were conducted separately within each task-type (“oddball”/”control”). In the “oddball” condition the comparisons were conducted among the three deviants for the posterior auditory cortex only, where a gradient of delta BOLD could be observed as a function of deviant type. In the “control” condition comparisons were performed among the three deviants for each of the ROIs. In the “oddball” condition a significantly larger delta BOLD was found in response to “tay” than to “deh” in the posterior auditory cortex. In the “control” condition delta BOLD to the word “day” in the anterior and posterior auditory cortices was significantly larger than that associated with the non-word “tay”.

Fig. 6
figure 6

ROI × stimulus ("deh", "day","tay") × condition (“oddball”/“control”) interaction. The interaction was obtained by calculating the individual percent signal change in each ROI (across hemispheres) for each of the six composite maps (three Deviant maps and three Deviant-control maps). See text for the results of the a priori tests. Abbreviations as in Fig. 5

In a separate set of paired comparisons (Bonferroni corrected, P < 0.05) the averages of delta BOLD between conditions (“oddball”/”control”) were compared for each of the three deviant stimuli and each of the ROIs (across hemispheres). A larger delta BOLD was found for the response to “day” in the “control” condition relative to the “oddball” condition implicating the anterior auditory cortex.

ROI analysis for the Word and Non-word maps

To further investigate the “word effect”, each participant’s model estimate of the percent signal change for each region of activation, was calculated separately for the Word map and Non-word map (Fig. 7, Table 8). The ROI data was subjected to a three-way ANOVA with Laterality (left, right), ROI (anterior auditory cortex, posterior auditory cortex, superior/middle frontal gyrus) and Lexical Status [word (“day”), non-word (“tay”)] as independent factors, with an appropriate adjustment for sphericity (Greenhouse–Geisser correction). A main effect of ROI was found [F(2,38) = 21.03, P < 0.001] as well as a Lexical Status × ROI interaction [F(2,38) = 6.05, P < 0.01] (Fig. 8). Paired comparisons (Bonferroni corrected, P < 0.05) found a significant difference between “day” and “tay” only within the anterior auditory cortex.

Fig. 7
figure 7

Word and Non-Word maps. Brain regions in response to “day” deviant-counterpart (Word map) and “tay” deviant-counterpart (Non-word map). Stimuli were contrasted against a baseline comprised of the non-word “fillers” (i.e., “peh”, “beh”, “deh”, “teh”). Left panel: response to the Word-map. Right panel: response to the Non-word map. BOLD contrasts were superimposed on a reference anatomical image (Holmes et al. 1998). Axial sections range from z = −6 to 46 (by increments of 4 mm)

Table 8 Brain regions activated in Word and Non-word maps in response to “day” and “tay”, respectively (Figs. 7, 8)
Fig. 8
figure 8

ROI × Lexical Status interaction. The interaction was obtained by calculating the individual percent signal change in each ROI for the Word and Non-word maps. Abbreviations as in Fig. 5

The distribution of percent signal change across subjects

To learn more about the consistency and extent of activation of specific brain regions, data were analyzed for individual subjects. Specifically, the percent signal change of individual subjects was computed for each of the six ROIs chosen for the group level analysis as well as for additional two brain regions that might be implicated in speech perception, that is, the left occipital pole and the left precentral gyrus. The results showed that the majority of subjects within each statistical test showed the same pattern of activity (see Table 9).

Table 9 The distribution of percent signal change across subjects

Discussion

Main findings

Overall, the results of the current study confirmed our hypotheses. Specifically, the sensory mechanism was located in the posterior auditory cortex (Fig. 4, Table 7) and the cognitive mechanism in the anterior auditory cortex (Fig. 3, Table 6). These locations match, respectively, those found in the context of tones (Opitz et al. 2005). Moreover, as expected, the cognitive effect was statistically significant only when the deviant was a word (i.e., “day”).

The similarity between the brain regions activated by speech in this study and the regions activated by tones (Opitz et al. 2005) could not be explained by the analysis of the physical sound features. First, the contribution of the acoustic component to the cognitive mechanism in this study was removed by contrasting deviants with their physical identical control counterparts. Second, our data suggest that the salience of the sounds, either tones or speech stimuli, accounts for the activation of the anterior auditory cortex (Jääskeläinen et al. 2004; Ahveninen et al. 2006) (see also ‘Support for the existence of “what” and “where” auditory streams’). Thus, the acoustic component could only account for the similar activation observed in the posterior HG. This is also in line with previous findings that link dorsal temporal brain areas including HG and planum temporale with the analysis of auditory features of complex sounds (Binder et al. 1996; Wessinger et al. 2001; Hall et al. 2002, 2003; Seifritz et al. 2002).

Activation loci in superior and middle temporal areas that were found to be activated in the Deviant (Fig. 1) and Deviant control maps (Fig. 2) had also been earlier reported to contribute to lexical and semantic processing (Price 2000; Salmelin et al. 2000; Scott and Johnsrude 2003; Hickok and Poeppel 2004). Similarly, the left dominance for processing intelligible speech that was found in our study in the “oddball” condition match previous results that link the left posterior STG to higher level linguistic processes (Narain et al. 2003).

Support for the existence of “what” and “where” auditory streams

The results of this study are in agreement with the segregation of the auditory system into “what” and “where” processing streams (Jääskeläinen et al. 2004; Ahveninen et al. 2006) associated with the analysis of auditory object content and location features that reside in areas anterior and posterior to primary auditory cortex, respectively (Rauschecker and Tian 2000). It was suggested (Jääskeläinen et al. 2004) that the “where” system is responsible both for fast analysis of sound location which is important for attentional orienting, and for detecting the degree of sound novelty, which affects its degree of distractibility. Specifically, the degree to which unattended novel sounds distracted visual forced-choice task performance coincided with the extent that the posterior N1 response was released from inhibition. On the other hand, the anterior N1 response was associated with the processing of fine object features (Jääskeläinen et al. 2004). Similarly, the results in our study indicate that the deviant word was released from inhibition in the “control” condition because of its lexical status and that this “word advantage effect” was located anterior to HG. This is in agreement with the finding that neurons in the anterior auditory cortex are more sharply tuned to phonetic features of sounds (Ahveninen et al. 2006).

The effect of speech perception

Despite the general similarity to the results obtained in the context of tones, some of the results are specific to speech perception, as follows. In the “oddball” condition the brain regions activated in response to the deviants “day” and “tay” were larger than those observed for the “deh” deviant (Deviant maps, Fig. 1). In addition, in the control condition the precentral gyrus was activated in response to both “day” and “tay” (Fig. 2). These similarities between “day” and “tay” may reflect an extra processing effort associated with the diphthong vowel /eI/ shared by “day” and “tay” (Sonty et al. 2003; Bohland and Guenther 2006). However, the fact that the similarity between the responses to “day” and “tay” diminished in the control condition (Fig. 2) may indicate that the /eI/ diphthong associated with deviant “day” was more salient among repetitive “deh” standards (“oddball” condition) than among the variable filler stimuli (“control” condition) (Nordby et al. 1994; Sabri and Campbell 2000).

The more similar responses between “day” and “deh” in the control condition (Fig. 2) may be explained by the shared /e:/ monophthong that might have elicited an expectation to hear a meaningful word (“day”) while presented with “deh”. This expectation could account for the more extensive response in the STG relative to “tay” (Fig. 2). The finding that activation in the left anterior auditory cortex dissociates the “oddball” and “control” conditions strengthens this interpretation (Scott et al. 2006) as well as the observed increase in the percent signal change in that region associated with both “deh” and “day” (Fig. 5, “Left” panels). Thus, in the “control” condition (Fig. 2), an interaction between lexical and acoustic features affected the results.

There were additional findings indicating left hemisphere dominance, as follows. The size of STG activation was the most extensive over the left STG for “day” in comparison to “tay” where it did not reach statistical significance and “deh” where it was more than five times smaller (Deviant control maps, Fig. 2, Table 5). Furthermore, the “day” control map revealed negative differential BOLD in the left superior frontal gyrus as well as in the left occipital pole (Fig. 2, Table 5). Left hemisphere dominance was also evident in the middle/superior frontal gyrus in the sensory mechanism map for “deh” (Fig. 4, Table 7). In addition, the four-way ANOVA (Fig. 5) has shown.

This pattern of left hemisphere dominance is in agreement with a number of imaging and clinical studies (for a review see Tervaniemi and Hugdahl 2003). The left middle/superior frontal gyrus activations are in line with previous findings implicating these brain regions with tasks engaging phonological working memory (Paulesu et al. 1993; Burton et al. 2000; Siok et al. 2003; LoCasto et al. 2004) and with the suggestion that a significant portion of active frontal areas is recruited for extracting acoustic information and maintaining it in memory (LoCasto et al. 2004). The negative differential activation evident in the left occipital pole associated with the “day” control map (Fig. 2) may indicate that meaningful words are more likely to elicit activity in visual processing regions (Billingsley-Marshall et al. 2007).

It is noteworthy that the sensory mechanism map for “deh” revealed a positive differential BOLD in the ACC (Fig. 4, Table 7). The ACC is implicated in initiating or inhibiting action and is considered to be part of a larger network that includes medial/lateral frontal, prefrontal and temporal regions (Wang et al. 2005; Dias et al. 2006; Gold et al. 2006). Hence, it is possible that in our study inhibitory activations occurred in the ACC (as well as in the superior frontal gyrus) in response to “day” since it was a meaningful stimulus that interfered with the main counting task (see also Rinne et al. 2005). Since standard “deh” was contrasted with deviant “day” to create the “deh” standard map, the positive differential BOLD located at the ACC and middle/superior frontal gyrus (Fig. 4, “deh”) might have been caused by a reduced activation associated with the response to deviant “day”.

A possible effect of the lexical status in the “oddball condition

It appears that the lexical status of the deviant stimulus affected its processing not only in the “control” condition but also in the “oddball” condition in which the effects of adaptation caused by the repeating standards were more pronounced. Specifically, the “day” stimulus activated parts of the posterior STG (i.e., BA 22/42), whereas the “tay” and “deh” stimuli activated large parts of the superior and middle temporal cortices (i.e., BA 21/22) (Table 4). However, the size of the left STG region activated by “day” deviant was almost twofold larger than that activated by “tay” deviant (Deviant maps, Fig. 2, Table 4). This is in agreement with the assumption that the left posterior STG is the focus of a multi-modal network associated with language comprehension (Aboitiz and Garcia 1997; Narain et al. 2003). More strongly left-lateralized posterior superior-temporal activation is associated with analysis of speech sounds for mapping onto higher levels of language processing (e.g., syllable, word) (Price et al. 1992; Zatorre et al. 1996). Thus, although the word advantage effect was not salient in the “oddball” condition, the different spread of activation between “day” and the other non-words may indicate differential processing based on the lexical status of the deviant stimulus.

Summary

Taken together, the results of the current study corroborate the existence of two independent mechanisms contributing to the change-detection response (Opitz et al. 2005; Hoshiyama et al. 2007; Maess et al. 2007): a sensory mechanism reflected by different refractory states of those subpopulations activated by the standard and the deviant and a cognitive mechanism which relies on auditory sensory memory representations which gives rise to the word advantage effect. Thus, on the one hand, our results support the view that the MMN represents a change-detection mechanism functionally and spatially distinct from an afferent input population (N1 generators) (Näätänen et al. 2005). On the other hand, our results indicate that a release from the inhibitory effects of adaptation is a prerequisite for the full realization of the significance of the deviant stimulus. This is in line with the adaptation hypothesis that assumes that the posterior auditory cortex gates novel sounds to awareness (Jääskeläinen et al. 2004).

Conclusion

Our findings serve to unify the two opposing views suggested by Jääskeläinen et al. (2004) and Näätänen et al. (2005). Specifically, the gate to awareness for auditory deviation (Jääskeläinen et al. 2004; Näätänen et al. 2005) relies on adaptation that modulates the extent to which novel sounds will be accessible to memory-based processes. In case of the present study, the sensory component serves to modulate the salience of the speech stimulus by the degree to which it will be accessible to cortical memory traces for speech sounds (Pulvermüller and Shtyrov 2006). Furthermore, the lexical status of the speech stimulus interacts with acoustic factors exerting a top-down effect on the novelty value of the auditory object that affects, in turn, its degree of accessibility to the cognitive component.