Introduction

Rett syndrome (RTT) is a severely incapacitating neurodevelopmental disorder affecting about 1-in-10,000 females1,2,3. RTT typically results from spontaneous mutations in the X-linked gene encoding methyl-CpG-binding protein 2 (MeCP2)2. Clinically, RTT is characterized as an age-specific (6–18 months) progressive loss of initially acquired intellectual, language, and motor abilities, with loss of purposeful hand use that is often supplanted by distinctive hand stereotypies and gait abnormalities1,3. Very little is known about cognitive processing, including perceptual capabilities and speech comprehension across the progressive clinical stages of RTT4,5. This is largely due to the nature of the impairments associated with RTT, which typically preclude the use of conventional cognitive evaluations which require verbal and/or gestural responses6. This leaves the field with non-representative outcome measures, making it difficult to assess neurocognitive function during the natural course of RTT or in response to treatments during clinical trials7. As such, there is an urgent need to develop objective quantitative measures of brain function (i.e. neural markers) that can be tracked in a noninvasive and unbiased manner.

Event related potential (ERP) recordings are an increasingly appealing option in both patients with RTT and animal models as a means to assess information processing and cognitive capabilities8,9,10,11,12,13,14. This technique provides the opportunity to deliver objective quantitative measures of brain function, including cortical network dynamics, in the absence of overt behavioral responses from participants (e.g.15,16,17). The integrity of early auditory processing, auditory discrimination, and sensory memory can be studied with the ERP component known as mismatch negativity (MMN)18,19. MMN is a relatively automatic ERP response evoked by occasional changes in a regularly occurring stream of inputs. Assessing MMN is ideal for RTT in that no engagement from the listener is required.

In a recent study, we used high-density ERPs to examine the ability of individuals with RTT to automatically process pitch changes in a regularly occurring stream of auditory tones9. Accurate neural representation of time-varying spectral changes in the auditory system is clearly a critical component of processing the speech signal20. Individuals with RTT showed severely impaired basic auditory sensory processing such that their automatic representation of pitch discrimination in sensory memory was delayed and prolonged relative to their typically developing (TD) peers. However, the presence of an MMN response, although morphologically atypical and delayed, suggested that these individuals are capable of detecting frequency deviations within a continuous stream of auditory inputs.

To build on these previous findings, the current study sought to further characterize auditory sensory memory capabilities in RTT, by specifically exploring whether there is adequate cortical representation of auditory duration, another critical cue in speech perception21. This is the first study to examine this important characteristic of sensory memory in RTT. To further interrogate the robustness of sensory memory for duration, we examined whether these sensory memory traces could be sustained over longer and longer periods. It has been shown that the robustness of the memory trace represented by the MMN is directly related to the rate at which the regularly occurring stream of stimuli is presented22,23. To test this in RTT, we assessed the MMN for duration deviance using three stimulation rates. We hypothesized that MMN generation would be present in RTT individuals, but it would be more severely impacted over slower stimulation rates.

Methods

Participants

Twenty-five patients with RTT, confirmed by MECP2 mutations, and thirty age-matched neurologically TD individuals took part in the study. Participants with RTT were recruited through the Rett Syndrome Center of the Children’s Hospital at Montefiore, while TD participants were recruited from the local community. All RTT patients were female as this rare disease affects mostly females. Ten out of thirty TD participants were male. Seven data sets from RTT and three from the TD group were excluded from analysis due to noisy electroencephalography (EEG) data resulting in less than 20% accepted trials per condition. The final sample contained 18 females with RTT (mean age: 12.8 ± 4.7, ranged 6–22) and 27 TD controls (12.1 ± 4.8, ranged 6–26). There was no significant difference in age between the RTT and TD group (t(43) = 0.5, p = 0.61). All participants with RTT underwent genetic testing, and detailed phenotypic assessment that was accompanied by detailed medical history questionnaires completed by their caregivers. Symptom severity in RTT was measured using the Rett Syndrome Severity Scale (RSSS)24. This clinician-rated scale represents an aggregate measure of the severity of clinical symptoms, including motor function, seizures, respiratory irregularities, ambulation, scoliosis, and speech. Composite scores in the range of 0–7 correspond to a mild phenotype, 8–14 correspond to a moderate phenotype, and 15–21 to severe features. The RSSS score in the current RTT group ranged between 5 and 15 (Mean ± SD = 9.6 ± 4). Clinical demographics such as severity scores, ages (age of onset and regression) of all participants including those with unusable data and medication are listed in supplementary material (Supplementary Table 1 Clinical Demographics). There were no differences in age-range and RSSS scores between the seven excluded RTT datasets and those included in the final analysis.

Participants with RTT were excluded if they were experiencing ongoing regression, specifically in Stage II, which is also known as the rapid developmental regression stage. Other exclusion criteria included uncorrected hearing loss or ear infection on the day of EEG acquisition. TD participants were excluded if they had a familial history of a neurodevelopmental disorder, any neurological or psychiatric disorders. All individuals in the TD group passed a hearing screen. While none of the RTT participants was deaf, hearing acuity could not be similarly assessed in the RTT participants.

The institutional review board of the University of Rochester and Albert Einstein College of Medicine approved this study. Written informed consent was obtained from parents or legal guardians. Where possible, informed assent from the participants was also obtained. Participants were modestly compensated at a rate of $15 per hour for their time in the laboratory. All aspects of the research conformed to the tenets of the Declaration of Helsinki.

Experimental design

We presented a simple auditory MMN paradigm while recording high-density EEG from TD and RTT participants. Experimental procedures were similar to those described in our previous paper9. All participants sat in a sound-attenuated and electrically-shielded booth (Industrial Acoustics Company, Bronx, NY) on a caregiver’s lap or in a chair/wheelchair. They watched a muted movie of their choice on a laptop (Dell Latitude E640) while passively listening to auditory stimuli presented at an intensity of 75 dB SPL using a pair of Etymotic insert earphones (Etymotic Research, Inc., Elk Grove Village, IL, USA). An oddball paradigm was implemented in which regularly occurring standard tones (85%) were randomly interspersed with deviant tones (15%). These tones had a frequency of 1000 Hz with a rise and fall time of 10 ms. Standard tones had duration of 100 ms while deviant tones were 180 ms in duration. The tones were presented with stimulus onset asynchronies (SOAs) of either 450, 900, or 1800 ms in separate blocks, and each block consisted of 500, 250, or 125 trials, respectively (supplementary Fig 1A). Participants were presented with a total of 14 blocks (2 × 450 ms, 4 × 900 ms, and 8 × 1800 ms) equally distributed within the experimental session, resulting in 1000 trials per condition.

EEG acquisition

A Biosemi ActiveTwo (Bio Semi B.V., Amsterdam, Netherlands) 72-electrode array was used to record continuous EEG signals. The set up includes an analog-to-digital converter, and fiber-optic pass-through to a dedicated acquisition computer (digitized at 512 Hz; DC-to-150 Hz pass-band). EEG data were referenced to an active common mode sense electrode and a passive driven right leg electrode.

EEG data processing

EEG data were processed and analyzed offline using custom scripts that included functions from the EEGLAB Toolbox25 for MATLAB (the MathWorks, Natick, MA, USA). EEG data were initially high-pass filtered using a Chebyshev Type II filter with a bandpass set at 1–40 Hz. Continuous EEG data were passed through a channel rejection algorithm, which identified bad channels using measures of standard deviation and covariance with neighboring channels. Rejected channels were interpolated using the EEGLAB spherical interpolation. Data were then divided into epochs that started 100 ms before the presentation of each tone and extended to 800 ms post stimulus onset. Bad trials containing severe movement artifacts or particularly noisy events were rejected if voltages exceeding ±150 μV, followed by a threshold set at two standard deviations over the mean of the maximum values for each epoch (the largest absolute value recorded in the first 500 ms of a given epoch, across all channels for each trial in each condition). The number of accepted trails for each condition and group is presented in supplementary figure 2. All epochs were then baseline corrected to the 100 ms pre-stimulus interval. The epochs were next averaged as a function of stimulus condition to yield the auditory evoked potential to the standard and to the deviant tone. To maximize the ERP at fronto-central sites, the data were referenced to TP7, or TP8 if TP7 was a noisy channel in a given participant. This approach takes advantage of the inversion of the MMN that is seen between fronto-central and inferior temporo-parietal sites26,27,28.

The window for measurement of the MMN was calculated by subtracting the grand mean ERP to deviant tones from the grand mean ERP to standard tones. The resulting distribution of activity showed maximal difference at ~225 ms (Fig. 2a–c). We then defined a time window of 10 ms centered around 225 ms (i.e. 220–230 ms) to obtain average MMN amplitudes for each individual for each SOA. Composite averages generated from FC3, FCz, and FC4 scalp electrodes were used for further statistical analysis.

Statistical analyses

The primary analysis employed a repeated measures analysis of variance (ANOVA) with SOA (450, 900, and 1800ms) as a within-participant factor and group (RTT vs. TD) as a between-participants factor to examine main effects and their interaction on MMN amplitude. Planned post-hoc tests were used to follow significant ANOVA effects. Partial η2 was used to estimate effect sizes. Pearson correlations were used to assess the relationship between MMN amplitude, age, and RSSS. We also compared the correlation coefficients of age with the MMN for each SOA using a Fisher z-transformation to examine if MMN maturation was similar across SOAs29.

In a secondary exploratory analysis, we further interrogated this rich high-density EEG dataset using the statistical cluster plot (SCP) approach (see Fig. 4). This approach serves as a follow-up to the primary a-priori tests of the MMN, as a means to more fully describe the recorded data, and as such, it is to be considered exploratory. Any additional effects uncovered should be considered post-hoc, must be interpreted cautiously, and serve simply as hypothesis generation tools to be explored further in a follow-up study. SCPs are constructed by calculating pairwise two-tailed t-tests between the evoked responses to a given pair of experimental conditions across all time points and all recording sites (electrodes). Results of these tests are then displayed as a color intensity plot that spatio-temporally summarizes periods of statistical difference between conditions (here: standard versus deviant). The x and y axes signify time (in milliseconds) and electrode location (frontal to posterior scalp) respectively, while the color represents the t-value for each data point. Only points exceeding an alpha criterion of 0.05 or less are highlighted, and then only when this criterion is present for a minimum of nine consecutive data points (i.e. 18 ms at the current digitization rate), thereby reducing the probability of type I errors. That is, the likelihood of multiple false positive results occurring by chance at nine consecutive time points is exceedingly low if one assumes statistical independence between each time point. Of course, there is a degree of autocorrelation between temporally adjacent time points in EEG recordings that must be considered. Even for high autocorrelations, applying a criterion of nine consecutive time points is quite conservative in avoiding type I errors30,31.

Results

Figure 1 displays ERPs elicited by standard and deviant tones for each SOA, as well as the corresponding difference waves, over fronto-central scalp sites (the average of FC3, FCz, and FC4). Clear MMN responses can be seen for every SOA in TD controls, while this is only the case for the shortest SOA (450 ms) in the RTT group. These MMNs have typical topography with negativity over the fronto-central sites and positivity in the mastoids as can be seen in Fig. 2. MMNs lasted from ~180 to 250 ms with maximum at about 225 ms, so the average MMN amplitude within 220–230 ms latencies was chosen for statistical analyses. Repeated measures ANOVA revealed a main effect of Group (F(1,43) = 6.970, p = 0.012, η2 = 0.139), indicating the attenuated MMN in RTT as compared to TD. The main effect of SOA did not reach significance (F(2,42) = 1.897, p = 0.156, η2 = 0.042), whereas the Group by SOA interaction did (F(2,42) = 4.348, p = 0.017, η2 = 0.092, Greenhouse-Geisser correction applied) confirming the evident reduction of MMN in RTT at slower presentation rates (i.e. SOAs of 900 and 1800ms). Post-hoc analyses also confirmed that TD and RTT groups differed significantly only for the longer SOAs of 900 ms (F(1,43) = 6.728, p = 0.013, η2 = 0.135) and 1800ms (F(1,43) = 10.302, p = 0.003, η2 = 0.193), but not for the shortest SOA of 450 ms (F(1,43) = 0.046, p = 0.831, η2 = 0.001). The mean MMN amplitude for the different SOAs in each group can be seen in Fig. 3.

Fig. 1: Grand mean waveforms for TD and RTT group over fronto-central electrodes (FC3, FCz, and FC4).
figure 1

Auditory event-related potentials (ERPs) to standard tones (blue trace) and deviant tones (red trace) are presented with standard deviation, shaded in gray. The TD group produced classic ERP waveforms, while the RTT group exhibited less stereotyped responses with reduced ERP amplitude across conditions. A clear MMN (difference between standard and deviant traces) was present for all SOAs for the TD group. However, an MMN was present only for the shortest SOA in the RTT group

Fig. 2: Topographic representation of the differences between deviant and standard tones across SOAs.
figure 2

An MMN with typical spatial distribution with negativity at the frontal site and positivity over the mastoids is clearly seen in all conditions for the TD group, but only for the shortest SOA condition in the RTT group

Fig. 3: Mean MMN amplitude for each SOA in RTT and TD groups.
figure 3

Significant difference between the groups are marked by asterisk (SOAs of 900 and 1800ms)

MMN amplitude showed a significant negative linear correlation with age in the TD group, for SOAs of 450 and 900 ms (r(27) = −0.41, p(27) = 0.035 and r(27) = −0.39, p(27) = 0.045, respectively), and at a trend level for SOA of 1800 ms (r(27) = −0.32, p(27) = 0.1). There were no differences in correlation coefficients across SOAs in TD (z = −0.4295, p-value = 0.6675). For the RTT group, MMN amplitude was not linearly correlated with age (|r| < 0.21, p(18) > 0.4) or RSSS (|r| < 0.21, p(18) > 0.4).

In follow-up exploratory analysis, the SCP approach was used to assess the general time-course of significant MMN differences between conditions (i.e. standard versus deviant) for each of the three presentation rates and for both groups (Fig. 4). As can be seen in the left panels of the figure, a robust differential effect was seen at all three presentation rates for TD participants, confirming and extending the findings of the ANOVA reported above. In contrast, only at the fastest presentation rate (450 ms) was a differential effect evident in the RTT population, with a considerably later onset (~190 ms versus 160 ms) and somewhat shorter-lived duration of differential significance over a much more circumscribed set of channels. No effects of note were evident at the two slower presentation rates in the RTT group, again confirming the main findings of the ANOVA above.

Fig. 4: Statistical cluster plots (SCPs) depicting the outcome of running t-tests that compare standard and deviant responses generated by each group (TD and RETT) to each condition (450, 900, and 1800ms SOAs), for all electrodes and all time points (–50 to 300 ms).
figure 4

Significant effects are plotted when p ≤ 0.05 for at least nine consecutive data points (~18 ms at 512 Hz sampling rate). The direction of these effects is color coded: when responses to deviant tones are significantly more positive (red) or significantly negative (blue) relative to responses to standard tones. The gray areas indicate periods where no differences are observed. Time is plotted along the x-axis and electrode position is plotted on the y-axis. Starting from the top left corner of each graph, electrodes that are located next to each other are clustered into color-coded scalp regions (Frontal, Central, Temporal, and Occipital). These color-coded regions are displayed on the corresponding head map (right)

Discussion

The current data show that participants with RTT produce the MMN response for the shortest 450 ms SOA condition, albeit attenuated and delayed, indicating that they have somewhat preserved ability to build an auditory sensory memory representation for duration and to discriminate duration deviance when a sequence of auditory signals is presented at a fast stimulation rate. However, these sensory memory traces for tone duration appear to have a substantially foreshortened temporal span in RTT, as evident by the highly atypical absence of a detectable MMN at longer SOAs of 900 and 1800ms.

The presence of an MMN to auditory duration deviance in the current study corresponds with our previous findings using a pitch deviance paradigm, where the presence of an MMN pointed to the neural ability to represent frequency deviations in RTT albeit that the pitch-evoked MMN was morphologically atypical and substantially delayed in that study9. The presence of a duration-evoked MMN here similarly suggests the neural capacity to present duration deviants. On the other hand, the fact that the duration MMN is absent at slower presentation rates might also suggest that sensory memory for this fundamental auditory feature is more impacted in RTT than is frequency representation. It is worth noting, however, that in our prior study, only a single frequency manipulation was used to generate the MMN, and that the difference between the standard and deviant (503 versus 996 Hz) was very large and therefore highly detectable. Unlike the current study, we did not parametrically vary the strength of the memory trace, either by changing the presentation rate or by narrowing the frequency difference. Taken together, the two studies suggest that while MMN is relatively preserved in RTT when the stimulation parameters are such that the deviation, be it in duration or pitch, is large and highly detectable, once the system is taxed to any degree, significant deficits begin to reveal themselves. Clearly, it will be important to follow up with parametric studies to assess the limits of the auditory sensory memory system in RTT for these and other fundamental auditory features (i.e., frequency, duration, location, and loudness).

It is also worth pointing out that the specificity of impaired MMN to the longer SOA conditions indicates that the MMN reduction was unlikely caused by factors such as attention or motivation, which under certain limited circumstances are known to modulate MMN amplitude32. The intact MMN in the fastest condition, and the fact that the blocks with different SOAs were equally distributed within experimental session clearly show that it is the ability to sustain sensory memory representations of duration over longer periods that is reduced in RTT.

Our study also showed an increase in MMN amplitude with age in the TD group, consistent with previous findings33. Noteworthy, the rate of MMN maturation was similar across SOAs, indicating that while duration discrimination ability is still developing from childhood into adulthood, the ability to sustain auditory sensory memory representations is already above 2 s within the age range of the participants in this study (6–26 years old). This result is in line with prior work showing that auditory sensory memory for frequency can last for at least 3 s in 6-year-old children34. In typical development, there is a gradual increase in the duration of sensory memory with development, such that no MMN is observed in children younger than 2 years old when SOAs are longer than 1 s, in 4-year-old children when SOAs are longer than 2 s, or in 6-year-old children with SOAs longer than 3 s. However, the MMN is prominent in each of these groups with faster presentation rates34,35,36.

One possible implication of the fact that patients with RTT showed no reliable MMN at 900 ms, is that their sensory memory systems may have stagnated at a developmental stage compatible with that of a 1–2-year-old TD child. It is in this age range that patients with RTT usually experience developmental arrest that affects cognitive and motor functions, including language skills. Another measure of basic auditory function previously examined in RTT is the brainstem frequency following response, which measures responses that are phase-locked to the frequency of a periodic stimulus. Galbraith and colleagues, working with adults with RTT, found that the ability to entrain external high-frequency modulation was at a level more consistent with infant maturation levels seen in TDs37. Thus, it might be suggested that auditory system maturation is also halted at an early developmental stage.

One might argue that the decay in sensory memory could be specific to duration representation, since previous studies have suggested that memory representation for duration decays more rapidly with aging and cognitive decline than does frequency representation38,39,40. In our previous study of RTT patients, MMN data suggested that the lifetime of frequency representation is at least 900 ms, longer than observed here for duration, but as mentioned, that was the only SOA employed in that study9. Nonetheless, it is a distinct possibility that the duration system is more vulnerable than the frequency representation system, and this may relate to the separable cortical circuits responsible for generating these two varieties of the MMN26. Whereas the major generators of the frequency MMN are found in hierarchically early primary auditory cortex as well as fronto-parietal regions, a more complex set of regions was involved in generating the duration MMN, including secondary auditory cortex and a more extended set of fronto-parietal generators. It is plausible that higher-order cortical regions are more impacted by the disease than are early regions, and that it follows that processes reliant on these higher-order regions might be more vulnerable. However, that we have not yet parametrically varied the presentation rates for frequency deviations within the same cohort, leaves this an open question for now.

The MMN measures in our study were not related to RTT severity as measured by the RSSS. While the RSSS accumulates all aspects of RTT symptomatology and is based on observational and clinical symptoms, the MMN is an objective measure of auditory sensory memory. The absence of correlation between the two measures suggests that cognitive and perceptual deficits, which are difficult to assess via behavioral methods, may be independent from core motor symptoms. Thus, our neurophysiological measure provides critical additional information about auditory perceptual and cognitive brain function in patients with RTT that is not directly related to the currently used clinical measures of RTT severity. Importantly, it also has implications for speech and language comprehension.

The attenuated memory span, as measured by MMN to stimulation with different SOAs is characteristic of several neurological conditions. MMN generation is crucially dependent on the proper functioning of NMDA receptors, as shown by animal studies as well as pharmacological studies in healthy adults and patients with schizophrenia41,42. NMDA abnormalities have also been implicated in RTT. In a small post-mortem study that included nine independent RTT samples from prefrontal cortex, Blue and colleagues reported that NMDA receptor density was increased in samples from younger girls (2–8 years old), while it was decreased in samples from older girls (up to 30 years of age)43. Similar biphasic shifts in the direction of this NMDA effect with age were confirmed in prefrontal brain regions of a RTT animal model44. While there was no age-related difference in MMN changes across condition in our RTT patients, most of them were older than 8 years. NMDA receptor dysfunction has also been reported in healthy elderly subjects and those with Alzheimer’s disease45, which have both also been shown to exhibit shortened sensory memory39,46,47,48. Therefore, the absence of MMN in RTT for longer SOAs and the apparent reduction of MMN for the shortest SOA could, in theory, be linked to decreased NMDA receptor binding. Similar to what we see here, patients with schizophrenia show reduced MMN even with short SOAs of 450–500 ms49,50. Nevertheless, understanding the role of NMDA receptors in modulating the span of sensory memory traces in different neurological disorders needs further investigation.

The oddball paradigm with varied SOAs can be easily implemented in animals. The animal model of RTT, either with total Mecp2 knockout or deficient Mecp2 in specific neuronal subtypes11,51, is a potentially very valuable tool to track down the neurophysiological and molecular mechanisms underlying the drastic decay in auditory sensory memory representation over longer periods that we report in this study, as well as to investigate its response to treatment. Taking into account that such profound deficits in sensory memory might be of crucial importance for language acquisition, the index of MMN attenuation with increasing SOA can be a very promising biomarker not only for RTT subjects, but also for other neurodevelopmental disorders.

Study limitations

A limitation of the current study is the wide age-range of the participants, given that auditory responses continue to mature with typical development across the age-range tested33,52. Although age was correlated with MMN amplitude in the TD group, it was not associated with manipulations of stimulus rate. This suggests that group differences seen as a function of presentation rate were not affected by age, but rather, represent frank differences in brain function in RTT. A second limitation is that mutation subtype could not be effectively examined here due to the relatively restricted sample size. Neither were we able to consider potential differences as a function of classic versus atypical Rett phenotype. Both of these distinctions will be of great interest as this work progresses. We must also acknowledge that hearing testing was not conducted at the time of EEG acquisition in patients with RTT due to difficulties in assessing it in this population. However, the presence of an MMN for the shortest SOA in the RTT group, clearly indicates that they could detect and decode auditory information. Lastly, non-invasive recordings such as those conducted here cannot shed light on the mechanisms by which MECP2 protein loss leads to auditory cortical processing deficits. Work using similar paradigms in murine models of RTT will be highly instructive in this regard11,53.

Conclusions

This study confirms the preserved ability of RTT patients to automatically decode duration deviations in the auditory stream when stimuli are presented in a rapid stream, which we previously showed in this population for large frequency deviations. However, automatic detection of duration changes was highly atypical in RTT when the presentation rate of the stimulus stream was slowed down and the auditory sensory memory system was taxed, indicated by the lack of obvious MMN responses at SOAs of 900 and 1800ms. We speculate that this drastic attenuation in the duration of sensory memory might lead to significant problems in language acquisition in RTT, as well as having implications for other aspects of information processing. The exact mechanisms underlying this decay, as well as behavioral outcomes, represent important avenues of research to increase knowledge of RTT and its perceptual and cognitive sequelae.