Introduction

Autism spectrum disorder (ASD), which we refer to as autism in the remainder of this documentFootnote 1, is defined by challenges with social communication and restricted, repetitive patterns of behaviour (American Psychiatric Association, 2013). As of 2013, atypical sensory features have been included as part of the diagnostic criteria for autism. Sensory differences are highly common, as approximately 90–94% of autistic individuals or their caregivers report the presence of atypical sensory behaviours, particularly in the auditory domain (Gomes et al., 2008), which may include placing one’s hands over one’s ears to protect from sounds or a preoccupation with particular sounds (Crane et al., 2009; Leekam et al., 2007; Tomchek & Dunn, 2007). Further, it has been posited that early, fundamental differences in auditory processing may impact later downstream neural mechanisms that contribute to speech and communication skills (Boets et al., 2008; Russo et al., 2009). In this way, aspects of both diagnostic criteria for autism (i.e., sensory features and social communication) may be understood via differences in auditory processing (O’Connor, 2012). However, differences in definitions, measurements, and approaches to understanding sensory function make connections across levels of analysis challenging and call for interdisciplinary collaboration to deepen clinical conceptualization and theoretical foundations through empirical study (Cascio et al., 2016).

The human auditory system, like other sensory systems, is comprised of adjacent subcortical and cortical portions. Sounds travel through the ear to the cochlea, brainstem, and auditory cortex (Musiek & Baran, 2018). Despite a comprehensive understanding of the neuroanatomy of the auditory system and the well-documented auditory atypicalities in autism, there is limited research that characterizes subcortical and cortical processing within the same group of individuals. This potentially stems from academic silos focused on different questions, structures, and measurements, despite the intimate anatomical links between levels of the auditory system. Further, while these processes have been studied independently in autism, they are rarely (with some exceptions; Chen et al., 2021; El Shennawy et al., 2014; Samy et al., 2012) linked to behavioural or clinical outcomes beyond the binary determination that the performance of autistic participants is better or worse compared to their typically developing (TD) peers regardless of the direction of effects. Here, we take a small step towards examining the relationship between subcortical and cortical auditory processes and behaviour within the context of a developmental framework, which integrates IQ matching (Burack et al., 2011a, 2011b, 2016). To this end, we compare a group of autistic and TD children of the same chronological age and non-verbal IQ on two different measures of auditory speech processing and correlate these with behavioural metrics of autistic traits and sensory features. Auditory Brainstem Responses (ABRs) were used to examine subcortical speech processes and Event-Related Potentials (ERPs) were used to assess cortical speech processes. This study also provides a preliminary exploration of sensory brain-behaviour links across multiple units of analysis among autistic and TD children.

Speech Auditory Brainstem Response (Speech-ABR)

The speech-ABR is a neurophysiological measure with a high temporal resolution that assesses responses generated by the brainstem nuclei and the inferior colliculus (IC) to a repeated consonant vowel (CV) syllable (Moossavi et al., 2019). The CV syllable /da/ is the most frequently used stimulus due to its ubiquity across languages (Moossavi et al., 2019). Given speech and language difficulties common among autistic individuals (Conti-Ramsden et al., 2006; Tager-Flusberg, 2006), the speech-ABR may be a relevant marker to differentiate between clinical and TD populations. Speech-ABRs evoked by syllables of 40 ms duration are broken down into seven peaks labelled V, A, C, D, E, F, and O (for a review, see Sanfins & Colella-Santos, 2016). Research evaluating the latencies of the speech-ABR find consistent differences in which autistic children have longer monaural ipsilateral wave V, A, and O latencies when compared to TD peers matched on chronological age (Chen et al., 2019, 2021; Ramezani et al., 2019; Russo et al., 2009). Waves V and A represent the onset of the consonant, while wave O reflects the transient response to the offset of the speech syllable (Chandrasekaran & Kraus, 2010). Thus, the later onset and offset responses in autistic individuals have been interpreted to reflect impaired speech processing at the brainstem (Sanfins & Colella-Santos, 2016). However, the majority of speech-ABR research has only looked at the speech-ABR in response to stimuli presented monaurally to the right ear, despite the real world relevance of binaural presentations (Sanfins et al., 2018). Further, across studies, cognitive ability is not accounted for (though see Otto-Meyer et al., 2018 for a comparison of the neural stability of the speech-ABR in a sample of age and IQ matched autistic and TD individuals), despite some evidence that aspects of the speech-ABR are related to IQ and language abilities in autism (Russo et al., 2009).

Event-Related Potentials (ERPs)

ERPs are a neurophysiological measure that also have a high temporal resolution and can be used to index sensory, perceptual, attentional, and cognitive processes by recording the electrical activity of postsynaptic potentials (for a review, see Luck, 2014). Specifically, the P1 and the mismatch negativity (MMN) are commonly referenced in the study of auditory processing. The P1 is a positive exogenous sensory-perceptual ERP that peaks around 100 ms. It can be elicited by the presence of an auditory stimulus (e.g., a CV syllable) and is modulated by stimulus characteristics (Luck, 2014). The MMN is elicited using an auditory oddball paradigm, in which two sounds (a standard occurring 80% of the time and a deviant occurring 20% of the time) are presented in an interleaved arrangement. The MMN is a negative difference wave resulting from the subtraction of averaged neural responses to standards from that of the deviants. It peaks between 150 and 250 ms primarily in the frontocentral region of the brain and is thought to reflect auditory discrimination (Näätänen & Alho, 1995; Näätänen et al., 2007).

Several studies have examined both the P1 and the MMN to speech sounds in the context of an oddball paradigm in autism (Čeponienė et al., 2003; Jansson-Verkasalo et al., 2003; Kadlaskar et al., 2021; Kujala et al., 2010; Lepistö et al., 2005; Piatti et al., 2021; Whitehouse & Bishop, 2008). Generally, results indicate that autistic and TD individuals demonstrate similar P1 latencies and amplitudes (Čeponienė et al., 2003; Kujala et al., 2010; Lepistö et al., 2006; Whitehouse & Bishop, 2008), with some studies finding that autistic individuals had smaller P1 amplitudes (Jansson-Verkasalo et al., 2003; Lepistö et al., 2005). Importantly, smaller P1 amplitudes were noted across all studies where the autistic and TD groups differed on IQ (Jansson-Verkasalo et al., 2003; Lepistö et al., 2005) but not those that were age and IQ matched (Kujala et al., 2010; Lepistö et al., 2006; Whitehouse & Bishop, 2008; though Čeponienė et al., 2003 did not report IQ). Studies of the MMN in response to speech sounds also most commonly report no differences in amplitude or latency between autistic and TD individuals (Čeponienė et al., 2003; Green et al., 2020; Jansson-Verkasalo et al., 2003; Kemner et al., 1995; Kuhl et al., 2005; Kujala et al., 2010; Lepistö et al., 2005; Piatti et al., 2021; Weismüller et al., 2015; though see Lepistö et al., 2006 for latency differences in Asperger’s syndrome). However, research examining auditory discrimination using magnetoencephalography (MEG) and its analog to the MMN, the mismatch magnetic field (MMF), have found longer latencies for autistic children, which were more pronounced among non-speaking autistics (Matsuzaki et al., 2019). Critically, nonverbal IQ accounted for nearly 60% of the variance in the MMF latency (Matsuzaki et al., 2019), further highlighting the importance of developmental considerations in the design and interpretation of research (Burack et al., 2016).

Notably, most of the aforementioned ERP studies have focused on peak latency and peak amplitude measures, which extract the point at which the waveform is at its largest. However, some argue that there is nothing special about the point at which the voltage of an ERP reaches a local maximum (Luck, 2014). Research grounded in the developmental approach (Cicchetti & Pogge-Hesse, 1982; Zigler, 1967, 1969) has focused on understanding how and when in the processing chain differences between autistic and neurotypical individuals arise (Brodeur et al., 2018; Happé & Frith, 2006; Mottron, 2011; Russo et al., 2021), and how these may reflect differences in processing. To this end, we elected to use mean amplitude and fractional area latency to capture aspects of the ERP that circumvent the traditional focus on peaks and may be better suited to understanding how autistic individuals process auditory information.

Measurement

Though currently underutilized in autism research (see Cary et al., 2021; Kadlaskar et al., 2021; Piatti et al., 2021), alternate methods of ERP measurement such as mean amplitude and fractional area latency may be helpful to parse nuances in cortical functioning and address different research questions relevant to the study of autism. Mean amplitude is calculated by taking the average voltage over a specified time window for a particular waveform and fractional area latency is calculated by extracting the moment in time in which a specified percentage (e.g., 30%, 50%) of the area under the curve of the waveform has been accrued (Luck, 2014). In this way, mean amplitude and fractional area latency may better reflect patterns in activity across a specified time window rather than extracting a single value at the local maximum. Fractional area latency also affords more specificity in measurement. For example, by examining the 30% fractional area latency, we may be able to capture the onset of cortical processing. This may be especially relevant in the study of sensory systems, which are fundamentally early, fast-paced, and hierarchically organized. Further, because autism is a low incidence disorder (Maenner, 2018), sample sizes tend to be small and restricted to a subsection of autistic individuals (i.e., those with higher IQs and fewer support needs). Mean amplitude and fractional area latency are less susceptible to noise compared to peak based measurements (Luck, 2014), which may increase validity and help to offset limitations related to measurement error (though future research examining the psychometrics of these measures in autism may be warranted). In further extending these measurement choices to autism research, we hope to provide new and valuable information regarding potential nuances in auditory processing.

Brain-Behaviour Relations

Research has yet to examine the relationship between subcortical and cortical auditory processing, and behaviours relevant to autism (though see Samy et al., 2012), though there is some evidence that early auditory perceptual ERPs correlate with autistic traits and sensory features (Cary et al., 2021; Kaplan-Kahn et al., 2021). Specifically, larger P1 mean amplitudes (Kaplan-Kahn et al., 2021) and larger differences between standards and deviants at the P1 (Cary et al., 2021) have both been associated with higher levels of autistic traits as measured by the Autism Quotient (AQ; Auyeung et al., 2008; Baron-Cohen et al., 2001, 2006) in both autistic and TD participants. With respect to the MMN to non-speech sounds, Cary et al. (2021) found that an earlier MMN latency was associated with a higher number of autistic traits and a larger MMN amplitude was associated with greater Sensory Overresponsivity measured through selected caregiver-report items on the Sensory Profile (SP; Dunn, 2014; Green et al., 2015; McKernan et al., 2020). While these studies provide emerging evidence that indices of sensory processing map on to autism relevant diagnostic traits and features, they are not consistently reported in the literature (Andersson et al., 2013; Donkers et al., 2013; Kadlaskar et al., 2021; Ruiz-Martinez et al., 2020), and as such require further research for clarification.

One study to date has focused on the causal links between speech-ABRs, structural magnetic resonance imaging (sMRI), and the Gesell developmental diagnosis scale (GDDS) among young autistic children matched on chronological age to their TD peers (mean age around 4 years old; Chen et al., 2021). The authors found significant correlations between the speech-ABR (i.e., wave V and A latency, wave V amplitude) and the surface area of the left rostral middle frontal gyrus (lRMFG), which is involved in language and complex sentence construction (Chapman et al., 1992, 1998). This suggests a relationship between the subcortical speech processing and cortical language systems in autistic children. Further, there was a significant indirect mediation effect in which the surface area of lRMFG predicted GDDS language outcomes via wave V amplitude, which supports combining subcortical and cortical indices of auditory processing to aid in understanding underlying neural mechanisms and behaviours relevant to autism.

Taken together, these findings suggest that the onset, offset, and/or magnitude of both subcortical and cortical auditory processes may be different in autistic individuals when assessed using speech-ABRs and ERPs. However, because most of the research has examined these processes in isolation with significant methodological variation across studies (i.e., comparison groups, developmental considerations such as IQ matching, differences in stimuli, recording electrodes, and dependent measures), it is challenging to integrate the results to inform a conceptualization of the full auditory system in autism. Accordingly, the primary aims of the present study were to present preliminary data that describes subcortical and cortical auditory processing from a sample of autistic individuals and their developmentally matched TD peers and to assess the relationship between subcortical and cortical auditory metrics and measures of autistic traits and sensory features.

To assess subcortical and cortical processes, the speech-ABR, the P1, and the MMN in response to speech sounds were compared between a group of autistic and TD children matched on chronological age and cognitive ability as measured by the Perceptual Reasoning Index (PRI) of the Wechsler Abbreviated Scale of Intelligence—Second Edition (WASI-II; Wechsler, 2018). To assess brain-behaviour relationships, associations were examined between the speech-ABR, the P1, the MMN, and specific a priori measures of Sensory Overresponsivity (Green et al., 2015; McKernan et al., 2020) and the Sensitivity quadrant of the SP (Dunn, 2014), and the Attention to Detail and Communication subscales of the AQ (Auyeung et al., 2008; Baron-Cohen et al., 2001, 2006). While these correlations are exploratory, the choice of correlates was based on our previous work (Cary et al., 2021) and the findings of others (Ruiz-Martínez et al., 2020) demonstrating associations between cortical indices of auditory processing to non-speech sounds measured with ERPs, autistic traits, and sensory features. Finally, developmental considerations such as IQ impacted all methodological decisions, including who and how the comparison group was selected (Zigler, 1967, 1969). As such, we elected to equate our participant groups on non-verbal IQ, which has been shown to better reflect autistic intelligence (Courchesne et al., 2019; Dawson et al., 2007) and is in line with best practices in developmental matching (Burack et al., 2004). In this way, differences in auditory processing can more easily be interpreted as reflecting fundamental group differences between autistic and non-autistic children, rather than being related to differences in cognitive ability.

Methods

Participants

Forty-two participants aged 7–17 years were recruited for this project, of whom 10 autistic (91% male) and 21 TD (43% male) completed both the speech-ABR and ERP portions of study and are included here (see Table 1). All participants spoke English as a first language. At the time of testing, all autistic participants had functional use of language and none had experienced any loss of language. Caregiver report indicated that five autistic participants had comorbid ADHD diagnoses, but these were not confirmed diagnostically, and no other psychiatric diagnoses were reported. Additionally, 4 autistic participants were reported to be taking prescription medications (Abilify, Clonadine, Intuniv, Fluoxetine, and Tenex). All TD participants were reported to have spoken their first words early or on time (around 12 months old). TD participants had no previous psychiatric diagnoses and were excluded if they had an Full Scale IQ (FSIQ) below 80, or a history of epilepsy, neurological, genetic, psychiatric, and learning disorders. Four of the TD participants had a sibling diagnosed with autism but caregivers reported no concerns. All participants identified as White and non-Hispanic/Latinx. All participants passed an audiologic evaluation, which included otoscopy, behavioural audiometric threshold evaluation, distortion product otoacoustic emissions (DPOAEs), transient click evoked otoacoustic emissions (TEOAEs), and wide band absorbance. Hearing and cochlear functioning was considered normal if the behavioural thresholds were < 25 dB HL. Lastly, the highest level of caregiver education completed was collected. Twelve percent of mothers finished high school, 20% completed some college, 20% had a Bachelor’s degree, and 48% had a Master’s degree or higher. For paternal education, 24% finished high school, 14% completed some college, 24% had a Bachelor’s degree, and 38% held a Master’s degree or higher.

Table 1 Demographic information by group

IQ scores from 2 TD children were missing and replaced with their group average. The participants groups were matched in chronological age (t(29) = 0.125, p = .901) and the PRI (t(29) = − 0.229, p = .821) of the WASI-II (Wechsler, 2011). Groups differed on FSIQ (t(29) = 2.861, p = .008) and the Verbal Comprehension Index (VCI; t(29) = 5.258, p < .001), such that the TD individuals scored significantly higher than their autistic peers (Table 1). Accordingly, VCI and FSIQ were covaried across all analyses. Autism diagnoses were confirmed using Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5; American Psychiatric Association, 2013) criteria, the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2; Lord et al., 2016), the Autism Diagnostic Interview—Revised (ADI-R; Rutter et al., 2003), and clinical judgement. Autistic participants had to score 7 or higher on the ADOS-2 and meet criteria on the ADI-R.

Procedure

This study was approved by the Institutional Review Board (IRB) at Syracuse University. Caregivers provided written informed consent and children provided written informed assent to participate. In total, the experiment took around four hours to complete, which was broken up into two or more sessions. Participants completed the hearing evaluation and two ABR tasks (i.e., click- and speech-ABR) at the Pediatric Audiology Laboratory. They completed the cognitive assessment and two MMN tasks (i.e., speech and non-speech MMN) at the Center for Autism Research and Electrophysiology (CARE) Laboratory. Questionnaires were completed at both locations. Only results from the experiments focused on speech processing are reported here.

ABR

ABR Experimental Task

The Intelligent Hearing Systems SmartEP system was used to present the stimuli and record the ABR data. Speech-ABRs were elicited using 40 ms /da/ syllables, with an alternating polarity of 11.1/s at 63 dB nHL and total recording time lasted around 20 min. Stimuli were presented binaurally and recorded simultaneously using two channels over left and right hemispheres. The lower forehead electrode (fz) was the ground, the high forehead (fpz) was the active electrode, and the reference electrodes were the mastoids (M1 and M2). Impedences were maintained below 3 kΩ and within 1 kΩ of each other. Data was amplified using a gain of 100, 000 and bandpass filtered from 100 to 3000 Hz.

ABR Data Analysis

Latencies were extracted using Smart EP software Version 5.10 (Smart EP, 2020). Two 1500 sweep runs were averaged together, resulting in a speech-ABR that contained 3000 sweeps. The latencies for waves V, A, and O were picked independently using visual inspection by two, experienced doctoral students. There was a 96.5% interrater agreement on the presence or absence of a wave and a 96.4% agreement on the latency of the peaks within 1.5 ms. The speech-ABR waves recorded from two participants were considered too noisy for the peaks to be chosen and were removed, leaving a final sample of 10 autistic and 19 TD participants for the speech-ABR analyses. Though typically presented only to the right ear, we elected to examine the speech-ABR in the context of a binaural presentation because it more closely replicates real life acoustic environments, is consistent with the EEG paradigm, and accounts for differences in the lateralization of language among autistic individuals (Flagg et al., 2005). Because the speech stimuli that elicited the speech-ABR were presented binaurally but recorded over both hemispheres at the same time, a series of paired t-tests were conducted to determine whether there were significant differences based on recording site by group. No significant differences were found for the autistic participants (wave V: t(1,9) = − 0.703, p = .500, wave A: t(1,9) = − 0.683, p = .512, wave O: t(1,9) = − 0.144, p = .889) nor the TD group (wave V: t(1,18) = 0.703, p = .491, wave A: t(1,18) = − 0.464, p = .649, wave O: t(1,18) = 0.858, p = .402). As such, the data from the right and left hemispheres were averaged together to reflect binaural stimulus presentation across both hemispheres for the remainder of the analyses.

Event-Related Potentials (ERPs)

Oddball Paradigm

Stream (Wyble, 2019) and MATLAB (MATLAB, 2010) were used to control the presentation of the stimuli. Participants were presented with a typical oddball paradigm in which one stimulus, the syllable /da/, was presented 80% of the time, and a second stimulus, the syllable /ba/, was interspersed among the standards 20% of the time. Speech sounds were voiced by a native English-speaking female. Syllables were presented in a random order, with the exception that two deviants would not play consecutively. There were 1000 trials of speech stimuli, with each stimulus playing for 360 ms followed by an interstimulus interval of 240 ms. Stimuli were presented via two speakers to the left and right of the computer screen at 60 dB SPL, each approximately 45 cm from the participants. To mitigate interference from other cognitive ERP components that require active attention (Näätänen et al., 2007), a distraction task was employed, in which participants watched a silent movie or television show of their choice and were instructed to ignore the auditory stimuli. This task lasted approximately 20 min.

ERP Recording

Net Station Software from Electrical Geodesics, Inc. (EGI; Electrical Geodesics, 2003) was used to record the ERP data with a Net Amps 300 series amplifier. A high-density 128-channel Geodesic SensorNet cap was fitted to each participant’s head using the nasion, inion, and jawbone as landmarks to help standardize fit. To increase the signal to-noise ratio, electrode impedences were maintained below 50kΩ and recorded at a sampling rate of 1024 Hz. Data was referenced to the Cz electrode during recording.

ERP Data Analysis

The ERP data were re-referenced to the average of the right and left mastoids (Näätänen et al., 2007). Data were band-pass filtered using a 0.1 Hz high-pass and 30 Hz low-pass 2nd order Butterworth filter (Luck, 2014). Next, the ERP data were segmented, beginning 50 ms before the presentation of the stimulus and continuing for 650 ms after. Following segmentation, the data were baseline corrected to allow for comparison across waveforms, set to 50 ms before the onset of the stimulus. A semi-automatic artefact rejection process was implemented in which epochs greater than 100 µV were removed. Lastly, visual inspection of the remaining data was conducted to identify any outstanding noisy channels. Identified channels were replaced using the surface spline interpolation method, which takes the activity from electrodes across the 3D scalp surface to inform the prediction of the missing data (Perrin et al., 1987). After interpolation, all participants had to have at least 200 acceptable standard trials and 50 deviant trials. The TD group had an average of 663.7 (SD = 102.5) accepted trials for the standard and 141.7 (SD = 22.6) for the deviant. The autistic group had an average of 568.8 (SD = 192.6) accepted trials for the standard and 118.6 (SD = 41.2) for the deviant. These did not differ by group for either the standard (t(28) = 1.458, p = .171) nor deviant (t(28) = 1.650, p = .125).

ERP Extraction

The P1 responses to the standard /da/ syllable were averaged together and the MMN was calculated by creating a difference wave for each participant (deviant /ba/–standard /da/; Näätänen et al., 2007). The P1 and the MMN were extracted from three electrode clusters: a frontocentral cluster consisted of the combined activity from electrodes E5, E6, E7, E12, and E106; the left temporal cluster consisted of electrode E39, E40, E45, E46, and E50; and the right temporal cluster was comprised from the activity at electrodes E101, E102, E108, E109, and E115. Grand averages were created by aggregating participants’ waveforms together at each electrode cluster by group. Visual inspection of the ERP waveforms, blind to group, revealed that one participant had excessive noise and was subsequently removed from further analysis. As such, the final sample for all ERP analyses consisted of 18 TD and 10 autistic participants.

ERPLab (Lopez-Calderon & Luck, 2014) was used to extract the latency and amplitude of the P1 and the MMN. To measure latency, 30% fractional area latency was selected to limit the impact of noise in the data and capture the onset latency (Luck, 2014). Time windows for the P1 and MMN latency were 100 ms and 200 ms in length, respectively, and were visually identified at the point in which the waveform deviated from 0µv by an experienced doctoral student, blind to group. These time windows were verified by an EEG expert, also blind to group. To identify a time window for the P1 mean amplitude, we first calculated peak latency for the grand average and then added 30 ms before and after that time for each electrode cluster. In line with previous research, the time window for the MMN mean amplitude was 150–250 ms (Näätänen et al., 2007). Additionally, to assess data quality and estimate noise in the ERPs, the analytic standardized measurement error (aSMÊ) was computed for all participants’ averaged ERP waveforms (Luck et al., 2021). The aSMÊ for the P1 and the MMN mean amplitude were calculated using the aforementioned time windows. Lastly, the aggregated root mean square standardized measurement error (RMS(SMÊ)) was calculated for each ERP by condition and group.

Questionnaires

The AQ is a 50-item questionnaire used to assess autistic traits and contains five subscales, including Social Skills, Attention Switching, Attention to Detail, Communication, and Imagination, with higher scores indicating a greater endorsement of autistic traits (Baron-Cohen et al., 2001, 2006). There are three different versions of the AQ with consistent item content but adapted for developmental level: a caregiver-reported Child AQ (Auyeung et al., 2008) was given to participants aged 4–11 years old (n = 12), the Adolescent AQ to participants aged 12–15 years old (n = 15; Baron-Cohen et al., 2006), and a self-report Adult AQ to participants 16 years or older (n = 4; Baron-Cohen et al., 2001). All three forms of the AQ have acceptable psychometric properties, with Cronbach’s alpha coefficients ranging from 0.63–0.97 across the subscales, which is in the Moderate to High range (Auyeung et al., 2008; Baron-Cohen et al., 2001, 2006). Additionally, research suggests that the AQ has good convergent validity with autism diagnosis and excellent test–retest reliability (Auyeung et al., 2008; Baron-Cohen et al., 2001, 2006). Scores from the Child version of the AQ were converted to allow for comparison across forms, such that item scores of 0 or 1 were converted to 0 and scores of 2 or 3 were converted to 1 for a total possible score of 50. In line with previous research (Cary et al., 2021; Kadlaskar et al., 2021; Kaplan-Kahn et al., 2021; Ruiz-Martínez et al., 2020) and hypotheses, the Attention to Detail and Communication subscales were selected for analysis.

The SP (Dunn, 2014) is a 125-item caregiver-report questionnaire that assesses sensory processing and behaviours, with higher scores indicating a greater endorsement of sensory features. There are four quadrants on the SP: Seeking, Registration, Avoidance, and Sensitivity. Additionally, previous research has established an additional composite of Sensory Overresponsivity, which is comprised of 14 items to examine tactile, auditory, and visual sensitivity (Green et al., 2015; McKernan et al., 2020). The SP has acceptable psychometric properties with Moderate to High internal consistency, as estimated from Cronbach’s alpha coefficients ranging from 0.47 to 0.91 across subscales (Dunn, 2014), moderate convergent validity to other validated measures of sensory processing (i.e., Sensory Processing Measure; Miller-Kuhaneck et al., 2007) and discriminant validity between children with a clinical diagnosis (e.g., Attention Deficit Hyperactivity Disorder, Asperger Syndrome, Fragile X Syndrome) and TD children (Brown et al., 2008). Given the interest in auditory sensory features in autism, the Sensitivity quadrant and Sensory Overresponsivity composite (Green et al., 2015; McKernan et al., 2020) were selected for analysis.

Statistical Analyses

All statistical analyses were conducted using SPSS V. 27 (IBM Corp, 2020). To characterize group differences, a series of mixed-model Analyses of Variance (ANOVAs) were run on the speech-ABR, P1, and MMN with group as the between-subjects variable, covarying for FSIQ and VCI. To assess the relationships between the speech-ABR, P1, MMN, autistic traits, and sensory features, a series of bivariate Pearson correlations were conducted between the latencies of the speech-ABR waves, the 30% fractional area latencies and mean amplitudes of the P1 and MMN, the Attention to Detail and Communication subscales of the AQ (Auyeung et al., 2008; Baron-Cohen et al., 2001, 2006), the Sensory Sensitivity quadrant of the SP (Dunn, 2014) and a Sensory Overresponsivity measure based on SP items (Green et al., 2015; McKernan et al., 2020).

Results

Descriptive analyses including the means and standard deviations for the latencies and amplitudes of the speech-ABR, P1, and MMN data by group are listed in Table 2. To further describe the morphology of the waveforms, the peak amplitudes and peak latencies of the grand average waveforms by group are provided. Specifically, the autistic group had a P1 that reached a peak of 5.21 µv at 112.8 ms in the frontocentral, 1.56 µv at 142.4 ms in the left temporal, and 2.9 µv at 137.6 ms in the right temporal electrode cluster. The TD group had a P1 that reached a peak of 4.11 µv at 118.4 ms in the frontocentral, 1.21 µv at 116 ms in the left temporal, and 1.66 µv at 133.6 ms in the right temporal electrode cluster. The MMN for the autistic group peaked at −2.07 µv in 280 ms in the frontocentral, −1.29 µv at 292 ms in the left temporal, and −1.62 µv in 276 ms in the right temporal electrode cluster. The TD group had an MMN at −2.12 µv in 220 ms in the frontocentral, −1.01 µv at 156 ms in the left temporal, and −0.65 µv by 216 ms in the right temporal electrode cluster. Grand averages by group for the speech-ABR, P1, and MMN can be found in Figs. 1 and 2, respectively. Additionally, a summary of data quality as measured via the RMS(SMÊ) for each ERP by condition and group can be found in Table 3. The RMS(SMÊ) for the P1 were consistently smaller than the SDTotal, suggesting that the observed differences were minimally impacted by measurement error (Luck et al., 2021). For the MMN, the RMS(SMÊ) values for the standard were also consistently smaller than the SDTotal, but slightly higher in the deviant condition, likely due to fewer trials, though still generally smaller than the SDTotal.

Table 2 Descriptive statistics of subcortical and cortical neurophysiological data
Fig. 1
figure 1

The speech-ABR by group with autistic (dashed) and TD (solid) participants

Fig. 2
figure 2

ERPs by group. A EEG channel map with the electrode clusters circled in blue (Frontocentral = solid, Right temporal = dashed, Left temporal = dotted), B The P1 to the standard /da/ by cluster, C The P1 to the deviant /ba/ by cluster, D The MMN by cluster with autistic (dashed) and TD (solid) participants

Table 3 Root mean square of the standardized measurement error values (RMS(SMÊ)) by condition and group for the mean amplitude of the ERPs

A mixed-model ANOVA was conducted with latency of the speech-ABR waves (V, A, O) as the within-subjects factor, group as the between-subjects variable, and FSIQ and VCI as covariates. Mauchly’s Test of Sphericity was violated, so Greenhouse–Geisser corrections were applied. Analyses revealed the expected main effect of wave (F(1.526, 38.155) = 3388.462, p < .000 ηp2 = .993) with no other significant effects or interactions, including group, (F(1,25) = 1.597, p = .218 ηp2 = .060), suggesting that while latency increased across the speech-ABR (V, A, O), these latencies did not differ by group.

Two mixed-model ANOVAs were conducted to examine the latency and amplitude of the P1 with electrode cluster as the within-group factor, group as the between-subjects variable, and FSIQ and VCI as covariates. No significant differences were found for the P1 latency, but there was a main effect of group for P1 amplitude (F(1,26) = 12.443, p = .002, ηp2 = .324), with autistic participants demonstrating a larger P1 compared to their TD peers. Additionally, an interaction between cluster and group was found (F(2,52) = 4.933, p = .015, ηp2 = .159), with Bonferroni corrected pairwise comparisons suggesting that autistic participants had larger amplitudes at the frontocentral (Mdiff = 3.194 (95% CI 1.169 to 5.220, p = .003)) and right temporal electrode clusters (Mdiff = 2.008 (95% CI 1.157 to 2.858, p < .001)) than TD participants but not at the left temporal electrode cluster (Mdiff = 0.889 (95% CI − 0.395 to 2.173, p = .167)).

Two mixed-model ANOVAs were conducted to examine the latency and amplitude of the MMN, with electrode cluster as the within-subject factor, group as the between-subjects variable, and FSIQ and VCI as covariates. The mixed-model ANOVA for MMN latency found a main effect of group (F(1,26) = 10.108, p = .004 ηp2 = .280), FSIQ (F(1,26) = 15.604, p = .001 ηp2 = .375), and VCI (F(1,26) = 15.366, p = .001 ηp2 = .371), suggesting that the autistic group had significantly earlier latencies than the TD group. No significant results were found for the MMN amplitude.

Relationship Between Subcortical, Cortical, and Behavioural Measures

The bivariate Pearson correlations between subcortical, cortical, and behavioural measures were conducted across groups. While all the correlations are presented in Table 4, we highlight the following relationships between neurophysiological measures of auditory processing here (Fig. 3). All waves of the speech-ABR were correlated with one another (i.e., V, A, O). Activity at electrode clusters were correlated to each other for both the P1 latency and the P1 amplitude, with similar findings noted for the MMN latency and amplitude. Only wave A of the speech-ABR was negatively correlated with the MMN amplitude in the right temporal cluster, such that a later wave A latency was associated with a larger MMN amplitude (r(28) = − 0.464, p = .013). Between cortical measures, MMN amplitude was positively correlated with P1 latency (r(29) = 0.503, p = .005) but negatively correlated with P1 amplitude (r(30) = − 0.447, p = .013; r(30) = -0.372, p = .043), suggesting that faster and larger P1 responses are associated with a larger MMN.

Table 4 Correlation matrix for speech-ABR, P1, MMN, sensory profile, and autism quotient data
Fig. 3
figure 3

Correlations between neurophysiological and behavioural data by group with autistic (dashed) and TD (solid) participants

Additionally, multiple brain-behaviour relationships were found (Table 4 and Fig. 3). Notably, a later wave A latency of the speech-ABR was related to higher scores on the Communication subscale (r(28) = .477, p = .01). The P1 was correlated with autistic traits and sensory features, such that a larger P1 was associated with a higher endorsement of Attention to Detail (r(29) = .518, p = .004), Communication Challenges (r(29) = .417, p = .024), Sensitivity (r(27) = .518, p = .013; r(27) = .563, p = .002), and Sensory Overresponsivity (r(27) = 0.570, p = 0.002). Lastly, the MMN amplitude was negatively associated with autistic traits and sensory features, such that a larger MMN was associated with greater Attention to Detail (r(29) = − .596, p = .001) and greater Sensory Overresponsivity (r(27) = − .479, p = .012).

Discussion

The present study used a multimethod approach to provide a preliminary characterization of the subcortical and cortical auditory systems and their relationship to behavioural traits relevant to autism, while considering the impact of developmental matching strategies. To this end, the subcortical speech-ABR and the cortical P1 and MMN were collected in the same sample of autistic individuals and their age and non-verbal IQ matched TD peers, along with behavioural measures of autistic traits and sensory features. In general, autistic and TD children had similar binaurally evoked speech-ABR latencies, while autistic children had larger P1 amplitudes and earlier MMN latencies compared to their TD peers. Additionally, these neurophysiological measures of auditory processing were related to each other and to behavioural ratings of autistic traits and sensory features with findings that, in general, larger P1s and MMNs were related to increased autistic traits and sensory sensitivities.

Speech-ABR

While autistic participants generally show prolonged waves V, A, and O in the speech-ABR compared to their TD peers (Chen et al., 2019, 2021; El Shennawy et al., 2014; Jones et al., 2020; Ramezani et al., 2019; Russo et al., 2009), we found no group differences in the latencies of the speech-ABR. However, speech-ABRs have never been examined in the context of a binaural presentation and with an age and IQ matched comparison group. These two critical methodological differences may explain the apparent discrepancy between the present study and previous research. First, the finding that there are no differences in binaural, but not monaural right, speech-ABR latencies is consistent with research showing asymmetrical laterality to click-ABRs (i.e., prolongations in one ear only; Roth et al., 2012) and fMRI data showing more prevalent right lateralization of language among autistic children (Knaus et al., 2010). Thus, while autistic participants may have prolonged monaural right speech-ABR latencies, when presented binaurally, which more closely replicates speech processing in the natural environment, group differences are no longer evident. Of note, the binaural left and right hemispheres were analyzed separately post hoc and no differences were found, though future research juxtaposing monaural and binaural presentation in autism is warranted. A second explanation for the observed differences in the present study could be that while autistic individuals show prolonged speech-ABR latencies compared to their age matched peers (Chen et al., 2019, 2021; Russo et al., 2009), this group difference disappears when we account for IQ. This pattern of “deficits” no longer being present when developmental factors are considered is in line with a developmental approach (Burack et al., 2021, 2011a, 2011b; Russo et al., 2007, 2021), and has been commonly reported in other neurodevelopmental disorders (Burack et al, 2021) such as Down syndrome (Matsuba et al., 2022) and Fetal Alcohol Spectrum Disorders (FASD; Lane et al., 2014), among others (see Burack et al., 2016). Thus, despite speech-ABRs being an early, automatic, and fundamental neurological process that is thought to develop before age 5 (Johnson et al., 2008), taking a developmental approach and accounting for cognitive ability may nonetheless be a relevant factor for consideration.

ERPs

At the cortical level, autistic participants had larger P1 amplitudes and earlier MMN latencies compared to their TD peers but did not differ in terms of the P1 latency or MMN amplitude. While other studies have found generally similar P1 amplitudes between autistic and IQ matched TD peers (Kadlaskar et al., 2021; Kujala et al., 2010; Lepistö et al., 2006; Whitehouse & Bishop, 2008), the finding that autistic participants had enhanced early perceptual neural responses is in line with previous work from our group (Cary et al., 2021), which focused on different aspects of auditory processing (i.e., habituation and discrimination) and different stimuli (i.e., non-speech sounds). The findings from this study and others (for a review, see O’Connor, 2012) support the basic tenets of the Enhanced Perceptual Functioning (EPF) Model (Mottron et al., 2006), which suggest that there are fundamental neurological differences that lead to an enhancement in and a precedence of sensory-perceptual processes in autism, which in turn impact higher-order processes such as attention, communication, and cognition (Samson et al., 2012). Taken together, the results of this study suggest that enhancements in early perceptual processes are different in autistic individuals and that the P1 and MMN may represent neurological underpinnings of behavioural studies demonstrating autistic advantages across sensory modalities, including vision (Hagmann et al., 2016; Kopec et al., 2020; O’Riordan et al., 2001), audition (Bonnel et al., 2010; Jones et al., 2009) and touch (McKernan et al., 2020).

One potential reason for the differences between the results of the present paper and others is the seemingly small but nonetheless crucial choices in ERP measurement. Mean amplitude and fractional area latency may offer valuable alternatives to typical peak based measurements, as they are less sensitive to noise (Luck, 2014) and allow for honing in on specific aspects of the processing chain (i.e., the onset of perception and discrimination). A second measurement consideration is how and why time windows are selected, as all ERP measures are entirely dependent on these prespecified boundaries. Narrower time windows are generally preferred, especially for early peaks, because they limit the risk of conflation from other overlapping ERP components (Luck, 2014), but there are currently no standardized guidelines for the time windows of any given ERP. As an example of how these methodological details can influence results, one study using mean amplitude in an age, sex, and IQ matched sample found no differences in the P1 when applying a 100 ms time window centered around 130 ms (± 50) between autistic and TD children (Kadlaskar et al., 2021). Though this approach may capture a broader proportion of the waveform, visual inspection of the grand averages suggests that the P1 of the autistic participants returned to baseline sooner than the TD group. Thus, in calculating the mean amplitude, this may lead to the interpretation that the autistic group had a smaller overall response, instead of catching the group-specific differences in morphology (i.e., the autistic P1 elapsed in a briefer time window than the TD group). In contrast, by selecting a narrower time window of 60 ms that was identified by adding 30 ms before and after the peak of the grand average, we believe to have better captured the transient nature of the P1 across groups. To assess this hypothesis, two post hoc mixed-model ANOVAs were conducted. Consistent with Kadlaskar et al. (2021), when using a 100 ms time window for the mean amplitude of the P1, there were no group differences. However, when adding in the FSIQ and VCI as covariates with the same 100 ms window, the autistic participants had a significantly larger P1 than their TD peers. Though exploratory and post hoc, this finding further emphasizes the importance of accounting for measurement, and cognitive and verbal abilities in research with individuals with neurodevelopmental disabilities (see Burack et al., 2016; Russo et al., 2007).

Previous research using peak latency generally finds that autistic participants have similar or later MMNs compared to their TD peers (Čeponienė et al., 2003; Green et al., 2020; Jansson-Verkasalo et al., 2003; Kemner et al., 1995; Kuhl et al., 2005; Kujala et al., 2010; Lepistö et al., 2005; though results are more mixed for non-speech sounds; Ferri et al., 2003; Gomot et al., 2002; Kujala et al., 2007), while we find earlier onset MMNs among autistic participants. Here, we also invoke measurement differences that could account for this finding. In addition to being more sensitive to noise, fractional area latency is well suited to capture the latency of ERP components with multiple peaks, such as the MMN (see Fig. 1), because peak latency can differ significantly depending on which peak is identified (for a review, see Kiesel et al., 2008). To further demonstrate parsimony between our results and that of others, the peak latency of the MMN was later in the grand average of the autistic participants, affording greater confidence in our data and highlighting the potential differences in findings when using different measurement tools. While in need of replication, using fractional area latency, or other measures like fractional peak latency, may offer an alternative to representing neural processing that may be conceptually relevant to autism research. Lastly, as more researchers publish estimates of data quality such as SMÊ and adhere to standardized reporting guidelines of EEG research (e.g., Keil et al., 2014; Webb et al., 2015), we may be better able to parse out the signal from the noise (i.e., measurement error) and more accurately determine the level of confidence we have in group differences.

In addition to influencing the subcortical processing of speech sounds, cognitive abilities including language may also play a role in the presence or absence of group differences in cortical auditory processing indexed by ERPs. Even when accounting for cognitive ability, we found that verbal skills significantly impacted the latency of the MMN, which is consistent with findings that language abilities are associated to the MMF latency in autism (Roberts et al., 2011; Matsuzaki et al., 2019). All participants in the present study used fluent speech to communicate, had generally average IQs, and were matched on PRI, but nonetheless had lower verbal IQs than their TD peers (Courchesne et al., 2019; Dawson et al., 2007). This demonstrates the challenges in differentiating the impact of language and cognitive abilities, even in the design and interpretation of research assessing low-level, pre-attentive neural processes (Burack et al., 2016). Future research may consider supplementing analyses with the use of covariates (Jarrold & Brock, 2004), however, to truly parse out the differences between diagnosis, language, and cognitive abilities, further research with larger sample sizes grounded in a developmental approach is needed.

Brain Behaviour Relationships

While broadly exploratory given limitations in sample sizes, there appeared to be associations between subcortical and cortical responses, autistic traits, and sensory features. First, wave A latency of the speech-ABR was negatively correlated to the MMN amplitude, suggesting that longer processing of the onset of the speech sound was associated with greater cortical auditory discrimination. That is, the later the neural response to the initial frication of the /d/ consonant (Sanfins & Colella-Santos, 2016), the greater the discrimination response between /da/ and /ba/ speech sounds. Wave A latency of the speech-ABR was also positively correlated with the Communication subscale of the AQ, such that later latency was associated with more communication challenges reported by children or their caregivers. These findings are in line with Chen et al. (2021), who noted a relationship between subcortical and cortical auditory processing, as well as the same association between wave A latency of the speech-ABR and a behavioural measure of language in autism. This relationship suggests, at least preliminarily, that the onset of speech processing is related to communication in autistic and TD children.

In line with EPF (Mottron et al., 2006), faster and larger P1 responses were associated with a larger MMN across groups, suggesting a cortical continuity in auditory processing in which enhanced perception was associated with enhanced discrimination. Despite being underpowered to fully examine this relationship by group, visual inspection of the correlation plot suggests that this pattern may be stronger for autistic participants (Fig. 3). However, consistent with previous research that has found evidence for EPF in TD populations (Kaplan-Kahn et al., 2021), the presence of this relationship across groups (though stronger in autism), may suggest that this processing style can be adapted by neurodiverse and neurotypical individuals alike.

There were also associations between the cortical P1 and MMN measures, autistic traits, and sensory features. Specifically, larger ERP amplitudes were associated with greater levels of Sensitivity and Sensory Overresponsivity, as well as higher levels of Attention to Detail and greater Communication Challenges. That is, the larger the auditory perception and discrimination response, the more similar to the typical autism phenotype, even though two-thirds of the participants were not autistic. These correlations, while preliminary, are also in line with the EPF among both autistic and TD children and suggest that enhancements in early sensory perceptual processes (perception at the level of the P1 and discrimination at the level of the MMN) are associated with a greater endorsement of autistic traits and sensory behaviours. This relationship between larger cortical responses and higher ratings of autistic traits and sensory features has been noted in other studies, including Cary et al. (2021) and Kaplan-Kahn et al. (2021), but warrants further investigation with larger and more diverse samples. Nonetheless, this suggests that cortical measures such as ERPs are related to sensory, perceptual, attentional, and cognitive processes, and may provide instrumental support in elucidating brain-behaviour relationships that are relevant to autism.

Limitations

There are several limitations to the present study. First, due to constraints imposed by the COVID-19 pandemic, our sample size was small. Additionally, we did not exclude TD participants who had a sibling diagnosed with autism, which may have impacted the homogeny of the comparison group and the subsequent behavioural and neurophysiological findings (see Clawson et al., 2017; Maziade et al., 2000; Pisula & Ziegart-Sadowska, 2015 for examples of differences among sibling probands). Based on a preliminary power calculation using GPower (Faul et al., 2009), future research may seek to recruit a sample size of at least 46 participants to reach adequate power to assess significance using a mixed-model ANOVA with an expected effect size of − 0.33 (Chen et al., 2020), α error probability of 0.05 and power of 0.80. A second limitation is that potential differences between the /da/ and /ba/ speech sounds, such as the formant frequencies, may have impacted the MMN by capturing differences in stimulus perception in addition to auditory discrimination. To account for such factors, future research may consider counterbalancing the oddball paradigms (see Schwartz et al., 2018). However, in striving for methodological rigor, pragmatic barriers that accompany long experimental paradigms warrant consideration to ensure inclusivity, representation, and diversity in autism research (Cascio et al., 2021).

While making recommendations about broadening the representation of autism research, we acknowledge that our sample is not representative of all autistic individuals, as participants had mean Average IQ scores and 100% identified as White and non-Hispanic/Latinx. Given the challenges associated with recruiting larger and more diverse samples and the importance of ensuring that research is inclusive of all autistic individuals, especially with respect to race, ethnicity, gender, socioeconomic status, cognitive ability, and support needs (see Cascio et al., 2021), future researchers may consider taking a consortium approach to work collaboratively to bolster sample sizes and increase representation in autism research.

Conclusion

The present study used multiple measures of neurophysiology to describe auditory processing in autistic and TD children and found that autistic individuals had similar speech sound processing at the level of the brainstem for binaurally presented stimuli, but larger and faster cortical responses compared to their age and IQ matched TD peers. Additionally, in line with the EPF, we found that faster and larger perceptual responses were associated with larger discrimination across participants, but especially in autism, suggesting enhancements in early, lower-order processes. Lastly, this study provided a novel characterization of the relationship between the speech-ABR and the MMN and between indices of cortical auditory processing to autistic traits and sensory sensitivity.

The number of studies focused on sensory features in autism has increased since their addition to the DSM-5 in 2013 (American Psychiatric Association, 2013). However, there are still few studies connecting across neurological and behavioural levels. Research on subcortical, cortical, and behavioural relationships can open important avenues for future research in sensory systems. ABRs and ERPs are relatively inexpensive and non-invasive neurophysiological measures that can be derived in the absence of behavioural responses, making them ideal for research with younger individuals and those with greater support needs, which creates increased opportunities for much needed representation and inclusivity in research. Future research applying the methodology and developmental framework demonstrated here may be valuable in replicating and extending the findings of the present study to examine (1) the speech-ABR and ERP data in the frequency domain, (2) causal relationships between subcortical-cortical auditory processing, and (3) relationships between neurophysiological measures of auditory processing and behaviours as a function of group. Implications of this research may inform our understanding on the neural mechanisms that contribute to auditory sensory features and underlie social communication, which may be leveraged to inform the diagnosis and clinical conceptualization of autism, as well as to further centralize the importance of sensory systems in the development of autism.