Abstract
Predicting and organizing patterns of events is important for humans to survive in a dynamically changing world. The motor system has been proposed to be actively, and necessarily, engaged in not only the production but the perception of rhythm by organizing hierarchical timing that influences auditory responses. It is not yet well understood how the motor system interacts with the auditory system to perceive and maintain hierarchical structure in time. This study investigated the dynamic interaction between auditory and motor functional sources during the perception and imagination of musical meters. We pursued this using a novel method combining high-density EEG, EMG, and motion capture with independent component analysis to separate motor and auditory activity during meter imagery while robustly controlling against covert movement. We demonstrated that endogenous brain activity in both auditory and motor functional sources reflects the imagination of binary and ternary meters in the absence of corresponding acoustic cues or overt movement at the meter rate. We found clear evidence for hypothesized motor-to-auditory information flow at the beat rate in all conditions, suggesting a role for top-down influence of the motor system on auditory processing of beat-based rhythms, and reflecting an auditory-motor system with tight reciprocal informational coupling. These findings align with and further extend a set of motor hypotheses from beat perception to hierarchical meter imagination, adding supporting evidence to active engagement of the motor system in auditory processing, which may more broadly speak to the neural mechanisms of temporal processing in other human cognitive functions.
SIGNIFICANCE STATEMENT Humans live in a world full of hierarchically structured temporal information, the accurate perception of which is essential for understanding speech and music. Music provides a window into the brain mechanisms of time perception, enabling us to examine how the brain groups musical beats into, for example a march or waltz. Using a novel paradigm combining measurement of electrical brain activity with data-driven analysis, this study directly investigates motor-auditory connectivity during meter imagination. Findings highlight the importance of the motor system in the active imagination of meter. This study sheds new light on a fundamental form of perception by demonstrating how auditory-motor interaction may support hierarchical timing processing, which may have clinical implications for speech and motor rehabilitation.
Introduction
Perceiving temporally hierarchical events is critical for humans to extract meaning from speech and music. In music, perceiving time as various rhythms, such as march or waltz, is ubiquitous. This process is known as meter perception, which is when the basic units, beats, are organized into meters, nested hierarchical structures of strong and weak position (Large, 2008). Humans are able to willfully subjectively impose hierarchical meters on ambiguous rhythms such that even physically identical series of events can be grouped into either twos or threes like a march or waltz rhythm (Bolton, 1894; Nozaradan et al., 2011, 2012, 2016; Fujioka et al., 2015). These studies showed that meters can be induced not only by bottom-up physical stimulus features, such as loudness, but also by intrinsically interpretive or imaginative processes, which are the primary focus of this study to understand how time perception is shaped by endogenous neural activity.
While the underlying mechanisms of how humans organize beats into hierarchical meters are still not clear, numerous studies and theories suggest the importance of motor regions when interpreting rhythmic sounds (for review, see Merchant et al., 2015; Proksch et al., 2020; Cannon and Patel, 2021). The motor cortico-basal ganglia-thalamo-cortical circuit is involved in beat listening tasks even when participants were asked not to move (Grahn and Brett, 2007; Bengtsson et al., 2009; Grahn and Rowe, 2013; Vaquero et al., 2018), and functional connectivity between motor and auditory regions is higher for rhythms with stronger metricality (Chen et al., 2006, 2008; Zatorre et al., 2007; Kung et al., 2013). These studies, together with MEG/EEG studies of beat processing (Iversen et al., 2009; Fujioka et al., 2012), inspired motor theories of beat perception, such as the Action Simulation for Auditory Prediction (ASAP) hypothesis (Patel and Iversen, 2014; see also Schubotz, 2007; Arnal, 2012; Patel and Iversen, 2014; Morillon et al., 2015), which propose that the motor system plays an essential role in the active perception of time, influencing temporal expectation by exerting top-down influence on the auditory system to shape the temporal processing of sound events.
Critical to motor theories is not merely that auditory information influences motor regions (indeed, the influence of auditory information in shaping movement is well known) but that there is influence in the reverse direction, of the motor system on auditory processing. Most prior studies have examined connectivity between auditory and motor regions in a nondirected way and thus were not able to speak to this distinction. Prior studies have primarily studied simple beat perception, without hierarchical metrical patterns, showing auditory-(sensori)motor phase coupling at the δ band (2-4 Hz) when subjects tapped to beats (Nozaradan et al., 2015), and in the beta band (18-24 Hz) during a passive listening task of isochronous rhythm (Fujioka et al., 2012). Using a directed connectivity method, Morillon and Baillet (2017) first observed that temporal attention is revealed by a directed-phase transfer entropy from the sensorimotor cortex to the associative auditory cortex in the beta band. These studies broadly support the importance of the motor system at the beat level; and while this could parsimoniously be assumed to apply as well for multiple hierarchical levels present in real-life music or speech, how human brains encode and maintain hierarchical organization at the meter level has not been tested.
In the current study, we recorded high-density EEG while subjects imagined binary and ternary meters without overt movements (with appropriate control conditions). We then directly tested how the motor and auditory system together respond to imagined meters by isolating the two functional sources to observe their individual neural responses and connectivity between them. Specifically, based on the ASAP hypothesis and other motor theories, we predicted that not only auditory but also motor sources will reflect imagined meters, and that directional connectivity will show top-down motor-to-auditory flow in beat listening and, critically, during meter imagination.
Materials and Methods
Ethics statement
The Institutional Review Board office of the University of California–San Diego approved this study. Participants signed informed consent forms to participate. The participants were compensated $15/h for participating in the experiment.
Participants
Twenty-six participants (15 females, 11 males, 0 nonbinary, mean = 23.08 years old) were recruited. All except for one of them (ambidextrous) were right-handed. All participants reported having normal hearing and no history of psychological disorders, neurodegenerative diseases, or brain trauma. Of these 26 participants, 6 were excluded for the following reasons: one of them (the subject who is ambidextrous) did not finish the task because of technical issues, two could not perform the main task correctly (for more details, see Tapping data to verify imagery), and three had very few usable data after preprocessing. After exclusion, the final sample size was 20. We chose our final sample size to be consistent with recently published works in general; and in specific, the most relevant previous study included 21 subjects (Morillon and Baillet, 2017).
Stimuli
At the beginning of each trial, we presented a double warning-tone (60 ms 400 Hz sine tones); then a series of unaccented and accented bass drum strokes were presented 1 s after the warning-tone. The drum strokes were modified from GarageBand SoCal drum kit and were delivered in MATLAB (The MathWorks) using the Psychophysics Toolbox software (Brainard, 1997). Each drum stroke was modified to be 416.67 ms long to include the attack and decay, played with a 2.4 Hz period (i.e., an interonset interval of 416.67 ms). The stimuli were presented using earbud headphones sealed in the ear (Sony MDRXB50AP). In certain phases of a trial, accented (10 dB rms) stimuli were presented every two or three drum strokes, creating a binary or ternary meter. To label the onset time of the stimuli, we created a time-locked square pulse for each drum sound as trigger events. Then the sound events and trigger events were combined and output in a wav file. During the experiment, the left channel of the wav file (i.e., sound events) was converted from mono to stereo to play on both left and right channels of the earbuds. The right channel of the wav file (i.e., trigger events) was recorded by the BioSemi Analog Input Box time-locked to EEG data recording. In this way, we had a precise (within 0.5 ms) record of sound onsets.
Procedure
Participants signed informed consent documents and received experiment instruction. As the purpose of the experiment was to investigate metrical perception and production, participants were required to do a metrical tapping practice session before the setup to ensure they could perform the task by accurately recognizing and tapping binary and ternary meters. All participants passed the practice session. Two participants who had the most difficulty grasping the task did not, however, perform the task correctly in the main experiment, and were excluded from further analysis (for more details, see Tapping data to verify imagery).
The current experiment included two tasks: a localizer task (Fig. 1) and the main task (Fig. 2). In the localizer task, participants performed three tasks. First, they performed spontaneous tapping at their preferred tempo without hearing any sound stimulus for 3 min (i.e., Tap localizer). They were instructed to tap as stably as possible. Then they listened to unaccented drum stroke stimuli with interonset interval randomized between 1 and 1.5 s for 3 min (i.e., Listen localizer). Finally, they tapped along with regular drum strokes (interonset interval = 600 ms). They were asked to synchronize with the sound as precisely as possible (i.e., Sync localizer). The Sync localizer was not analyzed and discussed in this study.
In the main task, there were 90 trials. Each trial was divided into 4 conditions in order: Baseline, Physical Meter, Imagined Meter, and Tap. Trials were preceded by a warning tone. In the Baseline condition, they were presented with 12 unaccented bass drum strokes played with a 2.4 Hz period. In the Physical Meter condition, the drum strokes were accented every two or three beats, creating either a binary march or a ternary waltz meter. In the Imagined Meter condition, they were instructed to subjectively impose the meter from the Physical Meter condition onto unaccented sounds that were identical to the Baseline condition. Finally, in the Tap condition, the sound stopped and they tapped the imagined meter, both the strong and weak beats, for verification.
During the whole experiment, subjects wore sealed earbuds and were comfortably seated in a chair with their right hands resting on a support. They were instructed to stay still during the listening part and tap only with their index finger during the tapping part of each trial.
Experiment design
A within-subject 2 (Meter: binary and ternary) × 4 (Conditions: Baseline, Physical Meter, Imagined Meter, and Tap) repeated-measures design was used in this study. Each trial was either in a binary or a ternary meter. We had 45 binary trials and 45 ternary trials in total. Each trial, including the warning tone with a gap of 1 s and four conditions (each 5 s long), took 21 s. The intertrial interval was randomized between 0 and 5 s to avoid anticipation of the next trial. The experiment, including the localizer and the main tasks, took ∼45 min to complete. The total experiment, with setup, took ∼2 h to complete.
Data recording
High-density EEG was recorded on a BioSemi Active II system (BioSemi) using a custom montage cap with 205 active electrodes chosen to fit the participant's head. Data were amplified and sampled at 2048 Hz while recording through the BioSemi ActiveTwo AD-box. The sound and tap trigger were recorded through a BioSemi Analog Input Box sent to the EEG amplifier and became a time-locked part of the EEG stream. We first checked offset (<40) and signal quality in BioSemi ActiView, then used Lab Streaming Layer (Christian Kothe, https://github.com/sccn/labstreaminglayer) to output the EEG data stream for LabRecorder. Before EEG recording, the electrode locations were recorded with a 3-D digitizer (Zebris Medical).
Participants tapped with their right index finger on a custom force-sensing button box (FX1901, 50 lb; www.TE.com). The participants were instructed to tap on the center (i.e., the bump) of the button and not to rest on or push the button when they were not required to tap. We monitored during the experiment to ensure that their tapping pattern followed the meter structure (i.e., binary: Strong-Weak-Strong-Weak…; ternary: Strong-Weak-Weak-Strong-Weak-Weak…). The tapping data were recorded through the BioSemi Analog Input Box. The latencies of the moments the finger touched the button were extracted from the data by an in-house script. We attenuated the sound feedback of tapping by using sealed earbuds and putting a layer of soft cloth on the button box. Simultaneously, to monitor that participants did not make unwanted movements during the tasks (especially during imagining), we recorded muscle activity using two surface EMG sensors placed on the first dorsal interosseous muscle on the right hand. Other body parts were also monitored by experimenters in real-time and recorded at 480 Hz using a custom 10 LED marker motion capture configuration (see Fig. 4; PhaseSpace), measuring the movements of head, elbows, left hand, right index finger, knees, and toes.
EEG preprocessing
Throughout the preprocessing and analysis, in-house scripts with EEGLAB version 2019.1 (Delorme and Makeig, 2004) running under MATLAB 2019a (The MathWorks) were used. The code will be publicly accessible on Open Science Framework. The data were downsampled to 512 Hz and high-pass filtered with cutoff frequency at 0.1 Hz. We performed average referencing on EEG channels, then did artifact removal using artifact subspace reconstruction with the following parameters: Flatline Criterion = −1; Highpass = −1; ChannelCriterion = 0.6; LineNoiseCriterion = −1; BurstCriterion = 5; WindowCriterion = 0.25 (Christian Kothe, https://github.com/sccn/labstreaminglayer) (Chang et al., 2020). Independent component analysis (ICA; infomax binica) was then applied on the data to extract brain EEG sources as well as outside-brain artifact sources, such as blink, eye movement, and facial and neck muscle activation (Bell and Sejnowski, 1995; Makeig et al., 1996). We localized equivalent dipole locations of independent components (ICs) scalp maps using DIPFIT version 3.0 plug-in (Oostenveld et al., 2011). The ICLabel version 0.3.1 plugin was used to generate probabilistic labels based on a machine learning algorithm using response features extensively trained by field experts (Pion-Tonachini et al., 2019). Only ICs with probability >0.4 as brain ICs went through further analysis. For the frequency analysis and phase analysis, continuous data were epoched into trial conditions (i.e., Baseline, Physical Meter, Imagined Meter, and Tap) using a 0-5 s window without baseline correction. For the directional connectivity analysis, we resampled the data to 128 Hz, and epoch with a −0.6 to 5 s window to avoid edge effects.
Data analysis
Identification of motor and auditory ICs using localizer trials
We applied data-driven ICA to Listen and Tap localizer data to identify one IC in each subject that best captures primary auditory and motor neural activities. ICA was proposed as a solution to this mixing problem by decomposing the recorded EEG into a collection of maximally independent components (Bell and Sejnowski, 1995; Makeig et al., 1996). The reason for using ICA is to overcome the inherent mixing of cortical sources in the recorded EEG channels. While traditionally a single channel, such as Cz, might be examined for auditory responses, in a sensory-motor paradigm, this will be contaminated by motor activity. Auditory and motor ICs were identified by selecting the ICs that explained the most variance of sound and movement-related evoked responses in localizer trials, using the following procedures: After applying logistic infomax ICA (i.e., binica of the runica function), we used ICLabel to identify and exclude artifactual, nonbrain ICs (threshold > 0.4). After back-projecting all identified brain ICs to EEG channels, we assessed which of the putative brain ICs accounted for the most variance in the auditory and motor ERPs computed across all channels in the Listen and Tap localizer trials, using the pvaf (percent variance accounted for) measure: [pvafi = 100 – 100 × mean(var(data – back_proji))/mean(var(data))], where i indexes the IC number.
For both sound and tap events, we used an epoch window from −300 to 500 ms relative to event onset, with a baseline from −100 to −50 ms. For the auditory ICs, we chose the IC with the highest pvaf of the N1-P2 complex (50-250 ms time-locked to the sound onset) in our Listen localizer trials. The N1-P2 complex was chosen because it is a typical auditory-evoked response widely found in auditory relevant tasks (Näätänen and Picton, 1987; Korzyukov et al., 1999). For the motor ICs, we selected the ICs with the highest pvaf of the movement-related potential (−50 to 100 ms time-locked to the tap onset) (Gerloff et al., 1998; Pollok et al., 2003; Shibasaki and Hallett, 2006). The right hand tapping has focused attention on the left hemisphere, so we expect to observe a left hemisphere lateralization showing on the topography of our motor ICs as observed in other studies (Krause et al., 2012; Ross et al., 2018).
It is important to note that, although consistent with localized cortical activity, the precision of localization is limited and an IC may include multiple highly temporally dependent processes, and thus each “motor” IC could potentially involve multiple processes, including temporal prediction, motor planning, movement execution and sensory feedback if they were precisely correlated in time and spatially overlapping. The same logic could be applied to auditory ICs as well given that previous research has found that at least six anatomic sources together contribute to auditory N1 ERPs (Näätänen and Picton, 1987). ICA accomplishes the dissection of auditory from motor activity, but distinguishing the multiple functions within the motor and auditory ICs is beyond the scope of this current study.
Frequency analysis
We conducted frequency transforms on the tapping data and the EEG signals in the main task to detect characteristic frequencies related to beat and meter, measuring peaks at the beat frequency (i.e., 2.4 Hz) and meter frequency (i.e., 0.8 and 1.2 Hz for ternary and binary meters). We used the frequency analysis of tapping performances in the main task to verify that participants paid attention during the task and could imagine and tap the beats with the correct metrical organization. Time-series tapping data (a continuous trace of tapping pressure as measured by a force sensor; see Fig. 4a, red) were averaged across trials, then converted to the frequency domain (FFT; 5 s data).
EEG time series were first averaged across epochs to enhance the signal-to-noise ratio by reducing trial-by-trial noise contributed by the task-irrelevant activities. We then transformed the average time-series data for each meter and trial condition (5 s) to the frequency domain using a discrete Fourier transform (Frigo and Johnson, 1998). The output was the amplitude frequency spectrum (μV) ranging from 0 to 256 Hz with a frequency resolution of 0.2 Hz. We focused on beat frequency (i.e., 2.4 Hz) in both binary and ternary conditions as well the meter frequency in binary meter (i.e., 1.2 Hz) and ternary (i.e., 0.8 Hz) meter to assess the presence of the appropriate meter-related peak for each experiment condition and meter. Standard baseline-subtraction peak-detection methods (e.g., Nozaradan et al., 2011) can verify that there are peaks at 1.2 and 0.8 Hz in binary and ternary conditions, but because of the closeness of the peaks in relation to the frequency resolution because of the short trial length (i.e., 5 s in our study vs 32 s in Nozaradan et al., 2011), such measures are misleadingly influenced by neighboring task-driven activity. Instead, to more directly highlight the contrast of interest that is at the core of this project, we subtract the amplitude at 1.2 Hz by the amplitude at 0.8 Hz in all four conditions: two meters and two ICs. This binary-ternary metric acts as our indicator for metrical processing, contrasting the relative strength of meter response that is expected from the condition with the meter response for the other condition. Such a measure will be larger if the neural response to the binary meter predominates and smaller if responding to the ternary meter.
Phase synchronization between sound and brain signals (nondirectional connectivity measurement)
The synchronization between sound and EEG signals was measured by calculating the phase-locking value (PLV) between the sound envelope and both auditory and motor ICs. All signals were filtered between 2 and 3 Hz (FIR filter calculated by MATLAB function fir1, order 1024). PLV was computed using the following formula (Lachaux et al., 1999):
Information flow analysis (directional connectivity measurement)
The information flow analysis was conducted by using EEGLAB plugin SIFT version 1.52. EEG data were downsampled to 128 Hz. We then averaged the signal across trials at each time point (ensemble mean equals 0 and the variance equals to 1) to ensure local stationarity. We also detrended the signal with a segment window of 1 s and step size of 0.1 s. The key parameters we used for fitting the multivariate autoregressive model were model order (p = 30), window length (1 s), and window steps (0.1s). The target frequency of interest is the beat frequency (i.e., 2-3 Hz). Segment window determines whether we can get the frequency of interest (i.e., decide how many cycles of the oscillation are included), while model order is influencing the frequency resolution by deciding how many spectral peaks can be modeled (Schlögl and Supp, 2006). As a rule of thumb, a model order of 30 can fit 15 distinct frequency peaks. The model order of 30 was chosen to provide sufficient degrees of freedom to model the frequencies of interest as well as additional frequencies present in the brain response. It is the simplest model that would allow us to observe the connectivity at the meter and beat frequencies using simulated data. Segment window of 1 s enables us to analyze the neural signals down to 1 Hz without violating the signal stationarity. We used the Vieira-Morf lattice algorithm to fit the multivariate autoregressive model. The selected parameters yield a parameter to data points ratio of ∼0.02, which indicates a sufficient amount of data to fit the model (Schlögl and Supp, 2006; Korzeniewska et al., 2008). The direct directed transfer function (dDTF08), the product of the full frequency direct transfer function and the partial coherence to remove spurious links in a multivariate design, was used as the index of effective connectivity (Korzeniewska et al., 2003, 2008). This connectivity measure is based on the information flow and does not imply a direct, unmediated, auditory-motor functional or anatomic connection.
Statistical analysis
Each dependent measure (frequency spectrum, PLV, and information flow) was analyzed in repeated-measures ANOVA with different foci.
For the frequency analysis, we focused on the difference between meter frequencies (1.2-0.8 Hz). The independent variables for the ANOVA test were the two meters (binary, ternary) and the four conditions (Baseline, Physical Meter, Imagined Meter, and Tap). We further analyzed whether there is a peak in the corresponding meter conditions by a one-sample t test. For the PLV, we focused on the beat frequency (2-3 Hz), keeping an eye toward the comparison among stimulus-auditory and motor-auditory coupling. An ANOVA test was performed on the same independent variables. For the information flow, we focused on the directed connectivity at the beat frequency (2-3 Hz), analyzing the effect of the two directionalities (auditory-to-motor flow, motor-to-auditory flow) and the four conditions (Baseline, Physical Meter, Imagined Meter, and Tap). In cases where there were sphericity violations, degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity. We conducted post hoc pairwise comparisons whose p values were adjusted using the Bonferroni correction method (marked as pb).
Hypothesis and predictions
We made predictions from the perspective of the ASAP and other motor theories, which propose that the motor system plays a necessary role in beat perception. Specifically, we hypothesized that both auditory and motor ICs will reflect imagined meters, and that there will be information flow not only from auditory to motor but motor to auditory during meter imagination. The frequency analysis is our first step to verify this hypothesis such that we contrast two conditions (Baseline vs Imagined Meter), which have identical stimulus input but varying in imagined meter. We predict neural responses in meter frequency in the Imagined Meter but not in the Baseline in both auditory and motor ICs. We also predict stronger neural responses at the beat frequency in the Imagined Meter than the Baseline in both ICs since the maintenance of meter frequency may result in larger amplitude at the beat frequency due either to harmonics of the meter oscillation, or a general upregulation of top-down control to actively maintain the rhythm across metrical levels.
An important question is whether the neural response in auditory ICs during the Imagined Meter is influenced by the motor system (as opposed to being generated entirely within the auditory system). The presence of meter responses in the motor system during imagining alone does not demonstrate this, so our nondirectional and directional connectivity analyses were designed to further investigate this question. Our nondirectional connectivity analysis contrasts phase locking of the auditory ICs with the stimulus on the one hand and with the motor ICs on the other, providing a measure of the relative correlation of auditory activity with the stimulus (bottom-up) and motor activity (top-down). We predict that the relative balance will depend on condition, with auditory ICs phase-locked more to external stimuli in the Physical Meter and phase-locked more with motor ICs in the IM. Finally, for the directional connectivity, based on ASAP's central hypothesis that motor to auditory influence is present and necessary during beat perception, we predict directional motor-to-auditory causal flow present in all beat-perception conditions (i.e., Baseline, Physical Meter, Imagined Meter conditions), and we further predict stronger motor-to-auditory flow than auditory-to-motor flow, especially in the Imagined Meter than other conditions, because of the need to stabilize the metrical percept in the absence of physical meter cues.
Results
Tapping data to verify imagery
In the last segment of each trial, participants were asked to tap out their imagined pattern of strong and weak beats as a verification of the correct tempo and meter of imagery. The frequency transform of their tapping data shows that most participants tapped with the 2.4 Hz beat rate, 1.2 binary meter rate, and 0.8 ternary meter rate, consistent with correct imagery (Fig. 3). We analyzed the peaks nearest to the meter rate (i.e., 0.8 and 1.2 Hz) in each meter condition across participants. In the binary condition, the mean of the peak frequency around 1.2 Hz was 1.200. In the ternary condition, the mean of the peak frequency around 0.8 Hz was 0.808. Tapping at the correct rate and meter is essential for our frequency analysis methods; and so, participants who tapped at the wrong tempo, suggesting improper imagery, were excluded if their peaks were beyond 1.5 interquartile ranges around the meter frequency mean. One participant was excluded from both meter conditions, while a second was excluded for the binary condition (Fig. 3, pink lines).
EMG and motion capture data to confirm no overt movement during meter imagining
We analyzed EMG and motion capture data to confirm that participants did not move rhythmically in any condition other than the tapping condition. This is important, as subtle, even unintentional movements at the meter rate could masquerade as a truly “imagined” meter response; to be certain that the imagined meter response relates to endogenous processes and not movement, it is critical to verify the absence of movement objectively, and not simply trust that participants were capable of following instructions to not move. Figure 4 summarizes the results of the movement analysis, showing EMG and finger position for the right, tapping, index finger. Time series of tapping finger EMG envelope and motion capture of finger movement (Fig. 4b) confirmed there was periodic, meter-related rhythmic muscle activity and finger movements only in the Tap condition (purple curves). Frequency analysis confirmed that beat and meter power was not present in any of the listening conditions. The only finger movement present in the Imagined Meter condition was a finger lift preparatory to the following tapping condition (visible as slowly ascending yellow lines in Fig. 4b and slightly higher yellow bar than the blue bar of the marker 6 right index finger in Fig. 4d); there was no rehearsal or shadowing of the meter. To rule out the presence of other covert body motions, such as head bobbing and foot tapping during in listening conditions, Figure 4c, d shows a comprehensive statistical analysis of rhythmic movement across all 10 body markers, again confirming that no significant power was evident at the beat (2.4 Hz) and meter rate (averaged across 0.8 and 1.2 Hz) by paired t tests between the Baseline and the Imagined Meter conditions (pb > 0.05). The Imagined Meter condition is slightly higher than the Baseline condition in marker 6 in Figure 4d, which is because of the preparatory finger lifting for the following Tap condition. There was neither meter nor beat components found in the first 4.5 s (i.e., before the finger lifting) of the Imagined Meter condition. In sum, we objectively demonstrated that no unwanted movements were observed in the task, confirming that our participants followed instructions and critically confirming the activities of motor ICs in meter perception and imagination cannot be explained by covert or overt body movements.
Identification of auditory and motor ICs from localizer trials
Auditory and motor responses were modeled as the best fitting ICs to Listen and Tap localizer trials. To extract the auditory ICs, 150 trials were extracted from the Listen localizer. After preprocessing, an average of 145.85 (SD = 7.78) trials per participant (of the original 150) were available for analysis of the auditory ERP. The average percent variance accounted for (pvaf) of auditory ICs across our final sample was 40.20% (SD = 16.38%). For the motor ICs, an average of 352.40 (SD = 122.89) taps available for analysis of the movement-evoked response in the Tap localizer task. The average pvaf of motor ICs across our final sample is 36.17% (SD = 18.77%). In 2 of 20 subjects, the same IC improbably contributed to the highest pvaf in sound and movement-related potentials, and we manually selected the auditory and motor ICs for them based on the best fit to scalp maps, ERPs, and spectra. Such a situation may have arisen because of individual differences in cortical geometry, which may prevent ICA from converging on functionally distinct components. The averaged scalp maps and the ERPs of auditory and motor ICs across all subjects are shown in Figure 5a, b. The two clusters showed distinct patterns: an N1-P2 complex for the auditory cluster (Näätänen and Picton, 1987; Korzyukov et al., 1999) and a movement-evoked potential for the motor cluster (Gerloff et al., 1998; Pollok et al., 2003; Shibasaki and Hallett, 2006. To further validate our identification of auditory and motor ICs, the unlabeled components were subjected to unsupervised k-means clustering based on their dipole locations and scalp maps. We did not include any time/frequency domain metric in this clustering process to avoid circularity in the later inferential statistics (Kriegeskorte et al., 2009). The results demonstrated that 37 of 40 ICs were classified the same as our pvaf method (accuracy = 92.5%).
Auditory and motor responses at the meter and beat frequencies
Our frequency analyses sought to investigate the neural signatures of imagined meters by examining the meter-frequency amplitude in auditory and motor responses and comparing this between the Baseline and the Imagined Meter conditions. This highlights the effect of meter imagining by comparing brain responses to identical, unaccented stimuli without and with metrical imagination. The frequency spectrums ranging from 0.5 to 3 Hz in both meter conditions, and both ICs are shown in Figure 6 with a bar chart visualizing the results of statistical analysis of the window of interest in Figure 7.
Responses at the meter frequencies
To investigate physical and imagined meter representations at 0.8 and 1.2 Hz (i.e., if there are peaks in corresponding meter frequency), we subtracted the spectral amplitude at 0.8 Hz from the amplitude at 1.2 Hz to provide a measure of metrical discrimination that will be positive for binary meter responses and negative for ternary. If there were no peaks at the meter frequency on the designated conditions, given that the noise level (e.g., including 1/f dynamics accounted for), the results could be close to zero. Overall Meter and interaction effects between Meter and Condition were tested with an ANOVA, where we predicted significant effects of Meter and interaction between Meter and Condition.
In auditory ICs, there was a significant main effect of Meter (F(1,19) = 24.540, p < 0.000, η2 = 0.564), with larger amplitudes in binary compared with ternary meter. There was also a significant interaction effect (F(3,57) = 5.395, p = 0.002, η2 = 0.221). Pairwise comparison showed a significantly larger amplitude in binary than ternary meter in the Physical Meter (t(19) = 3.386, pb = 0.003, d = 0.972), Imagined Meter (t(19) = 2.247, pb = 0.037, d = 0.720), and the Tap conditions (t(19) = 3.868, pb = 0.001, d = 1.174), but not in the Baseline condition (t(19) = 0.147, pb = 0.884, d = 0.048).
In motor ICs, there was a significant main effect of Meter (F(1,19) = 29.530, p = 0.000, η2 = 0.609), with more positive amplitudes in binary compared with ternary meter. There was also a significant interaction effect (F(1.797,34.147) = 5.456, p = 0.011, η2 = 0.223). Pairwise comparison showed a significantly larger amplitude in binary than ternary meter in the Physical Meter (t(19) = 3.262, pb = 0.004, d = 1.177), Imagined Meter (t(19) = 3.876, pb = 0.001, d = 1.210), and Tap conditions (t(19) = 4.866, pb = 0.000, d = 1.250), but not in the Baseline condition (t(19) = 0.037, p = 0.968, d = 0.012).
These results highlighted that both auditory and motor ICs reflect the imagined binary and ternary meters by contrasting identical stimuli without and with metrical imagination (i.e., the Baseline vs the Imagined Meter condition), suggesting the endogenous generation of meter timing in both auditory and motor regions.
Response at the beat frequency
For the analysis at the 2.4 Hz beat frequency, we focused on the main effect of Condition, keeping an eye toward the patterns of auditory and motor ICs according to the listening and the tapping tasks. We did not expect to see a main effect of Meter, nor the interaction between Condition and Meter. The amplitude at the beat frequency (2.4 Hz) was significantly >0 in all conditions based on one-sample t tests (pb < 0.0001).
In auditory ICs, there was a significant main effect of Condition (F(2.258,42.901) = 6.589, p = 0.002, η2 = 0.257), with marginally larger amplitude in the Physical Meter compared with the Baseline (t(19) = 2.818, pb = 0.066, d = 0.692) and Tap conditions (t(19) = 3.249, pb = 0.025, d = 0.897), and a larger amplitude in the Imagined Meter compared with the Baseline condition (t(19) = 3.257, pb = 0.025, d = 0.587). Although not expected, there was a significant main effect of Meter, (F(1,19) = 5.221, p = 0.034, η2 = 0.216), with slightly higher amplitude in binary compared with ternary meter. There was no significant interaction effect. In motor ICs, we also observed a significant main effect of Condition (F(1.866,35.449) = 5.757, p = 0.008, η2 = 0.233), with the higher amplitudes in the Tap compared with the Baseline condition (t(19) = 3.492, pb = 0.015, d = 0.989) and marginally higher amplitude in the Physical Meter compared with the Baseline condition (t(19) = 2.864, pb = 0.060, d = 0.541).
These results showed that our selected auditory and motor ICs respond the most to the corresponding task types (i.e., auditory IC peaked at Physical Meter; motor IC peaked at Tap). More importantly, although the explicit sound stimuli are identical in the Baseline and Imagined Meter conditions, only the Imagined Meter condition showed meter responses. Furthermore, the beat responses in the Imagined Meter condition were also distinct from the Baseline condition, and more similar to the ones in the Physical Meter condition (see beat responses at 2.4 Hz in Fig. 6). These findings demonstrate an effect of intrinsic imagination on neural responses, creating endogenous organization at the meter rate as well as modulating stimulus response at the beat rate.
Nondirectional connectivity analysis among sound, auditory, and motor responses
To more closely test whether the brain signals follow the external stimuli across time, we calculated PLV between the sound envelope and the selected motor and auditory ICs across time at 2-3 Hz (Fig. 8).
Sound envelope and brain signal coupling
Consistent with the frequency spectrum of beat frequency, the coupling between sound stimuli and auditory ICs had a significant main effect of Condition (F(3,57) = 3.141, p = 0.032, η2 = 0.142), with the highest PLV in the Physical Meter compared with the Baseline condition (t(19) = 3.126, pb = 0.033, d = 0.806). There was no main effect of Meter nor the interaction effect. To be complete, we also analyzed the coupling between sound stimuli and motor ICs. There was a significant main effect of Condition (F(3,57) = 11.980, p = 0.000, η2 = 0.387). Across the conditions, PLV of the Tap condition was significantly higher compared with the Baseline (t(19) = 4.152, pb = 0.003, d = 1.363), Physical Meter (t(19) = 4.033, pb = 0.004, d = 1.336), and the Imagined Meter conditions (t(19) = 3.166, pb = 0.031, d = 1.080). There was no interaction effect. Consistent with the responses at the beat rate (see Response at the beat frequency), the PLV between the sound envelope and the selected motor and auditory ICs at 2-3 Hz also showed that auditory and motor ICs respond the most to the corresponding listening task (i.e., the Physical Meter condition) and the tapping task (i.e., the Tap condition).
Auditory-motor IC coupling
The nondirectional coupling between auditory and motor ICs had a significant main effect of Condition (F(3,57) = 5.363, p = 0.003, η2 = 0.220), with the highest PLV in the Tap compared with the Baseline condition (t(19) = 3.532, pb = 0.013, d = 1.179). There was no main effect of Meter nor an interaction effect. This finding indicated strong nondirectional communication between motor and auditory ICs during the Tap condition, which requires both auditory inputs and motor outputs.
Information flow between auditory IC and motor IC
We analyzed the direction of information flow (auditory to motor vs motor to auditory) across conditions at the beat frequency (i.e., δ band 2-3 Hz). We averaged the data across two meters to increase the sample size for testing the Direction by Condition effects since the comparison between binary and ternary meters is not the focus of this study (Figs. 9, 10). For thoroughness, we analyzed the causal flow in the alpha band (8-12 Hz), which also shows visible connectivity activities across time.
At the beat rate, auditory-to-motor and motor-to-auditory flows are significantly >0 in all conditions and directions (pb < 0.000). There was a significant main effect of Condition (F(1.271,24.142) = 5.424, p = 0.022, η2 = 0.222) with a significantly stronger information flow in the Physical Meter compared with the Baseline condition (t(19) = 3.482, pb = 0.015, d = 0.498) and a marginally stronger information flow in the Tap compared with the Baseline condition (t(19) = 2.732, pb = 0.080, d = 0.768). There was no main effect of Direction nor an interaction effect. Interestingly, pairwise comparison showed a marginally significant result such that a stronger information flow from motor to auditory ICs than auditory to motor ICs was seen only in the Imagined Meter condition (t(19) = 1.957, pb = 0.065, d = 0.198), while not in other three conditions. In the alpha band, there were no significant main effects nor the interaction effect. No pairwise differences were observed across the experimental conditions.
In sum, the directional connectivity analysis showed that not only auditory-to-motor but also motor-to-auditory information flow is present at the beat rate in all conditions, demonstrating a tight reciprocal informational coupling during rhythm listening, imagination, and production. Bidirectional connectivity was equivalent in the Imagined Meter and Physical Meter conditions, suggesting that similar motor-to-auditory communication is present during both meter imagination/maintenance and meter perception. Importantly, this is consistent with previous studies hypothesizing a top-down modulation of auditory regions by motor regions during beat processing. During meter imagery, top-down information flow from motor to auditory ICs was marginally stronger than bottom-up auditory to motor, which could suggest that top-down flow may be slightly relatively more important during imagery. Interestingly, the strongest connectivity was found in the Tap condition, consistent with our PLV results, suggesting that our participants may rely on strong auditory-motor interactions to actively maintain the metrical pattern during tapping without a driving stimulus.
Discussion
Principal findings of the current study
The goal of this study was to understand how humans process hierarchically structured temporal information in music rhythm, testing the hypothesis that it involves the motor system. Using a meter imagination paradigm and a novel method to separate auditory and motor source activities with high-density EEG and ICA, we found evidence for motor involvement as hypothesized: First, endogenous neural responses to imagined meters were present in both auditory and motor sources in the absence of corresponding acoustic cues. Second, motor-to-auditory information flow was found at the beat rate in all listening conditions without overt movements. These findings confirm and extend predictions of motor theories of rhythm perception, suggesting that the motor system actively maintains hierarchical information and exerts a top-down influence on auditory processing and metrical imagery of rhythms.
Neural substrates of meter imagination
Auditory and motor source activities and their functional connectivity were contrasted between two conditions with identical stimulus but varying in the presence of an imagined meter: In the Baseline, the stimuli were presented before cueing a specific meter; in the Imagined Meter, the participants were instructed to imagine a previously cued binary or ternary meter. Our indicator for metrical processing was spectral activity at the meter rate, which is not physically present in the input signal. No meter response was observed in the nonmetrical Baseline in either auditory or motor ICs. In contrast, the meter response was clearly present during meter imagining in the Imagined Meter, demonstrating the endogenous generation of meter timing in both auditory and motor ICs. The spectral power findings extend prior studies using similar methods (e.g., Nozaradan et al., 2011) by separating auditory and motor activity using a unimodal localizer design and ICA, a critical precondition to investigate the auditory-motor interactions underlying meter imagining.
The presence of meter response during meter imagining in both ICs is suggestive that meter imagining involves the motor system, but a critical question is whether the auditory response to the imagined meter is influenced by the motor system. Directional connectivity analysis showed not only highly significant auditory-to-motor connectivity, but equally significant motor-to-auditory flow in all conditions, at the beat rate, demonstrating the presence of bidirectional auditory-motor causal connectivity during listening, imagination, and production of rhythms. The main effect of experiment condition and post hoc pairwise tests revealed that overall connectivity (combining across both auditory-motor and motor-auditory directions) was highest in the Physical Meter and was significantly greater than in the Baseline. Overall connectivity in the Imagined Meter was not significantly different from that observed in the Physical Meter, whereas it was also not significantly different from the Baseline. Although there was no main effect of connectivity direction nor an interaction effect, we used pairwise comparison to test our proposed motor hypothesis corollary that motor-to-auditory flow would be greater during imagined meters than physically realized meters and found marginal significance (p = 0.065) only in the Imagined Meter.
Our findings suggest that motor-auditory interaction is essential in hierarchical meter imagery, which may apply to meaningful nested structure of imagined speech as well (Tian and Poeppel, 2012; Proix et al., 2021). We analyzed connectivity at the beat rate, assuming that the beat rate is a reasonable proxy for the importance of motor to auditory flow, since it turned out that we did not have sufficient data to robustly fit a multivariate autoregressive model capable of capturing causal flow at the meter frequencies. Future studies will need to collect more data to have the power to describe the meter rate connectivity, which would have been more conclusive regarding meter imagery. Future studies will also examine and extend our findings to other brain areas involved in meter processing (Cannon and Patel, 2021). The observed causal flow in our study was greatest around the beat rate (2-3 Hz), within the δ band, which is proposed to be a top-down temporal constraint of cognitive processes of rhythmic patterns (Morillon et al., 2019). We did not, however, observe β oscillations and connectivity in the β range as in previous studies (Iversen et al., 2009; Fujioka et al., 2012, 2015; Morillon and Baillet, 2017), leaving open the question of beta band modulation during meter imagery.
Motor-auditory interaction to physically accented explicit meters
While the Imagined Meter–Baseline contrast in frequency analysis highlights the internal imaginative maintenance of an established, implicit meter, the Physical Meter–Baseline contrast examines response to explicit meter. In addition to an expected meter response in the auditory IC, the motor ICs also showed a significant meter response, although there were no overt movements required in the Physical Meter (their absence was confirmed via analysis of EMG and motion capture). There was no difference between the Physical Meter and the Imagined Meter in either beat or meter rate, consistent with Iversen et al. (2009) and Vlek et al. (2011), but in a more generalized population that is not restricted to musicians, showing that imagining the meters increases the meter responses in a similar way as if the meter is physically presented. Directional connectivity was significantly stronger in both directions in the Physical Meter than the Baseline, consistent with prior findings that stronger metricality elicits stronger (nondirectional) auditory-motor functional connectivity (Zatorre et al., 2007; Chen et al., 2008; Kung et al., 2013).
One ancillary finding is that neural responses to the binary meter are stronger than the ternary meter in all metrical conditions (i.e., Physical Meter, Imagined Meter, and Tap), matching previous findings (Pablos Martin et al., 2007; Celma-Miralles et al., 2021). This may be because of the advantage of binary meter in perception and sensorimotor synchronization (Celma-Miralles et al., 2021; for behavioral evidence, see also Creel, 2012, 2020) or because of a lower signal-to-noise ratio in ternary condition, which had fewer accented positions than binary condition. This can be addressed in future studies by using longer trials.
On the role of the motor system during meter perception and imagination: revisiting motor hypotheses and other alternative hypotheses
Our results are in line with the motor system role during beat perception posited by the ASAP and other motor hypotheses (Schubotz, 2007; Patel and Iversen, 2014; Morillon et al., 2015; Cannon and Patel, 2021). They extend it in three ways: First, current motor hypotheses do not make explicit predictions about how hierarchically structured meters (as opposed to single-level beats) might be represented in the interaction between auditory and motor regions. We found stronger bidirectional auditory-motor connectivity in the Physical Meter than the Baseline, consistent with prior findings of stronger nondirectional connectivity in fMRI for more strongly metrical rhythms (Chen et al., 2006). Second, we found marginally stronger motor-to-auditory in the Imagined Meter compared with the Physical Meter, consistent with previously found stronger activations from putamen during beat maintenance, which requires actively regenerating and predicting the beats, compared with beat finding (Grahn and Rowe, 2013). Finally, covert movement is a potential confound for this type of study, and has generally only been dealt with by instructing experimental participants not to move, which is far from definitive. Notably, this study explicitly and objectively verified the absence of motor activity during meter imagining by thorough analyses of finger force-sensor, EMG, and 10 whole-body motion capture markers. Thus, we can be confident that the endogenous activity at the meter rate in the Imagined Meter was not simply because of covert movement. Ideally, such verification would become standard.
While a unified auditory-motor account of metrical imagery is supported by the evidence, in principle other modality-specific mechanisms could be involved in metrical imagery, for example: (1) auditory echoic memory of accented sounds from the Physical Meter and (2) motor preparation. Auditory echoic memory predicts a gradually decreasing meter-frequency amplitude in auditory ICs during the 5 s of the Imagined Meter since memory of acoustic features has been found to decline across a few seconds (Cowan et al., 1997; Snyder and Weintraub, 2013; Cheng and Creel, 2020). Such a decline was not observed (see Fig. 7). A purely auditory mechanism would also not predict the observed meter activity in motor ICs, nor motor-to-auditory connectivity. Motor preparation during rhythm listening is known to affect motor activity (Chen et al., 2008) and would predict, as observed, meter response in the motor ICs. However, it is not clear how the observed meter activity in the auditory ICs would be explained. Our findings thus appear most parsimoniously explained by an auditory-motor interaction view of meter imagery.
In conclusion, our study demonstrated that the endogenous oscillations in both auditory and motor functional sources reflected the imagination of binary and ternary meters in the absence of corresponding acoustic cues or overt movements. We found clear evidence for motor-to-auditory information flow at the beat rate in all conditions, suggesting a top-down influence of the motor system on auditory processing of beat-based rhythms and reflecting an auditory-motor system with tight reciprocal informational coupling. These findings align with and further extend the ASAP hypothesis and other motor theories from beat perception to meter imagination, adding supporting evidence for active sensing in auditory processing.
Footnotes
This work was supported by National Science Foundation BCS1460885.
The authors declare no competing financial interests.
- Correspondence should be addressed to Tzu-Han Zoe Cheng at tzcheng{at}ucsd.edu