Neural development of networks for audiovisual speech comprehension
Introduction
In naturalistic situations, such as conversation, the incoming auditory speech stream is accompanied by information from the face, particularly from the lips, mouth, and eyes of the speaker. This visual information has been shown to enhance speech comprehension in both children and adults (Binnie et al., 1974, Dodd, 1979, MacLeod and Summerfield, 1987, Massaro, 1984, Massaro et al., 1986, Ross et al., 2007, Sumby and Pollack, 1954, Summerfield, 1979). Although sensitivity to visual speech information appears early in development (Burnham and Dodd, 2004, Kuhl and Meltzoff, 1982, Patterson and Werker, 2003, Rosenblum et al., 1997, Teinonen et al., 2008, Weikum et al., 2007), there is evidence that it continues to develop throughout childhood (Desjardins and Werker, 2004, Hockley and Polka, 1994, Massaro et al., 1986, McGurk and MacDonald, 1976, Sekiyama and Burnham, 2008, van Linden and Vroomen, 2008). With respect to neurobiology, recent research suggests that audiovisual speech comprehension in adults is mediated by a neural network that incorporates primary sensory regions as well as posterior inferior frontal gyrus and ventral premotor cortex (IFGOp/PMv), supramarginal gyrus (SMG), posterior superior temporal gyrus (STGp), planum temporale (PTe), and the posterior superior temporal sulcus (STSp), and involves strong effective connectivity among these regions (Bernstein et al., 2008, Callan et al., 2004, Callan et al., 2003, Calvert and Campbell, 2003, Calvert et al., 2000, Jones and Callan, 2003, Miller and D’Esposito, 2005, Ojanen et al., 2005, Pekkola et al., 2006, Sekiyama et al., 2003, Skipper et al., 2007, Skipper et al., 2005, Wright et al., 2003). The extent to which the mechanism for audiovisual speech comprehension in the child compares to the adult case is unknown; in particular, it is unclear whether the neurobiological substrate in the developing brain incorporates additional regions or different patterns of effective connectivity. In this paper, we examine the potential developmental mechanisms that result in a mature system implementing audiovisual speech comprehension, and how this system changes developmentally in the interactions among brain regions involved in the production and perception of speech.
Empirical evidence suggests that speech perception in children is less influenced by visual speech information than in adults. For example, studies assessing the development of audiovisual speech perception using incongruent “McGurk” stimuli report an increase in the influence of visual speech information with age, with development perhaps continuing even as late as the 11th year (Hockley and Polka, 1994, Massaro, 1984, Massaro et al., 1986, McGurk and MacDonald, 1976, Sekiyama and Burnham, 2008, Wightman et al., 2006). Several factors likely contribute to the neural development of audiovisual speech, including both general factors (e.g., development of selective attention, increasing myelination, and synaptic pruning) and more specific factors (e.g., learning of oral motor patterns for speech). Here we focus on the child’s increasing personal experience perceiving and producing speech, in an effort to gain insight into how children integrate audiovisual information during everyday verbal communication.
There is evidence to suggest that audiovisual speech integration is a skill that is acquired by experience listening to and observing speech over an extended period of time. In adults, for example, the amount of experience with a second language affects audiovisual processing in that language: Native French speakers with only beginning and intermediate skills in English are less sensitive to visual cues indicating a particular English consonant than either monolingual English speakers or more advanced French/English bilinguals (Werker, Frost, & McGurk, 1992). The role of experience in audiovisual integration gains further support from the presence of cross-linguistic differences. In one such example, Japanese adults are less influenced by visual information than English adults (Sekiyama and Burnham, 2008, Sekiyama and Tohkura, 1993), a difference that begins to emerge between 6 and 8 years (Sekiyama & Burnham, 2008). These data suggest that developmental differences in audiovisual speech integration are moderated by everyday perceptual experience with language.
Experience with speech production also contributes to the development of audiovisual speech comprehension. For example, children with articulatory difficulties perceive incongruent audiovisual syllables more often by their auditory component than do children without articulatory difficulties, who more often hear the fused percept or perceive the visual component alone (e.g., when presented with an auditory /pa/ and a visual /ka/, children tend to report hearing /pa/ instead of a fused percept /ta/, or visual /ka/; Desjardins, Rogers, & Werker, 1997, but see Dodd, McIntosh, Erdener, & Burnham, 2008). In addition, both groups of children are less likely to integrate the auditory and visual information into a fused percept, or to perceive the visual component alone, than adults. Further, children with cochlear implants who produce more intelligible speech demonstrate an improved ability to use visual speech information (Bergeson et al., 2005, Lachs et al., 2001). Taken together, these findings suggest a relationship between speech production skill and audiovisual speech perception.
The development of audiovisual speech comprehension thus appears to involve mechanisms that relate visual speech information to articulatory speech representations, both of which are acquired through experience with one’s native language (cf. Desjardins et al., 1997, Kuhl and Meltzoff, 1982, Kuhl and Meltzoff, 1984, Sekiyama and Burnham, 2008). Specifically, we propose that as children experience the auditory, somatosensory, and motor consequences of produced speech sounds in their own speech and in others’ speech, they develop a mapping system between sensory and motor output. This mapping allows for these components of the audiovisual speech signal to have a “predictive value” for each of the other components. For example, several authors have suggested that motor-speech representations constrain the interpretation of the incoming auditory signal (Callan et al., 2004, Skipper et al., 2005, Skipper et al., 2006, Skipper et al., 2007, van Wassenhove et al., 2005, Wilson and Iacoboni, 2006). In one model, visible articulatory movements of the speaker’s lips and mouth invoke articulatory representations of the listener that could have generated the observed speech movements (Skipper et al., 2007, van Wassenhove et al., 2005). These representations, based in prior articulatory experience, provide a set of possible phonetic targets to constrain the final interpretation of the speech sound (i.e., the visual information provides a “forward model” of the speech sound). Such motor-speech models draw on the listener’s articulatory repertoire, and we argue that, because adults have more experience producing and perceiving speech than children, they have more precise predictors of the target speech sound.
As mentioned at the outset, the neural substrate of audiovisual speech perception consists of a widespread network of interconnected cortical regions. In general, brain networks develop though increasing integration among the component regions that define the network (Church et al., 2009, Fair et al., 2007, Fair et al., 2007, Karunanayaka et al., 2007). The primary objective of the present study was to characterize this developmental change for audiovisual speech comprehension. To do so we used structural equation modeling (SEM) to assess differences between adults and children in effective connectivity among left hemisphere brain regions important for language production and perception. Physiological studies have suggested that interactions among left IFGOp/PMv, SMG, STGp, PTe, and STSp support audiovisual speech perception (see Campbell, 2008 for review). In particular, the development of audiovisual speech might depend on interactions between inferior frontal/ventral premotor regions, and posterior temporal/inferior parietal regions. This pathway has been postulated to help relate motor (articulatory) and sensory (auditory and somatosensory) information about the identity of the speech target (Callan et al., 2004, Skipper et al., 2005, Skipper et al., 2006, Skipper et al., 2007, van Wassenhove et al., 2005, Wilson and Iacoboni, 2006). Because adults have more experience both perceiving and producing speech, their sensory and motor repertoires are richer, and will have greater predictive value. Thus, we predict significant age differences in effective connectivity for audiovisual speech between inferior frontal/ventral premotor regions, and posterior temporal and inferior parietal regions.
In the current study, we used functional magnetic resonance imaging (fMRI) and SEM to study 24 adults and nine children during auditory-alone and audiovisual story comprehension. We compared effective connectivity between children and adults across the network, with particular attention to connectivity between temporal/parietal and frontal regions.
Section snippets
Participants
Twenty-four adults (12 females, M age = 23.0 years, SD = 5.6 years) and nine children (7 females, range = 8–11 years, M age = 9.5 years, SD = 0.9 years) participated. Eight years was the youngest age in the available cohort, and in previous studies, development of audiovisual speech perception has been shown to occur in this age range, with few age differences beyond age 11 (Hockley and Polka, 1994, Massaro et al., 1986, McGurk and MacDonald, 1976, Sekiyama and Burnham, 2008, Wightman et al., 2006). Only one
Signal-to-noise ratio
Simulations indicated that in the current design, the minimum SNR needed to detect a signal change of 0.5% was 54, and that needed to detect a signal change of 1% was 27 (see Supplementary materials). We analyzed the mean SNR across participants from 58 cortical and subcortical anatomical ROIs. In the regions examined, mean SNR ranged from a low of 13.8 (SD = 7.1) in the right temporal pole (a region of high susceptibility artifact) to a high of 134.3 (SD = 25.6) in the right superior precentral
Discussion
We investigated age-related differences in the neurobiological substrates of audiovisual speech using a network modeling approach. We had expected differences in sensitivity to visual speech information manifested by differences in network connectivity among brain regions important for audiovisual speech comprehension. Consistent with prior network-level investigations of the development of story comprehension (Karunanayaka et al., 2007), clear differences were found in the functional
Summary
In summary, we used a network modeling approach to examine how the development of audiovisual speech comprehension is reflected by changes in the interactions among brain regions involved in both speech perception and speech production. The analyses we report demonstrated that in children and adults, audiovisual speech comprehension activated a similar fronto-temporo-parietal network of brain regions. Age-related differences in audiovisual speech comprehension were primarily reflected by
Acknowledgments
We thank Michael Andric, E. Elinor Chen, Susan Goldin-Meadow, Uri Hasson, Peter Huttenlocher, Susan Levine, Nameeta Lobo, Robert Lyons, Arika Okrent, Shannon Pruden, Anjali Raja, Jeremy Skipper, and Pascale Tremblay. This research was supported by funding from the National Institutes of Health (F32DC008909 to A.S.D, R01NS54942 to A.S., and P01HD040605 to S.L.S).
References (135)
Structural equation modelling: Adjudging model fit
Personality and Individual Differences
(2007)- et al.
Quantified acoustic-optical speech signal incongruity identifies cortical sites of audiovisual speech processing
Brain Research
(2008) - et al.
An fMRI investigation of syllable sequence production
NeuroImage
(2006) - et al.
The feasibility of a common stereotactic space for children and adults in fMRI studies of development
NeuroImage
(2002) - et al.
Phonetic perceptual identification by native- and second-language speakers differentially activates brain regions involved with acoustic phonetic processing and those involved with articulatory–auditory/orosensory internal models
NeuroImage
(2004) - et al.
Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex
Current Biology
(2000) - et al.
Cortical substrates for the perception of face actions: An fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning)
Cognitive Brain Research
(2001) - et al.
Assembling and encoding word representations: fMRI subsequent memory effects implicate a role for phonological control
Neuropsychologia
(2003) Parametric analysis of fMRI data using linear systems methods
NeuroImage
(1997)- et al.
Cortical surface-based analysis: I. Segmentation and surface reconstruction
NeuroImage
(1999)
When less means more: Deactivations during encoding that predict subsequent memory
NeuroImage
An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest
NeuroImage
An exploration of why preschoolers perform differently than do adults in audiovisual speech perception tasks
Journal of Experimental Child Psychology
Cortical surface-based analysis: II: Inflation, flattening, and a surface-based coordinate system
NeuroImage
Thresholding of statistical maps in functional neuroimaging using the false discovery rate
NeuroImage
Cortical interactions underlying the production of speech sounds
Journal of Communication Disorders
Neural modeling and imaging of the cortical interactions underlying syllable production
Brain and Language
Abstract coding of audiovisual speech: Beyond sensory representation
Neuron
Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing
NeuroImage
Validating cluster size inference: Random field and permutation methods
NeuroImage
Comparison of functional activation foci in children and adults using a common stereotactic space
NeuroImage
Age-related connectivity changes in fMRI data from children listening to stories
NeuroImage
The intermodal representation of speech in infants
Infant Behavior and Development
Developmental changes in visual and auditory contributions to speech perception
Journal of Experimental Child Psychology
Structural modeling of functional visual pathways mapped with 2-deoxyglucose: Effects of patterned light and foot shock
Brain Research
The essential role of premotor cortex in speech perception
Current Biology
Articulatory–acoustic relationships during vocal tract growth for French vowels: Analysis of real data and simulations with an articulatory model
Journal of Phonetics
Processing audiovisual speech in Broca’s area
NeuroImage
The assessment and analysis of handedness: The Edinburgh inventory
Neuropsychologia
A comparison of bound and unbound audio-visual information processing in the human cerebral cortex
Cognitive Brain Research
Mapping brain maturation and cognitive development during adolescence
Trends in Cognitive Sciences
A mediating role of the premotor cortex in phoneme segmentation
Brain & Language
Early experience alters brain function and structure
Pediatrics
AMOS: Analysis of moment structures
The American Statistician
Simplified intersubject averaging on the cortical surface using SUMA
Human Brain Mapping
Controlling the false discovery rate: A practical and powerful approach to multiple testing
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
Development of audiovisual comprehension skills in prelingually deaf children with cochlear implants
Ear and Hearing
Visual activation and audiovisual interactions in the auditory cortex during speech perception: Intracranial recordings in humans
The Journal of Neuroscience
Distributed synaptic modification in neural networks induced by patterned stimulation
Nature
Auditory and visual contributions to the perception of consonants
Journal of Speech and Hearing Research
Use of visual information in speech perception: Evidence for a visual rate effect both with and without a McGurk effect
Perception and Psychophysics
Auditory–visual speech integration by prelinguistic infants: Perception of an emergent consonant in the McGurk effect
Developmental Psychobiology
Practical aspects of conducting large-scale functional magnetic resonance imaging studies in children
Journal of Child Neurology
Neural processes involved with perception of non-native durational contrasts
NeuroReport
Neural processes underlying perceptual enhancement by visual speech gestures
NeuroReport
An auditory-feedback based neural network model of speech production that is robust to developmental changes in the size and shape of the articulatory system
Journal of Speech, Language, and Hearing Research
Reading speech from still and moving faces: The neural substrates of visible speech
Journal of Cognitive Neuroscience
The processing of audio-visual speech: Empirical and neural bases
Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences
Control networks in paediatric Tourette syndrome show immature and anomalous patterns of functional connectivity
Brain
Locally weighted regression: An approach to regression analysis by local fitting
Journal of the American Statistical Association
Cited by (72)
Fiber tracing and microstructural characterization among audiovisual integration brain regions in neonates compared with young adults
2022, NeuroImageCitation Excerpt :Investigating AVI is therefore critical, as deficits in this ability may not only impede the normal development of the above mentioned cognitive abilities, but also contribute to specific disorders, such as dyslexia (Yang et al., 2020; Ye et al., 2017) and schizophrenia (de Jong et al., 2009; Pearl et al., 2009). The neural foundation of AVI has been extensively studied in adults and includes portions of the auditory, visual, frontal, parietal and temporal cortices (Dick et al., 2010; Hickok et al., 2018; Zhou et al., 2020). The recruitment of these brain areas varies as a function of stimulus properties.
Modality-independent recruitment of inferior frontal cortex during speech processing in human infants
2018, Developmental Cognitive Neuroscience