Early multisensory interactions affect the competition among multiple visual objects
Research Highlights
► Multisensory stimuli are more noticeable among a background of visual stimuli. ► Spatially uninformative sounds drive a visuo-spatial shifts of attention. ► Multisensory integration processes result in attentional capture. ► Attentional capture of multisensory stimuli is largely automatic. ► The early multisensory interaction was correlated with behavioral search benefits.
Introduction
Many studies have reported that information from different sensory modalities interacts (Alais and Burr, 2004, Calvert et al., 2000, Giard and Peronnét, 1999, Hershenson, 1962, Macaluso et al., 2000, McGurk and MacDonald, 1976, Molholm et al., 2002, Schroeder and Foxe, 2005, Shipley, 1964). For instance, a single visual event is perceived as being brighter when accompanied by an auditory signal than when presented in isolation (Stein et al., 1996). These and other results have provided evidence for the notion that multisensory integration enhances signal clarity and/or reduces stimulus ambiguity (see e.g. Chen and Yeh, 2009, Olivers and Van der Burg, 2008, Vroomen and De Gelder, 2000). One drawback of the majority of studies to date, however, is that they examine interactions among single events at a time (i.e. a single visual event in combination with a single auditory event), thus leaving out the question how multisensory interactions can aid in resolving the competition between multiple stimuli (Spence, 2007).
One exception is provided by a recent study of ours (Van der Burg et al., 2008b) in which participants searched for a horizontal or vertical line segment presented among many distractor lines of various orientations. Throughout each experimental trial random subsets of items changed color. Search through these displays is difficult but improves dramatically when a spatially uninformative tone is concurrently presented with the color change of the target line. This “pip and pop” effect, as we have dubbed it, indicates that a synchronized auditory event can affect the competition among multiple visual items. Follow-up studies demonstrated that the pip and pop effect is not due to increases in alertness or top-down temporal cueing (Van der Burg et al., 2008a, Van der Burg et al., 2008b). We have proposed that the auditory signal enhances the neural response of the synchronized visual event at an early sensory level of processing (Stein et al., 1996), though in our prior study we could not provide direct evidence for this hypothesis.
Here we report an electrophysiological study in humans that supports the idea that early multisensory integration underlies the pip and pop effect, bolstering the claim that rapid and automatic audiovisual integration leads to the subjective experience of the synchronized visual event popping out of the background. Participants were asked to search for a horizontal or vertical target line among irrelevant diagonal lines. At pseudo-random intervals subsets of lines changed orientation. This set-up allowed us to independently control the moment of display onset and the moment of target appearance within that display. The target was placed at a lateral location in the lower visual field so as to take advantage of the known lateralization of brain potentials related to the deployment of attention. The target was absent at the beginning of the trial, and at a specific moment one distractor line changed into the target line (i.e. a horizontal or vertical line segment). Participants were required to identify the target orientation by pressing one of two keys. Our behavioral measure was identification accuracy. Fig. 1a presents an example search display (see on-line Mov. 1 for a video clip of a trial).
The current experiment was designed in such a way that the relation between multisensory integration and attention could be studied from two different perspectives. First, ERPs elicited by auditory (A) and visual (v) signals presented in isolation could be summated and compared to those ERPs elicited by audiovisual (AV) stimuli, in which the sound coincided with the visual target (i.e. investigating the difference between AV and [A + V] ERPs), without specifically considering the lateral position of the visual stimuli. Earlier ERP work has shown that this additive model can reveal early latency (~ 40 ms) multisensory processes (Giard and Peronnét, 1999, Mishra et al., 2007, Molholm et al., 2002, Talsma et al., 2007), reflecting changes early in the visual processing sequence (Eckert et al., 2008, Martuzzi et al., 2007, Zangenehpour and Zatorre, 2010). The present study is thus designed to determine whether such early modulations underlie the pip and pop effect.
To assess the possibility that a sound automatically interacts with a coinciding visual stimulus (regardless of its relevance (Van der Burg et al., 2008a, Van der Burg et al., 2008b) we not only included a condition in which the visual part of the audiovisual stimulus was the target (i.e. an AVtarget), but also a condition in which the visual part of the audiovisual stimulus was an irrelevant distractor (AVdistractor). Here the tone was synchronized (and ERPs time-locked) to a visual distractor stimulus. If the pip and pop effect occurs in an automatic fashion, then early modulations are expected here as well as in the AVtarget condition. Note that the different conditions were presented in blocks to ensure that participants ignored the auditory event in the AVdistractor condition. This way, interactions could consequently be attributed to automatic integration of multisensory information. Fig. 1b illustrates the temporal dynamics of the trial types employed in our study.
Second, in addition to analyzing early multisensory modulations, the presentation of visual events at lateral locations in the visual field allowed for the analysis of lateralized ERPs associated with sensory enhancement, the deployment of attention, and activation in of visual short term memory (VSTM). To index changes in these cognitive processes we looked at a series of well-documented ERP components: the lateral P1 (90–120 ms post-stimulus), the N2pc (175–300 ms post-stimulus), the contralateral negative slow wave (CNSW; 300+ ms post-stimulus).
Multimodal research with the lateralized P1 has associated increases in the amplitude of this component to modulation of sensory strength and priority (McDonald et al., 2005, Störmer et al., 2009). In McDonald et al. (2005), for example, the perceived order of lateralized visual events was affected by the location of a preceding auditory signal. Visual events that were presented at the same location as auditory events were perceived as occurring earlier in time, and this behavioral effect was associated with increase in the amplitude of the lateral P1. These results lead us to expect a corresponding increase in lateral P1 in the present study if the auditory signal in fact modulates the strength of a synchronized visual event (though it is important to note that, in contrast to McDonald et al. (2005), in the present study the sound itself was not lateralized).
The N2pc was originally linked to visuo-spatial selective attention in a series of visual search studies (Luck and Hillyard, 1994, Luck et al., 2000). The component is evident as an increase in negative ERP amplitude at posterior electrodes located over cortical brain areas contralateral to an attended visual object. In the present study, we expect to observe an N2pc if the synchronized visual event captures attention. Moreover, if the early audiovisual integration and subsequent change in visual processing occurs automatically, similar P1 and N2pc modulations should be observed in the AVdistractor and AVtarget conditions.
The CNSW has been associated with visual short term memory (VSTM; Klaver et al., 1999). This lateralized component is present during memorization and reflected as a posterior slow negative ERP wave developed over the hemisphere contralateral to the memorized item. In our experimental task, VSTM was presumably active in AVtarget trials, as participants were required to eventually respond based on target characteristics, but we did not expect the recruitment of VSTM resources in the AVdistractor condition. With this in mind we expected to find a larger CNSW in the AVtarget condition.
Finally, we expect to observe a greater P3 component in the AVtarget condition compared to the other conditions (Sutton et al., 1965). We distinguish the P3 from co-occurring CNSW in that the P3 is a central component, while the CNSW has a lateral topography. The P3 has been associated with awareness of stimuli, and/or updating of working memory (Nieuwenhuis et al., 2005, Sutton et al., 1965).
To foreshadow, our results show that spatially uninformative auditory signals start to affect the competition among multiple visual objects at an early sensory stage (50–60 ms after stimulus onset), driving an early positive modulation contralateral to the visual event (~ 80 ms). These effects are followed by enhanced N2pc, contralateral negative slow wave (CNSW; Klaver et al., 1999), and P3 components, reflecting the orienting process of attention, visual short term memory, and active cognitive processing.
Section snippets
Participants
Fourteen volunteers participated in the experiment (7 females, mean age 23.8 years; range 18–33 years). Participants were either paid € 10 an hour or received course credits. All participants gave written informed consent for their participation. Experimental procedures adhered to the declaration of Helsinki.
Design and stimuli
Two grids filled with randomly oriented line elements were presented in the lower left and right visual fields (see Fig. 1a). A trial lasted approximately 4 seconds, and during this period a
Behavioral data
Target detection performance differed significantly for the different stimulus conditions, F4,56 = 31.1, P < 0.0001. As is clear from Fig. 2, the auditory signal led to strong benefits for the detection of the synchronized visual target in the AVtarget condition, resulting in significantly improved accuracy (90.5%) compared to all other conditions (average 75.8%; all individual comparisons, ts13 > 5.9, all Ps < 0.0001). As in previous studies (Ngo and Spence, 2010, Van der Burg et al., 2010b, Van der
Discussion
This study investigated the neural mechanisms involved in shifts of attention driven by multisensory events. The main findings were that spatially uninformative auditory signals affect the competition among multiple visual objects (as indicated by behavioral measures), and that this is related to an early audiovisual interaction starting 50–60 ms after stimulus onset over parieto-occipital electrodes. Our electrophysiological results reveal a sequence of events in the information processing
Summary and conclusion
The present study employed a visual search paradigm, using complex and dynamically changing visual displays and brief auditory events to show how sounds can affect the competition among visual stimuli. We found a systems-level cascading sequence of event that started with an early latency multisensory integration effect. This integration effect ultimately led to a rapid shift of attention to a visual stimulus that was presented in temporal synchrony with the occurrence of a transient auditory
Acknowledgments
We wish to thank Micah Murray and an anonymous reviewer for their suggestion to correlate the early AV effect with behavior in the pip and pop task.
DT and CNLO contributed equally to this project and should both be considered as second author. This research was supported by a Dutch Technology Foundation STW grant (07079), a division of NWO and the Technology Program of the Ministry of Economic Affairs (to JT), and a NWO-VIDI grant 452-06-007 (to CNLO).
References (73)
- et al.
The ventriloquism effect results from near-optimal bimodal integration
Curr. Biol.
(2004) - et al.
Audiovisual integration of speech falters under attention demands
Curr. Biol.
(2005) - et al.
Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex
Curr. Biol.
(2000) - et al.
Prestimulus oscillations predict visual perception performance between and within subjects
NeuroImage
(2007) - et al.
Dissociation of the N2pc and sustained posterior contralateral negativity in a choice response task
Brain Res.
(2008) - et al.
Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited
Vis. Res.
(2006) - et al.
Electrophysiological indicators of phonetic and non-phonetic multisensory interactions during audiovisual speech perception
Cogn. Brain Res.
(2003) - et al.
the distinct modes of vision offered by feedforward and recurrent processing
Trends Neurosci.
(2000) - et al.
Evidence of a visual-to-auditory cross-modal sensory gating phenomenon as reflected by the human P50 event-related brain potential modulation
Neurosci. Lett.
(2003) - et al.
Modulations of ‘late’ event-related brain potentials in humans by dynamic audiovisual speech stimuli
Neurosci. Lett.
(2004)
Event-related potential studies of attention
Trends Cogn. Sci.
Searching for "the top" in top-down control
Neuron
Multisensory auditory-visual interactions during early sensory processing in humans: A high-density electrical mapping study
Brain Res. Cogn. Brain Res.
Bleeping you out of the blink: Sound saves vision from oblivion
Brain Res.
Multisensory contributions to low-level, 'unisensory' processing
Curr. Opin. Neurobiol.
Look who's talking: The deployment of visuo-spatial attention during multisensory speech processing under noisy environmental conditions
Neuroimage
The multifaceted interplay between attention and multisensory integration
Trends Cogn. Sci.
An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings
Cogn. Brain Res.
Poke and pop: Tactile-visual synchrony increases visual saliency
Neurosci. Lett.
Audiovisual semantic interference and attention: Evidence from the attentional blink paradigm
Acta Psychol.
Crossmodal recruitment of primary visual cortex following brief exposure to bimodal audiovisual stimuli
Neuropsychologia
Attention to touch weakens audiovisual speech integration
Exp. Brain Res.
The ventriloquist effect does not dependent on the direction of deliberate visual attention
Percept. Psychophys.
Bimodal speech: early suppressive visual effects in human auditory cortex
Eur. J. Neurosci.
The spread of attention across modalities and space in a multisensory object
Proc. Natl Acad. Sci.
Neural processes underlying perceptual enhancement by visual speech gestures
Cogn. Brain Res.
Neural processes underlying perceptual enhancement by visual speech gestures
NeuroReport
Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech
J. Cogn. Neurosci.
Auditory-visual multisensory interactions in humans: Timing, topography, Directionality, and sources
J. Neurosci.
Catch the moment: multisensory enhancement of rapid visual events by sound
Exp. Brain Res.
Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading
Nature
A Cross-Modal System Linking Primary Auditory and Visual Cortices: Evidence From Intrinsic fMRI Connectivity Analysis
Hum. Brain Mapp.
Spatial attention can modulate audiovisual integration at multiple cortical and subcortical areas
Eur. J. Neurosci.
Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysical study
J. Cogn. Neurosci.
Combined spatial and temporal imaging of brain activity during visual selective attention in humans
Nature
Reaction time as a measure of intersensory facilitation
J. Exp. Psychol.
Cited by (129)
Multisensory integration of anticipated cardiac signals with visual targets affects their detection among multiple visual stimuli
2022, NeuroImageCitation Excerpt :The latencies of these early multisensory responses are consistent with earlier studies reporting cross-modal interactions between different exteroceptive senses. For example, prior studies on auditory-visual (Giard and Peronnet, 1999; Molholm et al., 2002; Senkowski et al., 2011; Van der Burg et al., 2011) and auditory-somatosensory integration (Foxe et al., 2000; Murray et al., 2004) have reported early ERP modulations starting at around 50 ms after stimulus onset. Early modulations in the alpha-/beta-band activity have also been observed to start at around 100 ms after the presentation of the auditory-visual stimulus (Gleiss and Kayser, 2014; Michail et al., 2021).
The relationship between multisensory associative learning and multisensory integration
2022, Neuropsychologia