Elsevier

NeuroImage

Volume 55, Issue 3, 1 April 2011, Pages 1208-1218
NeuroImage

Early multisensory interactions affect the competition among multiple visual objects

https://doi.org/10.1016/j.neuroimage.2010.12.068Get rights and content

Abstract

In dynamic cluttered environments, audition and vision may benefit from each other in determining what deserves further attention and what does not. We investigated the underlying neural mechanisms responsible for attentional guidance by audiovisual stimuli in such an environment. Event-related potentials (ERPs) were measured during visual search through dynamic displays consisting of line elements that randomly changed orientation. Search accuracy improved when a target orientation change was synchronized with an auditory signal as compared to when the auditory signal was absent or synchronized with a distractor orientation change. The ERP data show that behavioral benefits were related to an early multisensory interaction over left parieto-occipital cortex (50–60 ms post-stimulus onset), which was followed by an early positive modulation (80–100 ms) over occipital and temporal areas contralateral to the audiovisual event, an enhanced N2pc (210–250 ms), and a contralateral negative slow wave (CNSW). The early multisensory interaction was correlated with behavioral search benefits, indicating that participants with a strong multisensory interaction benefited the most from the synchronized auditory signal. We suggest that an auditory signal enhances the neural response to a synchronized visual event, which increases the chances of selection in a multiple object environment.

Research Highlights

► Multisensory stimuli are more noticeable among a background of visual stimuli. ► Spatially uninformative sounds drive a visuo-spatial shifts of attention. ► Multisensory integration processes result in attentional capture. ► Attentional capture of multisensory stimuli is largely automatic. ► The early multisensory interaction was correlated with behavioral search benefits.

Introduction

Many studies have reported that information from different sensory modalities interacts (Alais and Burr, 2004, Calvert et al., 2000, Giard and Peronnét, 1999, Hershenson, 1962, Macaluso et al., 2000, McGurk and MacDonald, 1976, Molholm et al., 2002, Schroeder and Foxe, 2005, Shipley, 1964). For instance, a single visual event is perceived as being brighter when accompanied by an auditory signal than when presented in isolation (Stein et al., 1996). These and other results have provided evidence for the notion that multisensory integration enhances signal clarity and/or reduces stimulus ambiguity (see e.g. Chen and Yeh, 2009, Olivers and Van der Burg, 2008, Vroomen and De Gelder, 2000). One drawback of the majority of studies to date, however, is that they examine interactions among single events at a time (i.e. a single visual event in combination with a single auditory event), thus leaving out the question how multisensory interactions can aid in resolving the competition between multiple stimuli (Spence, 2007).

One exception is provided by a recent study of ours (Van der Burg et al., 2008b) in which participants searched for a horizontal or vertical line segment presented among many distractor lines of various orientations. Throughout each experimental trial random subsets of items changed color. Search through these displays is difficult but improves dramatically when a spatially uninformative tone is concurrently presented with the color change of the target line. This “pip and pop” effect, as we have dubbed it, indicates that a synchronized auditory event can affect the competition among multiple visual items. Follow-up studies demonstrated that the pip and pop effect is not due to increases in alertness or top-down temporal cueing (Van der Burg et al., 2008a, Van der Burg et al., 2008b). We have proposed that the auditory signal enhances the neural response of the synchronized visual event at an early sensory level of processing (Stein et al., 1996), though in our prior study we could not provide direct evidence for this hypothesis.

Here we report an electrophysiological study in humans that supports the idea that early multisensory integration underlies the pip and pop effect, bolstering the claim that rapid and automatic audiovisual integration leads to the subjective experience of the synchronized visual event popping out of the background. Participants were asked to search for a horizontal or vertical target line among irrelevant diagonal lines. At pseudo-random intervals subsets of lines changed orientation. This set-up allowed us to independently control the moment of display onset and the moment of target appearance within that display. The target was placed at a lateral location in the lower visual field so as to take advantage of the known lateralization of brain potentials related to the deployment of attention. The target was absent at the beginning of the trial, and at a specific moment one distractor line changed into the target line (i.e. a horizontal or vertical line segment). Participants were required to identify the target orientation by pressing one of two keys. Our behavioral measure was identification accuracy. Fig. 1a presents an example search display (see on-line Mov. 1 for a video clip of a trial).

The current experiment was designed in such a way that the relation between multisensory integration and attention could be studied from two different perspectives. First, ERPs elicited by auditory (A) and visual (v) signals presented in isolation could be summated and compared to those ERPs elicited by audiovisual (AV) stimuli, in which the sound coincided with the visual target (i.e. investigating the difference between AV and [A + V] ERPs), without specifically considering the lateral position of the visual stimuli. Earlier ERP work has shown that this additive model can reveal early latency (~ 40 ms) multisensory processes (Giard and Peronnét, 1999, Mishra et al., 2007, Molholm et al., 2002, Talsma et al., 2007), reflecting changes early in the visual processing sequence (Eckert et al., 2008, Martuzzi et al., 2007, Zangenehpour and Zatorre, 2010). The present study is thus designed to determine whether such early modulations underlie the pip and pop effect.

To assess the possibility that a sound automatically interacts with a coinciding visual stimulus (regardless of its relevance (Van der Burg et al., 2008a, Van der Burg et al., 2008b) we not only included a condition in which the visual part of the audiovisual stimulus was the target (i.e. an AVtarget), but also a condition in which the visual part of the audiovisual stimulus was an irrelevant distractor (AVdistractor). Here the tone was synchronized (and ERPs time-locked) to a visual distractor stimulus. If the pip and pop effect occurs in an automatic fashion, then early modulations are expected here as well as in the AVtarget condition. Note that the different conditions were presented in blocks to ensure that participants ignored the auditory event in the AVdistractor condition. This way, interactions could consequently be attributed to automatic integration of multisensory information. Fig. 1b illustrates the temporal dynamics of the trial types employed in our study.

Second, in addition to analyzing early multisensory modulations, the presentation of visual events at lateral locations in the visual field allowed for the analysis of lateralized ERPs associated with sensory enhancement, the deployment of attention, and activation in of visual short term memory (VSTM). To index changes in these cognitive processes we looked at a series of well-documented ERP components: the lateral P1 (90–120 ms post-stimulus), the N2pc (175–300 ms post-stimulus), the contralateral negative slow wave (CNSW; 300+ ms post-stimulus).

Multimodal research with the lateralized P1 has associated increases in the amplitude of this component to modulation of sensory strength and priority (McDonald et al., 2005, Störmer et al., 2009). In McDonald et al. (2005), for example, the perceived order of lateralized visual events was affected by the location of a preceding auditory signal. Visual events that were presented at the same location as auditory events were perceived as occurring earlier in time, and this behavioral effect was associated with increase in the amplitude of the lateral P1. These results lead us to expect a corresponding increase in lateral P1 in the present study if the auditory signal in fact modulates the strength of a synchronized visual event (though it is important to note that, in contrast to McDonald et al. (2005), in the present study the sound itself was not lateralized).

The N2pc was originally linked to visuo-spatial selective attention in a series of visual search studies (Luck and Hillyard, 1994, Luck et al., 2000). The component is evident as an increase in negative ERP amplitude at posterior electrodes located over cortical brain areas contralateral to an attended visual object. In the present study, we expect to observe an N2pc if the synchronized visual event captures attention. Moreover, if the early audiovisual integration and subsequent change in visual processing occurs automatically, similar P1 and N2pc modulations should be observed in the AVdistractor and AVtarget conditions.

The CNSW has been associated with visual short term memory (VSTM; Klaver et al., 1999). This lateralized component is present during memorization and reflected as a posterior slow negative ERP wave developed over the hemisphere contralateral to the memorized item. In our experimental task, VSTM was presumably active in AVtarget trials, as participants were required to eventually respond based on target characteristics, but we did not expect the recruitment of VSTM resources in the AVdistractor condition. With this in mind we expected to find a larger CNSW in the AVtarget condition.

Finally, we expect to observe a greater P3 component in the AVtarget condition compared to the other conditions (Sutton et al., 1965). We distinguish the P3 from co-occurring CNSW in that the P3 is a central component, while the CNSW has a lateral topography. The P3 has been associated with awareness of stimuli, and/or updating of working memory (Nieuwenhuis et al., 2005, Sutton et al., 1965).

To foreshadow, our results show that spatially uninformative auditory signals start to affect the competition among multiple visual objects at an early sensory stage (50–60 ms after stimulus onset), driving an early positive modulation contralateral to the visual event (~ 80 ms). These effects are followed by enhanced N2pc, contralateral negative slow wave (CNSW; Klaver et al., 1999), and P3 components, reflecting the orienting process of attention, visual short term memory, and active cognitive processing.

Section snippets

Participants

Fourteen volunteers participated in the experiment (7 females, mean age 23.8 years; range 18–33 years). Participants were either paid € 10 an hour or received course credits. All participants gave written informed consent for their participation. Experimental procedures adhered to the declaration of Helsinki.

Design and stimuli

Two grids filled with randomly oriented line elements were presented in the lower left and right visual fields (see Fig. 1a). A trial lasted approximately 4 seconds, and during this period a

Behavioral data

Target detection performance differed significantly for the different stimulus conditions, F4,56 = 31.1, P < 0.0001. As is clear from Fig. 2, the auditory signal led to strong benefits for the detection of the synchronized visual target in the AVtarget condition, resulting in significantly improved accuracy (90.5%) compared to all other conditions (average 75.8%; all individual comparisons, ts13 > 5.9, all Ps < 0.0001). As in previous studies (Ngo and Spence, 2010, Van der Burg et al., 2010b, Van der

Discussion

This study investigated the neural mechanisms involved in shifts of attention driven by multisensory events. The main findings were that spatially uninformative auditory signals affect the competition among multiple visual objects (as indicated by behavioral measures), and that this is related to an early audiovisual interaction starting 50–60 ms after stimulus onset over parieto-occipital electrodes. Our electrophysiological results reveal a sequence of events in the information processing

Summary and conclusion

The present study employed a visual search paradigm, using complex and dynamically changing visual displays and brief auditory events to show how sounds can affect the competition among visual stimuli. We found a systems-level cascading sequence of event that started with an early latency multisensory integration effect. This integration effect ultimately led to a rapid shift of attention to a visual stimulus that was presented in temporal synchrony with the occurrence of a transient auditory

Acknowledgments

We wish to thank Micah Murray and an anonymous reviewer for their suggestion to correlate the early AV effect with behavior in the pip and pop task.

DT and CNLO contributed equally to this project and should both be considered as second author. This research was supported by a Dutch Technology Foundation STW grant (07079), a division of NWO and the Technology Program of the Ministry of Economic Affairs (to JT), and a NWO-VIDI grant 452-06-007 (to CNLO).

References (73)

  • S.J. Luck et al.

    Event-related potential studies of attention

    Trends Cogn. Sci.

    (2000)
  • B.T. Miller et al.

    Searching for "the top" in top-down control

    Neuron

    (2005)
  • S. Molholm et al.

    Multisensory auditory-visual interactions during early sensory processing in humans: A high-density electrical mapping study

    Brain Res. Cogn. Brain Res.

    (2002)
  • C.N.L. Olivers et al.

    Bleeping you out of the blink: Sound saves vision from oblivion

    Brain Res.

    (2008)
  • C.E. Schroeder et al.

    Multisensory contributions to low-level, 'unisensory' processing

    Curr. Opin. Neurobiol.

    (2005)
  • D. Senkowski et al.

    Look who's talking: The deployment of visuo-spatial attention during multisensory speech processing under noisy environmental conditions

    Neuroimage

    (2008)
  • D. Talsma et al.

    The multifaceted interplay between attention and multisensory integration

    Trends Cogn. Sci.

    (2010)
  • W.A. Teder-Salejarvi et al.

    An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings

    Cogn. Brain Res.

    (2002)
  • E. Van der Burg et al.

    Poke and pop: Tactile-visual synchrony increases visual saliency

    Neurosci. Lett.

    (2009)
  • E. Van der Burg et al.

    Audiovisual semantic interference and attention: Evidence from the attentional blink paradigm

    Acta Psychol.

    (2010)
  • S. Zangenehpour et al.

    Crossmodal recruitment of primary visual cortex following brief exposure to bimodal audiovisual stimuli

    Neuropsychologia

    (2010)
  • A. Alsius et al.

    Attention to touch weakens audiovisual speech integration

    Exp. Brain Res.

    (2007)
  • P. Bertelson et al.

    The ventriloquist effect does not dependent on the direction of deliberate visual attention

    Percept. Psychophys.

    (2000)
  • J. Besle et al.

    Bimodal speech: early suppressive visual effects in human auditory cortex

    Eur. J. Neurosci.

    (2004)
  • L. Busse et al.

    The spread of attention across modalities and space in a multisensory object

    Proc. Natl Acad. Sci.

    (2005)
  • D.E. Callan et al.

    Neural processes underlying perceptual enhancement by visual speech gestures

    Cogn. Brain Res.

    (2003)
  • D.E. Callan et al.

    Neural processes underlying perceptual enhancement by visual speech gestures

    NeuroReport

    (2003)
  • G. Calvert et al.

    Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech

    J. Cogn. Neurosci.

    (2003)
  • C. Cappe et al.

    Auditory-visual multisensory interactions in humans: Timing, topography, Directionality, and sources

    J. Neurosci.

    (2010)
  • Y.C. Chen et al.

    Catch the moment: multisensory enhancement of rapid visual events by sound

    Exp. Brain Res.

    (2009)
  • J. Driver

    Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading

    Nature

    (1996)
  • M.A. Eckert et al.

    A Cross-Modal System Linking Primary Auditory and Visual Cortices: Evidence From Intrinsic fMRI Connectivity Analysis

    Hum. Brain Mapp.

    (2008)
  • S.L. Fairhall et al.

    Spatial attention can modulate audiovisual integration at multiple cortical and subcortical areas

    Eur. J. Neurosci.

    (2009)
  • M.H. Giard et al.

    Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysical study

    J. Cogn. Neurosci.

    (1999)
  • H.J. Heinze et al.

    Combined spatial and temporal imaging of brain activity during visual selective attention in humans

    Nature

    (1994)
  • M. Hershenson

    Reaction time as a measure of intersensory facilitation

    J. Exp. Psychol.

    (1962)
  • Cited by (129)

    • Multisensory integration of anticipated cardiac signals with visual targets affects their detection among multiple visual stimuli

      2022, NeuroImage
      Citation Excerpt :

      The latencies of these early multisensory responses are consistent with earlier studies reporting cross-modal interactions between different exteroceptive senses. For example, prior studies on auditory-visual (Giard and Peronnet, 1999; Molholm et al., 2002; Senkowski et al., 2011; Van der Burg et al., 2011) and auditory-somatosensory integration (Foxe et al., 2000; Murray et al., 2004) have reported early ERP modulations starting at around 50 ms after stimulus onset. Early modulations in the alpha-/beta-band activity have also been observed to start at around 100 ms after the presentation of the auditory-visual stimulus (Gleiss and Kayser, 2014; Michail et al., 2021).

    View all citing articles on Scopus
    View full text