Abstract
Pooling and synthesizing signals across different senses often enhances responses to the event from which they are derived. Here, we examine whether multisensory response enhancements are attributable to a redundant target effect (two stimuli rather than one) or if there is some special quality inherent in the combination of cues from different senses. To test these possibilities, the performance of animals in localizing and detecting spatiotemporally concordant visual and auditory stimuli was examined when these stimuli were presented individually (visual or auditory) or in cross-modal (visual–auditory) and within-modal (visual–visual, auditory–auditory) combinations. Performance enhancements proved to be far greater for combinations of cross-modal than within-modal stimuli and support the idea that the behavioral products derived from multisensory integration are not attributable to simple target redundancy. One likely explanation is that whereas cross-modal signals offer statistically independent samples of the environment, within-modal signals can exhibit substantial covariance, and consequently multisensory integration can yield more substantial error reduction than unisensory integration.
Introduction
The brain's ability to integrate information derived from multiple senses is remarkable given that each sense transduces a different form of environmental energy. Nevertheless, the products of this synthesis are readily apparent in the multisensory responses of superior colliculus (SC) neurons, a midbrain structure involved in the control of orientation behavior and often used as a model to explore multisensory integration (Stein and Meredith, 1993). Cross-modal stimuli that are spatially and temporally coincident evoke responses from these neurons that are more robust than those evoked by the individual component stimuli (Meredith and Stein, 1983; Wallace et al., 1996, 1998; Jiang et al., 2001; Perrault et al., 2005; Stanford et al., 2005; Rowland et al., 2007a,b). Behaviorally, these spatiotemporally coincident cross-modal stimuli yield enhancements in the detection and localization of external events (Stein et al., 1988, 1989; Wilkinson et al., 1996; Jiang et al., 2002; Burnett et al., 2004). Similar multisensory enhancements have been observed in a number of brain regions, behaviors, and species using a variety of experimental techniques (Calvert et al., 2004; Stanford and Stein, 2007; Driver and Noesselt, 2008; Stein and Stanford, 2008).
Often implicit in the appreciation of multisensory integration is the belief that its underlying computation is different from that engaged during unisensory integration. The reasoning is straightforward: because different senses are not contaminated by common noise sources (they generate independent estimates), their synthesis should yield response products exceeding those produced by integrating information from within the same sense (Ernst and Banks, 2002; Rowland et al., 2007b). However, an alternate assumption is that combinations of within- and cross-modal stimuli would yield equivalent products because both represent a simple redundant target effect (RTE) regardless of input statistics (Miller, 1982; Gondan et al., 2005; Lippert et al., 2007; Leo et al., 2008; Sinnett et al., 2008).
It is only recently that this issue has been explored systematically. Alvarado et al. (2007) compared the products of multisensory and unisensory integration in cat SC neurons by presenting them with pairs of cross-modal and within-modal stimuli. They reported that these processes yielded different responses with distinct underlying neural computations. Weakly effective cross-modal stimuli produced responses that were statistically greater than either of the responses to the component stimuli and often engaged an underlying superadditive computation. More effective cross-modal stimuli resulted in proportionately less enhancement and in additive computations. In contrast, within-modal stimuli rarely produced enhanced responses and generally engaged subadditive computations. In rare circumstances, very weakly effective stimuli did yield additive interactions (albeit, rarely reaching the criterion for enhancement) but rapidly transitioned to subadditivity as stimulus effectiveness increased.
These neural data suggest that multisensory and unisensory integration would also yield very different behavioral products (see also Ernst and Banks, 2002). The present experiments were designed to evaluate this possibility in a detection and localization paradigm (Stein et al., 1989). The results obtained here also revealed substantial differences between the impact of multisensory and unisensory integration and suggest that a simple RTE model is insufficient to explain the data.
Materials and Methods
All procedures were conducted in accordance with the Guide for the Care and Use of Laboratory Animals (National Institutes of Health publication 86-23) and an approved Institutional Animal Care and Use Committee protocol at Wake Forest University School of Medicine, an Association for Assessment and Accreditation of Laboratory Animal Care-accredited institution. The apparatus, training, and testing procedures were similar to those described previously (Stein et al., 1988, 1989; Wilkinson et al., 1996; Jiang et al., 2002; Burnett et al., 2004).
Apparatus.
Four adult male cats (4–6 kg) were trained in a spatial localization task within a 90-cm-diameter perimetry apparatus (Fig. 1). The apparatus contained stimulus complexes of light-emitting diodes (LEDs) and speakers placed at 15° intervals to the left (from −90°) and right (to +90°) of a central fixation point (0°). Each stimulus complex consisted of two horizontally displaced (4 cm) LEDs (Lumex Opto/Components; model 67-1102-ND) located 4 cm beneath two horizontally displaced speakers (Panasonic; model 4D02C0). This displacement ensured that all stimulus pairings were likely to fall within the receptive fields of many of the same multisensory SC neurons (Stein and Meredith, 1993). As a general convention, we reference the leftmost stimuli within each individual stimulus complex with a “1” and the rightmost stimuli with a “2” (e.g., V1 is the leftmost visual stimulus). The perimetry apparatus was housed in a sound-attenuating chamber (Industrial Acoustics Company) with a constant 22 dB background noise. Stimuli were controlled with custom software, triggered by the experimenter, and unless otherwise stated, consisted of brief (40 ms) LED illuminations or bursts of broadband noise from the speakers. During the testing phase (see below), stimulus intensities were reduced to levels undetectable to the experimenter. The experimenter was informed of the target location (i.e., the stimulus complex), and of the stimulus combination on any given trial after the categorical response, location was judged. The animal was either within 5° (4 cm) of the target or outside of this range. This proved to be an extremely easy judgment to make as animals very rarely went between stimulus complexes (each separated by 15°) because all stimuli complexes were always visible to them. When making errors, they almost always either went to a wrong location or failed to respond (“no-go”). Rare circumstances in which the animal began to approach the target, but then returned to the starting point, were scored as no-go errors.
Training.
Responses were shaped using a small food reward (175 mg kibble; Hill's Science Diet). Each animal was first trained to stand at the center of the arena (the start position) and fixate directly ahead at 0°, with the experimenter providing gentle head restraint. The animal was then required to orient toward and approach (within 4 cm) a briefly illuminated (40 ms) but highly visible LED (3 foot-candles) at that perimetric location. Its nose had to be displaced no more than 5° to the left or right of this location within 3 s of stimulus onset to receive a reward. After mastering this task, the number of possible target LED locations was expanded to include increasingly more eccentric locations (these were randomly interleaved between ±45° and always involved the left LED in each stimulus complex). Given that 15° separated the perimetric locations, the animal had to choose among seven locations: three locations on the left (−45°, −30°, −15°), the center location (0°), and three locations on the right (+15°, +30°, +45°). Once the animal mastered the visual localization task, it was trained on the auditory localization task using the same general procedure (the auditory stimulus was a 40 ms, 60 dB sound-pressure level A-weighted broadband noise). At this time, “catch” trials (no stimulus) were introduced, during which the animal was required to remain at the start position looking directly ahead to receive a reward. Training was completed when animals would accurately approach the stimulus on at least 85 of 100 trials at all seven locations and have <15% erroneous responses to catch trials.
Testing.
To equilibrate the effectiveness of the stimuli among the different perimetric locations, the intensity of the modality-specific stimulus (visual or auditory) was reduced at each location to a level eliciting ∼25% correct responses. There was one stimulus/trial/location, but the stimulus could be an individual modality-specific stimulus (visual or auditory) or a pair of within-modal (visual–visual, auditory–auditory) or cross-modal (visual–auditory) stimuli. In addition, catch trials (no stimulus was presented) were included. All the presentations of the pairs of stimulus were spatially and temporally coincident, unless otherwise noted (see below). All animals were tested with ∼150 trials per day (i.e., until satiety) and 5 d per week. To ensure a reasonable number of trials per stimulus condition each day, cross-modal (visual–auditory) versus within-modal (visual–visual or auditory–auditory) comparisons were divided into two experiments separated by ∼9 months. The first experiment compared performance in response to within-modal visual (V1V2) and cross-modal (V1A1) stimuli; and the second experiment compared performance in response to within-modal auditory (A1A2) and cross-modal (V1A1) stimuli. Because animals were run to satiety each day using random trial selection, there were often unequal numbers of trials/day/location. Because the random selection procedure was used during all days of testing, there were some locations that exceeded the minimum number of trials.
Pairs of visual stimuli consisted of 2 LEDs separated by 5°. The paired auditory stimuli were also separated by 5°, but to ensure that two sounds were distinguishable in the second experiment, the first sound (see above) was presented first to the left speaker (30 ms duration) and 15 ms later to the second (right) speaker (35 ms duration). Thus, they overlapped for 15 ms. For this reason, the stimuli used in Experiment 2 lasted 50 ms as opposed to 40 ms as in Experiment 1. The two sounds were distinguishable to human listeners and had differing impacts on the animals' behavioral responses (see Results).
Within each experiment, all five stimulus conditions were randomly interleaved: (1) a single visual stimulus (V1); (2) a single auditory stimulus (A1 or A2); (3) a spatiotemporally coincident visual–auditory stimulus pair (V1A1); (4) a spatiotemporally coincident within-modal stimulus pair (visual, V1V2 or auditory, A1A2); or (5) a catch trial. All stimulus locations were randomly selected. Orientation and approach to each target stimulus (see above) and maintaining fixation during a catch trial were rewarded. Because V1 and V2 were identical, only one of them (V1) was used to obtain an index of unisensory visual performance and the intensity level of V2 was matched to V1 during the visual within-modal stimulus combinations (V1V2).
Control experiment.
The physical limitations of the apparatus displaced visual–visual (V1V2) and auditory–auditory (A1A2) stimuli by 4 cm in the horizontal dimension, whereas visual–auditory stimuli (V1A1) were at the same azimuthal position, one above the other. To examine whether the directionality of displacement had a significant effect on the results, additional tests were conducted in which the visual and auditory stimuli were diagonally displaced (V1A2). Three animals were tested for 6–7 d for a total of 1133–1244 trials per animal. We found that diagonally displaced cross-modal stimulus pairs produced response enhancements (see below) equivalent to those found for vertically displaced stimulus pairs.
Data analysis.
The outcome of each trial (correct or incorrect orientation/approach, or No-Go) was scored by the researcher and later grouped according to location and stimulus type. Comparisons were made between groups using standard statistical techniques (quantification of mean accuracy/error, χ2 tests for significant differences between response categories). The critical comparisons included the following.
(1) Localization accuracy (percentage correct responses) of individual stimuli, cross-modal stimulus pairs, and within-modal stimulus pairs. These included tests for significant differences in localization performance for cross-modal, within-modal, and single-stimulus conditions. Response enhancements associated with visual–auditory, visual–visual, and auditory–auditory stimulus pairings were evaluated by comparing the incidence of correct responses generated by these stimuli to the maximum incidence of correct responses generated by a single stimulus at each location (i.e., performance generated by the best single stimulus, visual or auditory). Differences in localization performance were also expressed as percentages.
(2) The incidence of each type of trial outcome for each stimulus condition and percentage differences between cross-modal, within-modal, and single-stimulus conditions.
(3) The relationship between the percentage response enhancements that was associated with cross-modal and within-modal stimulus pairs and the best single-stimulus performance at each location. These variables are typically inversely related; that is, lower accuracy in localizing single stimuli is typically correlated with greater enhancements when other stimuli are added (the principle of inverse effectiveness (Meredith and Stein, 1986; Stein and Meredith, 1993; Stanford et al., 2005). These relationships were fit separately with regression lines that were then compared with an F test.
Results
All animals learned to localize and approach the targets at all perimetric positions. Differences in the speed with which different animals (n = 4) reached criterion performance were noted, but these were minor and in keeping with interanimal variations noted in previous studies (Stein et al., 1988, 1989; Wilkinson et al., 1996; Jiang et al., 2002, Burnett et al., 2004). Performance patterns for each stimulus condition during testing were highly consistent across all animals, and thus the data were pooled for the general analysis.
Experiment 1: multisensory (visual–auditory) versus unisensory (visual–visual) integration
The results for each stimulus condition were highly consistent across stimulus locations because intensities were intentionally adjusted to equilibrate localization accuracy of the individual visual and auditory stimuli. The data were collapsed across all animals and locations to obtain group means. These were weighted by their differing numbers (see Materials and Methods). The mean localization accuracies for single visual stimuli (V1 or V2) and single auditory stimuli (A1) across stimulus locations were 25 and 27%, respectively. The addition of a second, spatially and temporally concordant stimulus enhanced the overall response accuracy, but enhancement in response to the cross-modal (V1A1) stimulus pair was significantly greater than enhancement to the within-modal (V1V2) stimulus pair at each location (Fig. 2A). The mean multisensory (V1A1) enhancement in localization performance was (pooling data across locations) 137% (by location, the range was from 94 to 168%), whereas the mean unisensory (V1V2) enhancement was only (pooling across locations) 49% (by location, the range was 31 to 79%).
It is also important to note that cross-modal stimulus pairs evoked significantly fewer No-Go errors (56% less; p < 0.05) and localization errors (29% less; p < 0.05) (Fig. 2B). In contrast, within-modal stimuli significantly reduced only No-Go errors (29% less; p < 0.05) but not the incidence of incorrect localizations (9% more; nonsignificant). Thus, whereas unisensory integration appeared to make responses more likely, it did not make them more accurate, as did multisensory integration. Indeed, the data show that the positive impact of multisensory integration on performance averaged more than 2.8 times that of unisensory integration.
Enhanced performance to both cross-modal and within-modal stimuli were inversely proportional to the best single-stimulus accuracy at each location; that is, they were both consistent with the principle of inverse effectiveness (Fig. 2C). However, whereas the regression fits of these trends for enhancements to each stimulus pair had similar slopes, their intercepts were very different, and the lines were consistently displaced by ∼100 percentage points. The effectiveness of cross-modal stimuli (V1A1) was consistently higher than that of within-modal stimuli (V1V2).
Experiment 2: multisensory (visual–auditory) versus unisensory (auditory–auditory) integration
The results were similar to those obtained in Experiment 1. The mean location accuracy for the single visual stimulus (V1) across locations was 25% and the two auditory stimuli (A1 and A2) were, respectively, 27 and 26%. Both cross-modal (V1A1) and within-modal (A1A2) stimulus combinations yielded enhanced responses, but again, the cross-modal combination yielded more than 2.9 times the performance enhancement (pooling data across locations, the mean enhancement was 141%; by location, the range was 106–177%) than did the within-modal combination (pooling data across locations, the mean enhancement was 49%; by location, the range 33–69%) (Fig. 3A).
Also, cross-modal stimuli significantly reduced both No-Go (60% less; p < 0.05) and incorrect location (25% less; p < 0.05) errors, whereas within-modal stimuli significantly decreased only No-Go errors (28% less; p < 0.05). Within-modal auditory stimuli had virtually no effect on the incidence of incorrect localizations (2% less; nonsignificant); thus, they made responses more likely but, unlike the cross-modal stimuli, they did not make initiated responses more accurate (Fig. 3B).
Finally, both multisensory (cross-modal, V1A1) and unisensory (within-modal, A1A2) performance enhancements were consistent with inverse effectiveness (Fig. 3C). However, as in Experiment 1, the regression fits to the inverse effectiveness trends had similar slopes but very different intercepts and were consistently displaced by ∼100 percentage points.
The data from the two experiments show that multiple coincident stimuli evoked more accurate detection and localization responses than did a single stimulus (i.e., two is better than one). However, the magnitude of this performance enhancement was strongly tied to whether the stimuli were derived from the same or different sensory modalities. Coincident cross-modal stimuli evoked strong enhancements in localization performance, not only making localization responses more likely, but more accurate. In contrast, coincident within-modal stimuli evoked only weak enhancements and, while making localization responses more likely, did not make them any more accurate. The magnitudes of these enhancements in localization performance were inversely proportional to the localization accuracy of the individual component stimuli. However, in the range studied, multisensory integration enhanced behavioral performance on average 2.85 times more than did unisensory integration. The data are highly consistent across animals and sensory modalities; that is, the impact of visual–visual and auditory–auditory stimuli were similar when the effectiveness of their component stimuli was similar (Fig. 4). Furthermore, the magnitude of the enhancements to the visual–auditory stimulus pairs obtained in Experiment 2 was not significantly different from that obtained in Experiment 1, despite the 9 month interval between them (Fig. 4).
Discussion
Enhancement in behavioral performance consequent to integrating information across sensory modalities is sometimes interpreted as attributable to RTE; that is, the improvements occur because these stimuli are multiple and redundant (i.e., equally informative individually). Indeed, speeded reaction time is often seen in response to combinations of within-modal (Marzi et al., 1996; Murray et al., 2001; Savazzi and Marzi, 2002, 2004; Schröter et al., 2007) and with cross-modal (Hughes et al., 1994; Frens et al., 1995; Goldring et al., 1996; Giard and Peronnet, 1999; Forster et al., 2002; Amlôt et al., 2003; Diederich and Colonius, 2004; Sakata et al., 2004; Teder-Sälejärvi et al., 2005; Senkowski et al., 2006; Hecht et al., 2008a,b) stimuli. If this interpretation is correct, then redundant targets from the same sensory modality (e.g., two visual or two auditory stimuli) should have equivalent effects. However, the data in the present experiments indicate that they do not. Multiple stimuli from the same sensory modality only marginally enhanced localization compared with cross-modal stimulus combinations. Although they made localization responses somewhat more likely, they did not make them any more accurate.
These behavioral observations closely parallel recent physiological findings from single neurons in the SC. Alvarado et al. (2007) found that SC neurons responded to pairs of adjacent visual stimuli with only a modest increase (not statistically significant) in their mean number of evoked impulses, numbers that were far below those predicted by summing responses to the individual component stimuli. In contrast, SC neurons showed highly significant response enhancements to cross-modal (visual–auditory) stimulus pairs, response enhancements that were often greater than, or equal to, the sum of responses to the component stimuli. These physiological and behavioral observations are consistent with suggestions that multisensory and unisensory integration engage different underlying computations.
As noted by Ernst and Banks (2002), these differences can be understood in the context of a probabilistic (Bayesian) model of integration. In a Bayesian model of spatial locations (Rowland et al., 2007a), each stimulus is assumed to generate a sensory report of its location that may be accurate or randomly inaccurate, depending on the fidelity of the sensory system. These reports are filtered by previous expectations such that, if two or more stimuli co-occur, they belong to the same event and thus should have the same location. The action of the filter is to fuse the sensory reports, generating an estimate of location that is a compromise between them. Sensory reports that are inaccurate in the same direction (e.g., both biased to the left or both to the right) do not generate more accurate estimates when fused because they do not contradict the previous expectation. However, sensory reports that are inaccurate in opposite directions (e.g., one biased left and the other one biased right) are fused to generate estimates that are more consistent with the previous expectation that the stimulus locations should be the same. These circumstances produce estimated stimulus locations between the two incorrect sensory reports, which are closer to the actual stimulus location than the flanking sensory reports.
Cross-modal stimulus combinations represent multiple stimuli that are conveyed by different forms of energy and transduced by independent sensory systems. Consequently, the sensory reports evoked by two cross-modal stimuli are independent of one another and equally as likely to be inaccurate in the same or different directions. Instances where the sensory reports are inaccurate in different directions yield more accurate estimates (by the above logic) and thus enhance localization responses. In contrast, multiple within-modal stimuli that travel by the same medium and are transduced by the same sensory system can be influenced by the same noise sources. Consequently, the sensory reports they generate are likely to covary substantially, that is, be more likely to be inaccurate in the same direction than in different directions. The lower incidence of sensory reports that are inaccurate in opposite directions reduces the incidence where estimates are improved through their fusion. The intuitive nature of this result is apparent in the consideration of the most extreme case: two sensory reports that covary 100% of the time are obviously no better than one alone. Thus, although not pursued quantitatively here, the Bayesian model can help explain how different products result when information is integrated across or within a sensory modality as a consequence of the relative statistics of the stimuli.
Whether the Bayesian model or some other model is used to explain the current results (see also Rowland et al., 2007b), the data suggest that there is something inherently different about multisensory and unisensory integration that is evident in a simple detection/localization task. Whether these differences would be equally evident in tasks requiring the use of higher-order cognitive processes remains to be determined.
Footnotes
-
This research was supported by National Institutes of Health Grant EY016716. We thank Nancy London for technical assistance.
- Correspondence should be addressed to Dr. Barry E. Stein, Department of Neurobiology and Anatomy, Wake Forest University School of Medicine, Winston-Salem, NC 27157. bestein{at}wfubmc.edu