Researchers have theorized that evolutionary pressure should have shaped mechanisms to ensure that emotional stimuli are rapidly detected, promote learning, and are remembered well (for review, see Phelps & LeDoux, 2005). This theory has spurred a great deal of research suggesting that emotional stimuli (Anderson, 2005), and particularly negative or threatening stimuli (Fox et al., 2000; Öhman, Flykt, & Esteves, 2001), experience processing advantages relative to nonemotional stimuli (for review, see Vuilleumier, 2005).

There is consensus that these emotional effects on cognitive processing are mediated by activation of the amygdala (Armony, Servan-Schreiber, Cohen, & LeDoux, 1997; LeDoux, 2000). Subcortical projections to the amygdala may allow for the rapid evaluation of the emotional relevance of stimuli (Morris, DeGelder, Weiskrantz, & Dolan, 2001; Pasley, Mayes, & Schultz, 2004; Vuilleumier, Armony, Driver, & Dolan, 2003). Although these subcortical projections may allow for a rapid emotional response, this rapid emotional response most likely precedes the conscious identification of the emotional stimulus. That is, the amygdala might indicate that a stimulus is negative prior to the viewer having conscious recognition of whether the stimulus is a spider or an angry face. The conscious recognition of the identity of the emotional stimulus requires cortical processing of the stimulus within the ventral visual processing stream (Mishkin, Ungerleider, & Macko, 1983; Ungerleider & Haxby, 1994).

Researchers have, however, posited that rapid amygdala activation is an early process that helps ensure that emotional stimuli benefit from additional cortical processing, thereby ensuring that these emotionally charged stimuli reach conscious awareness.

There are two mechanisms by which amygdala activity may support the rapid identification of emotionally charged stimuli. First, detection of the emotional valence by the amygdala may influence attention mechanisms, resulting in a rapid shift of attention toward the emotional stimulus. Second, it has been proposed that detection by the amygdala can increase vigilance (Armony et al., 1997) and, via direct connections from the amygdala to perceptual cortices (Amaral, Behniea, & Kelly, 2003), may increase the gain of those perceptual units responsible for processing the emotional stimulus (Phelps & LeDoux, 2005).

The attention-based theory suggests that emotional stimuli will be identified quickly because they are prioritized during the competition for attention. Thus, when multiple objects compete for attention, emotional objects will be selected and identified before nonemotional objects. By contrast, the gain-based hypothesis suggests that emotional stimuli may be identified more rapidly, even when there is no competition for attention. That is, the increase gain of early perceptual units that process emotional stimuli may produce more efficient identification of the emotional stimulus, even when only one stimulus is presented at a time and there is therefore no competition for attention.

The attention-based theory has been the focus of a great deal of research (see Frischen, Eastwood, & Smilek, 2008, and Yiend, 2009, for reviews), but there has been little research investigating whether the gain-based theory supports the rapid identification of emotional stimuli, even when there is no competition for the allocation of attention. Although both imaging (Morris et al., 1998) and behavioral data (Phelps, Ling, & Carrasco, 2006) suggest that amygdala activation is related to increased perceptual processing, neither method has investigated whether this increase is associated with more rapid identification of the negative stimulus. For example, Morris et al. (1998) found that the amount of amygdala activation caused by the presentation of a fearful face was correlated with neural responses in extrastriate visual cortices. However, the slow temporal resolution of fMRI makes it difficult to determine whether this increased activation was also associated with more rapid identification. Similarly, Phelps et al. found that the contrast required to discriminate the orientation of a tilted Gabor patch was lower following a nonspatially informative fearful cue than neutral cue. Although this finding suggests that emotional stimuli can increase visual sensitivity, the time course of such an improvement was not assessed. In addition, subsequent research (Bocanegra & Zeelenberg, 2009) found that the detection benefit reported by Phelps et al. occurs only for low spatial frequency gratings. For high spatial frequency gratings there was, instead, a cost associated with a fearful precue. Thus, it is unclear whether this increase in sensitivity, even if accomplished rapidly, would support the more rapid identification of more complex real-world objects and scenes.

To our knowledge, only one study has directly addressed the time course required to identify emotional scenes when only one scene was presented at a time. Maljkovic and Martini (2005) presented a stream of negative and positive emotional photographs using rapid serial visual presentation (RSVP) and then immediately tested participants’ recognition memory for the pictures. Collapsing across emotional valence, they found that more arousing images required less time to encode into memory. More interestingly, collapsing across arousal, they found that negative images were less likely to be remembered than positive images when presentation durations were very short (< 200 ms). Although this pattern reversed (better memory for negative images) with longer presentation durations (> 400 ms), the initial decrement in memory performance for negative images is inconsistent with the proposal that negative emotional images may be identified rapidly.

Maljkovic and Martini’s (2005) result has not, however, been replicated. In addition, the method of presenting images in an RSVP stream may have contributed to their finding. In their streams, each image was immediately replaced by the next. At extremely rapid presentation rates, this situation may have resulted in a second to-be-identified image appearing before the processing of the identity of the first image had been completed. Even though the stimuli appeared one at a time, at short stimulus durations, the identification processes of multiple images may have overlapped in time, producing a competition for attentional resources. If so, the poor performance with negative images may have reflected a rapid shift of resources away from the negative images toward the positive images. This shift of resources could be accounted for by an attentional bias toward positive stimuli (Frewen, Dozois, Joanisse, & Neufeld, 2008) or by a tendency to rapidly deploy resources to negative stimuli but to quickly remove them (Rinck & Becker, 2006).

In the present article, we investigated whether the emotional valence of a photograph has an impact on the time required to initially identify the image when there is little or no competition for resources during identification. As such, it minimizes the impact that processes guiding attentional allocation may have on image identification and directly investigates how the increased vigilance and perceptual processing associated with emotional images (i.e., the gain-based theory) influence the speed of identification. Our first experiment is similar to Maljkovic and Martini’s (2005) except that each of our images is presented briefly, masked, and then followed by a sizable delay before another image is presented. In all, there were 1,900 ms between the onset of one image and the onset of the next, providing more than enough time to complete the identification of an image before a second competing image was presented. If their results were due to competition for processing resources when multiple pictures appeared too closely in time, we would expect the effect to reverse in our experiment. If, however, their result reflected a tendency for there to be an initial encoding deficit for negative images, we would expect to replicate their finding.

Experiment 1

Method

Participants

Twenty-six Michigan State University undergraduates (18–27 years old) with normal or corrected-to-normal vision participated in the experiment for course credit.

Stimuli

The experiment used 72 photographs from the International Affective Picture System (IAPS) set (Lang, Bradley, & Cuthbert, 2005). The IAPS images have been extensively rated for both valence and arousal using a nine-point Likert-type scale. These ratings were used to select 24 negative, 24 neutral, and 24 positive images that fulfilled the following criteria. First, we wanted to equate the arousal level of the negative and positive images. To do so, we had to avoid the most negative images from the IAPS since they tend to be more highly arousing than the positive images. We also attempted to avoid pictures displaying close-ups of people’s faces since there is evidence for a processing advantage for happy faces (Amir, Elias, Klumpp, & Przeworski, 2003; Juth, Lundqvist, Karlsson, & Öhman, 2005; Kirita & Endo, 1995; Leppänen & Hietanen, 2004). Within these constraints, we wanted to maximize the valence difference between our stimuli. Given these constraints, our negative images had valence ratings that ranged from 2.8 to 4 (M = 3.4, SD = .29). The neutral images had valence ratings that ranged from 5 to 6 (M = 5.6, SD = .23). The positive images had valence ratings that ranged from 7 to 8 (M = 7.3, SD = .25). The valence and arousal distributions for the stimuli chosen for this experiment are summarized in Fig. 1. All images subtended 10 × 14 degrees of visual angle and were presented on a white surround.

Fig. 1
figure 1

A scatterplot of the normative valence and arousal ratings for the IAPS images used in the experiments demonstrates that the images differ substantially on their valence, but not their arousal

Procedure

Participants were informed that they would view a slideshow in which images would be presented very briefly and then masked by a colored checkerboard. Their task was to attempt to remember the briefly presented images for a memory test that would occur after the slideshow. Participants then sat at a computer on which the experiment was run. The experiment was programmed using E-prime Professional and displayed on a CRT with a resolution of 1,024 × 768 running at 100 Hz.

The experiment consisted of a slideshow presentation of the 36 images followed by a memory test in which all 72 images were presented. The following sequence of events cycled during the slideshow (see Fig. 2). The word “Ready” appeared on screen for 1,000 ms, followed by a fixation cross for another 500 ms. Then, the images were presented for 60 ms followed by a mask for 340 ms. After the mask, the entire process repeated. Thus, there was a total of 1,840 ms between each image. This process cycled until all 36 images were presented. The 36 images comprised 12 images from each valence condition, and the sequence of image presentation was randomized. In addition, there were two versions of the slideshow so that the 36 images that were seen by one participant were the unseen foils during the memory test for another participant.

Fig. 2
figure 2

The figure presents a schematic of the method used to present the slide show of emotional images in Experiments 1. Each image was presented briefly (60 ms) and was followed by a mask, but substantial time (1,900 ms) elapsed between the onset of each picture

Following the slideshow, participants were shown all 72 images (36 seen images and 36 unseen foils) one at a time. Again, the order of presentation was randomized. The image remained on the screen until the participant pressed one of two keys. The “y” key indicated that the participant had seen the image during the slideshow and the “n” key indicated that the participant had not seen the image before.

Results

For each participant, the percentage of hits and false alarms was calculated for each stimulus valence. Figure 3 presents the mean corrected hit rate (hits – false alarms) for each valence condition.Footnote 1 A within-subjects ANOVA confirmed that performance varied as a function of valence condition, F(2, 50) = 5.41, p = .007. Contrasts comparing the three valence conditions found that positive images were remembered better than negative images, F(1, 25) = 12.27, p = .002. Neutral images were remembered marginally better than negative images, F(1, 25) = 3.36, p = .08], but no more often than positive images, F(1, 25) = 2.09, p = .16.

Fig. 3
figure 3

The figure presents the mean corrected hits (hits – false alarms) for Experiment 1. Error bars represent the within-subjects standard errors of the means (Loftus & Masson, 1994)

Discussion

When presentation time was extremely limited, participants’ memory was more accurate for positive than negative images. In fact, negative images produced the poorest memory. This pattern of results is consistent with those of Maljkovic and Martini (2005); however, we found this deficit even though we provided ample time between each image (1,900 ms) so that the initial identity processing of images should not have overlapped in time.

One can draw a number of conclusions based on these data. First, the fact that emotional valence of these briefly presented stimuli had an impact on memory performance suggests that the valence was rapidly detected and influenced subsequent processing. This finding is generally consistent with the claim that the emotional valence of a stimulus is rapidly analyzed (Eimer & Holmes, 2002; Esslen, Pascual-Marqui, Hell, Kochi, & Lehmann, 2004). Second, the finding that memory was worst for negative images is inconsistent with the view that this early emotional evaluation supports the rapid identification of the scene (Phelps et al., 2006). Instead, it appears that the rapid evaluation of the stimulus as negative actually hindered one’s ability to successfully identify and store the image.

In addition, because we equated the arousal level of our negative and positive images, this difference was likely due to valence rather than arousal. To investigate this issue more thoroughly, we calculated the percentage correct across participants for each image and then correlated the image’s memory rate with the both the arousal level and valence level of the image. Memory for an image was significantly correlated with image valence, r(72) = .36, p = .002, but not with image arousal, r(72) = .006, p = .96.

We severely limited the presentation time of the images and therefore believe that the effects are most likely due to valence influencing the amount of time required to initially identify and represent the identity of the image. However, we used a memory task. Thus, it is possible that the results might have been due to some other aspect of a memory task such as memory consolidation or retrieval. To further investigate whether an image’s valence influences initial encoding time rather than later memory processes, in Experiment 2, we sought to replicate the effect with a task that did not require memory.

Experiment 2

In Experiment 2, we presented two images simultaneously and followed each with a mask. On some trials, both images were identical, and on other trials, the images were of different scenes. On a trial-by-trial basis, participants were asked to determine whether the two images were the same picture or different pictures. Performing this same/different task on a trial-by-trial basis reduced the need for consolidation and retrieval and allowed us to isolate the impact of valence on initial encoding time.

Method

Participants

Twenty-two participants with normal or corrected-to-normal vision participated for course credit.

Procedure

Participants were shown two simultaneous photographs that were followed by a mask. One image appeared just above the fixation point, whereas the other appeared just below fixation. In half of the trials, the two images were identical. In the remaining half of the trials, the two images were different. As soon as the images were presented, the participant pressed one button to indicate that the two images were identical and pressed a different button if the two images were different. Auditory feedback was given on a trial-by-trial basis.

The design was a 2 × 3 × 5 repeated measures design, with two levels of trial type (identical-image trials, different-image trials), three levels of emotional photographs (positive, neutral, and negative images), and five levels of stimulus presentation duration (30 ms, 40 ms, 50 ms, 60 ms, 70 ms). There were 10 trials in each of the 30 cells of the experiment, for a total of 300 trials. These trial types were randomly interleaved.

Half of the trials were identical-image trials. Of these, one-third of trials presented a positive photograph, one-third presented a neutral photo, and one-third presented a negative photo. For different-image trials, one image was either positive (one-third), negative (one-third), or neutral (one-third), and the other image was always a neutral image. Each of these pairings occurred an equal number of times at each of the five stimulus presentation durations.

The neutral images in different-image trials were selected at random from a set of 36 neutral images. Positive, neutral, and negative images for the identical-image trials were randomly selected from sets of 12 images of each valence. For the different-image trials, one image was randomly selected from these same sets of 12 images and was paired with a neutral image randomly selected from a second set of 36 unique neutral images. Thus, a given image from the sets of 12 images could appear multiple times and across multiple conditions.

Results

Figure 4 presents the corrected hit rate (hits – false alarms) for detecting that the two images were the same as a function of the time that the images were displayed. A 3 × 5 repeated measures ANOVA with three levels of emotion (negative, neutral, and positive) and five levels of presentation duration (30, 40, 50, 60, and 70 ms), confirmed that there was a main effect of both presentation duration, F(4, 84) = 33.324, p < .001, and emotion, F(2, 42) = 4.82, p = .01. The two factors did not interact, F(8, 168) = 1.14, p = .34. As expected, detection rates increased as presentation duration increased. In addition, planned contrasts reveal that people made fewer correct detections when the two images were negative than when they were positive, F(1, 21) = 10.75, p = .004. Performance with the neutral images was intermediate and did not differ significantly from either the positive, F(1, 21) = 3.24, p = .086, or negative F(1, 21) = 1.02, p = .323, images conditions.

Fig. 4
figure 4

The figure presents the data from Experiment 2 in which participants had to indicate whether two simultaneously presented images were the same or different. The figure presents the mean percentage of corrected same responses (hits – false alarms [FAs]) as function of stimulus presentation duration. Separate lines represent stimuli of different valence. Error bars represent the within-subjects standard errors of the means (Loftus & Masson, 1994)

Discussion

In Experiment 2, the task was changed from a memory task to a task in which participants detected whether or not two simultaneously presented images were identical to one another. This task reduced the need for memory processes such as consolidation and retrieval, thereby isolating the influence of initial encoding time on performance. Consistent with Experiment 1, when encoding time was limited, participants more accurately identified identical positive images than they did identical negative images, providing additional evidence that it takes longer to initially identify negative scenes relative to positive scenes. In addition, the failure to find a significant presentation time by emotion interaction suggests that this effect was relatively constant across the times inspected.

Experiment 3

Although the methods in Experiments 1 and 2 were designed to minimize competition for attentional resources, it is still possible that attentional effects played a role in those results. The third experiment was designed to test whether one of two potential attentional accounts could explain why performance was worse with negative than with positive images in Experiments 1 and 2. One possible explanation posits that people rapidly and reflexively shift attention away from the negative images. For instance, in the previous experiments, participants may have covertly shifted attention away from the negative image (e.g., to the side of the computer monitor), thereby interrupting the initial encoding of the negative images. Although this explanation may be at odds with claims that attention rapidly and reflexively shifts toward potentially negative images (Bradley et al., 1997; Eastwood, Smilek, & Merikle, 2001; Koster, Crombez, Van Damme, Verschuere, & De Houwer, 2004; Öhman et al., 2001), there is some evidence that an attentional bias toward negative images is robust only when one tests populations that have been selected for high anxiety; among nonselect and populations selected for low-anxiety, there is some evidence for a bias away from negative information (Bar-Haim, Lamy, Lee, Bakermans-Kranenburg, & van Ijzendoorn, 2007; Becker & Detweiler-Bedell, 2009; Frewen et al., 2008). Thus, it is possible that a reflexive attentional shift away from the negative images impeded their identification and storage. Note, however, that this explanation suggests that the deficit in encoding should be selective to the encoding of the negative image and may in fact increase the likelihood of encoding other items that occur simultaneously with the negative image.

By contrast, a second attention-based hypothesis suggests that the negative image may produce a temporary global interruption in processing. This hypothesis is based on Corbetta and colleagues’ model distinguishing between a ventral and dorsal attentional network (Corbetta, Patel, & Shulman, 2008; Corbetta & Shulman, 2002). According to this model, sustained attention is maintained by a dorsal attentional network that allows one to voluntarily attend to task-relevant objects and locations. If, however, something that is potentially relevant unexpectedly appears, the ventral network is activated and interrupts the ongoing dorsally mediated processing. During this brief interrupt, one might expect a global interruption of ongoing processing. If the mere presence of a negative image activates this ventral system, it might cause such a global interruption and thus also lower the encoding for other objects that appear simultaneous with the negative item.

To test these hypotheses, we ran a third experiment. The task in this experiment was identical to the memory task in Experiment 1, but each slide during the slide show of the to-be-remembered images consisted of two simultaneously presented scenes. One of these images was always neutral, whereas the other image could be neutral, positive, or negative. Of key interest was whether memory for the neutral item would vary as a function of the item with which it appeared. Note that these two attention-based hypotheses make the opposite predictions. If there is a rapid shift of attention away from the negative image, we would expect more attention to be paid to a neutral image that occurred with a negative item and thus an increase in memory for these neutral images. By contrast, if the mere presence of a negative image creates a global interrupt in processing, we would expect poor memory performance for the neutral images that appeared with a negative image.

Method

Participants

Fifty-two Michigan State University undergraduates (18–27 years old) with normal or corrected-to-normal vision participated in the experiment for course credit.

Procedure

The procedure was similar to that in Experiment 1, with a few notable exceptions. First, two images were presented (one above fixation and one below fixation) during each slide of the slideshow. One of these images was always neutral, whereas the other image was neutral, negative, or positive. There were again 36 slides during the slideshow, now consisting of a total of 72 images to be remembered. The order of image presentation and whether each image was presented above or below fixation was randomized.

The negative and positive stimuli were the same as those used in Experiment 1. However, this experiment required many more neutral stimuli than were used in Experiment 1. There were a total of 84 neutral images used for this experiment. Twelve of these images were used as fillers and appeared only in the slides containing two neutral images, and memory for these images was never tested. The remaining 72 neutral images were split into six lists of 12 images each. For a given participant, one of these lists would be paired with a negative image during the slideshow, a second list would be paired with positive images, and a third list would be paired with the neutral filler images. Across participants, the pairing of these lists with a particular condition was counterbalanced. After viewing the slideshow, participants viewed a total of 120 single images that were presented at fixation. For each image, participants had to say whether the image was seen during the slideshow or not. The 120 images comprised 24 negative (12 seen and 12 foils), 24 positive (12 seen and 12 foils), and 72 neutral images (12 seen with negative images, 12 seen with positive images, 12 seen with neutral images, and 36 foils).

Results

To investigate whether the overall valence effect from prior experiments replicated, we first compared memory performance for the negative, neutral, and positive images. The mean percentage of corrected hits (hits-false alarms) for the emotional images in each valence condition are presented in Fig. 5a. Consistent with prior experiments, a within-subjects ANOVA on the hit data with three levels of stimulus valence was significant, F(2, 102) = 7.90, p = .001, recognition memory for the positive images was better than memory for the negative images, F(1, 51) = 12.06, p = .001, and the neutral and negative conditions did not differ, F < 1. However, unlike in previous experiments, the performance was better for the positive than for neutral images, F(1, 51) = 9.03, p = .003.

Fig. 5
figure 5

The figure presents data from Experiment 3. a Corrected hits (hits – false alarms [FAs]) for the three emotional image conditions. b Corrected hits (hits – false alarms) for the neutral images as a function of the valence of the image they were originally presented with. Error bars represent the within-subjects (Loftus & Masson, 1994) standard errors of the means

The main question of interest in Experiment 3 was whether the ability to recognize a neutral image varied as a function of the emotion of the image with which it first appeared. Figure 5b presents the corrected hit data for the neutral images broken down by the type of image that the neutral image appeared with during the slide show. As is clear from the figure, the image with which the neutral images was paired had no effect on one’s ability to recognize the neutral image, F(2, 102) < 1.

Discussion

These results replicate the findings from Experiments 1 and 2 that positive images are better remembered than negative images, when the initial encoding time is severely restricted. This replication supports the finding that emotional valence of the item is detected very rapidly and can influence subsequent processing. Again, however, we found that negative images led to less recognition memory than positive, suggesting that the emotional valence hinders rather helps the rapid identification of the scene contents.

More importantly, we found that the valence of the emotional image had no influence on the processing of a simultaneously presented neutral image. The fact that memory for neutral items did not differ as a function of the valence of the items with which they appeared provides evidence against an attentional explanation based on a rapid shift of attention away from the negative items. If such a rapid shift occurred, one would have expected more attention to be focused on the neutral items that appeared with the negative items, and thus a recognition memory benefit for those neutral items. No such benefit was observed. If, instead, one thought that the mere presence of a negative image might stimulate that ventral attention network, thereby causing a momentary interruption in processing, one would have expected worse performance for the neutral images that appeared with negative images. No such deficit occurred. Instead, the results suggest that the rate of information encoding of the neutral image was unaffected by the valence of the item that appeared simultaneous with it. This pattern of findings provides evidence against both attention-based accounts of the recognition memory deficits for negative images.

In short, we continue to find a deficit in the recognition memory for negative scenes, but fail to find evidence that this deficit is related to attention.

General Discussion

Across three experiments, we found evidence that the time required to initially identify an image depends on its emotional valence. When encoding time was limited, people were more successful at identifying positive than negative images. These differences appeared whether one tested identity encoding via a recognition memory test (Experiments 1 and 3) or by an immediate judgment about whether two simultaneously presented images were identical or not (Experiment 2). Finally, in all experiments, the positive and negative stimuli were matched on arousal, suggesting that the effect was truly due to emotional valence rather than to differences in arousal.

In general, this finding is consistent with Maljkovic and Martini’s (2005) finding of less encoding for negative than for positive pictures when exposure durations were extremely short. The conclusion that the specific content of a complex image is not identified as quickly for negative images is also consistent with a number of findings from investigations that present face stimuli and have people make speeded decisions about specific aspects of the faces that are unrelated to the valence of the face. For instance, participants take longer to count the number of facial features in displays of negative than of positive schematic faces (Eastwood, Smilek, & Merikle, 2003), to respond to the gender of negative than of positive faces (Purcell, Stewart, & Skov, 1998), or to respond to the color of a negative than of a positive face (White, 1996). The findings are also broadly consistent with a number of studies that have found people to be faster and more accurate at identifying happy faces than negative (fearful, angry, sad, or disgusted) faces (Juth et al., 2005; Kirita & Endo, 1995; Leppänen & Hietanen, 2004; Palermo & Coltheart, 2004). For instance, Calvo and Nummenmaa (2008) concluded that happy faces were identified more rapidly and required less effort to encode than negative faces. The experiments presented here suggest that this positivity advantage may extend to nonfacial picture stimuli as well.

Thus, our finding adds to a growing body of literature suggesting that negative stimuli are identified more slowly than positive stimuli. These findings are at odds with the claim that the increased vigilance and perceptual processing associated with negative stimuli should lead to more rapid identification of those stimuli (Phelps & LeDoux, 2005; Phelps et al., 2006). Instead, it suggests that the detection of the image as negative produces changes in processing that impede the identification of the specific content of the image.

Why might negative images take longer to identify than positive images? At this point, we have not isolated the mechanism responsible for this difference in processing efficiency. In Experiment 3, we investigated possible attentional explanations for this effect. However, we found no evidence that this effect could be explained by either a rapid shift of attention away from negative images or by a global interruption in processing due to negative images. As such, we can rule out these attentional explanations for the effects we observed, but cannot definitively identify the cause of the effect. A number of candidate explanations are worthy of consideration.

A mundane explanation is that systematic differences in the low-level image characteristics or image complexity between our images was responsible for the effect. We have a number of pieces of evidence to suggest that this interpretation is not correct. First, we ran a follow-up to Experiment 1 in which participants (n = 30) viewed inverted images during both the initial slideshow and the memory test.Footnote 2 Inverting the images should have interfered with the ability to rapidly extract the emotional valence of the image (Calvo, Nummenmaa, & Hyönä, 2008; Maurer, Le Grand, & Mondloch, 2002), but should not have altered low-level visual characteristics such as image brightness, image color, or the distribution of spatial frequencies. Inverting the images abolished the memory differences as a function of emotional valence, F(2, 58) <1, suggesting that low-level differences in the images were not responsible for the differences we observed for the images. Second, we had a new set of participants (n = 14) rate the complexity of each of the 72 emotional images on a seven-point Likert-type scale. Image complexity ratings did not differ across the emotional sets of images, F(2, 26) = 2.27, p = .12. We also computed the average percentage correct for each image in Experiment 1 (across participants), and then hierarchically regressed this variable onto these image complexity ratings (Step 1) and the IAPS standardized image valence ratings (Step 2). Image complexity accounted for a significant proportion of the variance in percent correct, β = −.392, R 2 = .154, F(1, 70) = 12.74, p = .001, with better performance for the less complex images. More importantly, image valence added significantly to the prediction of percentage correct, above and beyond image complexity, β = .295, R 2 Change = .084, F(1, 69) = 7.60, p = .007. Taken together, the finding that the valence-based effect disappears for inverted images and that valence is related to performance even after accounting for complexity ratings gives us some confidence that these results are not simply the result of image confounds.

Instead, we believe that the detection of the valence of the stimuli is having an impact on processing, leading to less-efficient processing of negative than of positive stimuli.Footnote 3 Recent evidence suggesting that negative images precipitate a shift in the balance between the magnocellular and parvocellular channels, favoring the magnocellular (Bocanegra & Zeelenberg, 2009) provides one potential mechanism for the slower identification of negative images. The magnocellular pathway is the primary input to the dorsal processing stream (DeYoe & Van Essen, 1988; Maunsell, Nealey, & DePriest, 1990) that is involved in spatial perception and perception for action (Goodale & Milner, 1992), but does not support the conscious recognition of objects (Mishkin, Ungerleider, & Macko, 1983; Ungerleider & Haxby, 1994). By contrast, the parvocellular pathway is the primary input to the ventral processing stream (DeYoe & Van Essen, 1988; Van Essen, Anderson, & Felleman, 1992, but see Nealey & Maunsell, 1994), which is involved in object recognition (Mishkin et al., 1983; Ungerleider & Haxby, 1994). The shift in balance between these two systems may represent a trade-off emphasizing perception for action over perception for recognition, thereby delaying identification. Applied to the real world, this may allow one to avoid stepping on the threatening object before identifying whether it is a spider or snake.

Of course, the explanation above is speculative, and other explanations of the effect are possible. More research will be needed to identify the exact mechanisms responsible for the slower identification of negative relative to positive images that we report here. At this point, however, we can conclude that the detection of an image as negative impedes rather than assists the rapid identification of that image’s identity.

Finally, additional research elucidating the mechanism for this effect may have important implications for understanding how emotional stimuli influence processing. For instance, the need for additional time to initially identify negative images might explain why negative images tend to hold attention longer than positive images (Amir et al., 2003; Fox, Russo, & Dutton, 2002; Koster et al., 2004). In addition, the finding that negative images require more time to identify may suggest that the increased cortical activity for negative stimuli found with fMRI (Lang et al., 1998; LeDoux, 2000; Phelps & LeDoux, 2005; Sergerie, Chochol, & Armony, 2008; Vuilleumier, 2005) and ERP (Olofsson, Nordin, Sequeira, & Polich, 2008) reflects the increased time and effort required to complete identification rather than a more extensive processing of those stimuli.