During rapid serial visual presentation (RSVP) of natural scenes, perceptual processing is faced with a veritable onslaught of sensory information that demands rapid resolution. Nonetheless, a number of behavioral studies have determined that rapidly presented images are indeed perceived but quickly forgotten (Potter, 1975; Potter, Staub, Rado, & O’Connor, 2002). When perceptual processing is measured using electrophysiology during RSVP, previous studies have found that emotional, as compared with neutral, scenes prompt a larger occipito-temporal negativity starting around 150 ms after picture onset (e.g., Junghöfer, Bradley, Elbert, & Lang, 2001; Peyk, Schupp, Elbert, & Junghöfer, 2008; Schupp, Junghöfer, Weike, & Hamm, 2003), suggesting that this may be a neural signature of enhanced perceptual processing. Consistent with this, emotional pictures presented during RSVP are also recognized better on a later recognition test than are neutral pictures (Versace, Bradley, & Lang, 2010). Visual images of natural scenes, however, also vary widely in low-level perceptual properties that might also affect the ease of perceptual processing, especially when presented at rapid speeds. Among these are perceptual complexity and the presence of people. Therefore, in the present study, we assessed the extent to which complexity, content, and emotional arousal of natural scenes affected perceptual processing as measured using dense array EEG.

According to Gestalt psychologists, one of the major principles in perceptual organization is segmentation of an image into figure and ground (Palmer, 1999). When the high perceptual loads associated with the rapidly changing visual scenes in RSVP are processed, it is reasonable that images of lower perceptual complexity can by segregated more easily into coherent patterns, with these relatively simpler figure–ground pictures seeming to pop-out of the fleeting array. Thus, we presented simple figure–ground compositions, as well as more complex scenes, using RSVP and measured brain potentials, with the expectation that enhanced occipital negativity might be found for simple figure–ground scenes if this neural event indexes enhanced perceptual processing. Moreover, we also presented emotional and neutral pictures that were either figure–ground compositions or more complex scenes, allowing us to determine the contribution of both factors to ease of perceptual processing, as indexed by occipital negativity.

A third variable that might influence perceptual processing at rapid rates is the extent to which a picture includes people. Due to their social significance (Adolphs, 1999), as well as human expertise in processing faces and people (Tarr & Gauthier, 2000), facilitated processing of faces has been reliably demonstrated (Carmel & Bentin, 2002). For instance, when faces are presented within an array of other objects, faces are quickly detected (Hershler & Hochstein, 2006), and when nonfacial stimuli need to be detected within a stimulus array, response times are prolonged when a face is present as a distracting item (Langton, Law, Burton, & Schweinberger, 2008). Moreover, when brain potentials have been measured during face processing, an early negative component (N170) that is maximal over posterior sensors is found that is similar in timing to the component modulated during RSVP, suggesting that differences between emotional and neutral stimuli might be due to the higher proportion of people in emotional, as compared with neutral, images (Colden, Bruder, & Manstead, 2008).

Thus, in the present study, using RSVP (330-ms picture duration, no intertrial interval), we presented pictures that varied in terms of perceptual composition (figure–ground, scene), presence of people (people, no people), and emotional arousal (emotional, neutral). We measured brain potentials with a focus on assessing the amplitude of occipital negativity previously found during RSVP. To the extent that specific perceptual features facilitate identification and recognition at rapid speeds, we expect that simple figure–ground compositions will prompt heightened occipital negativity, as compared with more complex scenes, and that pictures depicting people might also show more occipital negativity than pictures without people. To the extent that emotional pictures elicit enhanced processing that does not reflect perceptual or face-related features, we expect to replicate earlier findings of enhanced negativity when emotional, as compared with neutral, pictures were viewed.

Method

Participants

Twenty-seven females (mean age, 18.8 years; range, 18–21 years; 3 left-handed) with normal vision participated as part of a general psychology course requirement at the University of Florida. All participants provided written informed consent prior to participation for the study approved by the University of Florida Institutional Review Board. Data from 2 participants were excluded from analysis due to excessive artifacts.

Materials and task

Stimuli were 128 color picturesFootnote 1 selected from the IAPS (Lang, Bradley, & Cuthbert, 2008) that covaried in terms of three variables: (1) picture composition (figure–ground, scene), (2) content (people, no people), and (3) affective content (emotionally arousing, neutral). Using these 128 images, eight sets (see Table 1) of 16 pictures were constructed such that the pictures in each set depicted one level of each of the three variables (e.g., figure–ground, emotional, people). Normative ratings of emotional arousal from the IAPS (Lang et al., 2008) and normative ratings of perceptual composition (0 = figure–ground, 9 = scene; Bradley, Houbova, Miccoli, Costa, & Lang, 2011) were matched across sets (see Table 1). For sets that contained emotionally arousing pictures, half of the pictures depicted pleasant contents (e.g., erotica, sports, adventure, etc.), and the other half were unpleasant pictures (e.g., mutilations, threat, violence, etc.).

Table 1 Ratings of emotional arousal and perceptual composition for each of the eight sets of pictures in which perceptual composition, emotion, and content were covaried

Pictures were presented in blocks in which two of the eight possible sets were alternated, such that, in each block, the two sets of pictures differed for only one of the three variables. Table 2 lists the 12 possible block types resulting from all combinations of sets. For instance, one block presented pictures in a figure–ground/emotional/people set intermixed with scenes/emotional/people set, in which the pictures differed only in perceptual composition (i.e., figure–ground compositions vs. more complex scenes). In a second block type, the same pictures (i.e., figure–ground/emotional/people) were presented intermixed with pictures that differed only in content (i.e., figure–ground/emotional/no people). Finally, in yet another block, the same figure–ground/emotional/people were alternated with pictures that differed only in emotional arousal (i.e., figure–ground/neutral/people).

Table 2 Description of the 12 different types of blocks that result when two sets of pictures are presented in each block that differ on only one of the varied factors (i.e., composition, content, emotion)

Across all of the blocks, each of the eight sets of pictures was presented 3 times. Within each block, 32 different stimuli (two sets of 16 pictures) were presented, with each of the 32 pictures presented 4 times with the restriction that a picture was repeated only once all other pictures from the set had been presented. Each picture was presented for 330 ms, immediately followed by the next picture; each block (128 trials) lasted for about 42 s, and participants passively viewed the pictures. Each of the 12 blocks (1,536 trials) was separated by 8-s interblock intervals (blank screen), and the order of presentation of the blocks was counterbalanced such that each of the 12 types (see Table 2) was presented in each serial position equally often across participants.

Data collection and analysis

The EEG was recorded continuously using a 128-channel geodesic sensor net (EGI, Eugene, OR) referenced against the vertex sensor (sampling rate, 250 Hz; bandpass, 0.1- to 100-Hz half-power cutoff frequencies; 6 dB/octave rolloff). Impedances were kept below 30 kΩ, as recommended by the manufacturer. Data were digitally filtered offline with a low-pass of 30 Hz (zero-phase shift Butterworth filter, 12 dB/octave). Bad channels (with poor signals throughout the whole recording—e.g., due to broken sensors) were interpolated using spherical splines. On average, 4.48 channels (range: 0–8) were interpolated per recording. Ocular and cardiac artifacts were removed using BESA software (MEGIS software, Munich, Germany) as described by Ille, Berg, and Scherg (2002), and the data were then converted to the average reference. For each subject and picture set, separate averages were created. As each picture set appeared in three different blocks, data were averaged across the blocks for a specific picture set such that each of the eight averages contained the ERPs of a specific stimulus set (e.g., figure–ground/emotional/people). On average, 82.6 % of epochs were retained for analysis after exclusion of bad trials, defined as when the difference between the maximum and the minimum amplitudes on a channel exceeded a threshold of 100 μV.

For statistical purpose, ERP data were averaged over a cluster of 19 posterior sensors (see Fig. 1). Since there is no baseline period in RSVP, occipital negativity was quantified as the difference in the mean amplitude from 116 to 140 ms post-stimulus-onset (P1) and the mean amplitude from 180 to 280 ms post-stimulus-onset. Repeated measures ANOVA of the amplitude of occipital negativity included the variables perceptual composition (figure–ground, scene), content (people, no people), and emotion (emotionally arousing, neutral). For illustration purposes, waveform plots of the data use a 0- to 48-ms post-stimulus-onset baseline.

Fig. 1
figure 1

ERP waveforms (averaged across a cluster of 19 sensors highlighted in the sensor layout) during RSVP for each of the eight different picture sets formed by covarying composition (figure–ground [FG], solid lines), scene, dashed lines), emotion (emotional [Emot], filled; neutral [Neu], open), and content (people [Peo], circles; no people [NoPeo], triangle). The most pronounced negativity occurs for figure–ground compositions that depict people in emotional contexts

Onset latency and topographical distributions were also compared by computing difference waves between the two levels of each factor (e.g., figure–ground minus scene). Onset latency was analyzed using a jackknife procedure in which 25 grand averages were computed, each time omitting the data from 1 participant. Onset latency was determined for each difference waveform as the point at which 50 % of the peak amplitude was reached (for details, see Kiesel, Miller, Jolicoeur, & Brisson, 2008; Luck et al., 2009), and an ANOVA was used to assess effects of each of the three factors, with adjustment of the degrees of freedom (Kiesel et al., 2008).

Topographical maps of the difference potentials were compared pairwise by computing global map dissimilarity (Lehmann & Skrandies, 1980), defined as the root mean square of the difference between two normalized (divided by global field power) maps. This index varies between 0 (maps have identical voltage distribution) and 2 (maps have inverse voltage distributions). Statistical significance of the global map dissimilarity values was determined by nonparametric randomization tests, in which an empirical distribution of global map dissimilarity values was created by randomly assigning the single-subject maps to experimental conditions and calculating the global map dissimilarity value for the resulting group-average ERPs (10,000 permutations). The value from the actual group-average ERPs was then compared with this distribution (see Murray, Brunet, & Michel, 2008, for details). Topographies were tested at each of the time points in which the global field power of the difference wave was maximal.

Pictures analysis

An analysis focusing on ERPs as a function of picture, rather than participant, was undertaken by creating a mean waveform across the 12 presentations of each of the 128 pictures for each participant and then averaging across participants, resulting in an evoked potential for each picture. This picture analysis assessed the influence of the manipulated dimensions on occipital negativity using regression analysis. In these analyses, ratings of emotional arousal (from the IAPS; Lang et al., 2008) were used to index the emotionality of each picture; previous studies have determined that a number of measures of emotional engagement, including skin conductance, pupil dilation, the amplitude of a centro-parietal late positive potential, and BOLD changes in visual sensory and limbic cortex covary with ratings of emotional arousal (see Lang & Bradley, 2010, for an overview), regardless of hedonic valence (e.g., pleasantness). Moreover, pictures rated lowest in emotional arousal are those that are rated as neutral in hedonic valence, allowing these ratings to continuously code the emotional arousal of a specific picture (Lang, 2010). Relatedly, normative ratings of perceptual composition (Bradley et al., 2011) were used to continuously index the perceptual composition of each picture; content (people, no people) was coded as a nominal variable in these item analyses.

Results

Figures 1 and 2 illustrate the RSVP waveforms averaged across occipital sensors. Occipital negativity was significantly modulated by each of the three manipulated variables: Pictures with figure–ground composition evoked more negative-going potentials than did complex scenes [composition: F(1, 24) = 85.91, p < .001, η 2 p = .78; see Fig. 2, left], and pictures depicting people elicited enhanced negativity, as compared with pictures that did not include people [content: F(1, 24) = 46.78, p < .001, η 2 p = .66; see Fig. 2, middle]. Replicating previous research, emotional, as compared with neutral, pictures also elicited enhanced negativity [emotion: F(1, 24) = 8.38, p < .01, η 2 p = .26; see Fig. 2, right)].

Fig. 2
figure 2

Averaged ERP waveforms illustrating the main effects of varying perceptual composition (figure–ground and scene), content (people and no people), and emotion (emotional and neutral pictures) during RSVP

A number of significant two-way interactions qualified these main effects. A significant interaction of composition and content, F(1, 24) = 21.38, p < .001, η 2 p = .47, was followed up by simple main effects tests that indicated that occipital negativity was larger for figure–ground pictures, as compared with scenes, regardless of whether pictures depicted people [composition: F(1, 24) = 106, p < .001] or not [composition: F(1, 24) = 14.4, p < .005]. On the other hand, the presence of people in a picture evoked more negativity than did those without people only for simple figure–ground compositions [content, F(1, 24) = 42.20, p < .001]. Relatedly, although the interaction of composition and emotion was only marginal, F(1, 24) = 3.72, p < .07 (see Table 3), simple main effects again indicated that occipital negativity was larger for figure–ground pictures, as compared with scenes, regardless of whether pictures were emotionally arousing [composition: F(1, 24) = 46, p < .001] or neutral [composition: F(1, 24) = 87, p < .001]. On the other hand, for simple figure–ground compositions, in which occipital negativity was already quite enhanced, emotional content did not further modulate negativity, F < 1, while for scenes, negativity was enhanced for emotional, as compared with neutral, pictures [emotion: F(1, 24) = 18.9, p < .001].

Table 3 Mean amplitude (μV) of early occipital negativity for the interaction effects (with standard errors in parentheses)

An interaction of content and emotion, F(1, 24) = 27.97, p < .001, η 2 p = .54, indicated that emotional pictures elicited enhanced negativity, as compared with neutral pictures for pictures that included people [emotion: F(1, 24) = 25.14, p < .001], whereas the difference was in the opposite direction for pictures that did not include people [emotion: F(1, 24) = 9.38, p < .005]. Moreover, whereas there was no difference in negativity whether neutral pictures depicted people or not, F < 1, pictures of emotional people elicited enhanced negativity, as compared with emotional pictures that did not include people [content: F(1, 24) = 46.18, p < .001].

Onset latency

Average onset latency varied across conditions, F corr(2, 48) = 7.23, p < .003, with an estimated onset of 184 ms for differences due to perceptual composition, 175 ms for differences due to content (i.e., presence of people), and 212 ms for emotional differences. Pairwise comparisons indicated that onset latency as a function of perceptual composition and the presence of people did not differ from each other (composition vs. content: p > .28) but that both of these differences began earlier than effects due to emotional arousal (composition vs. emotion, p < .03; content vs. emotion, p < .002).

Topographies

Figure 3 illustrates the similar topography of occipital negativity in the difference scalp maps for composition, content, and emotional arousal, as well as in bilateral occipital current sources in the current source density maps. Pairwise comparisons of the topographical distributions using global map dissimilarity (DISS) indicated that there were no significant differences in spatial voltage distribution as a function of picture composition, content, or emotion (composition vs. content, DISS = .50, p = .12; composition vs. emotion, DISS = .33, p = .35; content vs. emotion, DISS = .45, p = .19).

Fig. 3
figure 3

a Scalp potentials maps (back view) for the difference between the two levels of each factor (composition, left; content, middle; emotion, right) at the time point where global field power was largest. b Current source density maps (back view) of the difference scalp potentials shown in Panel A

Fig. 4
figure 4

Picture analysis. Top panel: Scatterplots of the relationship between amplitude of early occipital negativity and rated perceptual composition of emotional (left) and neutral (right) pictures for people (black circles) and objects (gray squares). The lines represent the linear regression for people (black lines) and objects (gray lines). Bottom panel: Scatterplots of the relationship between amplitude of the early occipital negativity and rated arousal of figure–ground pictures (black circles) and scenes (gray squares) for people (left) and objects (right). The lines represent the linear regression for figure–ground pictures (black lines) and scenes (gray lines)

Item analysis

In the picture analysis, stepwise regression (in which effects due to composition, content, or emotion were entered on the basis of F values) agreed with the subject analyses in indicating that each of the three manipulated variables accounted for significant variance in occipital negativity. Perceptual composition accounted for the most variance (21 %), F(1, 126) = 33.57, p < .001), followed by content (6.7 %), F(1, 125) = 11.66, p < .001, and then emotional arousal (4.2 %),  F(1, 124) = 7.84, p < .01. Figure 4 illustrates these relationships: When pictures are ordered by rated composition (top panel), occipital negativity is enhanced for simple figure–ground pictures, as compared with complex scenes, regardless of whether pictures are emotional (top left) or neutral (top right), with slightly stronger relationships for pictures that include people. When pictures are ordered by rated emotional arousal (bottom panel), on the other hand, the most arousing pictures are associated with the most enhanced negativity for pictures that include people (bottom left), whether these are simple figure–ground compositions or more complex scenes. For pictures that do not include people, there are no systematic differences in occipital negativity as a function of rated emotional arousal (bottom right).

Discussion

During rapid serial visual presentation, specific properties of natural scenes facilitated perceptual processing as indexed by enhanced negativity (180–280 ms) over occipital sensors. Perceptually simple figure–ground compositions were associated with enhanced occipital negativity, as compared with more complex scenes, as were pictures depicting people, as compared with those that did not, and pictures of emotionally arousing, as compared with neutral, scenes. When occipital negativity was computed individually for each picture presented using RSVP, regression analysis provided further information regarding the relative impact of specific features in facilitating recognition at rapid rates, with perceptual composition accounting for the most variance and emotional arousal accounting for the least. Rather than suggesting that processing any picture with emotional content is facilitated, the data instead suggest that specific perceptual features in images that depict people in pleasant and aversive situations are associated with enhanced RSVP processing. Because these pictures typically include multiple exemplars of erotica and violence, it is likely that features related to naked and/or injured body parts may mediate perceptual facilitation, as indexed by occipital negativity.

Enhanced negativity was found for figure–ground compositions, as compared with more complex scenes, regardless of whether these depicted emotional or neutral content and whether they included people or not, confirming that figure–ground composition is a key variable facilitating the ease of perceptual processing during RSVP of natural scenes. And, consistent with hypotheses and data showing that humans are experts at face/person detection (Bentin, Allison, Puce, Perez, & McCarthy, 1996; Kanwisher & Yovel, 2006), probably mediated by both experience and social relevance, simple figure–ground pictures that depicted people were associated with enhanced perceptual processing, as compared with pictures that did not include people.

A number of studies have reported an occipital-temporal negativity that is maximal around 170 ms following the onset of simple faces, as compared with objects (e.g., Bentin et al., 1996; Itier & Taylor, 2004), raising the issue of whether perceptual facilitation during RSVP of natural scenes primarily reflects enhanced face processing. As compared with the potentials measured during RSVP, however, the typical N170 has a more temporal topography and a much clearer peak (perhaps resulting from the typically slower presentation rate; Rossion & Jacques, 2008). Moreover, in the present study, pictures without people also prompted enhanced occipital negativity if these were simple figure–ground perceptual compositions, and the presence of people did not facilitate processing of complex scenes, making it difficult to attribute differences in perceptual processing during RSVP solely to the mere presence of people in a picture.

Whether a picture depicted emotional or neutral scenes controlled the least amount of variance in occipital negativity and was constrained to specific contexts, with the largest negativity found for simple figure–ground pictures that included people in emotional contexts. Previous studies have clearly determined that emotional engagement when pictures of natural scenes are viewed is strongest for pictures that depict people in sexual, violent, and other highly arousing emotional contexts (Bradley, Codispoti, Cuthbert, & Lang, 2001; Lang & Bradley, 2010; Schupp et al., 2004; Weinberg & Hajcak, 2010) and, furthermore, that an electrophysiological index of enhanced processing, the late positive potential, is of greatest amplitude when emotionally arousing pictures that are simple figure–ground compositions are viewed (Bradley, Hamby, Löw, & Lang, 2007; Nordström & Wiens, 2012). Perceptual enhancement for highly arousing emotional scenes during RSVP suggests that, in addition to perceptual composition and the presence of faces, some emotional features show perceptual facilitation.

It is of course possible that this subset of pictures also differs along some unspecified physical dimension(s), which remains an avenue for future investigation. Indeed, quantifying and equating information in emotional and neutral images is challenging not only because the variables potentially affecting perceptual processing are many, but also because not all of the critical factors are yet understood. Previous studies assessing effects of specific picture content on early brain potentials, however, have reported that pictures of erotica prompt the largest (Schupp et al., 2007; Weinberg & Hajcak, 2010) early enhancement, when compared with other affective contents. Assuming that effects of emotional content on early perceptual identification are mediated by the presence of specific perceptual features that activate representations linked to the subcortical appetitive and defensive systems that direct emotional reactivity (Lang & Bradley, 2010), one task for future research is to identify these features and determine their contribution to perceptual processing.

Moreover, whereas perceptual composition reliably affects perceptual processing at both fast and slow rates of presentation (Bradley et al., 2007; Wiens, Sand, & Olofsson, 2011), effects of emotion are more variable. Thus, whereas emotional content affected occipital negativity in the present RSVP study, emotion had no effect on these brain potentials in a study using a slower (6 s) presentation rate in which perceptual composition was again experimentally controlled (Bradley et al., 2007). Finding that variables differentially modulate brain potentials as a function of presentation rate is not surprising, given the sheer amount and variety of information presented per unit of time during RSVP, which poses challenges for perceptual processing and, presumably, prompts overlapping (as well as offset) potentials. And, although specific effects of emotional content on the amplitude of occipital negativity were found during RSVP in the present study, latency analyses suggested that this facilitation may have a later onset (but same topographical distribution), as compared with effects due to perceptual composition or the presence of people. One interpretation of this finding is that the features facilitating perceptual processing for emotional images may not be as fundamental as figure–ground segmentation or associated with very high expertise and familiarity in processing human faces but, instead, require more time for resolution.

Taken together, the present study illustrates a multifactorial approach to determining the contribution of specific factors to visual perception of natural scenes. When covarying perceptual composition, content, and emotion of natural scenes presented using RSVP, we found that all three variables contribute to the amplitude of occipital negativity (180–280 ms), supporting the inference that ease of perceptual processing at rapid rates is facilitated by figure–ground composition, the presence of people, and, to a lesser extent, emotional content. The data are consistent with a visual-processing stream for natural scenes in which figure–ground segmentation and the presence of people show the earliest facilitation in perceptual processing, with specific features in pictures of people in emotional contexts facilitating perception somewhat later. From a conceptual viewpoint, the data suggest that early perceptual processing of emotional images probably does not reflect enhanced selective attention to emotional cues in general but, rather, that some perceptual features associated with specific emotional contexts are perceived more easily, reflecting familiarity, priming, or, possibly, preparedness. Identifying these features and the mechanism of their action remains for future research.