Introduction

How do we perceive the coherent visual world? Regarding object perception, both parallel and hierarchical structures in the visual cortex are involved in considerable preattentive processing, such as figure-ground segregation and visual grouping (Grossberg, Mingolla, & Ross, 1994). On the other hand, attention also plays an important role in resolving competitions and selecting limited numbers of objects for further processing with limited resources (Desimone & Duncan, 1995). The present study focused on the temporal aspects of the interrelationships between preattentive and attentive processing. Objects/groups can be treated as units through attention-spreading mechanisms, in which attention to a feature or location of an object obligatorily spreads over all features or locations that belong to that object (for reviews, see Driver & Baylis, 1998; Scholl, 2001), and these mechanisms are time-consuming processes (Roelfsema 2006; Schoenfeld, Hopf, Merkel, Heinz, & Hillyard, 2014). The individuation and identification involved in object-file formation may occur at specific times or have temporal limits (Wutz & Melcher, 2013).

Previous studies of event-related potentials (ERPs) have provided insight into the time course and neural bases of object-based attention. A seminal study with overlapping surfaces formed by moving dots showed that the P1 (90–120 ms) and N1 (160–240 ms) components elicited by a change in the direction of motion were suppressed for unattended surfaces (Valdes-Sosa, Bobes, Rodriguez, & Pinilla, 1998), while such object-based modulations occurred in both earlier C1 (75–110 ms) and subsequent N1 (170–220 ms) components when surfaces were cued exogenously (Khoe, Mitchell, Reynolds, & Hillyard, 2005). In the latter study, C1 clearly involves early visual processing, while N1 most likely involves intermediate processing, given its position in the temporal sequence. Regarding spatial attention-spreading, previous studies have generally used displays that included two rectangles or separate objects, and a posterior N1 at around 140–180 ms in response to unilateral probes was enlarged at unattended locations when they were presented on cued or attended objects, as compared to unattended objects (He, Fan, Zhou, & Chen, 2004; Martínez et al., 2006; Martínez, Ramanathan, Foxe, Javitt, & Hillyard, 2007; Martínez, Teder-Salejarui, & Hillyard, 2007). These results, combined with those obtained by neuroimaging methods, suggest that the object-based N1 attention effect may reflect the enhanced neural activation of object or figure representation in the lateral occipital cortex (LOC; Martínez et al., 2006; Martínez et al., 2007; Martínez et al., 2007). Thus, early and intermediate stages of visual-cortical processing appear to be involved in object-based attention. However, since these previous studies examined ERPs in response to probes that were superimposed upon previewed or existing stable objects, the ERP modulations may reflect re-activation of object representations after the initial encoding or identification driven by object appearance.

The present study focuses on attentional processes engaged by emergent objects or object onsets. In natural visual scenes, objects may appear at any moment, even when they are stable, due to movements of our eyes and body. Such objects may require rapid identification and further stages of processing. Previously, we examined object-based spatial attention processes by using a sustained-focal attention task with bilateral stimuli (Kasai, 2010; Kasai, Moriya, & Hirano, 2011; Kasai & Takeya, 2012; Takeya & Kasai, 2014). While the participants attended to the left and right visual field, grouped or ungrouped bilateral objects were presented in random order, and the direction of the subject’s attention was indexed by posterior ERP lateralization effects comparing recording sites that were contralateral versus ipsilateral with respect to the attended visual fields. ERPs in response to frequent standard stimuli, rather than infrequent targets, were mainly analyzed. A robust P1 attention effect was observed irrespective of grouping, while the N1 attention effect was decreased for grouped stimuli as compared to ungrouped stimuli. These N1 modulations are similar to those observed in previous studies for unilateral probes, suggesting a common object/group-based spatial selection mechanism across emergent and stable objects. However, later ERP lateralization effects were also sensitive to grouping manipulations. The attention effects in the N2 latency range at around 300–400 ms for connected or similar stimuli were more positive than those for separated or dissimilar stimuli, respectively (Kasai, 2010; Kasai et al., 2011), and those in the P2 latency range (200–250 ms) were more positive for separated stimuli, although this result was observed only when another object was located between them (Kasai & Takeya, 2012).

Previous ERP studies of focal spatial attention, rather than object-based spatial attention, have identified lateralized attention effects (enhanced contralateral P1 positivity) when subjects attended to one side of sequentially presented bilateral stimuli, which were letter or letter-like stimuli, rather than typical objects or geometric shapes (Heinze, Luck, Mangun, & Hillyard, 1990; Heinze et al., 1994; Luck, Heinze, Mangun, & Hillyard, 1990; but see Woldorff, Liotti, Seabolt, Busse, Lancaster, & Fox, 2002). The P1 attention effect has been proposed to reflect a gain control or filtering mechanism for incoming sensory signals at the level of the extrastriate visual cortices (Heinze et al., 1994). P1 amplitude modulations may reflect the breadth of the attentional focus toward to-be-attended locations and may be modulated by the perceptual load of the task (Mangun, Hopfinger, Kussmaul, Fletcher, & Heinze, 1997). Importantly, an N1 attention effect has not been found in studies with lateralized attention to bilateral stimuli, which contrasts with the general findings for unilateral probes or those for bilateral objects in our previous studies. The results regarding later attention effects have been mixed. Luck et al. (1990) found that the posterior P2 (175–225 ms) in response to letter arrays was enlarged at contralateral versus ipsilateral sites, an effect that was distinguished from the P1 effect by a principal component analysis. Woldorff et al. (2002) found an N2 lateralization effect (230–280 ms) for checkerboards, which may reflect re-entrance of attention-enhanced activation in the extrastriate visual cortex. Thus, ERPs reveal the existence of multiple stages of processing in spatial selection, although the nature of these processing stages remains unclear.

The present study was principally motivated by a desire to explore object-selective attention using different types of stimuli. We have previously used simple, filled geometric shapes (Kasai, 2010; Kasai et al., 2011; Kasai & Takeya, 2012; Takeya & Kasai, 2014), which may involve neural activations within object-selective regions or figural enhancement in relatively late or intermediate visual cortical processing (Poort, Raudies, Wannig, Lamme, Neumann, & Roelfsema, 2012). In contrast, the letters or checkerboards used in previous studies may not involve the same type of enhancement. The present study used unfilled line objects that had no perceptual surfaces, and their connectedness and target discriminability were manipulated (Fig. 1). When bilateral stimuli are connected by a line having the same color, they may unambiguously become a single perceptual object, since unified connectedness or lack of feature discontinuity is the most primitive grouping factor (Palmer, 2003; but see Takeya & Kasai, 2014). If lateralized ERP attention effects are associated with the selection of object representations with figural enhancement, they should be minimized for line drawings regardless of connectedness or task difficulty. This paradigm also enables us to explore the recently addressed question of whether perceptual load or focusing attention can override object-based attention-spreading, indications for which have been mixed (Cosman & Vecera, 2012; Ho & Atchley, 2009).

Fig. 1
figure 1

Stimuli (above) and stimulus sequence (below) used in the present study

Methods

Participants

Fourteen volunteers (eight females), aged 19–30 years (mean =22.4 years), participated in this experiment. All had normal or corrected-to-normal vision and were right-handed. Written informed consent was obtained from each participant after the nature of the study had been explained fully.

Stimuli and procedure

Stimuli were black line drawings against a gray background (Fig. 1), displayed on a Hitachi CRT monitor, controlled by PsyScope on a personal computer (Macintosh G3) with a PsyScope button box (Cohen, MacWhinney, Flatt, & Provost, 1993). The viewing distance was 70 cm, and a central fixation cross that extended across a visual angle of 0.4° × 0.4° was presented throughout the experiment. Bilateral stimuli consisted of two rectangles, which were displayed horizontally 3.7° to the left and right (to the inner edge of the rectangles) and 1.7° above the fixation. Each rectangle extended across 1.7° × 1.7°, and each had a gap in its upper or lower edge, which faced in opposite directions. The width of the gap varied according to the type of stimulus; 0.7° for standards and 0.4° for targets in the hard condition. The targets did not have gaps in the easy condition. The bilateral rectangles were connected by a line in the connected condition, while there was no connecting line in the separated condition. The line width both for the rectangles and the connecting line was 0.8°.

Bilateral stimuli consisted of either two standard stimuli (75 % of the trials) or one standard and one target in the separate hemi-fields (25 % of trials). The bilateral stimuli were presented for 100 ms, and the inter-stimulus interval (offset to onset) was varied randomly between 300 and 650 ms (eight steps, rectangular distribution). While the ERPs in response to successive stimuli overlapped, due to the short ISI, this overlap was consistent across conditions because of the random order of stimulus presentation.

The participant was seated in a reclining chair in a sound- and electric-shielded room and instructed to attend to either the left or right hemi-field during the blocks and to press a button with the right thumb in response to the current target (i.e., closed square in the easy condition; square with a small gap in the hard condition) presented in the attended field as accurately and quickly as possible. It was emphasized that they had to maintain fixation and to try not to move their eyes during the block. The whole experiment consisted of two sessions according to task difficulty, and each session included attend-left and attend-right conditions, each of which consisted of 12 blocks (100 trials for each). The attend-left and attend-right blocks were alternated, and the initial task condition and visual field to be attended were counterbalanced across the participants. The experiment started with one to two practice blocks for each task and attention condition to stabilize task performance and eye movement.

Recordings and analyses

Electroencephalograms (EEG) were recorded using an electrocap (Neuroscan) with 25 Ag-AgCl electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, Oz, O2, PO7, PO3, POz, PO4, and PO8 according to the International 10–20 System), which were referenced to the nose. Blinks and horizontal eye movements were monitored with electrodes at the outer canthi of the eyes [horizontal electrooculogram (EOG)] and Fp2 and below the right eye (vertical EOG). The impedance of the electrodes was kept below 10 kOhm. EEGs were filtered with a bandpass of 0.1–30 Hz and sampled at 200 Hz.

Behavioral performance was measured, including the percentage of correct target detections (hits) and RTs for hits. Responses were scored as correct if they occurred within 200–1,000 ms after a target was presented in the attended location. Responses to other stimuli were classified as false alarms (FAs). The behavioral measures were subjected to repeated-measures analysis of variance (ANOVA): the factors considered were task (easy, hard), stimulus (separated, connected) and attention condition (attend left, attend right), and stimulus type (standard, unattended target) for FAs.

ERPs were averaged separately for each stimulus type, connectedness, and attention condition. Averaging epochs were 1,000 ms, starting 200 ms before the onset of the stimulus and ending 800 ms post-stimulus, while correcting for differences in the 200-ms pre-stimulus baseline. Automatic artifact rejection was applied to eliminate epochs contaminated above 75 μV, and epochs with incorrect responses were also excluded. It should be noted that behavioral responses and their associated ERPs may overlap and contaminate the ERPs in response to subsequent stimuli in the current design. While it would have been ideal to exclude from all epochs that contained responses, this overlap should be equivalent for the different conditions being compared and should not affect the main ERP comparisons. The overall average number of remaining trials following artifact rejection was 83.5 % and there were no large differences across conditions.

Analyses were focused on ERPs in response to standards. ERPs at occipital-temporal sites (PO7, PO8) were quantified as mean amplitudes over latency windows of 100–140 ms (post-stimulus) for P1, 140–180 ms for N1, 200–250 ms for P2, and 300–350 ms for N2. These component measurements were subjected to repeated-measures ANOVA with factors of task (easy, hard), stimulus (separated, connected), laterality of the electrode sites relative to the attended visual field (ipsilateral, contralateral), and the attended visual field (left, right). If the interaction between stimulus and laterality was statistically significant, difference ERPs (contralateral vs. ipsilateral) were calculated to clarify the attention effects. Partial eta squared (ηp 2) is reported as a measure of the effect size.

Results

Behavioral data

Table 1 summarizes the behavioral data. Hit rates for the easy condition were higher than those for the hard condition, which was reflected by a main effect of task [F(1, 13) =33.0, P <0.0001]. For FA rates, there was a significant interaction between task and type [F(1, 13) =6.9, P =0.021], which reflects the finding that FAs for unattended targets were greater than those for the standards only in the hard condition [F(1, 13) =5.3, P <0.038]. There were no other statistically significant effects for hits and FAs.

Table 1 Summary of behavioral data

RTs for the easy condition were faster than those for the hard condition, which was reflected by a main effect of task [F(1, 13) =69.6, P <0.00001]. There was also a main effect of stimulus [F(1, 13) =22.2, P =0.0004], which indicated that RTs for the separated condition were faster than those for the connected condition. Furthermore, there was a significant interaction between task and stimulus [F(1, 13) =8.6, P =0.012], which indicates that the stimulus effect for the hard condition was greater than that for the easy condition, although the stimulus effects for both the hard and easy conditions were significant [F(1, 13) =4.8, P =0.047; F(1, 13) =22.8, P =0.0004].

Electrophysiological data

In the grand-averaged ERPs in response to standard stimuli, several spatial attention effects were revealed as significant differences between ERPs recorded at electrode sites ipsilateral and contralateral to the task-relevant visual field, which were most prominent over occipito-temporal brain areas (Fig. 2). The mean amplitudes of attention effects are shown in Fig. 3, and Table 2 summarizes the statistical results of omnibus ANOVAs for the standard ERPs.

Fig. 2
figure 2

a Grand-average event-related potentials (ERPs) in response to standards at occipito-temporal electrodes (PO7, PO8). Left two columns ERPs at hemisphere sites contralateral and ipsilateral to an attended visual field overlap, so that the left and right sites are collapsed. Right column Difference waves for the spatial attention effect, i.e., ERPs at ipsilateral sites were subtracted from those at contralateral sites. b Scalp distributions of spatial attention effects: ERPs in the attend-right condition were subtracted from those in the attend-left condition. White-line circles Occipital temporal electrode sites (PO7, PO8)

Fig. 3
figure 3

Mean amplitudes of the contralateral minus ipsilateral attention effects. Error bars Standard errors of the mean. 95 % confidence intervals were calculated according to Loftus and Masson (1994)

Table 2 Results of omnibus ANOVAs. All significant values are shown

P1 (100–140 ms) had a greater amplitude for the hard condition than for the easy condition (Fig. 2), which was reflected by the main effect of task [F(1,13) =7.1, p =0.019, ηp 2 = 0.35]. The P1 amplitude was also greater at sites contralateral to the attended visual field as compared to ipsilateral sites, as reflected by the main effect of laterality [F(1,13) =28.3, P =0.0001, ηp 2 = 0.68]. The interaction of laterality and attended visual field indicates that the laterality effects were greater for the attend-left condition than for the attend-right condition [F(1,13) =5.7, P =0.033, ηp 2 = 0.30]. Furthermore, the laterality effects were enlarged for the hard condition relative to the easy condition (see also Fig. 3), as reflected in a significant interaction between task and laterality [F(1,13) =5.3, P =0.039, ηp 2 = 0.29]. In contrast to P1, N1 (140–180 ms) showed no apparent attention or task effects: according to the trend and scalp topographies (Fig. 2b); the significant main effects of task and laterality in this latency range may be due to an overlap with the P1 effects [F(1,13) =9.8, P =0.008, ηp 2 = 0.43; F(1,13) =4.7, P =0.049, ηp 2 = 0.27]. There was no effect of stimulus (connected vs unconnected) on either the P1 or the N1.

The P2 component (200 250 ms) had a greater amplitude for connected stimuli than for separated stimuli, as reflected by the stimulus effect [F(1,13) =27.5, P =0.0002, ηp 2 = 0.69], and for contralateral sites than for ipsilateral sites, as reflected by the laterality effect [F(1,13) =4.9, P =0.046, ηp 2 = 0.27]. Importantly, the attention effect (contralateral vs ipsilateral) was greater for the separated condition than for the connected condition, which was reflected by a significant interaction of stimulus and laterality [F(1,13) =21.7, P =0.0005, ηp 2 = 0.63]. This object-based attention effect was reliable for both the easy and hard conditions, which was reflected by significant stimulus effects for the difference (contralateral minus ipsilateral) ERPs [F(1,13) =19.3, P =0.0007, ηp 2 = 0.60; F(1,13) =14.5, P =0.002, ηp 2 = 0.53].

In the later N2 range (300–350 ms), the laterality or attention effect shifted toward a more positive direction for the easy condition than for the hard condition, as reflected by a significant interaction of task and laterality [F(1,13) =5.3, P =0.039, ηp 2 = 0.29]. The task effect on laterality varied with the stimulus, as indicated by the three-way interaction among task, stimulus, and laterality [F(1,13) =5.0, P =0.044, ηp 2 = 0.28]. In post-hoc tests, the attention effect in the connected condition was more positive for the easy condition than for the hard condition [F(1,13) =8.0, P =0.014, ηp 2 = 0.38]. Furthermore, the task effect on laterality was limited to the attend-right condition, which was indicated by a three-way interaction between task, laterality, and attended visual field [F(1,13) =8.2, P =0.013, ηp 2 = 0.39]. Post-hoc tests showed that the task x laterality effect was significant only for the attend- right condition [F(1,13) =5.6, P =0.035, ηp 2 = 0.30].

Discussion

This study explored the links between object perception and multiple stages of processing in spatial selection, by recording lateralized attention effects of posterior ERPs in response to bilateral unfilled line objects. The results showed that the lateralized P1 attention effect was enlarged in the hard condition, which suggests the successful manipulation of attentional focus, whereas no lateralized N1 attention effect was evident. In contrast, P2 (200–250 ms) was clearly enlarged at scalp sites contralateral, rather than ipsilateral, to the attended side of separated objects, as compared to connected objects. The N2 attention effect (300–350 ms) was shifted positively for connected objects in the easy condition.

The present lack of N1 effect stands in contrast with our previous studies, which found attention-spreading effects in the N1 latency range (Kasai, 2010; Kasai et al., 2011; Kasai & Takeya, 2012; Takeya & Kasai, 2014). The general stimulus configurations and tasks in the current study were similar to those in the previous studies; the critical difference was that unfilled line stimuli, rather than geometric shapes with filled surfaces, were used. On the other hand, the pattern of results, i.e., enlarged P1 contralateral to the attended side of the bilateral stimuli without any N1 modulation, was similar to the study of Heinze et al. (1994), which used letters as stimuli. These suggest that the lateralized N1 attention effect observed in previous studies is associated with the selection of a particular type of object that contains surfaces or figural enhancement against the background. This notion is consistent with the N1 object-based effects in response to unilateral probes for stable objects; these effects were localized to the LOC (Martinez et al. 2006; Martínez et al., 2007; Martínez et al., 2007), which may be a critical brain region for figural perception (Flevaris, Martinez, & Hillyard, 2013; Pitts, Martínez, Brewer, & Hillyard, 2011). Although an object-based N1 attention effect was also found for line drawings (He et al., 2004), spatial regions surrounded by closed lines could be filled-in to form perceptual surfaces.

The P2 attention effect may be associated with the selection of higher object representations, as compared with the N1 effect. However, it may be observed only when objects are clearly separated from each other. In our previous studies, the P2 attention effect may not have been found because filled objects can be grouped according to similarities in surface properties (Kasai, 2010; Kasai et al., 2011; Takeya & Kasai, 2014), whereas a P2 effect was observed when another object interrupted the grouping between bilateral stimuli (Kasai & Takeya, 2012). Luck et al. (1990) also found a similar P2 attention effect for letters that were aligned in hemi-circles across the left and right hemi-fields, which raises the question of whether the existence of multiple elements within each field reduces grouping of bilateral stimuli. Interestingly, in the present study, we also found that the P2 amplitude overall was increased for connected objects. This is not likely to be due to a simple change in luminance or contrast caused by the addition of a connecting bar, since we previously observed similar amplitude enhancement under a physically controlled condition (Kasai, 2010, Exp. 2). One possible explanation is that P2 or the P2 attention effect is associated with the amount of attentional resources deployed to selected or individuated objects at a stage where objects/groups are encoded as units. According to the biased competition model (Desimone & Duncan, 1995), more than two objects in the visual scene may compete for limited resources according to receptive field properties in the visual cortex, while a single object may be a winner that takes all available resources.

Although the P2 attention effect involves a top-down modulation of visual processing and varies with connectivity, it was independent of the manipulation of perceptual task difficulty or the breadth of attentional focus. In contrast, lateralized potentials in the N2 latency range appear to reflect a “tug-of-war” between bottom-up attention-spreading and top-down focal attention. If the N2 attention effect is a real-time measure of the lateralized allocation of attention direction similar to the N2pc (N2-posterior-contralateral) observed in visual search tasks (Woodman & Luck, 1999), the present N2 results would indicate that attention was guided to the opposite side of the connected objects when the task requirement to attend to the task-relevant side was low and attentional resources were available. The object-based N2 attention effect may be associated with grouping on the basis of task-relevant feature dimensions, since it was observed for shape similarity, but not for achromatic-color similarity, in a shape discrimination task (Kasai et al., 2011). It is likely that object-based attention-spreading can be modulated at relatively late stages of processing in feature-based attention, although it remains an open question whether perceptual load modulates the intermediate stages of processing in object-based attention, as reflected by the N1 component.

With respect to behavioral performance, RTs were delayed for connected or grouped objects, indicating an object-based attention effect; i.e., grouping may have broadened the attentional focus to discriminate target features at to-be-attended sides, as in our previous studies (Kasai, 2010; Kasai et al., 2011; Kasai & Takeya, 2012). However, the current RT difference was greater in the hard condition, although the N2 attention effects indicate greater attentional guidance in the easy condition. The pattern of RT results may be due to unexpected stimulus–response compatibility: since the task was to detect closed squares in the easy condition, connectedness or the absence of gaps may have facilitated a response in this condition. However, target features by themselves may not be able to explain the connectedness effects on behavioral performance and ERPs (see also Kasai & Takeya, 2012). The present results show that multiple stages of processing contribute to behavioral outputs.

In conclusion, the present study demonstrates that emergent objects engage multiple stages of processing during spatial selection, the latencies of which may be associated with a temporal sequence of perceptual-object formation. Specifically, an intermediate stage of processing indicated by N1 may engage in the sensory activation of contiguous spatial regions that represent perceptual objects or groups. The subsequent stage of processing indicated by P2 may engage in the individuation of more abstract perceptual object representations. The latter effect has not been observed, however, in ERP studies that used stable, continuously presented objects (He et al., 2004; Martínez et al., 2006; Martínez et al., 2007; Martínez et al., 2007). These contrasting findings thus show a need to distinguish emergent objects from stable objects in studies of object-based attention. In addition, different stages of selection were engaged for filled versus unfilled objects. Although behavioral object-based attention effects may be objective measures of perceptual grouping or “objects” (Watson & Kramer, 1999), and object-based effects were also found for mere lines (Avrahami, 1999), the underlying mechanisms may not necessarily be the same across different types of objects. The present study suggests that different types of attentional operations can be initiated according to the strength of grouping, which may vary with object context or the existence of surfaces. Furthermore, since almost all objects in a natural setting have surfaces, it would be important to understand how our visual system adapts to the artificial paper-based visual environment that is so prevalent in modern society.