Introduction

Inhibition of return (IOR) refers to delayed responses to stimuli that have been previously attended. Frequently studied with Posner’s cue-target paradigm (Posner & Cohen, 1984), increased response times (RTs) to cued as compared to uncued targets (i.e., inhibitory cueing effects; ICEs) are typically observed when the cue-target onset asynchrony (CTOA) is long enough (for a review, see Klein, 2000). IOR is observed behaviorally when the delay between the onset of two stimuli presented at the same location exceeds approximately 200 ms, and the effect can last up to 3 seconds (Samuel & Kat, 2003).

Although early IOR studies primarily used manual response tasks, it has commonly been proposed that IOR is generated in the sensory to motor processing stream within the oculomotor system (Rafal, Calabresi, Brennan, & Sciolto, 1989; Taylor & Klein, 1998, 2000). An early neuropsychological study by Posner, Rafal, Choate, and Vaughan (1985) provided important evidence on the cortical basis of IOR by investigating behavioral IOR effects in patients with progressive supranuclear palsy (PSP), a neurodegenerative disease that affects the superior colliculus (SC) in which patients lose the ability to make voluntary saccadic eye-movements. Using a detection task that required a manual response to targets, the study found that behavioral IOR effects were absent in patients with PSP but were present in the normal control group when cues and targets were presented on the vertical meridian (i.e., above or below fixation) with a CTOA of 1,000 ms. A similar study, but with saccadic responses to targets, was later conducted on a patient with bilateral lesions of the SC that similarly did not observe IOR in the patient (Sereno, Briand, Amador, & Szapiel, 2006), suggesting that the SC plays a role in the manifestation of behavioral IOR.

Contrary to earlier studies that speculated whether IOR was either an output- or input-based effect, there is increasing evidence that there are multiple ICEs that can contribute to the overall behavioral inhibition observed in spatial-cueing tasks (Chica, Taylor, Lupiáñez, & Klein, 2010b; Hilchey, Hashish, et al., 2014a; Lim, Eng, Janssen, & Satel, 2018; Pratt & Neggers, 2008; Taylor & Klein, 2000; Zhang & Zhang, 2011), each being generated in different neural areas and affecting different pathways (e.g., input or output). Taylor and Klein (2000) proposed that there are two different ICEs, which also were termed input- and output-based IOR that depend on, respectively, the SC and cortex (Lupiáñez, Klein, & Bartolomeo, 2006; for a review, see Klein & Redden, 2018) and that these different mechanisms could each contribute to behaviorally observed inhibition under different conditions.

Multiple mechanisms underlying behavioral inhibition

Early theories of IOR proposed that the primary distinction between input- and output-based IOR is whether the oculomotor system is suppressed or not (Kingstone & Pratt, 1999; Taylor & Klein, 2000; for a review, see Klein & Redden, 2018). In this framework, when eye movements are actively inhibited, input-based IOR is thought to be generated, whereas output-based IOR is generated only when the eyes are free to move, in a mutually exclusive manner. Although some researchers have alternatively suggested that overall behavioral inhibition is entirely input-based (Dukewich & Boehnke, 2008), there is considerable evidence for a separate output-based mechanism (Kingstone & Pratt, 1999; Taylor & Klein, 2000; for a review, see Klein & Hilchey, 2012), as well as evidence that these input- and output-based ICEs are additive rather than mutually exclusive, depending on experimental conditions (Lim et al., 2018; Satel, Hilchey, Wang, Story, & Klein, 2013a; Satel & Wang, 2012).

In addition to the input- and output-based ICEs discussed above, a third mechanism that also can contribute to overall behavioral inhibition in spatial cueing tasks is a low-level neural mechanism, often called sensory adaptation (also referred to as visual adaptation or short-term depression; Hilchey, Klein, & Satel, 2014b; Kohn, 2007; Satel, Wang, Trappenberg, & Klein, 2011). Sensory adaptation arises whenever there is repeated stimulation of the same synapses along the same pathway (Boehnke et al., 2011; Dorris, Klein, Everling, & Munoz, 2002; Fecteau & Munoz, 2005). For example, in spatial cueing paradigms, when the same spatial location is stimulated by both the cue and the target, visual neurons in the superficial layers of the SC (sSC) respond less vigorously to targets appearing at previously cued compared with previously uncued locations. In other words, sensory adaptation slows RTs to previously stimulated locations by reducing the strength of subsequent sensory input. However, it is important to note that neurophysiological evidence (Fecteau & Munoz, 2005) and computational simulations (Lim et al., 2018; Satel, Fard, Wang, & Trappenberg, 2014a; Satel et al., 2011; Wang, Satel, & Klein, 2012) suggest that sensory adaptation along this pathway disperses within approximately 600 ms after the first stimulus (i.e., the cue).

More recently, using microsaccades as a measure of covert attention, Hafed (2013) provided an interesting account of the possible link between early input-based ICEs and microsaccades (tiny saccades that occur during fixation), suggesting that inhibition observed in saccades (or “large saccades”) are a direct outcome of the facilitation and inhibition of pre-microsaccadic build-up activity. Using a computational model with intrinsic microsaccadic rhythm dynamics, Tian, Yoshida, and Hafed (2016) demonstrated that microsaccades that are consistently biased away from cue onset locations in a time range where sensory adaptation is thought to be active (i.e., at around 250 ms post-cue; Dorris et al., 2002; Fecteau & Munoz, 2005; Engbert & Kliegl, 2003; Galfano, Betta, & Turatto, 2004) are associated with the generation of early inhibitory effects. Importantly, such early ICEs can be observed in both saccades and microsaccades, and there is evidence suggesting that they both share similar mechanisms in the SC (Engbert, 2012; Hafed & Krauzlis, 2012; Hafed, Lovejoy, & Krauzlis, 2013).

To summarize, behaviorally exhibited ICEs can be attributed to multiple neural mechanisms that are each generated under specific conditions. These mechanisms also differ from one another in terms of the stages of processing that are affected (i.e., input or output end of the processing continuum) and the time periods during which they are activated. In terms of their processes, sensory adaptation and other input-based ICEs bias attention by attenuating subsequent exogenous activity at a previously attended (exogenously) location, whereas output-based IOR inhibits oculomotor responses to any previously attended location. Despite these mechanisms biasing orienting in different ways, it is believed that they are each operating in the service of novelty seeking because of their nature of biasing attention against returning to previously attended locations (Klein & Redden, 2018).

Due to the additive nature of cueing effects, multiple inhibitory mechanisms can each contribute to overall behavioral ICEs under different conditions, making behavioral dissociation between the effects difficult (Lim et al., 2018; Lim, Eng, Osborne, Janssen, & Satel, 2019). To obtain neurophysiological evidence of multiple distinct inhibitory effects in humans, the neural correlates of inhibitory cueing mechanisms can be further examined by means of electrophysiological techniques, such as event-related potentials (ERPs) generated either exogenously by sensory stimuli or endogenously by internal cognitive or motor processes (Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015).

Event-related potentials

At short CTOAs of less than 250 ms, faster RTs to cued compared with uncued targets are typically observed (Lupiáñez, Milliken, Solano, Weaver, & Tipper, 2001; Posner & Cohen, 1984). This opposite effect to ICEs is known as a facilitatory cueing effect (FCE) and has been associated with the enhancement of an early sensory event-related potentials (ERP), the P1 component for cued compared with uncued target trials at very short CTOAs (e.g., 100 ms; Doallo et al., 2004; Hopfinger & Mangun, 1998) as a result of transient facilitation of cortical visual processing for stimuli occurring at the location where attention has been recently directed.

To assess quantitatively the relationship between P1 modulation and cueing effects, Satel et al. (2013a) performed a correlation analysis between behavioral ICEs and P1 modulation effects across 19 published experiments that used CTOAs of greater than 500 ms and found that P1 reductions were associated with increased behavioral ICEs (r = –0.60, p < 0.05). More recently, a similar correlation analysis of 23 published experiments also revealed a negative correlation (r = –0.75, p < 0.05) driven by both P1 enhancements and P1 reductions associated with FCEs and ICEs, respectively (Martín-Arévalo, Chica, & Lupiáñez, 2016). These results suggest that the P1 component is an electrophysiological marker for behavioral cueing effects when the oculomotor system is suppressed. Contradictory findings, however, also have been reported where behavioral ICEs have been observed without P1 modulations (Hopfinger & Mangun, 2001; McDonald, Ward, & Kiehl, 1999; Prime & Ward, 2006; Satel, Hilchey, Wang, Reiss, & Klein, 2014b; Satel, Wang, Hilchey, & Klein, 2012). It is important to note that most ERP studies that have found P1 effects did not use CTOAs greater than 1200 ms (for review, see Martín-Arévalo et al., 2016) and that ERP studies that used CTOAs of 1,500 ms (Gutiérrez-Domínguez et al., 2014) and 2,000 ms (Satel et al., 2012) have found behavioral ICEs without P1 effects and thus concluded that the P1 component is not a reliable electrophysiological marker of behavioral ICEs.

The N2pc component has also been associated with the allocation of covert attention in visual space and can be used to track rapid shifts of attention in space (Hickey, McDonald, & Theeuwes, 2006; Woodman & Luck, 1999). This component is measured by comparing contralateral and ipsilateral differences between about 180 and 350 ms after stimulus onset, as a result of an enhanced negativity contralateral to the visual field where the target is presented as compared to the ipsilateral visual field (Hopf et al., 2000; Luck & Ford, 1998; Luck, Girelli, McDermott, & Ford, 1997). It has been suggested that the N2pc may reflect both target processing and distractor suppression (Hickey et al., 2006). More recently, it has been linked to target-enhancement among distractors rather than active suppression of distractors (Jolicœur, Sessa, Dell’Acqua, & Robitaille, 2006; Kiss, Van Velzen, & Eimer, 2008; Mazza, Turatto, & Caramazza, 2009). Studies also have found a relation between N2pc reduction and behavioral ICEs (Martín-Arévalo, Chica, & Lupiáñez, 2014; McDonald, Hickey, Green, & Whitman, 2009; Yang, Yao, Ding, Qi, & Lei, 2012), suggesting that ICEs are a result of cued targets being suppressed/delayed from being selected. More evidence is, however, required before one can come to the conclusion that the N2pc component is a reliable electrophysiological marker of ICEs.

Steady-state visual evoked potentials

Although numerous studies have investigated the ERP consequences of IOR, there is no consensus about which ERP components (e.g., P1, N2pc) are related to the mechanisms underlying ICEs (for a review, see Martín-Arévalo et al., 2016; Satel, Wilson, & Klein, 2019). As an alternative to ERPs, steady-state visual evoked potentials (SSVEPs) are commonly used to investigate visual information processing (Regan, 1968) and allow for attention to be measured continuously (Morgan, Hansen, Hillyard, & Posner, 1996). Compared with ERPs, SSVEPs are easier to quantify (Luck, 2014) and less prone to artifacts (Regan, 1989), making them a valuable tool for measuring attention in paradigms, such as the spatial cueing task (Li et al., 2017).

The SSVEP is a resonance phenomenon in the brain that arises when viewing a flickering visual stimulus. It is an electrophysiological signal evoked by periodic stimulation that reliably shares the same frequency as the stimulus, arising from occipital areas (Di Russo et al., 2007; Müller, Teder, & Hillyard, 1997). Evidence from electrophysiological and neuroimaging studies have also shown that the SSVEP response can be observed over areas other than occipital cortex, including parietal, temporal, frontal, and prefrontal areas (Di Russo et al., 2007; Ding, Sperling, & Srinivasan, 2006; Pastor, Valencia, Artieda, Alegre, & Masdeu, 2007); however, stronger correlations are found over frontal and occipital areas (Xu et al., 2013; Zhang, Xu, Guo, & Yao, 2013).

Narrow frequency bands (i.e., a narrow range of frequencies; e.g., ±0.1 Hz) centered around the stimulus frequency are commonly used to measure SSVEPs (Nunez, Nunez, & Srinivasan, 2017). For instance, if the frequency of a flickering stimulus is 12 Hz, SSVEPs evoked by the stimulus can be measured using frequency bands between 11.9 and 12.1 Hz. The narrow-band and high signal-to-noise ratio characteristics of SSVEPs can be leveraged to minimize the problem of EEG artifacts, such as muscle movements, that are typically broadband (i.e., a wide range of frequencies), allowing the stimulus related neural activity to be segregated from artifacts and spontaneous brain activity (Regan, 1989; Srinivasan, Bibi, & Nunez, 2006).

Adrian and Matthews (1934) first described the SSVEP, with this technique having been widely applied in visual science research since its discovery (for review, see Norcia et al., 2015). Studies have found that even weak stimulation intensities, such as monitor refresh flicker, can evoke SSVEPs, up to at least 75 Hz, at which point the flickering is no longer consciously perceivable (Herrmann, Mecklinger, & Pfeifer, 1999; Lyskov, Ponomarev, Sandström, Mild, & Medvedev, 1998). Leveraging the high signal-to-noise ratios of SSVEPs, multiple stimuli presented simultaneously can each be frequency-tagged at a different flicker frequency, allowing researchers to monitor the focus of spatial attention among multiple stimuli.

Visual attention can be shifted to a peripheral location in space without moving one’s eyes, and this unobservable attentional orienting is called covert orienting. Overt orienting, on the other hand, requires explicit eye movements toward the target location and can be monitored using an eye-tracker (Posner, 1980; Wright & Ward, 2008). Both covert and overt attention have been found to modulate SSVEP amplitude (Müller & Hübner, 2002; Müller, Teder-Sälejärvi, & Hillyard, 1998b), with overt attention modulating SSVEPs more strongly than covert attention (Walter, Quigley, Andersen, & Mueller, 2012). Behavioral experiments, even with an eye-tracker, often face the challenge of monitoring covert spatial attention not associated with eye movements. There often is no way to tell whether participants are directing their attention to the stimulus that they are currently looking at, as fixation is not always correlated with where we “look” with our mind’s eye (for a review, see Hoffman, 1998). Recently, given that both microsaccades and saccades are likely regulated at the level of the SC (Hafed, Goffart, & Krauzlis, 2009; Rolfs, Kliegl, & Engbert, 2008), microsaccades have been correlated with shifts of covert attention (Engbert & Kliegl, 2003; Hafed & Clark, 2002; Laubrock, Engbert, & Kliegl, 2005). However, there is no consensus on the reliability of using microsaccades to measure covert attention (Horowitz, Fencsik, Fine, Yurgenson, & Wolfe, 2007a; Horowitz, Fine, Fencsik, Yurgenson, & Wolfe, 2007b; Laubrock, Engbert, Rolfs, & Kliegl, 2007; Laubrock, Kliegl, Rolfs, & Engbert, 2010).

When investigating the processes underlying ICEs, most studies have been limited to behavioral measures, making it difficult to elucidate the time course of such attentional biases towards uncued locations caused by the inhibition of sensory processes. By tagging multiple stimuli with different frequencies, one can measure the effects of orienting spatial attention both covertly and overtly with frequency-coded SSVEPs by extracting the amplitudes associated with the different stimuli.

Morgan et al. (1996) first demonstrated how covert spatial-attention can be assessed among multiple elements by means of SSVEPs. The task required participants to maintain fixation at a central cross flanked by two peripheral boxes that flickered at 8.6 Hz in one hemifield and 12 Hz in the other. A central arrow cue was then presented, and participants were instructed to attend covertly to the peripheral box directed by the cue. A series of letters (letters A to K) and a single digit (5) were presented in the two peripheral boxes. Participants were required to indicate (with a key press) the presence of the digit at the cued location. It was found that SSVEP amplitude was larger at the attended location. In other words, when the peripheral box flickered at 12 Hz in the attended hemifield, there was an enhancement of the SSVEP response at 12 Hz relative to 8.6 Hz (unattended hemifield) and vice versa. These results suggest that SSVEP amplitude is strongly modulated by selective spatial attention. These findings were further supported by a study that also used functional magnetic resonance imaging (fMRI), demonstrating that the attentional modulation of SSVEP signals can be localized in the fusiform and lateral occipito-temporal cortex (Hillyard et al., 1997). Other studies also have found that SSVEPs can be used to detect sustained covert attention in terms of attended frequency-tagged stimuli eliciting larger SSVEP amplitudes (Belmonte, 1998; Müller & Hübner, 2002; Müller, Teder-Sälejärvi et al., 1998b).

IOR and SSVEPs

Using frequency tagging to investigate the allocation of spatial attention, Robertson, Watamura, and Wilbourn (2012) conducted a study on young infants that observed a behavioral phenomenon similar to IOR (i.e., a temporary bias away from covertly attended locations followed by a redirection of attention to a new location). This study used three horizontally arranged yellow toy ducks as stimulus objects to encourage visual attention in young infants. Each of these three stimulus objects had lights attached to them that collectively flickered at different rates (8, 10, and 12 Hz). Infants were free to look anywhere during the task, and ocular movements were coded manually from the corneal reflections of the stimulus recorded using a monochrome camera, without an eye-tracker. The SSVEP findings showed a transient enhancement of SSVEP amplitude to fixated targets as early as 1,500 ms before gaze shifts to the next target, together with a decline of SSVEP amplitude to subsequent fixation targets. These SSVEP modulations lasted for approximately 1,000 ms (or until 500 ms before gaze shift), and opposite effects were subsequently observed—an increase of SSVEP amplitude driven by the next fixation targets and a decrease driven by fixated targets. However, the IOR findings of Robertson et al. (2012) were not conclusively demonstrated due to the lack of behavioral evidence (i.e., reaction times). Also, unlike the spatial cueing paradigm, in which the covert spatial attention of participants is manipulated using exogenous cues, the task used by Robertson et al. (2012) did not have a cueing condition.

More recently, Li et al. (2017) investigated IOR using the SSVEP technique in a traditional cueing paradigm with manual responses to targets, in which 8 and 20 Hz flickering backgrounds were displayed at the two possible stimulus locations. Using a CTOA of 1,400 ms, the study observed a transient enhancement in mean SSVEP amplitude (cued = 0.15 μV; uncued = –0.02 μV) ipsilateral to cues displayed on the 20 Hz flickering background in the window of 170-180 ms post-cue, followed by a reduction (cued = −0.12 μV; uncued = –0.02 μV) in the window of 200-800 ms post-cue. These results suggest that mechanisms underlying behaviourally observed IOR, distinct from endogenous attention (Morgan et al., 1996), can modulate SSVEP amplitude.

However, the IOR-modulated SSVEP effect observed in Li et al. (2017) should have been present at both 8 and 20 Hz, not only at 20 Hz, because it has been demonstrated that although SSVEPs are present in the 12-25 Hz and 25-50 Hz ranges (Herrmann, 2001), they are strongest in the alpha range (approximately 8-12 Hz; Herrmann, 2001; Pastor, Artieda, Arbizu, Valencia, & Masdeu, 2003), with maximal responses at about 10 Hz for luminance flickers (Fawcett, Barnes, Hillebrand, & Singh, 2004; Regan, 1966; Srinivasan et al., 2006). The absence of SSVEP modulation at their 8 Hz frequency could potentially be due to the eye movements of participants not being monitored by means of an eye-tracker during the task, which has been found to be a serious confounding factor that could lead to inconsistent IOR results (Chica, Klein, Rafal, & Hopfinger, 2010a; Rafal et al., 1989). It is unclear whether this limitation had an impact on the SSVEP responses, but there were inconsistent findings between hemispheres. Specifically, Li et al. (2017) found SSVEPs in one hemisphere, but not the other, and subsequent analyses were performed on only the left posterior occipital electrodes (PO3, PO5, and PO7).

Given that IOR effects have been observed with manual and saccadic responses to peripheral targets (Briand, Larrison, & Sereno, 2000; Taylor & Klein, 1998, 2000), similar post-cue SSVEP modulations should be present when saccadic responses to targets are required instead of manual responses. In the present study, we therefore aimed to extend upon the findings of Li et al. (2017) by using stimulus frequencies within a specific broadband frequency (i.e., alpha range), incorporating eye tracking into the design, and adding a distinct saccadic response task.

Present study

The purpose of the present study was to investigate the temporal characteristics of input- and output-based ICEs using SSVEPs. Cues and targets were exogenous, peripheral stimuli to ensure that the ICEs observed were large (Hilchey, Klein, et al., 2014; Lim et al., 2018; Taylor & Klein, 2000). A relatively long CTOA of 1,800 ms was used to ensure adequate time after the cues for SSVEP analyses, as well as to ensure that sensory adaptation had completely dispersed and was not influencing behavioral results at the time of target onset. Finally, separate blocks of trials were incorporated that required either manual or saccadic responses to targets to dissociate input-based (when the oculomotor system was actively suppressed during manual response blocks) and output-based (saccadic response blocks) ICEs.

We expected that SSVEP modulations early in the cue-target interval would reveal the time course of sensory adaption under both response conditions, because both conditions included repeated stimulation of the same pathway. Based on previous findings (Dorris et al., 2002; Fecteau & Munoz, 2005; Lim et al., 2018; Satel et al., 2014a), the SSVEP modulation associated with sensory adaptation was predicted to begin and end at around 250 ms and 600 ms post-cue delay, respectively, regardless of the activation state of the oculomotor system. Given that the later input-based ICE also is thought to be caused by a reduction in sensory cell responsiveness due to repeated stimulation, we expected to observe SSVEP modulations in the manual response task throughout the cue-target interval. However, because the output-based, oculomotor-generated ICE is not thought to affect the input sensitivity of neurons, we did not expect to find differences between cued and uncued SSVEP responses in the saccadic response task. That is, we expected that input-based IOR, once generated (only in the manual response blocks), would modulate SSVEP amplitudes late in the CTOA period, but no modulation was expected to occur as a result of output-based IOR generation (in the saccadic response blocks).

Method

Participants

Forty students (33 females and 7 males, mean age 22 years, range 18-31 years) from the University of Nottingham Malaysia participated in the experiment. All participants were naive to the hypotheses of the experiment and reported normal or corrected-to-normal vision with no neurological or psychiatric disorders. Participants completed a 1-hour session for course credit or monetary compensation (RM15). The experiment was approved by the Science and Engineering Research Ethics Committee of the University of Nottingham Malaysia.

Stimuli and apparatus

All participants were tested in a dimly lit room. Stimuli were presented on a 24-inch BenQ (Taipei, Taiwan) gaming monitor with a refresh rate of 60 Hz (i.e., 16.67 ms) and screen resolution was set to 1,600 × 900 pixels. Participants’ heads were positioned approximately 57 cm away from the screen. Stimuli were drawn using MATLAB (MathWorks, Natick, MA) with the Psychophysics toolbox extension (Kleiner et al., 2007) running on a Windows 7 (Microsoft, Redmond, WA) computer that was connected to an EyeLink 1000 Plus (SR Research, Ottawa, ON, Canada) eye-tracking system host computer. The desktop-mounted eye-tracker was used to monitor participants’ eye position during trials with a sampling rate of 500 Hz. Drift correction was performed at the beginning of each block of trials.

Electroencephalograms (EEGs) also were recorded during the experiment using elasticized 32-channel HydroCel Geodesic Sensor Nets containing 32 silver chloride-plated carbon pellet electrodes, each embedded in a sponge and set inside a plastic pedestal (Electrical Geodesics, Eugene, OR). Sensor nets of different sizes were used depending on the head circumference of participants. EEG recordings were referenced to the vertex (Cz) and amplified by Net Amps 300 (digitized sampling rate 250 Hz; impedance less than 50 kΩ).

Design and procedure

The experimental protocol used in this experiment was a traditional cue-target paradigm. Stimuli were displayed against a black background on a computer monitor. A white fixation cross was displayed at the center of the screen with two white placeholders separated center to center by 7.7° in visual angle, all displayed along the horizontal meridian (each measuring 4.5° × 4.5° visual angle).

The two placeholders were superimposed on white filled squares that flickered at 8.6 and 12 Hz. In other words, the white squares flickered on and off at a rate of 8.6 Hz in one visual field and 12 Hz in the other (Fig. 1). Combination of flicker frequency locations (i.e., 8 Hz on the left and 12 Hz on the right, or 12 Hz on the left and 8 Hz on the right) were counterbalanced across trials within each condition.

Fig. 1.
figure 1

Example of the experimental paradigm. In the condition shown, square backgrounds flickered at 8.6 Hz in the left field and 12 Hz in the right, with a cue on the right side superimposed over the 12 Hz flickering background

The border width of both placeholders was set to 1 pixel initially and later increased to 20 pixels when a placeholder was selected as a cue. Target stimuli could appear at either of the peripheral locations as a white filled circle (peripheral targets) measuring 2.4° in diameter of visual angle.

The experiment had a 2 (Response Modality) × 2 (Cue Frequency) × 2 (Cueing) design with all variables manipulated within participants. The following factors were used: Response Modality (with 2 levels: saccade and manual), Cue Frequency (with 2 levels: 8.6 Hz and 12 Hz), and Cueing (with 2 levels: cued and uncued). All participants were tested with 4 blocks of 60 trials (i.e., a total of 240 trials per participant) preceded by 24 practice trials, with rest breaks provided between blocks (i.e., a total of three breaks per participant). A fifth of the trials (i.e., a total of 48 trials) were catch trials, in which no target was presented, occurring randomly in the sequence. There were two separate blocks for each Response Modality condition, but Cue Frequency and Cueing were intermixed within each of the blocks. The order of blocks was counterbalanced across participants.

All participants gave their written informed consent, and verbal instructions were given by the experimenter. Before the beginning of the experiment, the participant’s head was fixed to the recording systems by means of a chin-rest, followed by a 5-point calibration and validation procedure to ensure that the precision of the eye tracking was within one degree of visual angle. The experiment lasted on average about an hour. Each session ended with a debriefing by explaining the IOR phenomenon and the objectives of our study.

The intertrial interval (ITI) was randomized between 1,500 ms and 2,000 ms, during which no stimulus (including flickering background) was presented except for a white fixation point at the center of the screen. Participants were allowed to blink their eyes as desired during this period. A CTOA of 1,800 ms (or an interstimulus interval [ISI] of 1,700 ms) was used to ensure the time course of SSVEP cueing effects would be adequately captured over a reasonable post-cue duration. Participants began trials by fixating on a central cross for 1,200 ms, followed by a to-be-ignored uninformative peripheral cue that was presented for 100 ms. Targets appeared 1,800 ms after cue onset as a circle displayed in either the same placeholder as the cue (cued) or the opposite placeholder (uncued) for 3,000 ms or until a response was made (Fig. 2).

Fig. 2.
figure 2

Experimental design used, proceeding temporally from top to bottom. Peripheral cues were presented as a visible amplification of placeholders. (A) For regular trials, after an ISI of 1,700 ms, peripheral targets were presented as visible filled circles. Participants were instructed to ignore cues and make either a saccadic or manual response to targets. (B) For catch trials, participants were required to maintain fixation at the center for 2,900 ms after cue offset

At the beginning of each block of trials, participants were instructed to make either saccadic or manual localization responses to the location indicated by the target object as quickly and accurately as possible. The criterion for a successful saccadic response was that the saccade needed to fall within at least 3° of visual angle of the targeted location. On catch trials, participants were required to maintain fixation at the center for 2,900 ms after cue offset. Whenever the participant’s gaze position deviated by more than 3° of visual angle from the central point during the fixation period, the trial was abruptly terminated and recycled randomly among the remaining trials.

EEG pre-processing

All electrodes were referenced to the vertex (Cz) during the recording and were average re-referenced offline. EEG was filtered offline by using a bandpass filter of 1-100 Hz. Bad channels were removed based on flatline, channel, and line noise criteria using the Clean Rawdata EEGLAB plug-in (Delorme & Makeig, 2004). Channels with flatline signals lasting for more than 5 seconds, with 0.75-minus correlation relative to their reconstruction based on neighboring channels, or with 4-plus standard deviations in noise relative to signal were rejected. Channels removed (0-4 channels, on average 1.8 channels removed per participant) were then interpolated using spherical spline interpolation and re-referenced to the average.

After average re-referencing, EEG data were segmented into periods of 3,300 ms: from 1,300 ms before the onset of the cue to 2,000 ms post-cue. Epochs time-locked to cue onset in which an eye blink occurred, or fixation deviated from center during the fixation period, were discarded. These discarded epochs were also the trials that were terminated programmatically during data collection and recycled randomly and therefore did not affect the total number of trials completed. No additional artifact removal methods (e.g., ICA or artifact rejection) were used, because SSVEPs are not sensitive to low-frequency artifacts, such as eye or body movements (Wu, 2016).

To determine the electrode sites for SSVEP analysis, power spectral density (PSD) analyses were conducted on 17 electrode sites around the frontal (F7, F3, F4, F8, Fz), central (C3, C4), fronto-central (FCz), temporal (T7, T8), parietal (P3, P4, P7, P8, Pz), and occipital (O1, O2, Oz) regions (Fig. 3). PSDs were calculated using Welch’s power spectral density estimate in MATLAB using the EEGLAB software toolbox (Delorme & Makeig, 2004). Based on the outcome, the averaged EEG signals from electrode sites O1 and O2, where the SSVEP amplitudes were overall highest, were chosen for further analyses.

Fig. 3.
figure 3

(A) Topographical distribution of maximum SSVEP amplitudes for each flicker frequency, across all other conditions, from 17 electrode sites – frontal (F7, F3, F4, F8), central (C3, C4), fronto-central (FCz), temporal (T7, T8) parietal (P3, P4, P7, P8, Pz) and occipital (O1, O2, Oz). (B) Averaged power spectral density (PSD) of O1 and O2 electrode sites showing spectral peaks at the corresponding stimulus frequencies

SSVEP amplitudes were extracted from the EEG signal by means of complex demodulation (Müller, Andersen, & Keil, 2008; Regan, 1989). Complex demodulation has been widely used in SSVEP analyses in cognitive neuroscience (Kashiwase, Matsumiya, Kuriki, & Shioiri, 2012; Müller et al., 2008; Müller & Hübner, 2002) since the technique was outlined by Regan (1989). Complex demodulation provides a continuous power function over time at a specific frequency, allowing the easy visualization of changes in the amplitude of the SSVEP signals over time, and has higher time resolution compared to spectral analysis. The time series of the SSVEP component, X(t), is described by:

$$ X(t)=A(t)\ \cos \left(\omega t+P(t)\right), $$

where t is the time, A(t) and P(t) represent temporal changes in amplitude and phase of the signal, respectively, and ω denotes the temporal frequency of the flickering stimulus used to evoke SSVEPs (i.e., 8.6 and 12 Hz in this study). The procedure then extracts the SSVEP amplitude by multiplying the complex exponential function, e–iωt, by the EEG data, E(t), described as follows:

$$ X(t)=A(t)\ \cos \left(\omega t+P(t)\right)+N(t), $$

where N(t) represents the noise component in all frequencies except for the flicker frequency, ω. We used center frequencies of 8.6 and 12 Hz, respectively, as part of our complex exponential function, e–iωt, followed by applying a 2-Hz low-pass filter to remove the noise component, N(t), and to smoothen the SSVEP amplitude extracted. Next, SSVEP amplitude was corrected according to its baseline, which was defined as mean amplitude from 50 ms before cue onset to 50 ms after cue onset. Separate grand averages were then computed for each combination of response modality, cue frequency, and cueing, for a total of eight grand averages.

Results

Behavioral

All participants scored at least 98.96% accuracy (mean accuracy was 99.85%). Trials with RTs that deviated 2.5 times less than the median absolute deviation—calculated per condition—were quantified as anticipatory responses. Likewise, slow outliers were defined as those with RTs 2.5 times greater than the median absolute deviation. After excluding incorrect trials (0.15% of all trials), anticipatory responses (0.61%), and slow outliers (6.00%), statistical analyses were performed on the remaining 93.24% of trials. Difference scores (cued minus uncued) of RTs for each condition are presented in Table 1. All reported t-tests are one-tailed, paired-samples, planned comparisons.

Table 1. Mean RTs (in ms) with SDs in parentheses, cueing effects (cued RT - uncued RT), and error rates for each Response Modality × Cue Frequency × Cueing condition

The effect of Cue Frequency on RTs was examined first. All correct RTs were submitted to a 2 (Response Modality) × 2 (Cue Frequency) × 2 (Cueing) repeated-measures ANOVA, and yielded no main effect of Cue Frequency, F(1, 39) = 1.03, p = 0.316, MSE = 223.33, η2 = 0.03. There were significant main effects of Response Modality [F(1, 39) = 111.85, p < 0.001, MSE = 3912.42, η2 = 0.74], as a result of faster RTs in the saccadic response condition (M = 359.78) than in the manual response condition (M = 432.22); and Cueing [F(1, 39) = 24.54, p < 0.001, MSE = 380.62, η2 = 0.39], with RTs being slower in the cued condition (M = 401.49) than in the uncued condition (M = 391.65). Furthermore, no two-way interactions were found between Cueing and either Cue Frequency [F(1, 39) = 0.55, p = 0.461, MSE = 177.02, η2 = 0.01] or Response Modality [F(1, 39) = 0.71, p = 0.405, MSE = 186.68, η2 = 0.02], suggesting that different response modalities or cue frequencies do not affect the behavioral ICEs observed. There also was no three-way interaction between Response Modality, Cue Frequency, and Cueing, F(1, 39) = 3.13, p = 0.085, MSE = 153.60, η2 = 0.07, indicating that the difference between cued and uncued RTs did not depend on the combination of response modality and cue frequency.

Planned one-tailed paired-samples t-tests (Bonferroni corrected, in which the p values were multiplied by the number of comparisons) revealed significant ICEs for saccadic responses with cue frequencies of 8.6 Hz [ICE = 15.65 ms; t(39) = 4.04, p < 0.001] and 12 Hz [ICE = 8.53 ms; t(39) = 3.63, p = 0.002]. Marginally significant and significant ICEs were found for manual responses with cue frequencies of 8.6 Hz [ICE = 8.17 ms; t(39) = 2.06, p = 0.092] and 12 Hz [ICE = 10.87 ms; t(39) = 3.69, p = 0.001], respectively.

SSVEPs

Based on the pre-analysis depicted in Fig. 3, electrode sites O1 and O2 were selected for further SSVEP analyses. Transient responses (also known as “ringing”) caused by phase shifts are typically observed at the beginning and end of the computed SSVEP amplitudes when using complex demodulation (Hayano et al., 2017; Wilhelm, Grossman, & Roth, 2005). This ringing occurs because the procedure involves applying a low-pass filter to remove the noise component, N(t), which also removes the estimations obtained for the period equal to half of the low-pass filter term at the first and last portions in amplitude and frequency.

To determine the appropriate time window to measure the difference between cued and uncued SSVEPs, grand average SSVEPs for each combination of response modality and cueing were submitted to two-tailed t-tests per time point in succession (i.e., stepped in 4 ms for 250-Hz sampling rate) with a false discovery rate (FDR; Benjamini & Hochberg, 1995) corrected significance level of p < 0.05 for multiple comparisons.

Based on these per time point t-test results, a post-cue window of 100-500 ms was used to extract the SSVEP amplitudes per condition, which were submitted to a 2 (Response Modality) × 2 (Cue Frequency) × 2 (Cueing) repeated-measures ANOVA. No main effect of Response Modality [F(1, 39) = 3.12, p = 0.085, MSE = 0.06, η2 < 0.07] nor Cue Frequency [F(1, 39) = 4.01, p = 0.052, MSE = 0.24, η2 = 0.09] was found; however, there was a main effect of Cueing [F(1, 39) = 26.15, p < 0.001, MSE = 0.08, η2 = 0.40], as a result of lower SSVEP amplitudes in the cued condition (M = –0.27) than in the uncued condition (M = –0.10). Furthermore, Cue Frequency had no two-way interactions with Response Modality [F(1, 39) = 0.00, p = 0.980, MSE = 0.04, η2 < 0.01] or Cueing [F(1, 39) = 0.17, p = 0.681, MSE = 0.05, η2 < 0.01], and similar to behavioral results, no two-way interaction was found between Response Modality and Cueing [F(1, 39) = 0.07, p = 0.799, MSE = 0.02, η2 < 0.01]. There also was no three-way interaction between Response Modality, Cue Frequency, and Cueing [F(1, 39) = 0.02, p = 0.879, MSE = 0.06, η2 < 0.01]. These results indicate that SSVEP responses and cueing effects were not influenced by response modality or cue frequency. Hence, for ease of comparison, SSVEPs were collapsed across Cue Frequency in Fig. 4.

Fig. 4.
figure 4

Time-domain SSVEP amplitudes over all subjects in response to flickering stimuli at cued (blue solid lines) and uncued (red dashed lines) locations. Grey regions denote point-by-point significant differences in SSVEP amplitudes between cued and uncued conditions (paired-samples t-tests, FDR-corrected p < 0.05). (A) Saccadic response task. (B) Manual localization response task. (C) Topographical distribution of SSVEP amplitudes for each flicker frequency and cue location across the post-cue window of 100-500 ms

The SSVEP amplitudes were then averaged across all time points within the 100-500 ms post-cue interval, and one-tailed paired-samples t-tests (Bonferroni corrected, in which the p-values were multiplied by the number of comparisons) were conducted to further examine the main effect of Cueing, revealing significant differences between cued and uncued SSVEPs for each combination of response frequency and response modality: 8.6 Hz, saccadic [t(39) = –2.95, p = 0.011]; 12 Hz, saccadic [t(39) = –3.96, p = 0.001]; 8.6 Hz, manual [t(39) = –2.46, p = 0.037]; 12 Hz, manual [t(39) = –4.10, p < 0.001]. Collapsed across Cue Frequency, SSVEP cueing effects remain significant for both the saccadic [t(39) = –4.85, p < 0.001] and manual tasks [t(39) = –4.37, p < 0.001]. In sum, in the interval from 100 to 500 ms post-cue, SSVEPs were larger when evoked by flickering stimuli at uncued locations than by flickering stimuli at cued locations, regardless of response modality (Table 2).

Table 2. Mean SSVEP amplitudes with SDs in parentheses and SSVEP cueing effects (cued amplitude – uncued amplitude) for each Response Modality × Cue Frequency × Cueing condition

In addition to post-cue SSVEP analyses, we also analyzed post-target SSVEPs using a post-target window of 0-100 ms (or, a post-cue window of 1,800-1,900 ms, given that the CTOA was 1,800 ms) and found no significant difference in SSVEP amplitudes between target-present and target-absent locations [F(1, 39) = 3.89, p = 0.056, MSE = 0.03, η2 = 0.09]. No two-way nor three-way interaction was found among Response Modality, Cue Frequency, and Cueing conditions (p > 0.1).

Discussion

The effects of spatial attention on SSVEPs have been well demonstrated in numerous studies by comparing responses to attended and unattended stimuli (Morgan et al., 1996; Müller, Picton et al., 1998a). More importantly, the magnitude of attentional modulation of an SSVEP signal has been found to scale with the amount of attention paid to the spatial location (Toffanin, de Jong, Johnson, & Martens, 2009). The present study investigated the time course of IOR, and other ICEs, using the SSVEP technique. Primarily, we were interested in comparing post-cue SSVEPs on cued and uncued trials to determine whether the generation of different ICEs modulated SSVEP signals. ICEs are commonly investigated using a traditional spatial cueing paradigm with a cue stimulus presented in the peripheral visual field followed by a target stimulus that appears at the same or opposite location as the cue. In the present study, the two possible cue locations were each presented with a white square background that flickered at either 8.6 or 12 Hz. Topographical and PSD analyses showed that O1 and O2 electrode sites had the highest SSVEP amplitudes at the corresponding stimulus frequencies (i.e., 8.6 and 12 Hz), demonstrating that the SSVEPs observed were driven by the flickering backgrounds.

SSVEP modulations

The grand average of SSVEP amplitudes per condition showed an overall reduction in amplitude at cued locations (referred to here as an SSVEP modulation) regardless of response modality and stimulus frequency. Two-tailed t-tests per time point revealed that reductions in SSVEP amplitude were significant (p < 0.05) in the 100-500 ms post-cue time range. These SSVEP results are in reasonably good agreement with a previous SSVEP study on IOR (Li et al., 2017), in which a similar SSVEP modulation was observed in the window of 200-800 ms post-cue onset, with manual responses to targets and, presumably, an actively suppressed oculomotor system (note that eye tracking was not used in this previous study).

Because the present study used a CTOA of 1,800 ms, the behavioral performance of participants in the time window in which the SSVEP ICE was found (i.e., 100-500 ms) could not be examined. However, previous behavioral studies (Hilchey, Klein, et al., 2014b, Lim et al., 2018; Lim et al., 2019, Exps. 2 and 5), using a range of CTOAs from 250 to 1,000 ms in a similar paradigm (with eye movements to targets), have revealed behavioral performance in that time range. In Lim et al. (2018), ICEs were found as early as 250 ms post-cue onset (ICE = 14.93) up to 1,000 ms (ICE = 35.07). It was hypothesized (and simulated using a DNF model) that these ICEs were due to sensory adaptation at short CTOAs of around 100-750 ms and were due to output-based, oculomotor IOR at long CTOAs of around 550 ms and greater (see also Satel, Story, Hilchey, Wang, & Klein, 2013b; Satel et al., 2011). As predicted, reductions in SSVEP amplitudes were observed at cued, relative to uncued, locations early in the cue-target interval, which falls within the active period of sensory adaptation. Although the time window of the SSVEP modulation (i.e., 100-500 ms) found in the present study is smaller and earlier compared to Li et al. (2017), they are both within the active period of this early, input-based ICE (i.e., sensory adaptation). We also had predicted that SSVEP modulations would be observed later in the CTOA—but only in the manual response condition when the oculomotor system was actively suppressed, due to the hypothesized input-based nature of the ICE generated in this condition. However, no cue-elicited SSVEP modulations were observed with either manual or saccadic responses late in the CTOA, suggesting that neither of these ICEs actually affect early sensory processing. Putting these results together, the reduction in SSVEP amplitudes we observed in the present study is likely to be an electrophysiological marker of sensory adaptation, even though this ICE had dispersed by the time of target onset and so would not have affected the observed behavioral performance.

Sensory adaptation and SSVEPs

As observed in monkey neurophysiological data (Fecteau, Bell, Dorris, & Munoz, 2005), sensory adaptation is an early, low-level, input-based after-effect that reduces the strength of subsequent inputs at previously stimulated locations. Electrical microstimulation of the SC showed that the iSC was not less sensitive to repeated electrical stimulation (Dorris et al., 2002), suggesting that sensory adaptation does not take place in the iSC but in the sSC, or other earlier sensory areas. Because the sSC receives input mainly from the early visual areas, such as V1 and the retina (Lui, Gregory, Blanks, & Giolli, 1995; Pollack & Hickey, 1979), sensory adaptation could occur as early as in the primary visual cortex (V1) and later propagate throughout the entire oculomotor network (Fecteau et al., 2005). Sensory adaptation could arise from an even earlier stage, as a result of diminished neural responses of V1 neurons to subsequent stimuli (e.g., flickering backgrounds), for example, as observed with microsaccades occurring during cue onset (referred to as microsaccadic suppression; Hass & Horwitz, 2011; see also Kagan, Gur, & Snodderly, 2008; Leopold & Logothetis, 1998).

Kohn (2007) defined visual adaptation as a reduction in the effectiveness of a stimulus to elicit a neuronal response, in agreement with the input attenuation effects observed in the monkey SC (Fecteau et al., 2005) and proposed that visual adaptation can affect sensory processing areas in the cortex and subcortex. Although SSVEPs are not sensitive to neuronal responses in the SC, the source or sources of the adaptation effect on SSVEPs can be inferred based on the visual pathway in which the formation of SSVEPs occurs. Based on fMRI and source modelling studies, the major source of SSVEP signals has been found to be in V1, among V3A (mid-occipital), V4 (ventral occipital), and V5 (middle-temporal; Bayram et al., 2011; Di Russo et al., 2007; Hillyard et al., 1997; Müller et al., 2006; Saenz, Buracas, & Boynton, 2002). It is possible that the SSVEP modulations observed in our study were caused by sensory adaptation that had propagated through to V5 or visual areas earlier than V5 as a result of reduced visual input activity from cued locations.

To put the matter in a slightly different light, aside from single cell recordings (Boehnke et al., 2011; Dorris et al., 2002; Fecteau & Munoz, 2005), there is evidence that such early input-based ICEs may be reflected in microsaccades as well (Hafed, 2013; Hafed & Krauzlis, 2012). Specifically, Tian et al. (2016) demonstrated that the enhancement of target-related activity at very short CTOAs (e.g., 100 ms; Doallo et al., 2004; Hopfinger & Mangun, 1998) and suppression at slightly longer CTOAs (less than around 600 ms) is synchronized with post-cue microsaccades and the magnitude of behavioral inhibition is also related to the number of microsaccades interrupted by target onset due to transient visual responses—i.e., the lower the frequency of interruptions, the larger the ICE observed. According to these studies, the enhancement and suppression phenomena, albeit happening at the level of the SC, are thought to be reflected in behavioral ICEs as a result of stimulus (i.e., any stimulus, regardless of whether it is a cue or a target) onset reflexively resetting the microsaccadic rhythm, causing microsaccades to become biased away from the cue onset location. There is, therefore, a strong link between covert attention and microsaccades, and the SSVEP modulations observed here could be an outcome of pre-microsaccadic rhythm being interrupted by cue onset. However, in the present study, flickering stimuli were present throughout trials, and it is not known how that could have affected the microsaccadic rhythms. It remains unclear whether the ICEs observed in saccades and microsaccades reflect a single process (Hafed, Chen, & Tian, 2015) or multiple related processes (Kliegl, Rolfs, Laubrock, & Engbert, 2009; Laubrock et al., 2010).

Limitations

Another SSVEP study with infants has found SSVEP amplitudes to the target of an upcoming saccadic eye movement begin to increase as early as 500 ms before shifts of gaze (Robertson et al., 2012), suggesting that covert attention is directed to the location of the next gaze shift before gaze onset. Other, non-SSVEP, studies also have found evidence of endogenous saccades being preceded by shifts of covert attention (Deubel, 2008; Godijn & Pratt, 2002; Peterson, Kramer, & Irwin, 2004). However, no post-target SSVEP attention effect was found in the present study.

One possibility is that the absence of post-target SSVEP attention effects in our study was due to the methodological limitations of the paradigm used, in which a trial ended immediately after a response to the target had been made. The short time period beginning at target onset until the end of the trial may have been too short (~300 ms) for such endogenous attention effects to be reflected in the SSVEP signals. In short, we found no difference in SSVEP effects between response modalities, which is likely attributable to the limited exposure to flickering stimuli after target onset. Future studies should consider retaining stimuli on screen for a brief period (e.g., 1,000 ms) after a response to the detection task is received from the participant. Alternatively, it could simply be that neither late input- or output-based ICEs have any effect on SSVEPs, suggesting that both mechanisms arise late in processing.

In addition, we did not observe SSVEP modulations beyond ~500 ms post-cue in either the manual response or saccadic response task, although behavioral responses demonstrated that ICEs were observed in both experimental conditions. Although these behavioral effects were statistically equivalent, we nonetheless presume that an input-based ICE was generated in trials with manual responses (because the oculomotor system was actively suppressed) and an output-based ICE (i.e., traditionally defined as IOR/oculomotor IOR) was generated in trials with saccadic responses. Our results suggest that the SSVEP modulation we observed in both conditions in the 100-500 ms time range is reflective of early sensory adaptation, which would be present in both conditions during this time range, given that both the SSVEP modulation and sensory adaptation had dispersed by 600 ms post-cue.

Conclusions

To summarize, we found a reduction in SSVEP amplitudes at cued locations (SSVEP modulations), which we believe can be attributed to early behavioral inhibition arising from early sensory adaptation that is generated along the input pathway. However, it is unlikely that the behavioral ICEs observed here were reflective of this neural mechanism, given that the SSVEP modulation was no longer evident after 550 ms post-cue onset and that sensory adaptation is thought to disperse by about 600 ms post-cue onset. SSVEP modulations were not observed later in the CTOAs in either response modality condition, suggesting that neither of these neural mechanisms actually affect the early sensory processing measured with SSVEPs. These findings provide further electrophysiological evidence that there is more than one mechanism that contributes to overall behavioral inhibition, and that these inhibitory mechanisms can each have different time courses, manifest along different pathways, and arise under different conditions. In line with this view, inhibitory mechanisms must be better understood and defined before making simplifying assumptions that all behavioral effects observed are derived from a single mechanism termed IOR. Future experimental designs using the spatial cueing task should take into account the temporal properties (e.g., early or late), effects (e.g., output or input pathways), and causes (e.g., oculomotor activation) of each known inhibitory mechanism to best uncover and dissociate their neural underpinnings and behavioral effects.