Salient stimuli capture attention and action

Kerzel, Dirk; Schönhammer, Josef

doi:10.3758/s13414-013-0512-3

Salient stimuli capture attention and action

Published: 06 August 2013

Volume 75, pages 1633–1643, (2013)
Cite this article

Download PDF

Attention, Perception, & Psychophysics Aims and scope Submit manuscript

Salient stimuli capture attention and action

Download PDF

Dirk Kerzel¹ &
Josef Schönhammer¹

5086 Accesses
27 Citations
1 Altmetric
Explore all metrics

Abstract

Reaction times in a visual search task increase when an irrelevant but salient stimulus is presented. Recently, the hypothesis that the increase in reaction times was due to attentional capture by the salient distractor has been disputed. We devised a task in which a search display was shown after observers had initiated a reaching movement toward a touch screen. In a display of vertical bars, observers had to touch the oblique target while ignoring a salient color singleton. Because the hand was moving when the display appeared, reach trajectories revealed the current selection for action. We observed that salient but irrelevant stimuli changed the reach trajectory at the same time as the target was selected, about 270 ms after movement onset. The change in direction was corrected after another 160 ms. In a second experiment, we compared manual selection of color and orientation targets and observed that selection occurred earlier for color than for orientation targets. Salient stimuli support faster selection than do less salient stimuli. Under the assumption that attentional selection for action and perception are based on a common mechanism, our results suggest that attention is indeed captured by salient stimuli.

Influence of simple action on subsequent manual and ocular responses

Article 25 January 2017

The effects of eccentricity on attentional capture

Article Open access 31 May 2023

Salience matters: Distractors may, or may not, speed target-absent searches

Article 14 December 2021

Introduction

Selection of visual information is achieved by attentional prioritization of visual stimuli and may be controlled in different ways. Attentional selection is said to be top-down when it reflects expectations and goals of the observer, and it is said to be bottom-up if it reflects the saliency of the stimuli (reviewed by Theeuwes, 2010). We will briefly present the contingent capture and the additional singleton paradigms, which are believed to provide evidence for top-down and bottom-up control, respectively. However, it should be noted that the two paradigms confound bottom-up and top-down factors to some degree and also involve mechanisms, such as intertrial priming, that defy the theoretical dichotomy (Awh, Belopolsky, & Theeuwes, 2012).

The contingent attentional capture paradigm consists of a target display that is preceded by a cue display (e.g., Folk, Remington, & Johnston, 1992). In separate blocks of trials, observers search the target display for a colored item among white distractors (i.e., a color target), or they look for a single white item (i.e., an onset target). Color targets and onset targets are shown in separate blocks, resulting in an attentional set for color or onset, respectively. Over blocks of trials, the cue display contains either a color cue or an onset cue. It was observed that reaction times (RTs) in a speeded discrimination task are shorter when the cue is shown at the same location as the target, but only when the cue characteristics match the target characteristics (i.e., onset cue/onset target or color cue/color target). In sum, only cues that match the current attentional set capture attention, which is strong evidence for top-down control. However, subsequent studies showed that the repetition of target features contributes to the pattern of results (e.g., Ansorge & Horstmann, 2007), which does not support the notion of top-down control, because trial history is not easily classified as bottom-up or top-down (Awh et al., 2012).

In the additional singleton paradigm developed by Theeuwes (1991), observers search for a shape singleton in a circular array of items and perform a discrimination task. In one variant of the paradigm, the target and distractors change roles randomly from trial to trial (Theeuwes, 1991). That is, a diamond target among circle distractors may be followed by a circle target among diamond distractors, and vice versa. On half of the trials, a salient color singleton is shown at a nontarget location, which increases RTs. Presumably, attention is attracted to the salient color distractor before moving to the less salient shape singleton. However, the conclusion that attention was involuntarily attracted by the salient element and the notion of bottom-up control were subsequently challenged.

Bacon and Egeth (1994) introduced a slight change to the additional singleton paradigm by adding another shape to the array, whereby the target lost its status as shape singleton. After adding the shape, attentional capture was abolished. Bacon and Egeth concluded that attentional capture occurred only when observers were in singleton detection mode, which may induce selection of singletons along the wrong dimension (i.e., color instead of shape). In contrast, when a particular feature was looked for, no capture occurred. However, Theeuwes (1992) found that attentional capture persisted even when the target feature was fixed in a block of trials, instead of changing randomly from trial to trial. In this case, observers did not have to detect the singleton shape but could look for a particular feature in a block of trials (i.e., a feature search).

Furthermore, Folk and Remington (1998) accounted for increases in RTs with an additional singleton by nonspatial filtering costs. According to Treisman, Kahneman, and Burkell (1983), “any object on which attention could, but should not, be focused must be excluded at a cost, in time or in accuracy” (p. 530). Filtering costs arise even for objects that are highly dissimilar from the target and are, therefore, not attended (Kahneman, Treisman, & Burkell, 1983; Treisman et al., 1983). Folk and Remington showed that nonmatching cues in the contingent capture paradigm increased RTs, as compared with a condition with neutral cues. At the same time, RTs with nonmatching cues were not different at cued and uncued locations, suggesting that attention had not been captured. These results confirm the idea that it takes time to exclude nonmatching cues even if they do not attract attention.

Importantly, nonspatial filtering costs predict that the distance between target and distractor should not affect RTs, whereas if attention was captured by the salient distractor, the cost should be larger for distractors that are further away from the target, because attention has to travel a longer distance. Becker (2007) concluded that increases in RTs caused by distractors were mostly consistent with nonspatial filtering costs. Further evidence against attentional capture by salient distractors comes from studies investigating the role of practice. In these studies, two groups of participants were shown different displays in the training phase but the same displays in the subsequent test phase. After training with displays inducing feature search mode, a salient distractor in the test phase did not affect performance, whereas it did after a training phase inducing singleton detection mode (Leber & Egeth, 2006; Zehetleitner, Goschy, & Müller, 2012). Even more surprisingly, the resistance to distraction of participants who performed in feature search mode disappeared when a distractor was presented in the test phase, which they had not encountered during training (Zehetleitner et al., 2012), pointing to the important influence of training and prior exposure in the additional singleton paradigm.

Furthermore, electrophysiological measures of attentional deployment, such as the N2pc, have been used to investigate the control of attention. The N2pc is a negativity contralateral to the attended stimulus that occurs 200–300 ms after stimulus onset (Eimer, 1996; Luck & Hillyard, 1994). Importantly, the N2pc is a spatial measure of the focusing of visuospatial attention because the negativity shows which hemifield was attended. In contrast, differences in RT could be due to shifts of spatial attention or to other processes, such as spatial filtering. Hickey, McDonald, and Theeuwes (2006) reported an N2pc in response to an irrelevant color singleton, which supports the notion of attentional capture by salient stimuli. In their experiment, target and distractor shapes changed roles randomly from trial to trial. In subsequent research, the target shape was fixed, allowing for feature search, which abolished the N2pc to the irrelevant element (Burra & Kerzel, 2013; Schubö, Schröger, Meinecke, & Müller, 2007; Töllner, Müller, & Zehetleitner, 2011; Wykowska & Schubö, 2010, 2011). The absence of an N2pc to irrelevant distractors was confirmed in studies using the contingent capture paradigm, where no N2pc occurred to nonmatching cues (Ansorge, Kiss, Worschech, & Eimer, 2011; Lien, Ruthruff, Goodin, & Remington, 2008).

In the present study, we reexamined attentional selection in a feature search using a novel approach that is based on action execution (Song & Nakayama, 2009). The main purpose of attention outside the laboratory may be to select targets for goal-directed action (Allport, 1987; Neumann, 1987). In the General Discussion section, we will argue that perception, as investigated by the manual discrimination tasks in attentional and contingent capture paradigms, shares a common attentional mechanism with manual reaching action. We therefore looked for changes in goal-directed action attributable to the attentional selection of salient stimuli. In our task, the search display was presented after participants had initiated a reaching movement toward the screen, forcing them to do the search with the hand in flight (see also Chapman et al., 2010a, 2010b). We recorded reaching trajectories and measured deviations toward the irrelevant distractor. If the distractor was selected for goal-directed action, the hand should move toward its location. Importantly, our measure allows for a continuous and spatial measure of attentional selection, quite similar to the N2pc. In related research, it was observed that reaching movements deviated toward salient elements that were presented at potential target locations before the target appeared (Wood et al., 2011), suggesting that salient elements during our visual search task may have a similar effect.

We presented a matrix of white vertical bars containing an orientation singleton (see Fig. 1a). Participants were asked to touch the orientation singleton. The orientation singleton was always tilted to the left, allowing for feature search. A red vertical bar was the irrelevant but salient color singleton that was presented on 50% of the trials. Because feature search was possible (i.e., fixed target), we consider our experiments a conservative test of the attentional capture hypothesis. For instance, the electrophysiological marker of attentional deployment, the N2pc, was absent for irrelevant distractors when feature search was possible, and the increase in RTs was much smaller (Burra & Kerzel, 2013; Lamy & Yashar, 2008; Pinto, Olivers, & Theeuwes, 2005).

Experiment 1

Method

Participants

Sixteen psychology students at the University of Geneva participated for class credit. They reported normal or corrected-to-normal vision. The study was approved by the ethics committee of the Faculty of Psychology and Educational Sciences, and informed consent was given before the experiment started.

Apparatus and stimuli

The 3-D coordinates of manual movements were recorded at a sample frequency of 150 Hz by means of a marker positioned on the nail of the right index finger (CMS20S, Zebris Medical GmbH, Isny im Allgäu, Germany). The stimuli were presented on a 21-in. CRT monitor (85 Hz, 1,280 × 1,024 pixels) equipped with a touch interface (IntelliTouch, Elo Touchsystems, Menlo Park, CA) at a distance of ~65 cm from the participant. On the screen, 1 cm corresponded to ~0.88° of visual angle. Reaching responses were initiated by pressing the “arrow down” key on a computer keyboard. The central fixation mark was 39 cm in front of and 27 cm above the index finger on the start key. The center-to-center distance between stimuli was 2.6 cm vertically and 1.5 cm horizontally. There were 13 columns and 9 rows. The central element was replaced by a black fixation disk 0.6 cm in diameter. The bars had a width of 0.4 cm and a height of 1.4 cm. The target and distractor elements were placed on a circle with a radius of 6 cm. There were six possible target/distractor positions, which are shown in Fig. 1b. Adjacent positions were separated by 60° of rotation, starting at the 3 o’clock position. Target and distractor were separated by 120° of rotation, which resulted in a distance of 10.4 cm. The context elements and target were white (110 cd/m²), and the colored bar was red (CIE 1931: x = 0.612, y = 0.338, l = 20.5 cd/m²), on a gray background (54 cd/m²). The combined color and luminance difference made the color singleton very salient. The orientation singleton was a bar tilted to the left by 45° of rotation.

Procedure

Participants started a trial by pressing the “arrow down” key, which triggered the appearance of the fixation mark (see Fig. 1a). After a random interval (uniform distribution) between 0.5 and 1 sec, a change in the size of the fixation point (from 0.3° to 0.6°) and a beep prompted participants to lift the finger. When participants lifted the finger, the search display was shown, and observers reached the tilted element. The release of the key was considered the RT, and the interval between key release and contact with the touch screen as movement time (MT). RTs had to be between 100 and 500 ms, and MTs had to be less than 600 ms. The short allowable MT forced observers to start moving toward the screen right after key release. Doing the visual search task without moving toward the screen inevitably resulted in time-out errors. The main purpose of the strict MT limit was to avoid a “wait-and-search” strategy, so it was set to be unrealistically difficult: Feedback about slow movements was given on 11% of the trials.

Furthermore, the ultrasound microphones did not capture the ultrasound pulse from the markers if the finger was turned away from the microphone or when the movement was very rapid, as in jerky movements. Observers were therefore instructed to move smoothly and continuously toward the target position after key release. The visual search task was to be performed while the movement was on-going. Visual error feedback was given at the end of the trial.

There were six target/distractor positions (cf. Fig. 1b), and all positions were equally likely. On half of the trials, no distractor was shown. On the other half, the distractor was shown at 120° clockwise or counterclockwise from the target. Observers worked through four blocks of 120 trials that were separated by a short break. Before data collection, participants received at least 30 practice trials.

Results

The data set of 1 participant was removed because 32% of the trials contained missing samples. After inspection of the distribution of MTs, a limit of 700 ms was chosen to remove MT outliers, which amounted to 2% of the trials. Error rates are presented in Table 1.

Table 1 Mean error rates and percentage of retained trials (error types are not exclusive)

Full size table

Each trajectory was resampled to yield 200 samples that were normalized with respect to depth (i.e., the axis from the participant to the screen). After spatial normalization, the time of each sample was recovered by interpolation. We then averaged trajectories for the six target positions without distractor and for the six target positions with distractor, separately for each of the two distractor positions (counterclockwise and clockwise). Thus, there were 6 unique conditions without distractor and 12 unique conditions with distractor.

The analysis of reach trajectories is illustrated in Fig. 2. For each of the six conditions without distractor, we calculated the distance between the trajectory to a target at position P1 and the trajectory to a target at position P2 that was rotated 120° clockwise or counterclockwise. The distance was calculated in 3-D for each of the 200 samples. Because the trajectories were normalized with respect to depth, the samples were approximately equidistant in depth. Therefore, the distance in depth between samples can be graphically represented by the grid lines in Fig. 2.

For each of the 12 conditions with distractor, we calculated how much the trajectory toward the target at P1 deviated toward the distractor at position P2. To this end, we referenced the trajectory with target at position P1 and distractor at position P2 to trajectories with targets at P1 and P2. The resulting distance indicated not simply the difference to the trajectory with a target at P1 in the absence of distractors, but also the deviation from the trajectory to target at P1 in the direction of the trajectory to a target at P2.

More precisely, the following calculations were carried out (see Fig. 2). A sample i on the trajectory to the target T1 is referred to as T1_i and the corresponding point at about the same depth when moving to a target at T2 as T2_i. The corresponding point on the trajectory to a target at T1 with a distractor at D2 is referred to as T1D2_i. We first determined a straight line between T1_i and T2_i, which we refer to as g. Next, we dropped a perpendicular of T1D2_i on g and calculated the distance of this point to T1_i. If the point was between T1_i and T2_i, the sign of the distance was positive. If the point was opposite of T1_i, the sign was negative. If the distractor had no effect, the distance between T1_i and T1D2_i would be zero. If the participant went to the distractor at P2 instead of the target at P1, there would be a distance between T1_i and T1D2_i that was equal to the distance between T1_i and T2_i.

We averaged the distances between T1 and T2 and between T1 and T1D2 across all target positions. The distances for the target at position P1 with distractor at position P2 are shown in Fig. 3. As a reference, we also plot the distance between trajectories to targets at P1 and P2. Observers follow a default trajectory for the first ~270 ms, which results in very small differences between the trajectories during the first phase of the movement. At ~270 ms, the finger starts to move toward a specific stimulus, which increases the distance between trajectories to targets at P1 and P2. At the same time, there is an increase in the deviation toward the distractor.

To quantify the deviation toward the distractor, we compared the mean deviation in the peak interval with the mean deviation in a baseline interval at the beginning of the movement. The baseline interval was defined from the 20th to the 60th sample, which corresponds to the initial 10%–30% of the trajectory. Note that there were 200 equally spaced samples from the starting point of the hand on the keyboard to the endpoint of the movement on the screen. To determine the peak interval, we analyzed the deviation toward the distractor after averaging across participants. First, we determined the peak deviation and calculated the mean deviation in the baseline interval. Next, we used the value at 50% between the baseline deviation and the peak deviation to delimit the peak interval at the rising and falling flank. Then the limits of the peak interval were applied to individual data. Thus, the range of samples (out of 200) in the baseline and peak intervals was the same across participants, but the time of the two intervals varied between participants because individual velocities were different. The average time intervals and average deviations in the baseline and peak intervals are shown in Table 2.

Table 2 Mean interval times (measured from movement onset) and mean deviation toward the distractor for the baseline and peak intervals

Full size table

Average trajectories are shown in Fig. 3. The baseline interval was between 72 and 170 ms after response onset, and the peak interval was from 273 to 445 ms, comprising 67 samples (cf. Table 2). The mean deviation was larger during the peak interval than during the baseline interval (11.5 vs. 1.6 mm), t(14) = 10.9, p < .001.

RTs did not differ between trials with and without distractor (247 ms in both conditions), p = .595, but MTs were slightly longer with than without distractor (568 vs. 557 ms), t(14) = 7.1, p < .001. In this and the following experiments, choice errors were rare (1%), suggesting that attentional capture did not affect the landing position of the finger on the screen.

Discussion

We observed that goal-directed reaching movements deviated toward the position of a salient distractor in a visual search task. About 270 ms after movement onset, participants left the default trajectory and moved toward the orientation singleton. At the same time, there was a deviation toward the color singleton. The deviation toward the distractor was corrected after 160 ms when the hand returned to the trajectory without a distractor. Because the target remained the same throughout the experiment, participants were able to search for a particular feature and did not have to rely on singleton detection. In contrast to previous studies using the N2pc as a spatial marker of visual selection, we observed evidence for attentional capture by task-irrelevant but salient visual elements in a feature search.

Experiment 2

While we wish to interpret the spatial deviation as evidence for attentional selection of the irrelevant object, an alternative explanation is possible. Because target and distractor locations were on opposite sides of fixation, simply continuing on the default trajectory would also result in a deviation toward the distractor. Figure 4a, b shows that without measuring the default trajectory toward the center of the screen, it is not possible to attribute the deviation unambiguously to attraction by the irrelevant distractor. It may just as well be a delay in the selection of the target that could be attributed to spatial filtering.

To decide between attraction and delay, we changed the spatial layout of target and distractor positions. Now the targets were presented on the cardinal axes and the distractors in the corners of a virtual square (cf. Fig. 1c). Because moving to the target required staying on the default trajectory, only attraction toward the distractor would result in deviations. For instance, let us consider the position above fixation as P1 and the position in the upper left as P2 (cf. Fig. 4c). To move to P1, participants continue on the default horizontal trajectory to the center of the screen. If a distractor at P2 attracted the reach, this would result in a horizontal deviation that would fall between target-only reaches toward P1 and P2. If selection of the target at P1 was delayed because of a distractor at P2, the trajectories would remain horizontally aligned with a reach to P1.