Introduction

Working memory (WM) is our capacity to hold information in mind for a short duration and to manipulate, transform or act on that information. When we search for an object, for example, WM both maintains an active representation of that object and allows us to compare it with perceptual input as we conduct the search. Models of visual search suggest that representations held in WM can bias perceptual processing and influence visual attention during visual search tasks (Desimone & Duncan, 1995; Eimer, 2014). This process is thought to occur through top-down activation of feature-sensitive visual processing areas from WM representations – thereby enhancing processing of features that match those of the representations held in WM. Consistent with these models of visual search, previous studies have shown that objects that match items held in WM are recognized faster than objects that do not match WM items (Downing, 2000; Olivers et al., 2006; Soto & Humphreys, 2009). Furthermore, Gayet et al. (2017) found greater activation in brain areas involved in object recognition (including the lateral occipital cortex) in response to presentation of a stimulus that matched information being held in visual WM compared with a non-matching stimulus. A concomitant effect of this enhanced processing is that the spatial location of the matching features becomes the focus of visual attention. This process has been termed WM-driven attentional guidance.

While it is usually the case that the representation held in WM guiding our search is the target item, this need not necessarily be the case – any active WM representation can influence attention and perceptual processing. Indeed, many studies have demonstrated that attention can be guided by information held in WM even when it is irrelevant to the current task. These studies often use a dual-task paradigm in which participants hold an item in WM for a memory task while completing a separate visual search task. If the memory item then appears in the search array as a distractor, it is possible to measure its influence on the search task. The majority of these studies, typically using stimuli with simple features, have shown that when an item with a conjunction of features, such as a colored shape, is held in WM, attention is drawn to distractors that match these features (Olivers et al., 2006; Soto et al., 2005). The resulting interference is attributed to the enhanced processing of the memory-matching distractor resulting from top-down activation from WM; therefore, biasing attention away from the target.

Research has also shown that abstract stimuli (that is, ones that do not possess the visual features of the item that they represent) can also influence visual search. Using several variations of the typical attentional guidance dual-task paradigm, Soto and Humphreys (2007) presented memory items either as objects (e.g., a red square) or as verbal descriptions of those objects (e.g., the words ‘red square’). They demonstrated that memory items presented in either format slowed search when a matching distractor was present and suggested that rapid extraction of the semantic content of the search array items was responsible for the attentional capture by the WM item. Using a different paradigm, Malcolm et al. (2016) also demonstrated the effect of irrelevant semantic information on search performance. Participants searched for a target superimposed on background images of everyday objects, one of which was centrally located while the other two were presented on the left and right of the central image. One of the peripheral backgrounds was semantically related to the central one while the other was unrelated. The results showed that participants were faster to detect the target when it was located on the semantically related background image than the unrelated one – even though the semantic (and visual) properties of the background images were irrelevant to the search task. The results of these studies suggest that both visual and semantic properties of visual stimuli can guide attention during visual search tasks, even when they are not related to the target item.

Whether or not the semantic properties of real-world objects guide attention may depend on the demands of the task. De Groot et al. (2017) tested the effect of visual and semantic similarity between target and distractor items on visual fixation following search array presentation. In this study, participants first memorized an object name and then performed a search task. In the ‘template’ condition, the memorized object name served as the target cue; that is, participants searched for the corresponding image in the search array. In the ‘accessory’ condition, the memorized object name was irrelevant to the search task (participants searched for a different item). Target-absent search arrays contained both a distractor that was visually related to the memorized name and one that was semantically related. The results showed that both visually and semantically related distractors captured attention during the search task in the target-absent ‘template’ condition (i.e., when the memorized information was task-relevant). However, they did not find any attentional guidance effects in the target-absent accessory condition; when the memorized object name was not relevant to the search task, neither visually nor semantically related distractors attracted attention. These experiments indicate that the conditions under which semantic information influences visual search (and the mechanism by which it operates) are still not fully understood.

To date, attentional capture by visually matching distractors found in studies that used standalone images of real-world objects have only been found on target-absent trials (Balani et al., 2010; Houtkamp & Roelfsema, 2006). One possible reason is that the target item (presented as either an object name or image) changed on every trial; therefore, it may have been temporarily held in a highly active or accessible state because it was continually updated. In their study, Houtkamp and Roelfsema (2006) asked participants to memorize two objects at the start of every trial. These items were targets for the two consecutive visual searches that followed. On the critical trials, the second search target appeared in the first search array as a distractor. The results indicated that the memory-matching distractors had no effect on search times when the target was present in the display; however, they slowed search when the target was absent. These results are consistent with an explanation proposing that WM maintains the target template in an active state, ‘shielding’ it from any interference from non-target items (Olivers et al., 2011). To explore this mechanism further, we tested whether search-irrelevant distractors could bias attention when the target item was held constant throughout the experiment. If the target item is not updated in WM, it may not be held in as active a state or provide as much protection against interference from memory-matching distractor items.

The goals of the present study were to examine whether distractors that are identical to the WM item and distractors that are semantically associated with the WM item interfere with search when the target remains the same. We use the term WM rather than visual WM because, although the stimuli were presented visually (i.e., as images), it is likely that participants also encoded semantic and linguistic features of these images. In the experiments reported here, we used a memory-and-search dual task in which participants first viewed and memorized an everyday object. They then determined if the target item was present or absent in a two-object array. Lastly, they were given a recognition test for the object they previously memorized. There were three types of search arrays based on the match between the memorized item and the distractor: on exact-match trials, the distractor was identical to the memory item, on semantic-match trials, it was semantically related to the memory item; and on non-match trials, it was unrelated to the memory item. If active search-irrelevant WM information (i.e., the memory item) can bias attention away from the target, then we would expect slower search in the exact-match condition than in the non-match condition. If items semantically related to the memory item can also bias attention away from the target, then we would expect slower search in the semantic-match condition than in the non-match condition.

Experiment 1

Method

Participants

Participants were recruited from Amazon’s Mechanical Turk, using the CloudResearch platform (Litman et al., 2017). Participants were required to have completed over 1,000 HITs (Human Intelligence Tasks) with an approval rating of over 95% and to be in the USA. Forty-eight participants completed the visual search task. Eight participants were removed from the analysis for having an accuracy rate less than 90% on either the visual search or the memory task, leaving 40 participants (19 females and 21 males) in the final analysis. Participants’ mean age was 29.7 years (SD = 3.74, range 20–35). Five participants were Asian, six were African-American, five were Latino, two were mixed race and 22 were Caucasian. The study took approximately 45 min and participants were paid US$5. The study was approved by the Monash University Research Ethics Committee.

Stimuli

The stimuli were images of everyday items taken from the POPORO stimulus set (Pool of Pairs of Related Objects; Kovalenko et al., 2012). This set contains triplets of images, two of which are semantically related and one that is unrelated (e.g., a hamster, a hamster’s wheel and a power cable). The semantic relatedness ratings were calculated from pairwise comparisons performed by 132 participants, and related and unrelated images were matched on luminance (and luminance distribution), their radially averaged power spectra, and the shape of the outline of the image (for further details, see Kovalenko et al., 2012). We selected 180 out of the 400 available sets with a mean relatedness rating between the two related images of 84.07% (SD = 5.1, range: 76.0–96.2%). Image sets were selected based on high relatedness between the related images and cultural appropriateness.

The experiment was conducted online (see Procedure); therefore, because participants completed the study using their own computers and monitors, the absolute size of the stimuli could not be controlled. However, the size of all stimuli was set to be 25% of the height of the screen (for a 24-in. 1,920 × 1,080 monitor, each stimulus was 270 pixels, or approximately 7.15 cm, square). Search array eccentricity was set to 15% of the height of the screen (for a 24-in. 1,920 × 1,080 monitor, the center of each image was 162 pixels, approximately 4.3 cm, from the center of the screen with a gap of 54 pixels, approximately 1.4 cm, between the stimuli).

Procedure

Participants first completed a short demographic survey on Qualtrics (https://www.qualtrics.com; Qualtrics, Provo, UT, USA) and then completed the experimental task developed using PsychoPy (Peirce et al., 2019) and administered online using the Pavlovia platform (https://pavlovia.org/).

The experiment involved both a memory and a visual search task. Before starting the task, participants were given written instructions and shown the target item that they would look for during the visual search task. The target item was a cup of coffee, and the same image was used for all trials and all participants. On each trial (see Fig. 1), the participant was first presented with the memory item. After a delay, the participant was then presented with a simple search array consisting of two items presented on the left and right of the screen. These items remained on the screen until participants indicated with a key press whether the target item was present (using the index and middle fingers of their left hand to press the ‘1’ or ‘2’ keys). Following the search task, participants were presented with a probe item and indicated whether it was the same as the memory item (using the index and middle fingers of their right hand to press the ‘8’ or ‘9’ keys). The key mappings were randomized across participants. Following the presentation of each item image, a random dot mask was presented to prevent afterimages. There were six blocks of 30 trials. Prior to the experimental trials, participants were given 20 practice trials (the first ten of which had text instructions).

Fig. 1
figure 1

A schematic timeline of each trial in the experiment. First, a memory item is shown (e.g., a saucepan), followed by a two-item search array. In this example, the target item (the cup of coffee) is displayed with a semantically related distractor (a stove). The participant indicates whether the target item is present or not. Finally, the probe item is shown (in this example, it is the same as the initial memory item) and the participant indicates whether it is the same as the original memory item. Following each image, a random dot mask was presented to limit image after-effects

On each trial, the memory item was one of the two related images in each triplet of stimuli; therefore, in the following search array, one of the items (the distractor) could be the memory item itself, the related image, or the unrelated image in that image triplet. The other item in the search array was either the target (a target-present trial) or an item randomly selected from the semantically unrelated items of the other image triplets (a target-absent trial). The probe item was either the same as the memory item or the semantically related item from a different, randomly chosen triplet. Both the visual search conditions (target presence: present, absent; and distractor type: exact, semantic, non-match), the memory condition (probe item: same, different) and the side on which each search array item was presented were chosen at random.

Results and discussion

Participants performed both the search task and the memory task with a high degree of accuracy (overall mean accuracy of 98.25% for the search task and 97.62% for the memory task). Only trials on which responses for both tasks were correct were included in the reaction time (RT) analysis. Additionally, for each participant, any trials with outlier RTs on either the search or the memory tasks were removed. Outliers were defined as RTs of less than 250 ms or greater than the participant’s median RT plus five times the scaled median absolute deviation. This resulted in average trial numbers across participants for each condition of between 25.88 and 28.15 (out of an expected 30 trials).

Figure 2 shows the mean RTs in the visual search task across all the conditions. A 2 (target presence: present, absent) × 3 (distractor type: exact, semantic, non-match) repeated measures ANOVA showed significant main effects for both target presence (F1,39 = 83.22, p < .001, ηp2 = .68) and distractor type (F2,78 = 22.36, p < .001, ηp2 = .36) as well as a significant interaction (F2,78 = 4.01, p = 0.030, ηp2 = .09; p-values are corrected for violations of sphericity using the Greenhouse-Geisser adjustment where appropriate). These results show that participants responded more quickly when the target was present (771.13 ms) than when it was absent (915.98 ms). Follow-up pairwise comparisons showed search times on exact-match trials were slower (884.44 ms) than on semantic match trials (830.94 ms), which were again slower than on non-match trials (815.28 ms; p < .02 for all comparisons). To investigate the interaction, pairwise comparisons were run between the distractor conditions for the target present and target-absent conditions separately. The results showed that when the target was present, RTs in all the distractor conditions differed from each other (exact-match mean = 797.25 ms, semantic-match mean = 766.93 ms, non-match mean = 749.21 ms: p < .03 for all comparisons). When the target was absent, there was no significant difference between the semantic-match (894.96 ms) and the unrelated distractor conditions (881.36 ms; p = .21); however, the RT in the exact-match condition was significantly slower than both the semantic match and non-match conditions (971.62 ms; ps ≤ .001).

Fig. 2
figure 2

Mean reaction times (RTs) for the search task across the target (present, absent) and distractor (exact match, semantic match, non-match) conditions of Experiment 1. Error bars represent the SEM. * p < .05, *** p ≤ .001

The results of Experiment 1 showed that a distractor matching the information held in WM interfered with search. Previous studies have established this memory-driven attentional capture using simple features. The results of this study extend this phenomenon to complex stimuli such as real-world objects. Furthermore, the results show that distractors that are semantically related to information in WM also slow down search. This finding suggests that visual attention is influenced not only by the visual features of WM information but also by their semantic features.

One potential alternative explanation for the slower RTs in the semantic match condition might be that, in the stimulus set we used, the related images were more similar in color than the unrelated ones, and that this perceptual similarity was driving the attentional guidance, rather than the semantic relationship.Footnote 1 Previous studies of attentional guidance that used color as a feature of the WM item have found that it has a strong influence on visual search (Olivers et al., 2006; Soto et al., 2005). Memorized colors have also been found to influence early saccades in target detection tasks; for example, saccades were more accurate when targets matched the memorized color than when they did not (Hollingworth et al., 2013). Therefore, it is possible that in the semantic-match condition of Experiment 1, attention was biased to distractors with the same color as the memorized object. If this is the case, this effect could be attributed to the perceptual similarity, rather than the semantic relationship, between the memory item and the distractor. To address this, we replicated the experiment using only grayscale images.

Experiment 2

The aim of Experiment 2 was to rule out any influence of color on attentional guidance in the semantic-match condition. In this experiment, we used grayscale versions of the same images used in Experiment 1.

Method

Participants

Forty-five new participants were recruited from Amazon’s Mechanical Turk using the same criteria as Experiment 1. Twelve participants were excluded for having less than 90% accuracy on either the visual search task or the memory task, leaving 33 participants in the final analysis (ten males, 22 females, and one who did not identify their gender). Participants’ mean age was 30.24 years (SD = 3.60, range 25–36). The study was approved by the University of Mindanao Ethics Review Committee.

Stimuli and procedures

The stimuli and procedures of Experiment 2 were identical to those of Experiment 1 with the exception that all images were grayscale.

Results and discussion

As in Experiment 1, the overall mean accuracy in the search and memory tasks were high (98.06% and 97.23%, respectively). Only trials with correct responses in both tasks were included in the RT analysis. Outliers were defined as in Experiment 1 and were removed from further analysis. The average number of trials across participants for each condition ranged from 26.55 to 28.48.

Figure 3 shows the mean RTs in the visual search task across all conditions of Experiment 2. Overall, RTs were slowest on exact-match trials, followed by semantic-match trials, and non-match trials. As expected, RTs were faster in the target-present condition than in the target-absent condition. A 2 (target presence: present, absent) × 3 (distractor type: exact, semantic, non-match) repeated-measures ANOVA confirmed a significant main effect of target presence (F(1, 32) = 54.27, p < 0.001, ηp2 = 0.63), distractor type (F(2, 64) = 26.42, p < 0.001, ηp2 =0.45), and a significant interaction (F(2, 64) = 4.09, p = 0.03, ηp2 = 0.11; p-values are corrected for violations of sphericity using the Greenhouse-Geisser adjustment where appropriate). These results show that participants responded more quickly when the target was present (833.56 ms) than when it was absent (1,019.44 ms). Pairwise comparisons confirmed that search times on exact-match trials (961.83 ms) were slower than on non-match (896.90 ms; p < 0.001) and semantic-match trials (920.76 ms; p < 0.001). Search times on semantic-match trials were also slower than on non-match trials (p < 0.04).

Fig. 3
figure 3

Mean reaction times (RTs) for the search task across the target (present, absent) and distractor (exact match, semantic match, non-match) conditions of Experiment 2. Error bars represent the SEM. * p < .05, ** p < .01, *** p ≤ .001

A test of simple main effects showed that when the target was present, exact-match and semantic-match RTs were both slower than non-match RTs (p = 0.002 and p = 0.006, respectively). Exact-match and semantic-match RTs did not differ significantly (p = 1). When the target was absent, exact-match RTs were slower than non-match and semantic-match RTs (ps < 0.001). However, there was no difference between semantic-match and non-match RTs (p = 1).

The results of Experiment 2 are similar to those seen in Experiment 1. The presence of a distractor that was identical to the item being held in WM interfered with search for the target object. Importantly, after removing any influence of color, we found that distractors that were semantically related to the memory item also slowed down search. The results of Experiment 2 show that a semantic relationship between a distractor and the memory item is sufficient to bias attention during the search process. It is also noteworthy that the exact-match and semantic-match guidance effects were found on trials when the target was present, which suggests that the search-irrelevant WM representation was able to compete with the target representation for top-down guidance of search.

General discussion

The present study investigated two primary questions: whether search for images of real-world objects can be influenced by task-irrelevant information held in WM, and whether distractors semantically related to WM-held information can slow down visual search. The findings clearly demonstrate that search was slowed when the distractor (irrelevant to the search task) was an exact match with the memory item. Furthermore, distractors that were semantically associated with, but visually dissimilar to, a memorized object (semantic-match distractors) also slowed search, when the target was present in the display. Consistent with previous studies, our findings also show faster search times on target-present trials compared with target-absent trials.

The exact match distractor slowed search both when the search target was present and when it was absent. This result differs from the findings of another experiment that used drawings of real-world objects (Houtkamp & Roelfsema, 2006, experiment 1B) which showed memory-matching distractors only slowed search on target absent trials. A potential explanation for this inconsistency might be the different memory requirements for the memory and search items. In that study, the search targets had to be updated on every trial, which may have resulted in a stronger target representation held in WM than our study, in which the target remained constant throughout the experiment. According to one model of visual WM (Olivers et al., 2011), when the target template is kept in a strongly active state in WM, it prevents other WM items from interacting with incoming perceptual information. In this study, the target’s representational strength was likely lower either because there was no need to overcome or compete with a previous target representation, or because the nature of the representation was different (e.g., the involvement of long-term memory systems). The reduced target item activation in our study might allow other representations in WM (such as the memory item) to interact with incoming sensory information and slow the search process. Results from a study that examined event-related potential components in a similar visual search paradigm are consistent with this interpretation. Gunseli et al. (2014) showed that contralateral delay activity and the late positive complex amplitude (thought to be related to maintenance of WM representations) were lower when the same search target was used throughout the experiment compared with when it changed every trial. This suggests that once the features of the target are learned, the WM representational strength of the target item is reduced. In contrast to the search target, the memory item in our experiment changed on every trial, which may have resulted in a more strongly active WM representation. These task characteristics of an invariant target and unique memory items, and the resulting activation strengths, may have allowed the distractors to compete with the target representation effectively even on target present trials. This is not the first study to find WM-driven attentional guidance using real-world objects or meaningful stimuli. For example, Jung et al. (2018) also found that search was slowed when a distractor matching a memorized object was present in the search display. In their experiment, however, they used real-world indoor scenes as their search display. The results presented here add to our understanding of the factors that lead to involuntary WM-driven attentional guidance. That is, when the search target is constant, attention is automatically drawn to distractors that visually match WM contents even in search displays with standalone objects.

The results also demonstrate that semantic information can influence visual search; in the target-present condition of both experiments, a semantically related distractor significantly slowed search. While this result is consistent with previous findings showing that search can be guided by semantic information (Nako et al., 2014) and that distractors associated with the target slow search (Belke et al., 2008; Moores et al., 2003), the semantic information that influenced attention in those studies was related to the target. Here, we found evidence for the influence of search-irrelevant information on attention during search, with semantic features of the memory item slowing down search for the target. The exact mechanism of attentional guidance by semantically related distractors is not fully understood but presumably involves activation spreading between visual and semantic features of the stimulus, and to related representations.

Experiments on semantic priming by subliminal images show that mere exposure to related images can affect object recognition (Dell'Acqua & Grainger, 1999); therefore, it may be possible that simple passive viewing could affect visual search as well. However, several studies have shown this is not the case. Soto and Humphreys (2007, Experiment 3) showed that when observers were not asked to remember the colored shape presented prior to the search task there was no attentional guidance by a matching distractor. Olivers et al. (2006, Experiment 5) similarly demonstrated that when the memory task was completed before the search task, there was also no effect of a memory-matching distractor. Malcolm et al. (2016) also found that an object had to be attended to in order to guide visual attention. This evidence suggests that a representation must be active in WM to interact with the search process; items minimally encoded or dropped from WM do not have an effect. This difference from object recognition tasks could be because the visual search task requires more complete target and search item representations to be successfully completed. Different task requirements likely result in different strengths (and time courses) of representational activation, which could be investigated in future research.

One aspect of our results that is difficult to explain is why there was no difference in search times between semantic and non-match distractors when the target was absent. This pattern of result was the same for both Experiment 1 and Experiment 2. One possibility might be methodological; in the target-absent condition, one of the search items was the non-matching item of the stimulus triplet, the other was chosen at random from all the other non-matching items. This meant that some non-matching items were presented more than once. It is possible that this item repetition may have slowed search. Another methodological consideration is the use of the same target across participants. It may be that there is something idiosyncratic about this target item that is driving the results. While we think this is unlikely, we welcome conceptual replication and extension of this work to explore the mechanisms behind attentional guidance further.

In summary, our results clearly show that both exact and semantically related distractors can slow visual search, even in the presence of the search target. One possible explanation for this might be the relatively lower strength of the WM representation of the target item compared to the memory item in our paradigm. The effect of the semantic distractors might be explained by activation spreading from visual to semantic features of both the WM and perceptual representations; and the interaction of these features slowing the search process. Further research varying the representational nature and strength of the WM item, the target template, distractor and perceptual representations would further clarify the mechanisms behind visual search and how it is affected by WM content.