Humans are not passive observers but dynamic agents who interact with their environment in goal-directed ways. Performing a goal-directed action presupposes some kind of expectation of action consequences—that is, anticipation of what outcome a particular action is likely to have and whether this outcome fits with one's current intention. This is why ideomotor theorists (e.g., Greenwald, 1970; Hommel, Müsseler, Aschersleben, & Prinz, 2001; James, 1890; Lotze, 1852; Prinz, 1987, 1997) postulate that voluntary actions are integrated with and controlled by representations of desired action goals in the form of the anticipated sensory consequences of the planned actions. Ideomotor theory proper focuses on the role of perceptual action effects in (feedforward) action selection: Motor patterns are assumed to become associated with codes of the perceptual consequences they evoke, so that the individual can later select the motor pattern “thinking of” (i.e., endogenously activating the representations of) its consequences (Elsner & Hommel, 2001). However, recent findings suggest that information about perceptual action consequences can serve at least two more functions in action control. For one, codes of anticipated action effects can be used to monitor action execution by matching anticipated action effects against actually produced effects (Band, van Steenbergen, Ridderinkhof, Falkenstein, & Hommel, 2009; Wolpert & Ghahramani, 2000). And, for another, information about relevant action effects can apparently bias perceptual processing toward feature dimensions that provide relevant information for online (i.e., feedback-driven) action control. It is this function that we investigated in the present study.

Numerous investigations have provided evidence for close links between action and perception in general, and for the impact of action on perception in particular (e.g., Bekkering & Neggers, 2002; Craighero, Fadiga, Rizzolatti, & Umiltà, 1999; Deubel & Schneider, 1996; Hommel, 1993; Humphreys & Riddoch, 2001; Müsseler & Hommel, 1997; Schubö, Prinz, & Aschersleben, 2004; Tucker & Ellis 2001; for reviews, see Hommel, 2010; Hommel et al., 2001; Prinz, 1997). More specifically, several studies have shown that perceptual processes are affected by anticipated effects of action plans (e.g., Jordan & Hershberger, 1994; Jordan & Hunsinger, 2008), and that perception can be biased by particular action plans (e.g., Craighero et al., 1999; Fagioli, Hommel, & Schubotz, 2007; Memelink & Hommel, 2005, 2006; Müsseler & Hommel, 1997).

For example, Kerzel, Jordan, and Müsseler (2001) demonstrated that observers tended to overshoot the position at which they perceived a moving dot vanish, but only if they were actively tracking the dot with their eyes. Moreover, Jordan and colleagues (Jordan & Hunsinger, 2008; Jordan & Knoblich, 2004) found that endogenous control over the dot’s motion influenced perception of its point of vanish. Evidence for action-related bias of perception was also reported by Fagioli et al. (2007). In their study, preparing a grasping movement facilitated the detection of size oddballs, whereas preparing a pointingmovement facilitated the detection of location oddballs. Wykowska et al. (2009) tested whether action-related biases operate at even earlier processing stages than those implied by the observations of Fagioli et al. To this aim, the authors introduced a standard visual search task for pop-out targets combined with logically unrelated movements (grasping or pointing). Similar to Fagioli et al,, the authors found that search performance was better if a size-defined target was combined with grasping or if a luminance-defined target was combined with pointing than with other combinations. They suggested that this might be because size is a relevant perceptual dimension for grasping control (in parameterizing grip aperture and related aspects of a grasping movement), whereas luminance is particularly relevant for pointing control (for parameterizing pointing direction; see Anderson & Yamagishi, 2000; Graves, 1996). To facilitate the processing of action-relevant perceptual parameters, preparing a grasping or pointing movement may bias the perceptual system toward perceptual dimensions that provide these parameters, such as size and luminance, respectively. In the study of Wykowska et al. (2009), this might have set the stage for a kind of action–stimulus (dimension) congruency effect from which some targets benefited more than others.

Aim of the present study

The aim of the present study was to investigate how this congruency effect is related to already known phenomena of perception–action interactions. On the one hand, almost all available models of perceptuomotor interactions (with the exception of Hommel et al.'s 2001 Theory of Event Coding) are focusing on the impact of perceptual processes on action control without considering effects going in the opposite direction (e.g., Kornblum, Hasbroucq, & Osman, 1990; Kornblum, Stevens, Whipple, & Requin, 1999; Rosenbloom & Newell 1987; Spironelli, Tagliabue, & Umiltà, 2009). Obviously, this makes it difficult to relate these kinds of models to observations of the sort reported by Wykowska et al. (2009). On the other hand, however, it is possible to apply the general logic underlying the available models to these observations. According to the taxonomy suggested by Kornblum et al. (1990), the relationship between stimuli and responses can be characterized on at least two levels.

One level (the set level) refers to stimulus and response sets, which can vary in the degree of feature overlap—that is, with respect to the number of dimensions they share (see Fitts & Seeger, 1953). In general, the more dimensions are shared, the more stimulus and response codes or processes are assumed to interact. For instance, responding to left and right stimuli by saying out loud two unrelated nonsense syllables (e.g., “dal” and “bof”) implies entirely unrelated stimulus and response sets. If the responses would consist of pressing a left versus a right key instead, the sets would highly overlap, because the same dimensions were employed in forming stimulus and response sets.

Another level refers to what Kornblum et al. (1990) called the “element level”. Element-level compatibility refers to the relationship between individual stimulus and response elements and describes the mapping of a stimulus onto a specific response. It matters only for overlapping S–R sets—that is, for dimensionally related stimulus and response features. Without set-level compatibility, it should not matter how stimulus and response features are mapped onto each other. For instance, performance should not depend on whether you respond to left stimuli by saying “dal” and to right stimuli by saying “bof,” or vice versa. With set-level compatibility, however, mapping does matter. For instance, it is well known that responding to left and right stimuli by pressing left and right keys, respectively, produces much better performance than does the opposite mapping (e.g., Morin & Grant, 1955).

Given that some available models (e.g., Kornblum, Hasbroucq, et al., 1990, Kornblum, Stevens, et al., 1999) have been created to account for both element-level and set-level compatibility effects, we were interested to see whether the observations of Wykowska et al. (2009) could be interpreted along these lines. In the present study, the experimental paradigm consisted of a visual search task and a movement task. Participants were asked to prepare a given action (grasping or pointing) according to a pictorial cue. Subsequently, a search display with several circular items was presented, and participants were asked to detect a target differing from the other items in size or luminance. On completion of the search task, participants were asked to perform the grasping or pointing action on items of a specially designed device positioned underneath the computer screen. Consider the relationship between the search displays and the action device used in that study (see Fig. 1a). The visual search displays consisted of gray circles arranged on a circular array (see Fig. 1a, top), whereas the to-be grasped/pointed objects consisted of round items mounted on a movement execution device (MED; see Fig. 1a, bottom). As the search items and the action objects were placed on a circular array, this arrangement showed some feature overlap at the set level, which may have contributed to the observed congruency effects. Furthermore, although there was no systematic relation between search targets and action objects, and the location of the action object was signaled only after the search task was completed, one may wonder whether element-level similarity(i.e., both being round circles) may have had an impact on the action-induced congruency effect as well.

Fig. 1
figure 1

Differences in experimental design across experiments. a Original design by Wykowska, Schubö, and Hommel (2009) in which two types of search targets were used: a luminance target (left) and a smaller size target (right). The to-be grasped/pointed objects of the MED (bottom) were circular and arranged on a circular array. b Design of Experiment 1, in which only a larger size target was used (top). The circular items of the MED were substituted by cups of different diameters. c Design of Experiment 2 in which the circular array of the MED was substituted by three cups aligned horizontally

The straightforward logic underlying the following two experiments was to systematically reduce the similarity between the stimulus set in the search task and the response set in the manual-action task and to see whether this would eliminate the congruency effect. Experiment 1 was designed to investigate the congruency effect under conditions in which stimulus and response sets would still overlap at the set level, but the similarity between individual search items and the to-be-grasped or pointed-to objects would be strongly reduced. We tried to achieve that by replacing the originally circular movement objects by paper cups arranged in a circle (see Fig. 1b). The idea that the design of Experiment 1 reduced similarity relative to the original paradigm of Wykowska et al. (2009) was tested with a control experiment in which element-level similarity was manipulated.

The aim of Experiment 2 was to eliminate the similarity at the set level as well. To achieve that, the to-be grasped (or pointed) cups were arranged in a horizontal line below the computer screen (see Fig. 1c).

Experiments 1 and 2

Method

Participants

Sixteen paid volunteers (12 women) from the ages of 21 to 30 years (mean age: 24) took part in Experiment 1, and another group of 15 participants (9 women) from the ages of 18 to 32 years (mean age: 23.3) took part in Experiment 2. All but two were right-handed; all of them reported normal or corrected-to-normal vision. The experiments were conducted with the understanding and consent of each participant.

Stimuli and apparatus

Stimuli were presented on a 17-in. CRT screen (100 Hz refresh rate) placed at a distance of 110 cm from an observer (Experiment 1) and on a 17-in. TFT monitor (75 Hz refresh rate) placed 50 cm from an observer (Experiment 2). Stimulus presentation was controlled by E-Prime presentation software (Psychology Software Tools, Pittsburgh, USA) in both experiments.

Cues specifying what type of movement to prepare (i.e., grasping or pointing) consisted of centrally presented black and white photographs covering 8.5° × 11.3° (18.4° × 23.7°)Footnote 1 of visual angle, showing a left hand performing a pointing or grasping movement on a white paper cup. The search display always contained 28 items (gray circles, 1.1°[2.4°] in diameter; 40 cd/m2 of luminance) positioned on three imaginary circles with a diameter of 3.4° (10.4°), 7.4° (14.1°), and 11.3° (17.7°); see Figs. 1b and c. The target was defined by sizeFootnote 2—larger circle, 1.4° (3.3°) in diameter—and could appear on one of six lateralized positions (three left, three right) on the middle circle.

In Experiment 1, the MED was positioned below the computer screen, at a distance of 80 cm from the participants’ seat (see Fig. 1b). The midpoint of the device was situated 50 cm below and 30 cm in front of the computer screen. The MED consisted of a 43 × 54 × 13 cm box containing eight LEDs positioned on an imaginary circle that was 22.2° in diameter. Slightly beneath each of the LEDs, rectangular cardboard pads were attached. White paper cups were positioned on those pads (see Fig. 1b, bottom) and covered the LEDs. All of the cups had the same height (4.5°), weight (2 grams), and luminance (3 cd/m2). They could vary only in diameter, with four cups being larger (5.7°) and four smaller (4°). The LEDs behind the cups were lighting up the cups (luminance values of lit-up cups were equal to 32 cd/m2). In Experiment 2, three linearly aligned cups were positioned on a wooden board that was installed 20 cm below the computer screen (see Fig. 1c). The cups were identical in height (5.7°), weight (2 g), and luminance (3 cd/m2), but differed in average diameter (small, 5.7°; medium, 7.4°; large,8.5°). Instead of an LED lighting up behind one of the cups (Experiment 1), in Experiment 2, a yellow asterisk (1.4°, R, 255; G, 211; B, 32 in the RGB scale) presented on the computer screen for 400 ms signaled which cup should be grasped/pointed to. The asterisk could appear at one of three different positions on the screen (10.0° below an imaginary horizontal axis in the middle of the screen and −11.6°, 0°, and 11.6° relative to an imaginary vertical axis in the middle of the screen).

Procedure

All of the participants took part in two sessions, one practice session and a subsequent experimental session with no less than 2 hours and no more than 2 days in between. In the practice session, participants performed two blocks of one movement type only (pointing or grasping, 48 trials per each block) and one block of both types of movements randomly intermixed (64 trials).

The experimental session proper consisted of two blocks, 192 trials each. At the beginning of the experimental session, participants performed a short warm-up block (32 trials) in which they practiced the movements only, and then a practice block (80 trials) in which they practiced the visual search task together with the movement task. The movement task was randomized within blocks, and participants were presented with a picture cue (trial-wise) informing about the movement type they were to execute (see Fig. 2). Participants were instructed to prepare for the movement but not execute it until a go-signal would appear. Subsequent to the cue presentation, the search display was presented for 100 ms. Participants were asked to respond to the visual search task immediately by pressing the right/left mouse keys with the index and middle fingers, respectively, of their right hands. Both speed and accuracy were stressed. Upon the response in the search task, a go-signal occurred (Exp. 1, one of the LEDs on MED lit up; Exp. 2, a yellow asterisk was presented), which signaled that observers should execute the prepared movement—that is, to either point or grasp from the side the indicated item with the left hand. Only accuracy was stressed in the movement task. The correctness of movement execution was registered by the experimenter, who was monitoring the participants through a camera.

Fig. 2
figure 2

Trial sequence for both experiments. At large, the trial sequence was the same across the experiments. In the case of procedural differences across experiments, the information is given for Experiment 1 and is in brackets for Experiment 2. Each trial began with a fixation cross displayed for 500 ms (Exp. 2: 600 ms). Subsequently, the movement cue appeared for 800 ms and was followed by another fixation display for 500 ms (Exp. 2: 600 ms). The search display was presented for 100 ms. A blank screen followed the search display and remained on the computer screen until participants responded to the search task. Upon search response, one of the LEDs on MED lit up (Exp. 2: a yellow asterisk was presented) for 300 ms (Exp. 2: 400 ms) as a go-signal for the prepared movement

Control experiment

As was mentioned previously, we conducted a control experiment in order to test the manipulation that aimed at reducing element-level similarity in the present design as compared with that in the original study of Wykowska et al. (2009). In this experiment, set-level similarity participants performed a search task for size targets (larger size, 2° of visual angle in diameter, 40 cd/m2 of luminance) in a search display containing circles (1.4° of visual angle in diameter, 40 cd/m2 of luminance) or cup-like items of same area sizes as the circles (bases, 2°, 1.4°; height, 1.8° for the larger size target and 1.4°, 1.1°, 1.2° for the remaining objects; 40 cd/m2 of luminance) (see Fig. 3).They also performed a movement task on two types of MEDs: the original MED setup used in Wykowska et al. (see Fig. 1a) and the new MED with cups (see Fig. 1b). The search display types were randomized, whereas the MED types were blocked. Participants were asked to respond to the detected search target by grasping the spatially corresponding element on the MED as fast as possible. The visual search display contained eight elements arranged on a circular array (middle circle of the original displays, 7.4° of visual angle in diameter), and the MED also consisted of eight items. The items on the MED could be small (2.8°), medium (4°), and large (6.2°) in diameter. Key release times (initiation of the grasps) were measured as reaction times (RTs) to the search target. The element-level similarity was manipulated by the compatibility factor. For compatible: MED with circles + visual search with circles, or MED with cups + visual search with cup-like objects (see Fig. 3, left side of the right panel). For incompatible: MED with cups + visual search with circles or MED with circles + visual search with cup-like objects (see Fig. 3, right-most panel). Data of 13 participants, whose overall error rates were lower than 10%, were subject to analyses. The analyses focused on trials in which participants were to grasp items of MED that shared the feature with the visual search target (larger MED items). A repeated measures one-way ANOVA with the within-subjects factor of element-level compatibility (same type of search objects and MED items vs. different type) was performed on the RT data. The results showed that performance was better when the action context was similar to the perceptual context (cf. Fig. 3 middle, white bar) as compared with when they were different (see Fig. 3, middle panel, gray bar), F(1, 12) = 8.7, p < .05, η p2 = .42. These data confirmed our assumption that the cup-like items used on the MED in the present design reduces the element-level similarity, which has been quite high in the original design of Wykowska et al.

Fig. 3
figure 3

Schematic representation of the factorial design of the control experiment that tested element-level similarity between the action and perceptual contexts in the experimental designs used in Wykowska et al. (2009) and in the present studies. The left column and the left panel of the right column represent the element-level compatibility (circles in the visual search display and circles on MED, or cup-like search objects and cups on MED), and the rightmost panel depicts the incompatibility condition (circles in the visual search display and cups on MED, or cup-like search objects and circles on MED). The middle graphs depict mean reaction times (RTs) as a function of element-level compatibility (white and gray bars). Error bars represent within-subjects confidence intervals, with a 95% probability criterion calculated according to the procedure described in Cousineau (2005)

Results

Incorrect movement trials as well as outliers in the search task (+/− 3 SDs from mean RT for each participant and each block) were excluded from further analyses. From the remaining data, RTs in the detection task were submitted to ANOVAs with: movement type (point vs. grasp) and trial type (target absent vs. target present trials) as within-subjects factors. For the analyses of the error rates, only incorrect trials in the movement task were excluded, and individual mean error rates were submitted to analogous ANOVAs. Three participants were excluded from the analyses of Experiment 1 because of poor overall performance (>10% errors).

In Experiment 1, search RTs were faster when the grasping movement was prepared (M = 509 ms) as compared with when the pointing movement was prepared (M = 520 ms), F(1, 12) = 6.7, p < .05, η p2 = .36 (see Fig. 4a). This effect was comparable for target trials (∆M = 12 ms) and blank trials (∆M = 11 ms); the interaction between trial type and movement type was far from significance, p > .7. Error rates showed a pattern of results similar to the RT data (fewer errors in the grasping condition, M = 3.8%, relative to the pointing condition, M = 4.1%), although this difference was not significant, p > .5. The results of Experiment 2 showed the same pattern: RTs were faster when the grasping movement (M = 535 ms) rather than the pointing movement was prepared (M = 549 ms), F(1, 14) = 4.7, p < .05, η p2 = .25 (see Fig. 4b). Again, this effect was comparable for target trials (∆M = 21 ms) and for blank trials (∆M = 7 ms).The congruency effect did not interact with trial type, p > .2, and error effects were again far from significance ps > .5, but showed an expected pattern: slightly fewer errors in the grasping condition, M=5.7 %, than in the pointing condition, M=5.9%.

Fig. 4
figure 4

a Results of Experiment 1.b Results of Experiment 2. Reaction times (RTs) in the visual search task as a function of prepared movement (pointing, white bar; grasping, gray bar). Error bars represent within-subjects confidence intervals with 95% probability criterion, calculated according to the procedure described in Cousineau (2005)

Discussion

Our findings replicate the previous observation of Wykowska et al. (2009) that size-defined visual targets can be detected more easily while preparing for a grasping action as compared with a pointing action. Such findings further support the idea that action and perception are tightly coupled (in line with results of, e.g., Bekkering & Neggers 2002; Craighero et al., 1999; Fagioli et al., 2007; Hommel, 1998; Müsseler & Hommel, 1997) and that actions are planned through representation of their anticipated sensory consequences, which in turn biases perception (in line with findings of, e.g., Elsner & Hommel, 2001; Jordan & Hershberger, 1994; Jordan & Hunsinger, 2008; Jordan & Knoblich, 2004; Kerzel et al., 2001). Importantly for the aims of the present study, the action-target congruency effect was obtained even when the to-be-grasped (or pointed) white elongated cups shared no object similarity with the round gray items in the search display. Moreover, in Experiment 2, the spatial arrangement of items of the perceptual task also differed from the spatial arrangement of items of the action task, since the latter was performed on one of three paper cups that were horizontally aligned below the computer screen. In other words, neither a reduction in element-level (Experiment 1) nor in set-level (Experiment 2) compatibility let the effect disappear. This rules out the possibility that existing stimulus–response compatibility models can account for action-target congruency, as was demonstrated by Wykowska et al. (2009), even if one neglects the fact that these models are not considering any impact of action on perception anyway. In other words, the present congruency effect clearly goes beyond previous demonstrations of set-level and element-level compatibility.

The present results show that action–perception congruency is not restricted to situations with feature similarity between stimulus and response sets. Thus, congruency effects need not arise at the level of basic sensory features of stimulus and response sets, but may result from higherlevel representations of perception and action events. One can only speculate about the nature of these higherlevel representations the at this point. There is empirical evidence that these could be representations in terms of action goals, which constitute the link between the sensory consequences of an action and the selection of the appropriate motor control parameters (see, e.g., Prinz, Aschersleben, & Koch, 2009). According to this notion, a specific perceptual dimension would be selected according to its relevance to a current action plan—that is, according to the use the stimulus features have for specifying open parameters in action control—such as location for pointing and shape for grasping.

How are action-perception links established? In line with ideomotor views, we postulate that voluntary actions are controlled by representations of the corresponding action goals. Through life-long experience with particular actions, humans learn to select and integrate those perceptual characteristics that are relevant for achieving the respective action goals. It is not unreasonable to assume that acting agents first consider all sorts of perceptual information for the online control of their actions. With experience, some selections were rewarded in that some actions were more successful than others. In object grasping, for example, humans learn that successful grasping requires adjusting the grip aperture to the object’s size. Preparing a grasping movement may thus activate the representation of size as a grasp-relevant dimension. As a result, the perceptual system will prioritize processing size information in general, and not, for example, color, which is irrelevant to grasping. Such prioritizing might take place via a weighting or a biasing mechanism (e.g., Bundesen, 1990; Desimone & Duncan, 1995; Hommel, 2009; Hommel et al., 2001; Müller, Reimann, & Krummenacher, 2003; Wolfe, Butcher, Lee, & Hyle, 2003) that may operate at the perceptual stages when certain characteristics of the environment are weighted and selected for subsequent elaborated processing (see, e.g., Wykowska et al., 2009).

To summarize, results of the present study support the idea that when humans act voluntarily, their actions are controlled by representations of action goals in the form of characteristics that have been learned to be relevant for the planned action, independent of element-level and set-level compatibility between the stimulus and the response. Once an action-relevant characteristic is encountered in the environment, its processing becomes prioritized, because it has already been involved in a preactivated perceptual aspect of action representation. As such, action planning has a significant impact on early stages of perceptual processing and selection mechanisms.