Introduction

Expert performance has been examined across numerous domains, including aviation (e.g., Palmisano & Gillam, 2005), military combat (e.g., Williams, Ericsson, Ward, & Eccles, 2008), medicine (Kushniruk & Patel, 1998), and sport (Williams, Ford, Eccles, & Ward, 2011). Many of these researchers have used simulation in all its various guises, including virtual, computer, and film-based approaches, to recreate the performance environment under controlled and reproducible conditions in the laboratory (for a review, see Ward, Williams, & Hancock, 2006). In sport, the design of these representative tasks is a significant challenge for scientists attempting to faithfully capture and reproduce the performance environment, particularly the perceptual and cognitive demands of the task. The challenge has been to try and reproduce the highly dynamic and rapidly changing nature of the competition setting in a controlled, repeatable manner for experimentation.

Scientists have made attempts to use representative tasks with high fidelity, which is the degree to which a model or simulation reproduces the state and behavior of a “real-world” feature or condition (Hays & Singer, 1989). However, the overriding tendency has been to design simplistic and sometimes manufactured or contrived tasks, with a stronger emphasis on internal rather than external validity (Dhami, Hertwig, & Hoffrage, 2004). A concern is that such designs may introduce potential floor or ceiling effects on performance, denying experts access to information they would normally use and/or causing them to employ different processes to solve a particular task (Abernethy, Thomas, & Thomas, 1993).

Another important concern is the degree to which stimuli and their associated responses are related to each other or not, which is known as stimulus–response compatibility. Scientists exploring stimulus–response compatibility effects have shown that responses are faster and more accurate when a spatial stimulus array matches a spatial response array (compatible mapping) than when it does not match (incompatible mapping) (see Fitts & Deininger, 1954). Furthermore, when this compatibility is low, the simple relationship between the potential amount of information to be processed and the capability to process it effectively can be affected (Proctor & Reeve, 1990). Thus, the stimulus–response compatibility effect appears to take place when there is a physical or conceptual similarity between the stimulus and response sets (Kornblum, Hasbroucq, & Osman, 1990).

In recent years, there have been concerns raised and much debate voiced about the representativeness and fidelity of research on expert performance (e.g., Dhami et al., 2004; Dicks, Davids, & Button, 2009; Ericsson & Williams, 2007). Researchers have become aware of the need to develop more representative experimental tasks for testing and training the processes and component skills underpinning expert performance. For example, in sport the fidelity of the experimental task design has been shown to influence the perceptual behaviors employed (Dicks, Button, & Davids, 2010). Dicks and colleagues (2010) showed that different and more pertinent visual search patterns were employed by experienced soccer goalkeepers under more representative task constraints, as compared with the less representative conditions. Therefore, there is a need to design and employ task conditions that provide realistic but reproducible domain-specific situations so that performance can be objectively evaluated over repeated tests.

In the present article, we examine whether the fidelity of the response mode influences the underlying processing strategies governing anticipation and decision-making performance during a representative film-based simulation of 11 versus 11 open-play soccer situations. We use a novel approach to this issue by collecting retrospective verbal reports of thinking from skilled participants under two different response conditions that were either stationary or movement based, in conjunction with standard measures of response accuracy. It was expected that the difference in response fidelity between the stationary and movement conditions would lead to differences in the verbal reports articulated by participants across the two conditions (e.g., Proctor & Reeve, 1990). A greater proportion of higher-order verbal report statements of cognitive processes were expected, such as predictions and action planning, for the movement, as compared with the stationary response group due to the higher fidelity of the representative task design and increased stimulus–response compatibility. Moreover, we hypothesized that participants in the movement group would demonstrate more accurate performance in comparison with their counterparts in the stationary group (see, e.g., Fitts & Deininger, 1954).

Materials and method

Participants

In total, 20 male semiprofessional soccer players participated in the experiment. Participants were randomly allocated into two different experimental groups: movement-based (n = 10) and stationary (n = 10). The movement group took part in a more realistic representative task that included an action component, while in the stationary group, a less realistic verbal response task was employed with participants remaining in a seated position. Participants in the movement group (M age = 21.5 years, SD = 2.0) had played soccer regularly since the mean age of 5.6 years (SD = 1.2), during which they had trained/played for a mean of 9.2 h (SD = 1.7) per week and participated in an average total of 615 (SD = 131) competitive matches. The stationary group (M age = 21.1 years, SD = 2.0) had regularly participated in soccer since the mean age of 5.8 years (SD = 1.6), during which they had trained/played for a mean of 9.0 h (SD = 1.8) per week, including participation in an average total of 632 (SD = 145) competitive matches. Informed consent was provided prior to participation, and ethical approval was gained through the lead institution’s Ethics Board.

Stimuli and apparatus

Participants were presented with life-size video sequences involving dynamic, 11 versus 11 soccer situations filmed and viewed from the perspective of a central defender and with the opposition team in possession of the ball (for further details on the production of the video-based test stimuli, see Roca, Ford, McRobert, & Williams, 2011). The video stimuli consisted of 20 test and 4 practice trials. All video clips were approximately 5 s in duration, with each one being occluded at a key moment in the action, such as when the opposition player in possession of the ball was about to make a pass, shoot at goal, or maintain possession of the ball by dribbling forward.

The test film stimuli were back projected onto a large projection screen (Draper Cinefold, Spiceland, IN; height = 3 m, width = 4 m) that was placed at a distance of 3 m directly in front of the participant. Participants in the movement group were free to move and interact with the action sequence as they would normally do when playing in a real soccer match, which includes moving forward, backward, and sideways (see Fig. 1). The movement of participants were monitored using a digital video camera (Canon LEGRIA FS200, Tokyo, Japan) positioned 3 m behind the participant and linked to a TV monitor screen (Philips 15PF5120, Eindhoven, Netherlands) placed on the experimenter’s desk. In contrast, in the stationary group, participants were seated during the experiment at the same start position as the movement group.

Fig. 1
figure 1

The experimental layout employed in the movement-based response condition

Verbal responses and reports of thinking were collected using a lapel wireless microphone system (Sennheiser EW-100G2, Wedemark, Germany), including a telemetry radio transmitter fixed to the participant and a telemetry radio receiver connected to the digital video camera.

Procedure

Prior to the experimental task, participants were given instruction and training on how to think aloud and provide retrospective verbal reports. The instructions comprised Ericsson and Kirk’s (2001) adaption of Ericsson and Simon’s (1993, pp. 375–379) original protocol combined with a series of domain-specific warm-up tasks. Training continued until participants were comfortable with the procedure, and feedback was given to ensure that participants’ verbal reports were consistent with the instructions. The verbal report training protocol lasted approximately 30 min. Once training had been completed, participants were presented with a total of 4 warm-up trials to ensure familiarization with the experimental setting and the protocol procedure. Retrospective verbal reports were collected directly after every trial. After providing verbal reports, the participants were required to verbally confirm “what the player in possession was going to do” and “what decision the participant themselves made or were about to make at the moment of video occlusion.” Participants completed 20 test trials in a quiet room, and each individual completed the training and test session in about 60 min.

Two outcome measures of performance were obtained. Anticipation accuracy was defined as whether or not the participant correctly selected the next action of the player in possession of the ball at the moment of video occlusion, such as a pass to a teammate, a shot at a goal, or dribbling the ball forward. A panel of three Union of European Football Associations qualified soccer coaches independently selected the most appropriate decision for a participant to execute in response to the on-screen situation at the time of video occlusion on each trial. The interobserver agreement between coach selections was 91.7 %. Decision-making accuracy was defined as whether or not the participant decided on the action selected by the coaches as most appropriate for that trial. The correspondence between the movement group’s action selection (as determined through verbal confirmation by participants of their decision at the end on each trial) and action execution (as determined through video observation of participants on each trial) was 100 %. Anticipation and decision-making accuracy were calculated as the mean number of trials (in percentages) on which the participants selected the correct response.

The verbal report data were analyzed using the three most discriminating trials between groups, which were chosen on the basis of the mean scores from the anticipation and decision-making measures (cf. McRobert, Williams, Ward, & Eccles, 2009). Participants’ retrospective verbal reports were transcribed verbatim and segmented using natural speech and other syntactical markers. Verbal reports were classified according to a structure adapted from Ericsson and Simon (1993) and further developed by Ward, Williams, and Ericsson (2003). Four major types of cognitive thought statement categories were coded: (1) Monitoring statements were those recalling current actions or descriptions of current events; (2) evaluations were statements making some form of comparison, assessment, or appraisal of events that were situation, task, or context relevant; (3) predictions referred to statements anticipating or highlighting future or potential future events; and (4) planning statements were those referring to a decision(s) on a course of action in order to anticipate an outcome or potential outcome. The reliability of the data was established using the intra- (94.5 %) and interobserver (93.3 %) agreement formulas. These figures were created from a reanalysis of 20.0 % of the data, using procedures recommended by Thomas, Nelson, and Silverman (2005).

Results and discussion

Independent t-tests showed no significant difference in anticipation response accuracy between the movement-based (M = 60.0 %, SD = 12.9) and stationary (M = 59.5 %, SD = 7.3) groups, t(14.02) = 0.11, p = .92, d = 0.05. However, there was a trend toward significance for the decision-making response accuracy scores, t(18) = 1.79, p = .090, d = 0.80. The mean percentage of correct decision-making responses for the movement group (M = 79.5 %, SD = 8.0) was slightly higher than that for the stationary group (M = 73.0 %, SD = 8.2).

A 2 group (stationary, movement) × 4 statement type (monitoring, evaluation, prediction, planning) ANOVA revealed a main effect for group, F(1, 18) = 280.33, p < .001, η p 2 = .94. The movement group (M = 7.37 statements, SD = 2.05) generated significantly more verbal statements of cognitive processes, when compared with the stationary group (M = 4.37 statements, SD = 0.85). There was also a significant effect for type of verbal statement, F(3, 54) = 61.40, p < .001, η p 2 = .77. Bonferroni-corrected pairwise comparisons demonstrated that participants verbalized significantly more monitoring statements (M = 3.27 statements, SD = 1.12), as compared with all other statement types. A higher number of predictive statements (M = 1.15 statements, SD = 0.44) were verbalized, as compared with evaluation statements (M = 0.60 statements, SD = 0.71). No differences were found between planning statements (M = 0.85 statements, SD = 0.81) and evaluation or predictive statements. These data are presented in Table 1.

Table 1 Mean (SD) number of type of verbal statements generated per trial between stationary and movement response groups

The group × type of verbal statement interaction was not significant, F(3, 54) = 0.58, p = .63, η p 2 = .03. However, because the movement group made more statements, as compared with the stationary response group, the frequency scores for each category were subsequently normalized into percentage data. Participants in the stationary group made a greater proportion of monitoring statements, as compared with any other type of statement (M = 67.4 %, SD = 13.0 vs. M = 32.6 %, SD = 13.0). In contrast, participants in the movement group made a lower proportion of monitoring statements, as compared with any other statement type (M = 48.6 %, SD = 10.1 vs. M = 51.4 %, SD = 10.1), indicating that they engaged in a greater amount of higher-order cognitive thought processing involving evaluations, predictions, and planning.

Our findings support the hypothesis that differences in the fidelity of the two response methods (Hays & Singer, 1989) would lead to differences in the verbal reports articulated by participants. Moreover, it supports previous research that showed that the capability to process information effectively is reduced when there is a lower “natural” connection between the stimulus and the required/associated response (e.g., Proctor & Reeve, 1990). The cognitive thought processes observed for the movement-based group may be a result of the improved compatibility between stimulus characteristics and response selection/execution under the more realistic settings. That is, the need to move in response to the continuous action presented on the life-size screen appears more compatible with the skilled player’s customary response in the game situation. A further interpretation could be that participants in the movement-based response group may have invested more mental effort in the task and felt more engaged due to the increased fidelity (e.g., Bianchi-Berthouze, 2013). In future, a rating scale for mental effort (e.g., RMSE; Zijlstra, 1993) could be used to measure the participants’ perceived engagement and investment in the different task conditions.

Findings highlight the importance of designing representative tasks that offer participants a more realistic context for continuous decision making, perception, and action as per the environmental characteristics of the actual performance domain. Such tasks are preferable to those that isolate each/any of these elements of performance (Dicks et al., 2010; Ericsson & Williams, 2007; Ward et al., 2006). The suggestion is that greater attention needs to be displayed to the fidelity and ecological representativeness of task designs so that inferences and conclusions can be made about the specific and often complex processes that underpin and mediate expert performance.

In conclusion, we examined whether the cognitive strategies employed during filmed simulation differed depending on whether participants remained stationary or moved/interacted with it as they normally would. Participants in the movement group verbalized a larger number of thought statements, with a higher proportion related to the assessment and prediction of future options and the planning and selection of an appropriate action response, when compared with the stationary group. The higher fidelity and greater stimulus–response compatibility evident in the movement group led to different thought processes being engaged, when compared with the stationary group, albeit these changes did not have a marked impact on the accuracy of the judgments made. Our findings suggest the need to design experimental tasks that (more closely) recreate the constraints that exist in the actual performance setting in order to better identify the mechanisms and processes mediating superior performance.