Introduction

There can be little doubt that necessity is the mother of invention, and the driving force for considering a methodological approach in a new light. Reverse correlation methods have a long productive history across a diverse range of topics in psychological and biological sciences (Ahumada & Lovell, 1971; Marmarelis & Marmarelis, 1978). Relatively recently they have been applied to the specific topic of face perception (e.g., Haig, 1985; Gosselin & Schyns, 2001; Sekuler, Gaspar, Gold, & Bennett, 2004) and provided some important insights into the understanding of this vital ability. However, while approaches such as these have gleaned a wealth of information from healthy adult participants (typically the classic undergraduate student sample (Heinrich, Heine, & Norenzayan, 2010)), their technical requirements have largely precluded a more general applicability that encompasses children and most atypical groups. To address this, we developed a participant-friendly version of one such technique (Bubbles, Gosselin & Schyns, 2001) and for the first time were able to use this approach successfully to better understand the development of face processing in typical children (6–12 years: Ewing, Karmiloff-Smith, Farran & Smith, 2017a, b).

In a standard adult Bubbles experiment, participants are expected to complete a large number of trials to guarantee comprehensive sampling of the stimulus space. Typically, this would be many hundreds of trials (at least 500 per condition, often more, e.g., Gosselin & Schyns, 2001; Smith et al., 2005), completed over multiple, extensive testing sessions. In adapting the paradigm to a non-standard audience we faced two important challenges: firstly, ensuring that we had sufficient information sampling to perform the Bubbles analysis and secondly ensuring that our participants remained fully engaged and motivated for as long as possible. To address the former point, rather than test a small number of individuals over many trials (as is typical), we tested large numbers of individuals over a relatively small number of trials. To address the latter point we introduced a number of modifications to the testing sessions, including shorter blocks with an onscreen countdown block progress bar, an interactive and encouraging experimenter sitting alongside the participant and engaging with them during all breaks, and finally the introduction of the puzzle-bubble game during breaks. This game involved the participants guessing the name of famous films/locations/TV-shows from as little visual information as possible; cheeky bubbles “hid” the key details, but could be removed by the experimenter to provide further clues. Anecdotally, these changes, and the puzzle bubble game in particular, appeared surprisingly popular for children and adults alike!

Although the effects of mental fatigue are well known to negatively impact cognitive performance (e.g., Boksem, Meijman & Lorist, 2005; Hopstaken, van der Linder, Bakker, & Koppier, 2015) with underlying changes in brain activation patterns (e.g., Lorist, Boksen, & Ridderinkof, 2005; Boksem, Meijman, & Lorist, 2006; Borghini et al., 2012; Tanaka, Ishii, & Watanabe, 2014), there tends to be only minimal consideration of the participant experience during the administration of repetitive tasks often asked of participants in Psychology experiments. Mental fatigue occurs as a result of sustained periods of demanding task performance and is typically characterized by changes in mood and motivation (e.g., Boksem & Tops, 2008), and in particular a reduction in task engagement (Hopstaken, van der Linden, Bakker, & Kompier, 2015a). Due to its importance in driving workplace errors and accidents, the study of mental fatigue has often focused on the practical implications for occupational settings. However, as mental fatigue is often directly linked to brain processes critical for performance in psychophysical tasks (e.g., attention, Boksem, Meijman, & Lorist, 2005; global/local processing bias, van der Linder & Eling, 2006; executive control, van der Linder, Frese, & Meijman, 2003), it follows that by overlooking their impact, researchers of human behavior may be deleteriously adding noise to their studies.

Research suggests that one way to counter the effects of mental fatigue is to boost the rewards associated with participation (Boksem & Tops, 2008; Hopstaken, van der Linder, Bakker, & Kompier, 2015a, 2015b) to re-engage fatigued participants in a given task. Given this, we were interested to observe if our participant-friendly task modifications, which were specifically designed to engage young/cognitively impaired individuals in our demanding, repetitive, and relatively boring tasks, could also have a measurable impact on task performance and data quality in a standard adult sample.

We set out to validate the impact and effectiveness of our task engagement strategy and the modifications made to the operation of the task by running three identical base versions of the paradigm with the same adult participants in a single testing session. In one version adults performed the task with no experimenter interaction during the entirety of the task. There was no puzzle bubble game and only generic self-paced “take-a-break” screens between blocks. In a second version, adults again performed the experiment with no experimenter interaction, but with the puzzle bubble game (played independently) separating blocks (even-numbered blocks only). Finally, in the third version, the experimenter interacted with the participant as they played the puzzle bubble game (matching the participant-friendly implementation). All other aspects of the methodology remained constant across the three versions of the task. Furthermore, we employed a modified short-form of the Intrinsic Motivation Inventory (IMI; Ryan, 1982) to directly assess each participant’s subjective experience of each experimental condition to determine if our manipulations significantly altered the participants’ experience of completing the task.

We directly compared performance across the different versions of the task, with the expectation that the introduction of both the puzzle-bubble game (to enforce breaks between trial blocks, thus ensuring that the blocks are spaced out, and to alleviate the tedium of completing many similar trials) and interaction with the observer during breaks would lead to better performance on the task and cleaner statistical results. Comparing version 2 and version 1 permits us to evaluate the effectiveness of the puzzle bubble game in boosting task engagement in itself, while the comparison of version 2 and version 3 establishes the extent to which any improved performance is driven by interaction with the experimenter. Direct comparison between self-report measures of task engagement (from the IMI) and objective performance metrics (from the Bubbles task) allows us to explicitly establish if greater task engagement is significantly tied to experimental outcomes on a psychophysics task such as this.

To the best of our knowledge, this is the first time that the impact of the participant experience has been explored in the context of a repetitive visual psychophysics task conducted under typical experimental testing conditions (not those designed to specifically induce mental fatigue by having participants perform the same task repeatedly for a number of hours with no breaks). Should the subjective participant experience and task engagement directly impact cognitive performance and resulting data quality, then there are clear implications across a wide range of research areas in the psychological sciences.

Methods

Participants

Thirty adults (ten male, mean age = 26.2 years, SD = 10.1) completed a single testing session lasting approximately 45 min. All participants had normal or corrected-to-normal vision, no history of psychological problems, and provided signed informed consent. The study was approved by the ethics board of the Department of Psychological Sciences, Birkbeck College, University of London.

Procedure

Using a repeated-measures design, in a single testing session participants completed three bubbles task versions. Each took 10 to 15 min to complete, and were identical except for the introduction of the puzzle-bubble game during breaks between blocks (versions 2 and 3), and standardized interaction with the experimenter during the puzzle bubble game (version 3). Participants each completed a single puzzle bubble challenge per break (for a total of four challenges across the 512 trials of task versions 2 and 3), with each challenge lasting approximately 3 min. The order in which participants completed each version of the task was randomized via a Latin square procedure with ten participants completing each order of the different versions.

In the Bubbles task, participants were asked to categorize sub-sampled versions of expressive faces by the expression shown. The approach works by presenting only some parts of a stimulus (typically visual) to the participant on each trial and relating categorization decisions to the information that was presented. On each trial, most of the stimulus is hidden from view and only the information located behind a number of randomly positioned circularly symmetric Gaussian apertures is made available to the participant to make their categorization decisions. The location of the apertures varies randomly across trials so that over sufficient trials an exhaustive search of the visual space will have been conducted. Reverse correlating the location of the apertures with categorization responses permits the experimenter to establish which visual regions are significantly correlated with categorization performance and therefore can be concluded to be essential for the task at hand.

Stimuli were fearful, happy, angry, and sad expressions taken from the California Facial Expression database (CAFÉ, Dailey, Cottrell & Reilly, 2001, as used in previous Bubbles expression categorization studies, e.g., Smith et al., 2005; Smith & Merluscal, 2014; Schyns, Petro, & Smith, 2007). As per existing Bubbles studies of facial expression categorization, stimuli were decomposed in six non-overlapping spatial frequency bands (SF) of one octave each (120–60, 60–30, 30–15, 17–7.5, and 7.5–3.8 cycles per image). To create a single experimental stimulus, each SF band was sampled independently with randomly positioned Gaussian apertures (the Bubbles) whose size was adjusted at each scale to reveal three cycles per aperture and whose number (per scale) was adjusted to ensure equivalent sampling of each SF scale (i.e., more small high SF bubbles than the larger low SF bubbles). The sampled information from each scale was then recombined into a single stimulus image comprising visual information across the SF bands (see Gosselin & Schyns, 2001 and Smith et al., 2005 for fuller details of the stimulus generation process). The total number of apertures (Bubbles) over all SF scales was adjusted on a trial per trial basis via a staircase algorithm to target a performance criterion of 75% correct. To this end, poor performance resulted in more information on a subsequent trial (i.e., more bubbles), while higher than target performance resulted in a reduction in the amount of information presented on subsequent trials (i.e., less bubbles).

In each version of the task participants completed 512 emotion categorization trials (128 per emotion) by categorizing each stimulus by emotion (labelled keyboard keys denoted fearful, happy, angry, sad, and don’t know), for a total of 1,536 trials over the course of the full experiment comprising the three task versions. A short practice phase prior to testing confirmed that participants could correctly categorize the non-Bubbled (i.e., intact) face stimuli by expression and introduced the participants to the response keys. Participants sat 70 cm from the experiment, which ran on MATLAB using the Psychophysics toolbox (Pelli), such that stimuli subtended a viewing angle of 5.36 × 3.7° of visual angle.

Unlike standard implementations of the Bubbles technique, in the modified child-friendly version, we added a “don’t know” response to prevent participants from correctly guessing and adding unnecessary noise to the data (“don’t know” responses were coded as incorrect). Furthermore, we introduced a count-down bar onscreen that permitted participants to gauge their position in a block, and reduced the length of individual blocks to a few minutes (64 trials) rather than around 5 min.

To gauge interest/motivation, participants completed a short form of the IMI (Ryan, 1982) at the end of each experimental condition. In this questionnaire we asked participants to rate (on a scale of 1–7) how they felt about the task they had just completed in terms of their interest and enjoyment (two separate questions), their perceived competence, the effort they put into their performance, the importance to them of doing well, the degree of pressure they felt, how related they felt to the experimenter, and finally how important they felt the task was.

Results

Bubbles results

We considered two performance metrics as dependent measures: the amount of information (i.e., number of bubbles) required to achieve the target performance of 75% correct for each emotion and the actual percentage correct achieved (NB: with a small number of trials it is not possible to perfectly stabilize performance at the target 75% correct); see Fig. 1A. Alongside this, we examined the quality of the bubbles solution, i.e. the visual information that is significantly associated with categorization of each emotional expression. A one-way repeated measures ANOVA with task version (1, 2, 3) as the within-subjects factor indicated a clear main effect on the amount of information required to achieve good performance levels (F(2,58) = 3.8, p = 0.029, η 2=0.12). Planned comparisons revealed that participants required significantly less information for task version 3 (M = 85 bubbles) compared to task version 2 (M = 97bubbles, F(1,29) = 5.6, p = 0.025, η 2 = 0.16), but there was no such drop in number of bubbles for task version 2 compared to task version 1 (M = 93 bubbles, F(1,29) = 0.9, p = 0.35, η 2 = 0.03). An equivalent ANOVA on percentage correct scores indicated a trend for a main effect of condition here too (F(1.3, 37.6) = 3.4, p = 0.06, η 2 = 0.11), with planned comparisons again showing that participants are performing slightly better in task version 3 (74.4%) compared to task version 2 (72%, F(1,29) = 4.2, p = 0.049, η 2 = 0.13), but with no improvement for task version 2 compared to task version 1 (73.4%, F(1,29) = 2.3, p = 0.14, η 2 = 0.07).

Fig. 1
figure 1

(A) Behavioral metrics of performance accuracy and the amount of information required in red for task version/condition 1, green for condition 2, and blue for condition 3. (B) Regions significantly associated with correct categorization performance for fearful and happiness categorizations (p<0.05 corrected) for condition 1 (red), condition 2 (green), and condition 3 (blue). Note that when the same region is significant for multiple conditions it will be colored as per the RGB color space combination (e.g., purple = red + blue = condition 1 and condition 3, white = red+green+blue = all three conditions). (C) The un-thresholded information association maps between correct categorization performance and information location (measured as z-scores, higher values represent a greater association between presentation of information at that location and correct categorization response)

To evaluate the effectiveness of the task version manipulations on the quality of the bubbles solution we considered the information processing results for the two most well-researched emotional expression categorizations: fear and happiness.Footnote 1 The critical visual information for both fear and happiness categorizations has been confirmed across a number of studies in typical adult participants. For fearful categorizations the crucial visual information has been repeatedly shown to comprise the wide-open eyes across scales in higher spatial frequencies (scales 1–3), alongside the open mouth (scales 3 and 4, e.g., Adolphs et al., 2005; Smith & Merlusca, 2015; Smith et al., 2005, F. Smith & Schyns, 2009;). For happiness categorizations it is the wide-open mouth, from fine detail in the higher spatial frequencies through to the broad low spatial frequency mouth shape information (Adolphs et al., 2005; Smith & Merlusca, 2015; Smith et al., 2005, F. Smith & Schyns, 2009).

For both fear and happy, and all three task version scenarios, the bubbles solution replicates most Footnote 2 of the key features of these established processing profiles. Figure 1B shows only those regions that pass the corrected statistical tests (p<0.05, Chauvin et al., 2005) highlighted on a sample face. Significant regions observed under task version 1 are coded in red, those from task version 2 in green, and finally those of task version 3 in blue. Note that where the same regions were significant in multiple task versions it is color-coded in the RGB color space combined color (e.g., the same region significant for task version 1(red) and 3(blue) would be coded in purple, the same region significant for all task versions would be coded in white). Figure 1C presents the information association maps (z-scores) for all positive associations between information sampling and performance for each condition in turn across the five spatial frequency bands sampled prior to applying the statistical threshold.

Importantly, not all task versions produced equally clear profiles of information use. Close inspection of the results reveals that for fear categorizations, it is only in task version 3 – where social interaction and participant engagement are maximized – that both eyes reach significance in the highest spatial frequency band. Similarly, for happiness categorizations it is only in task version 3 that the entire higher spatial frequency mouth reaches significance. Furthermore, when considering the absolute strength of the association between the important pixel locations and performance (the un-thresholded z-scores, presented in Fig. 1C) the largest values are generally observed for task version 3, see Table 1.Footnote 3

Table 1 Maximal strength of the association between information location and performance (measured in z-scores) indicating a stronger association for task version 3 for scales 1–3 for both fear and happy, and again for scale 5 for fear categorizations

Motivation Questionnaire results

One participant failed to understand the instructions with regard to the questionnaire (choosing to answer only one of the eight questions at each administration), and the data for one participant, completing one condition, was lost due to experimenter error leaving 28 participants. A one-way repeated measures ANOVA, with task version as the factor with three levels, was conducted for each question in turn (GG correction reported for violations of sphericity). Significant effects were further explored with post-hoc follow-up t-tests (Bonferroni corrected for multiple comparisons); see Fig. 2 for average responses per condition. We observed a significant effect of condition for participant’s self-reported enjoyment (F(2,54) = 4.7, p = 0.013, η 2 = 0.15), interest (F(2,54) = 3.86, p = 0.027, η 2 = 0.13), desire to do well (F(2,54) = 4.4, p = 0.016, η 2 = 0.14), pressure felt (F(1.57,42.27) = 6.6, p = 0.006, η 2 = 0.2), and connectedness to the experimenter (F(1.26, 34.06) = 13.2, p<0.001, η 2 = 0.33), with a clear trend for an effect also on the effort they expended (F(2,54) = 3.04, p = 0.056, η 2 = 0.1). There was no effect of experimental condition on their desire to be good at the task (F(2,54) = 1.8, p = 0.17, η 2=0.06) or how important they felt the task was (F(2,54) = 0.74, p = 0.48, η 2 = 0.03).

Fig. 2
figure 2

Subjective ratings from participants after completing each experimental task version/condition (condition 1 in red, condition 2 in green, and condition 3 in blue)

Planned comparisons confirmed that participants enjoyed participating in condition 3 more than condition 2 (F(1,27) = 6.4, p = 0.018, η 2 = 0.19), but with no such benefit for condition 2 over condition 1 (F(1,27) = 0.6, p = 0.45, η 2 = 0.02). Similarly, participants expended more effect in condition 3 compared to condition 2 (F(1,27) = 3.98, p = 0.056, η 2 = 0.13), with no difference between conditions 1 and 2 (F(1,27) = 0.3, p = 0.59. η 2 = 0.11). They also tried to do well more for condition 3 than condition 2 (F(1,27) = 4.7, p = 0.039, η 2 = 0.15), with no difference between conditions 1 and 2 (F(1,27) = 1.35, p = 0.26, η 2 = 0.05). As expected, participants felt more connected to the experimenter in condition 3 than condition 2 (F(1,27) = 15.9, p<0.001, η 2 = 0.37), but this came at the cost of feeling more pressure (F(1,27) = 6.7, p = 0.013, η 2 = 0.2). Again there was no difference for either connectedness or pressure felt between conditions 1 and 2 (F(1,27) = 1.3, 0.36, p = 0.26, 0.55 , η 2 = 0.05, 0.013, respectively). Finally, participants’ interest in the experiment did not increase significantly between conditions 2 and 3 (F(1,27) = 0.58, p = 0.45, η 2 = 0.02), but rather there was a trend for interest to be significantly greater for condition 2 than for condition 1 (F(1,27) = 4.0, p = 0.056, η 2 = 0.13).

In an exploratory analysis we then asked whether subjective feelings representing engagement with the task might be directly correlated with markers of task performance (percentage correct, mean number of bubbles) within each task version. We considered self-report measures of effort expended as the best proxy for task engagement and found clear relationships between increased engagement and improvements in the behavioral performance metrics for all task versions, but most so for task version 3 (V1: Accuracy, r2(28) = 0.40, p = 0.03, Information required, r2(28) = −0.33, p = 0.09; V2: Accuracy, r2(28) = 0.52, p = 0.005*, Information required: r2(28) = −0.46, p = 0.013; V3: Accuracy, r2(28) = 0.53, p = 0.004*, Information required: r2(28) = −0.49, p = 0.009*; *denotes Bonferoni corrected significant effects). Note that engagement with the task as approximated by effort expended was not directly correlated with “pressure felt” under any task version (r2(28) = 0.19, 0.03, 0.22, p = 0.35,0.86,0.26, respectively) and in particular the increased pressure felt under task version 3 did not seem to be a significant driving force of improved performance (Accuracy, r2(28) = 0.089, p = 0.65 Information required, r2(28) = −0.16, p = 0.43). Similarly, increased feelings of connectedness to the researcher did not correlate significantly with performance under any task version (V3: r2(28) < 0.21, p > 0.28; V1, V2: r2(28) < 0.23, p > 0.23).

Discussion

Here we tested a modified implementation of the Bubbles reverse correlation paradigm that is more appropriate for a developing sample (children) and potentially others for whom the traditional method would make participation very challenging (e.g., individuals with low cognitive ability). Participants completed three versions of the same Bubbles emotion categorization experiment in a single session, with the order of the different versions counter-balanced. With the exception of the reduced number of trials, the first version mirrored that of a standard experiment in most aspects (generic screens providing self-paced short breaks every few minutes, though the use of a count-down bar and presence of an experimenter in the testing room are novel). The second version introduced the puzzle-bubble game as a self-controlled diversion from the monotony of the main task. Finally, in version three, the experimenter actively “played” the puzzle-bubble game with the participant, acting as quiz master to interact and provide encouragement. Our results indicated better performance for version 3 across the board. Participants demonstrate a capacity to achieve higher performance levels and require less information to do so when performing an otherwise identical psychophysics task. In addition, participants are also subjectively more motivated – they report higher levels of enjoyment, interest, effort, and a greater desire to do well. Unsurprisingly, participants also feel a greater connection to the experimenter but also more pressure.

A relatively large number of participants for this type of study (30) each completed a relatively small number of experimental trials (128 per emotion category) in each of the three different experimental arrangements. Despite a smaller overall number of trials (3,840 here (per emotion, per experiment version) vs. 5,000 (Smith & Merlusca, 2015) or 16,800 (Smith, Gosselin, Cottrel, & Schyns, 2005)), our Bubbles information use results are clearly aligning with established findings for the well-established happy and fearful expression categorizations. The significant features driving fear categorizations (wide-open eyes across the high and mid spatial frequencies, mouth at lower spatial frequencies) and the features found to be significant for happy categorizations (the broad smiling mouth across spatial frequency bands) mirror past findings. We observe most of these significant visual regions for all three task versions, but note that in fear categorizations the use of the eyes in the highest spatial frequency band only reached significance for version 3. Similarly, it is only under version 3, that the full high spatial frequency mouth reaches significance for happy expression categorization. Furthermore, for the majority of the key visual features, task version 3 produced the statistically cleanest result as indicated by the highest association between visual information and behavioral performance.

Our Bubbles paradigm results and the motivation questionnaire findings all highlight the importance of social interaction in boosting subjective motivation and task engagement, alongside generating significant improvements in objective task performance and the quality of the Bubbles solution. Little benefit is observed for the use of the game distraction during breaks on its own with the only reported difference being an increase in subjective interest in the task. Past research has shown general cognitive benefits of social interaction including boosting measures of executive functioning (Ybarra et al., 2010), working memory, and speed of processing for simple dot patterns (Ybarra et al., 2008), and acting as a potential intervention to slow cognitive decay (Dodge et al., 2015), but to the best of our knowledge this is the first study to find clear benefits of ongoing interaction in a perceptual task such as this. Social interaction is known to constitute a reward in and of itself (Insel, 2003; Walter, Abler, Ciaramidaro, & Erk, 2005), and social rewards (typically simply photographs of attractive smiling faces) activate similar neural reward structures to monetary rewards (Aharon et al., 2001; Izuma, Saito, & Sadato, 2008; Lin, Adolphs & Rangel, 2012; Sprecklemeyer et al., 2009), with some researchers finding that social rewards can be even more motivating than financial rewards in occupation contexts (Graham & Unruh, 1990). The social interaction taking place in version three of the task here could function in a similar, and likely enhanced, manner to activate these same reward structures and boost goal-directed behavior in the task.

As such, we conclude that it is likely that any similar diversionary activity that engages the participant with the experimenter during breaks is likely to lead to a similar boost in performance and participant experience. Further studies could explore the extent and nature of the diversion and interaction required in more detail to further optimize testing efficiency. Extant evidence suggests that the interaction should be neutral or cooperative (as opposed to competitive) to drive improved performance (Ybarra et al., 2010). Other factors including explicit feedback, either as vocal praise (a staple of education/training), numerical assessments of ability, or more traditional rewards (e.g., desired foods, monetary rewards, gifts – e.g., small toys/stickers for children) could also be interesting avenues to explore in the context of boosting task engagement and associated performance and ability.

It is important to note that we did not set out here to establish the necessary (or sufficient) number of trials required to achieve a stable Bubbles solution, and it would be incorrect to conclude a lower bound from the current findings. Determining the necessary number of trials required to accurately categorize important information use for a particular categorization is an important question for future research but is outside the scope of the current manuscript. It is a complex problem that will vary depending on a considerable number of factors. For example, obtaining a stable solution will require more power (i.e., more trials) when the categorization to be made is more challenging, e.g., in the case of sadness and anger here. Trial numbers might also vary if individual differences across participants result in consistently high levels of noise – see Wang, Friel, Gosselin, and Schyns (2011) for an estimate in a small set of individuals in a standard Bubbles expression categorization task, and note that they observed considerable individual differences in the number of trials required. If it proves possible to establish a target number of trials for a particular categorization, one could then explore if improvements to participant engagement significantly alter this. Finally, it is also important to note that the participant-friendly approach presented here is intended to pull out similarities in information use within a wide sample of participants. In situations where one expects the sample to vary widely in the strategies employed, e.g., in developmental prosopagnosics who report a wide number of strategies to counter their face-processing deficits (Yardley, McDermott, Poisarski, Duchaine, & Nakayama, 2008), an approach such as this is unlikely to work.

Conclusions

Working productively with young children and other groups varying in cognitive ability often requires careful consideration of the participant experience that can be foreign to those working with complex psychophysical paradigms. The results presented here signal that child-friendly design modifications are possible and need not undermine the interpretability of results. In fact, our findings show the opposite pattern. Here, they pay clear research dividends with typical adult participants. By boosting task engagement via an interactive game, we were able to improve objective task performance and the statistical power of our results in a basic investigation of face-processing taking place in a short testing session (only 15 min per task). These results will hopefully encourage researchers to see that creating a friendly and engaging participant experience should not be limited to situations with children or atypical populations. We have confirmed empirically that there are significant benefits associated with expending a little more time and effort during data collection.