Organisms face challenges to survival when environments change. Behavioral and evolutionary scientists have examined what is broadly called behavioral flexibility in the face of environmental changes under a wide range of species and conditions (see Lea et al., 2020). One general approach to examining behavioral flexibility assesses how behavior changes when responses that paid off at one time or in one setting do not pay off under other circumstances. In other words, how does operant behavior change when reinforcement contingencies change?

Reversal learning

Reversal learning is one approach to examining changes in operant behavior under changing reinforcement conditions (see Izquierdo et al., 2017, for a review). Tasks using a midsession reversal of reinforcement contingencies typically include two response alternatives and arrange a contingency reversal after a fixed number of trials (see Rayburn-Reeves & Cook, 2016; Zentall, 2020, for reviews). Responding to one location or stimulus (S1) is reinforced during trials comprising the first half of a session, and responding to a different location or stimulus (S2) is reinforced during trials comprising the second half of a session. With repeated exposure to the contingency change between S1 and S2 across sessions, responding reliably tracks those reinforcement contingencies across many species, including rats (e.g., Rayburn-Reeves et al., 2013), pigeons (e.g., R. G. Cook & Rosen, 2010), dogs (Laude et al., 2016), nonhuman primates (e.g., Rayburn-Reeves et al., 2017), and humans (e.g., McMillan & Spetch, 2019).

There are some apparent differences among species in how closely behavior tracks those contingencies before and after the midsession-contingency reversal. For example, counting the number of trials to reversal appears to be unique to humans (McMillan & Spetch, 2019; Rayburn-Reeves et al., 2011). However, humans, rats, parrots, and monkeys all have been reported to engage in a win-stay/lose-shift pattern of behavior—experiencing nonreinforcement of responding to S1 following the contingency reversal during the second half of sessions rapidly produces a shift to responding to S2 (Laschober et al., 2021; Rayburn-Reeves et al., 2011; Rayburn-Reeves et al., 2013; Rayburn-Reeves et al., 2017). These findings suggest individual response-outcome contingencies from trial to trial largely underly discriminations driving reversal performance between S1 and S2 in these species. In contrast, pigeons and dogs have been reported to make large numbers of anticipatory errors before contingency reversals and perseverative errors following contingency reversals (R. G. Cook & Rosen, 2010; Davison & Cowie, 2019; Laude et al., 2016). These findings suggest temporal cues from the start of sessions govern discrimination of contingency reversals in pigeons and dogs to a greater extent than rats or primates (see Zentall, 2020, for a review).

Nevertheless, several environmental conditions can influence the prevalence and pattern of anticipatory and perseverative errors during reversals across species. As examples, both rats and pigeons tend to make fewer errors during spatially defined discriminations than nonspatially defined visual discriminations (e.g., McMillan et al., 2014; McMillan & Roberts, 2012; Rayburn-Reeves et al., 2013), and error patterns between S1 and S2 change with differences in both reinforcer probability (e.g., Santos et al., 2019; Zentall, Andrews, et al., 2019a) and stimulus conditions (e.g., Rayburn-Reeves et al., 2017; Zentall, Peng, et al., 2019b). Such findings indicate that differences in error patterns are dependent on the prevailing stimulus and reinforcement conditions (see Zentall, 2020, for a review). Therefore, with some exceptions, such as counting the number of trials to reversal, differences in reversal learning among species and behavioral flexibility more generally might largely reflect quantitative rather than qualitative differences (see McMillan & Spetch, 2019). One goal of the present research was to examine the extent to which conditions of uncertainty influenced patterns of human decision-making during contingency reversals—in particular, patterns of counting trials versus a win-stay/lose shift strategy.

Humans’ use of a win-stay/lose-shift strategy has been documented across a wide variety of repeated decision scenarios that introduce conditions of uncertainty, including the Iowa gambling task (e.g., Cassotti et al., 2011; Worthy et al., 2013), the Prisoner’s Dilemma (e.g., Wedekind & Milinski, 1996), causal learning tasks (e.g., Bonawitz et al., 2014), rock paper scissors (e.g., Zhang et al., 2021), stock market predictions (Gutiérrez-Roig et al., 2016), and formation changes in football (Tamura & Masuda, 2015). Moreover, prior research on human performance during repeated decision tasks has shown that quantitative models assuming a win-stay/lose-shift strategy outperform traditional reinforcement learning models (e.g., Worthy et al., 2012; Worthy & Maddox, 2012). These findings suggest that win-stay/lose-shift is a pervasive strategy under conditions of uncertainty.

Resurgence

A related set of findings demonstrating behavioral flexibility during changes in reinforcement contingencies is an effect referred to as resurgence. Resurgence has been studied primarily as a preclinical model of relapse in which only a single contingency reversal from S1 to S2 is later followed by extinguishing the most recently reinforced response to S2 (see Lattal et al., 2017; Shahan & Craig, 2017; Wathen & Podlesnik, 2018, for reviews). The increase in S1 responding upon extinguishing S2 responding demonstrates behavioral flexibility in which a previously reinforced behavior returns despite no longer being reinforced. One goal of the present research was to examine the extent to which conditions of uncertainty influence patterns of decision-making when introducing extinction of S1 and S2 responses—in particular, patterns of counting trials versus a win-stay/lose shift strategy.

Resurgence is relevant to understanding behavioral flexibility because the return of previously successful responses is relevant to problem-solving, creativity, foraging strategies, and behavioral variation (see Shahan & Chase, 2002). For example, previously learned approaches to solving complex math problems might resurge when more recently learned approaches are ineffective (e.g., C. L. Williams & St. Peter, 2020). As with midsession reversal learning, resurgence has been demonstrated under laboratory conditions in a range of species, including zebrafish (Kuroda et al., 2017a, 2017b), Siamese fighting fish (da Silva et al., 2014), pigeons (e.g., Liddon et al., 2017), mice (Craig et al., 2020), rats (e.g., Podlesnik et al., 2019), monkeys (Mulick et al., 1976), and humans with and without intellectual disabilities (e.g., Ho et al., 2018; Podlesnik et al., 2020; Shvarts et al., 2020). Resurgence of problem behaviors also has been observed to be prevalent clinically (e.g., Briggs et al., 2018; Muething et al., 2021). In contrast to midsession reversal learning, a comparative approach to examining resurgence across species has not been employed systematically. Nevertheless, the contingency changes with both reversal learning and resurgence procedures offer an opportunity to examine counting and win-stay/lose-shift response patterns during conditions of reinforcement uncertainty. Thus, the present experiments combined reversal learning and resurgence procedures for the first time.

Present experiments

The present experiments examined S1–S2 reversal learning with humans across 10 consecutive blocks of trials -- hereafter referred to as "sessions" -- arranging either a fixed (Experiment 1) or variable (Experiment 2) reversal point. We expected responding to be more precisely controlled by the contingencies with greater exposure, with few anticipatory or perseverative errors toward the end of those sessions (McMillan & Spetch, 2019; Rayburn-Reeves et al., 2011; see also R. G. Cook & Rosen, 2010). During Sessions 11–15, we extended the number of trials per session following S1–S2 contingency reversals to assess how extinguishing responses to both S1 and S2 affected response patterns (cf. Bai et al., 2017). We expected that experiencing extinction of responding to S2 during those additional trials would initially yield an increase in responding to S1, analogous to a resurgence effect and consistent with a win-stay/lose-shift pattern of behavior. Therefore, the present experiments assessed patterns of reversal learning and resurgence under lower (Experiment 1) and higher (Experiment 2) levels of (un)certainty regarding the changing of reinforcement contingencies. This research was designed to answer the following research questions:

  1. 1.

    To what degree did performance change across trials with repeated exposure to contingency reversals?

  2. 2.

    To what degree did performance change when removing reinforcement for both responses during extinction testing?

  3. 3.

    To what degree do different response patterns reflect differences in decision-making? Specifically, how do contingency reversals and extinction induce counting versus win-stay/lose-shift strategies under conditions of lower and higher levels of (un)certainty?

Experiment 1

Experiment 1 systematically replicated Experiment 4 of Rayburn-Reeves et al. (2011) with a touchscreen computer task. During the first 10 sessions, we arranged 24 trials per session in which the reinforcement and extinction contingencies reversed predictably after Trial 12. We arranged two buttons differing in appearance, consistent with previous research on reversal learning of visual discriminations (McMillan & Spetch, 2019) and resurgence (Podlesnik et al., 2020). Therefore, the present experiment examined how contingency changes and extinction affected response patterns and decision-making strategies under relatively certain conditions.

Methods

Participants and apparatus

We recruited a total of 20 undergraduate students from the departmental research pool (SONA) at Auburn University to receive course credit and the opportunity to earn a $25 Amazon gift card. Data from all but two participants were included in the subsequent analyses (exclusion criteria described below). Participants were 18 to 22 years old (M = 19.4, SD = 1.3), and 13 were female (65%). Participants completed the experiment on a 17-in. Angel POS touchscreen monitor (1,280 × 1,024 resolution) connected to a desktop computer running Windows 10. The task was programmed using Visual Basic 2015.

Procedure

For the duration of the experiment, participants sat alone in a 6.1 square-meter room without a phone or watch. Written instructions were presented on the touchscreen monitor. Instructions read:

“After pressing the PROCEED button, you will play a game on a computer. When ready, press the PLAY button. A new page will appear, and you will see buttons. Touching buttons could earn you points. A picture will appear if you earn a point. Then, the session will be resumed shortly. Earn as many points as you can. More points increase the likelihood of receiving a $25 Amazon gift card. To play this game, we ask you not to count your responses or points.”

Instructions to avoid counting have been reported in previous research to prevent biases in responding produced by counting (e.g., Grondin et al., 2004; Rayburn-Reeves et al., 2011).

Pressing “PROCEED” introduced a “PLAY” button, which initiated the first session. Figure 1 shows objects presented during the task. Participants completed 15 sessions. The first 10 sessions consisted of 24 trials and the last five sessions consisted of 36 trials. During each trial, a workspace consisting of a beach background with a height of 700 pixels and a width of 900 pixels appeared at the center of the monitor. The rest of the space on the monitor remained black throughout the session. Though invisible, the workspace was divided into two halves (left and right) and a button appeared at the center of each half at the onset of each trial. One of the buttons was a blue square and the other was an orange square (100 × 100 pixels each). A single symbol from playing cards was superimposed on both the blue square and orange circle buttons (e.g., red heart, black club), with symbol/button combinations counterbalanced across participants. We included these symbols to increase stimulus disparity, as greater disparity between response options tends to produce greater control by consequences (e.g., Gallagher & Alsop, 2001; Godfrey & Davison, 1998). Each button was presented equally often on the left and right sides of the screen, but neither appeared on the same side for more than three trials in a row. Within each half of the workspace, buttons randomly moved 25 pixels at 1.5-s intervals within a restricted area (200 pixels × 200 pixels).

Fig. 1
figure 1

Task interface. Note. Objects presented on the screen included a single workspace (700-px × 900-px squares) with a beach image in the background. Two buttons (90-px × 90-px squares) moved on the right or left side of the screen. Buttons were a blue square and an orange square with superimposed heart and club symbols. (Color figure online)

For the first 12 trials of each session (hereafter Pre-Reversal phase), pressing the S1 button displayed a yellow star with “+1” for 1 s according to a fixed ratio (FR) 1 reinforcement schedule. Pressing the S2 button resulted in a brief blackout (the entire screen turning black for 1 s). For the next 12 trials (Trials 13–24; hereafter Post-Reversal phase) of each session, the contingencies were reversed between presses to the S1 and S2 buttons. We will refer to this portion of the experiment (i.e., Trials 1–24 across all 15 sessions) as the Contingency Reversal. Sessions 11–15 included an additional 12 trials (Trials 25–36) in which all responses resulted in a 1-s blackout and no reinforcer deliveries occurred. We will refer to these additional trials during Sessions 11–15 as the Extinction Test.

After completion of all but the last session, the buttons disappeared, and a rotating hourglass appeared on a white background for 5 s. Following this, a “PLAY” button reappeared and pressing it initiated the next session. After the 15th session, participants completed a postexperiment questionnaire which asked about demographic details (age, sex; see Supplemental Materials) and how they made decisions during the task.

Data analysis

Data screening

Data sets were excluded from analyses if (1) at least 50% of responses were allocated to the S2 button for five consecutive sessions during Trials 1 through 12 or (2) at least 50% of responses were allocated to the S1 button for five consecutive sessions during Trials 13 through 24. Similar exclusion criteria were implemented by McMillan and Spetch (2019) to exclude participants who failed to discriminate the contingencies for lack of attending to the task. Excluded data sets are presented with the Supplemental Materials.

Analytical strategy

We used a mixed-effects modeling approach (DeHart & Kaplan, 2019) to evaluate how various factors (i.e., Group, Session, Phase, Trial) influenced the likelihood of selecting S1. Specifically, a multilevel logistic model was used to evaluate trial-by-trial choice for S1 (1) and S2 (0). This approach was suited to this task and this question because the multilevel approach both accounts for intercorrelations within (i.e., Subject) and between (i.e., Group) the data. Further, this approach preserves individual-level variability and supports further review of how reversal learning and resurgence occur for individuals.

All statistical procedures were performed using the R Statistical Program (R Core Team, 2021). Multilevel modeling was performed using the lme method included in the lme4 package (Bates et al., 2015). Although the full data could be evaluated together, individual choices were evaluated separately for the Contingency Reversal (Sessions 1–15, Trials 1–24) and Extinction Test portions of each experiment (Sessions 11–15, Trials 25–36). This was the more pragmatic choice because the quantities of data differed and because research questions were specific to respective portions of the data. Bonferroni corrections (.05/2 = .025) were applied to address issues with repeated comparisons. Random effect structures were evaluated using the second-order Akaike information criterion (AICc) provided in the MuMIn R package (Bartoń, 2020), and the associated fixed effects were subsequently evaluated using likelihood-ratio tests.

Results

Descriptive analysis

The left panel of Fig. 2 shows the percentage of S1 choices as means of sessions 1-5, 6-10, and 11-15 (hereafter session blocks). A lower percentage of participants chose S1 during the first trial (M = 80.4%, SEM = 3.1) compared with subsequent Pre-Reversal trials across all three session blocks (M = 95.9%, SEM = 0.4). There were slight decreases in percentages suggesting anticipatory errors in the last trial before the midsession contingency reversals in some sessions across all three session blocks (see the Supplemental Materials). Following the reversal, the percentage of S1 choices on Trial 13 decreased across session blocks, with fewer errors observed on this trial in later session blocks (Trials 1–5: M = 92.2%, SEM = 2.9; Trials 11–15: M = 64.4%, SEM = 9.5). Percentage of S1 choices decreased rapidly across the first three Post-Reversal trials and reached low levels across session blocks (Trial 15: M = 4.4%, SEM = 1.3). The right panel of Fig. 2 shows the percentage of S1 choices across individual Sessions 11-15 during the last four trials before the Extinction Test and the first four trials during the Extinction Test. The right panel shows that there were no indications of anticipatory errors before the Extinction Test.

Fig. 2
figure 2

Percentage of S1 choices across trials and sessions in Experiment 1. Note. This figure shows the percentage of participants choosing S1 (1) as a mean across 5-session blocks (left panel) and (2) in individual Sessions 11–15 in the trials immediately preceding and following the onset of extinction (right panel). Solid lines represent the onset of contingency changes. + = FR1 schedule of reinforcement; − = extinction. Error bars represent standard errors of the mean. (Color figure online)

Responding during the Extinction Test

The percentage of S1 choices in Session 11 increased from the first to the second trial of the Extinction Test before decreasing on the subsequent trial and oscillating near 50% thereafter (M = 51.4%, SEM = 1.7). Figure 3 shows the pattern of responding across the first three trials of the Extinction-Test sessions. The percentage of participants choosing S1 was low, high, and low across Trials 25, 26, and 27, with this general pattern becoming less pronounced across sessions.

Fig. 3
figure 3

Percentage of S1 choices in the first three extinction-test trials of Experiment 1. Note. Black, gray, and white data points show the percentage of participants choosing S1 in the first three trials (25, 26, and 27) of the Extinction Test, respectively, across Sessions 11–15.

Figure 4 shows the mean number of clicks on areas of the screen other than the response buttons (hereafter workspace responses) per 12 trials across Sessions 11-15. The figure shows that workspace responses occurred more often during the Extinction-Test trials (2536; M = 1.1, SEM = 0.2) relative to trials preceding the Extinction Test (13–24; M = 0.4, SEM = 0.1). Workspace responses remained low across these sessions during Trials 13–24.

Fig. 4
figure 4

Mean number of workspace responses before and during the extinction test in Experiment 1. Note. This figure shows the mean number of clicks on the background image across all participants in Sessions 11–15 both prior to (open circles) and during (filled circles) the Extinction Test. Clicks on the background image included clicks anywhere on the screen other than the two response buttons.

Response patterns

Figure 5 shows the percentage of participants adopting a win-stay/lose-shift, counting, or some other undefined response pattern during the Contingency Reversal and Extinction Test across sessions. The figure also shows the reported percentage of win-stay/lose-shift (dotted line) and counting patterns in the post-experiment survey (dashed line). We coded a win-stay/lose-shift response pattern when two S1 responses occurred immediately before and after the onset of the contingency reversal or extinction and the subsequent response was an S2 response. We coded a counting pattern when there were two S1 responses immediately before the onset of a contingency reversal or extinction and an S2 response after the change. We coded other patterns as those not meeting the definition of a win-stay/lose-shift or counting pattern. Figure 5 shows the prevalence of win-stay/lose-shift response patterns increased from 28% to 83%, while the prevalence of other response patterns decreased from 72% to 17% across Sessions 1–5. Counting remained at zero levels across Sessions 1-5. All response patterns remained stable across Sessions 6–10 until the prevalence of win-stay/lose-shift response patterns decreased from 72% to 44% and the prevalence of counting increased from 11% to 33% in Session 10. Those response patterns largely held steady throughout Sessions 11–15 during the Contingency Reversal (win-stay/lose-shift: M = 51.1%; counting: M = 36.7%). Initiating the Extinction Test produced an (1) increase and subsequent decrease in win-stay/lose-shift response patterns, (2) an immediate decrease in counting, and (3) an increase in other response patterns.

Fig. 5
figure 5

Percentage of win-stay/lose-shift and counting response patterns in Experiment 1. Note. This figure shows the percentage of response types coded as Win-Stay/Lose-Shift (W-S/L-S), Counting (Count), or Other across individual sessions in Experiment 1. Ext = Extinction Test. W-S/L-S: Two S1 responses occurred immediately before and after the onset of a contingency reversal or extinction and the subsequent response was S2. Count: Two S1 responses occurred immediately before the onset of a contingency reversal or extinction and the S2 response occurred immediately after the change. Other: Patterns did not meet either definition. The dashed line represents the percentage of participants reporting counting in the post-experiment survey; the dotted line represents the percentage of participants reporting a win-stay/lose-shift strategy

Error patterns

Figure 6 shows the mean number of errors associated with different response patterns (counting, win-stay/lose-shift, other). The figure shows that counting resulted in the lowest number of errors (M = 0.2), while response patterns other than counting or win-stay/lose-shift resulted in the highest number of errors in each session block (M = 3.2).

Fig. 6
figure 6

Mean number of errors associated with each response pattern in Experiment 1. Note. Percentage of response types coded as Win-Stay/Lose-Shift (W-S/L-S), Counting (Count), or Other across session blocks in Experiment 1. No counting response patterns were observed during the first session block.

Response latencies

The left panel of Fig. 7 shows differences in mean latencies to choices of S1 or S2 in the first three trials following the contingency reversal (Trials 13-15) relative to the trial immediately preceding the reversal (Trial 12) across Sessions 1-5 for participants engaging in counting, win-stay/lose-shift, or other response patterns. For participants engaging in win-stay/lose-shift response patterns, the figure shows little increase in latencies to choices of S1 or S2 from the last Pre-Reversal trial (12) to the first three Post-Reversal trials (13–15) in Sessions 1–5 (M = +8.9 ms). However, participants engaging in other response patterns demonstrated increases in latencies to each choice in Trials 13–14 relative to Trial 12 (M = +427.9 ms). The right panel of Fig. 7 shows differences in mean latencies to choices of S1 or S2 from the last Post-Reversal trial (Trial 24) to the first three Extinction-Test trials (Trials 25-27) across Sessions 11-15. The right panel shows that counting response patterns resulted in the greatest increases in response latency from Trial 24 to Trial 25 (M = 617.7 ms), while win-stay/lose-shift patterns resulted in more gradual increases in response latency across the first two Extinction-Test trials (25–26; Trial 25: M = +28.5 ms, Trial 26: M = 777.3 ms).

Fig. 7
figure 7

Differences in response latency from the last pre-change trial in Experiment 1. Note. ms = milliseconds. The left panel shows mean difference in reaction time from the last Pre-Reversal trial (Trial 12) across Sessions 1–5. The right panel shows mean difference in reaction time from the last Post-Reversal trial (Trial 24) across Sessions 11–15

Statistical analysis

Contingency reversal

Table 1 shows the results from the Contingency Reversal portion of the experiment. Results indicated a significant association between Trial number and the response (β = −0.30, p < .001), whereby increasing trials was associated with decreased odds of S1 versus S2 (factor of 0.74). Similarly, there was a significant association between the type of contingency and the response (β [Pre-Reversal] = 2.87, p < .001), whereby the odds of selecting S1 over S2 increased by a factor of 17.65. These findings suggest that responding tracked reinforcement contingencies.

Table 1 Results of generalized linear mixed-effects regression for Experiment 1

Extinction test

Table 1 also shows the results from the Extinction Test portion of the experiment. The best-fitting model indicated a significant association between Trial and the response (β = 0.07, p < .001), whereby the odds of selecting S1 over S2 increased by a factor of 1.08 across trials within sessions.

Discussion

Overall, choices between S1 and S2 were sensitive to both the midsession-reversal and extinction contingencies. We observed anticipatory errors by some participants before contingency reversals, consistent with McMillan and Spetch (2019) and Rayburn-Reeves et al. (2011). We also observed decreases in S1 choices during the first trial following contingency reversals, especially during later session blocks. In other words, some participants switched from S1 to S2 before experiencing the blackout for an incorrect response within sessions. This finding is consistent with Rayburn-Reeves et al. and indicative of counting. Therefore, the instruction not to count trials (Grondin et al., 2004) was not entirely effective in the present experiment or Rayburn-Reeves et al. (2011). Performance reflected experience with contingency changes and, for some participants, the additional influence of counting trials.

We also examined responding when extinguishing responses to both S1 and S2. Consistent with previous studies of resurgence, experiencing the extinction contingency for S2 produced an increase in S1 responding (e.g., Kuroda et al., 2017a; Podlesnik et al., 2020; Robinson & Kelley, 2020). That is, choices in extinction initially reflected previous experience with contingency reversals. The odds of choosing S1 increased across trials (although this was a small effect; see Table 2). In line with these findings, target responding during resurgence tests sometimes increases immediately and then decreases (e.g., Sweeney & Shahan, 2013). In other cases, resurgence initially occurs at a low rate before increasing (e.g., Doughty et al., 2007; Podlesnik & Shahan, 2009, 2010). Nevertheless, the present findings demonstrated that the conditions producing the resurgence effect in a majority of participants were consistent with the win-stay/lose-shift response pattern observed during contingency reversals. In other words, both types of contingency changes similarly induced abrupt changes in responding to the other option.

Table 2 Results of generalized linear mixed-effects regression for Experiment 2

Experiment 2

Despite providing instructions not to count (see also Rayburn-Reeves et al., 2011), counting the number of trials to a contingency reversal became relatively common among participants during later sessions in Experiment 1. One approach that appears to minimize control by trial number with human participants is arranging contingency reversals after varying and unpredictable numbers of trials across successive sessions (see also Rayburn-Reeves et al., 2011). In Experiment 2, we replicated the procedures from Experiment 1, but varied reversals across sessions. Eliminating counting from performance during contingency reversals would facilitate the examination of whether a common win-stay/lose-shift response pattern could account for performance both during contingency reversals and subsequent extinction tests. Therefore, the present experiment examined how contingency changes and extinction affected response patterns and decision-making strategies under relatively uncertain conditions.

Methods

Participants and apparatus

Eighteen undergraduate students recruited from a psychology research pool (SONA) at Auburn University participated to receive course credit and the opportunity to earn a $25 Amazon gift card. No participants previously completed Experiment 1. Participants were 18 to 22 years old (M = 19.7, SD = 1.4), and 13 were female (72%). One participant reported red–green colorblindness, but we included this participant because the button stimuli were differentiated by both symbols and colors.

Procedure

The experimental apparatus and procedure were identical to Experiment 1, with the exception that reinforcement and extinction contingency reversals between S1 and S2 occurred variably across trials (on Trials 5, 9, 13, 17, or 21), with each reversal point occurring equally often across the three 5-session blocks. We arranged an equal number of reinforced trials for S1 and S2 within each session block.

Data analysis

The analytical approach for Experiment 2 was identical to that of Experiment 1, with the exception that we included only the Phase and Reversal Point factors in the model (i.e., not Trial or Time). This is because Time (i.e., # of sessions in Phase) was not constant across levels of Reversal Point. We used emmeans (Lenth et al., 2021) with the best-fitting model to perform pairwise comparisons across different levels of Reversal Point. Exclusion criteria were identical to Experiment 1.

Results

Descriptive analysis

Figure 8 shows the percentage of S1 choices as means of trial blocks in the trials immediately preceding and following contingency reversals (left panel) or in individual sessions across trials immediately preceding and following the onset of extinction (right panel). The majority of participants chose S1 during the four trials preceding contingency reversals across all session blocks (M = 94.6%, SEM = 0.7). As expected, given the variable reversal point, there was little indication of anticipatory errors in the last trial before contingency reversals across all three session blocks (see Supplemental Materials; cf. Experiment 1). Following the reversal, the percentage of S1 choices on Trial 13 (labeled 1 in the left panel of Fig. 8) remained high across all blocks (M = 97.4%, SEM = 0.9) before decreasing on the subsequent trial (M = 20%, SEM = 2.9), indicating a win-stay/lose-shift strategy. The right panel of Fig. 8 shows that S1 choices increased in the last trials of the Post-Reversal phase during Session 15 from 6% to 11%, indicating few but greater than zero anticipatory errors before the Extinction Test.

Fig. 8
figure 8

Percentage of S1 choices before and after contingency changes in Experiment 2. Note. The left panel shows the percentage of participants choosing S1 as a mean across 5-session blocks in trials immediately preceding and following the onset of contingency reversals. The right panel shows the percentage of participants choosing S1 in individual sessions (11–15) immediately preceding and following the onset of extinction. Solid lines represent the onset of extinction. + = FR1 schedule of reinforcement; − = extinction. Error bars represent standard errors of the mean

Responding during the extinction test

As in Experiment 1, the right panel of Fig. 8 shows that the percentage of S1 choices in Session 11 increased from the first to the second trial of the Extinction Test before decreasing on the subsequent trial and oscillating near 50% thereafter (M = 50.3%, SEM = 1.7). Responding continued to oscillate near 50% for the remainder of Extinction-Test trials (see Supplemental Materials). Also consistent with Experiment 1, Fig. 9 shows that the percentage of participants choosing S1 was low, high, and low across Trials 25, 26, and 27, with this general pattern becoming less pronounced across sessions.

Fig. 9
figure 9

Percentage of S1 choices in the first three extinction-test trials of Experiment 2. Note. Black, gray, and white data points show the percentage of participants choosing S1 in the first three trials (25, 26, and 27) of the Extinction Test, respectively, across Sessions 11–15

As in Experiment 1, Fig. 10 shows that workspace responses occurred more often during the Extinction-Test trials (2536; M = 1.4, SEM = 0.2) relative to preceding trials (13–24; M = 0.5, SEM = 0.1). Workspace responses remained relatively low across sessions in Trials 13–24.

Fig. 10
figure 10

Mean number of workspace responses before and during the extinction test in Experiment 2. Note. This figure shows the mean number of clicks on the background image across all participants in Sessions 11–15 both prior to (open circles) and during (filled circles) the Extinction Test. Clicks on the background image included clicks anywhere on the screen other than the two response buttons

Response patterns

Figure 11 shows that the prevalence of win-stay/lose-shift response patterns increased from 39% to 94%, while the prevalence of other response patterns decreased from 61% to 6% across the first five sessions, as in Experiment 1. Unlike Experiment 1, (1) response patterns meeting the definition of counting remained at near-zero levels across all sessions, and (2) the percentage of win-stay/lose-shift response patterns remained relatively high across Sessions 6–15 of the Contingency Reversal (M = 84.4%). Consistent with Experiment 1, initiating the Extinction Test produced an immediate increase and then decrease in win-stay/lose-shift responses across sessions and an increase in other response patterns.

Fig. 11
figure 11

Percentage of win-stay/lose-shift and counting response patterns in Experiment 2. Note. This figure shows the percentage of response types coded as Win-Stay/Lose-Shift (W-S/L-S), Counting (Count), or Other across individual sessions in Experiment 2. Ext = Extinction Test. W-S/L-S: Two S1 responses occurred immediately before and after the onset of a contingency reversal or extinction and the subsequent response was S2. Count: Two S1 responses occurred immediately before the onset of a contingency reversal or extinction and the S2 response occurred immediately after the change. Other: Patterns did not meet either definition. The dashed line represents the percentage of participants reporting counting in the post-experiment survey; the dotted line represents the percentage of participants reporting a win-stay/lose-shift strategy

Error patterns

Figure 12 shows that win-stay/lose-shift response patterns resulted in fewer errors (M = 1.7) compared with other response patterns across all session blocks (M = 3.2), as in Experiment 1.

Fig. 12
figure 12

Mean number of errors associated with each response pattern in Experiment 2. Note. This figure shows the percentage of response types coded as Win-Stay/Lose-Shift (W-S/L-S) or Other across session blocks in Experiment 2

Response latencies

For participants engaging in win-stay/lose-shift response patterns, the left panel of Fig. 13 shows no increase in latencies to choices of S1 or S2 from the last Pre-Reversal trial (12) to the first two Post-Reversal trials (13–14) in Sessions 1–5 (M = −78.0 ms). However, latencies to each choice increased slightly in Trial 15 relative to Trial 12 for those participants (M = +201.9 ms). For participants engaging in other response patterns, latencies decreased slightly in Trials 13–15 relative to Trial 12 (M = −151.25 ms).

Fig. 13
figure 13

Differences in response latency from the last pre-change trial in Experiment 2. Note. ms = milliseconds. The left panels show mean difference in reaction time from the last Pre-Reversal Trial (Trial 12) across Sessions 1–5. The right panels show mean difference in reaction time from the last Post-Reversal Trial (Trial 24) across Sessions 11–15

The right panel shows increases in mean latencies to choices of S1 or S2 from the last Post-Reversal trial (Trial 24) to the first three Extinction-Test trials (Trials 25–27) in Sessions 11–15. Win-stay/lose-shift and other response patterns both resulted in a gradual increase in response latency from the first Extinction-Test trial (25; win-stay/lose-shift: M = +5.4 ms; other: M = −121.1 ms) to the second Extinction-Test trial (26; win-stay/lose-shift: M = +929.7 ms; other: M = +1.1 s) relative to Trial 24.

Statistical analysis

Contingency reversal

Table 2 shows the results from the best-fitting model for the Contingency Reversal portion of the experiment. Results indicated a significant association between Phase and the response (β = 4.42, p < .001), whereby the odds of selecting S1 over S2 was higher in the Pre-Reversal phase compared with the Post-Reversal phase (factor of 82.86). Similarly, there was a significant association between Reversal Point (Trial 13, 17, 21) and the response (βs > 0.45, ps < .001), whereby odds of selecting S1 over S2 were greater as the reversal was arranged on later trials. Pairwise comparisons across Reversal Points indicated that the odds of selecting S1 over S2 generally were greater on Trial 24 when the reversal occurred on later versus earlier trials (see Table 3). Overall, these findings were consistent with Experiment 1 and suggest that responding tracked the reinforcement contingencies.

Table 3 Pairwise comparisons on Trial 24 Across reversal points

Extinction test

Table 2 also shows the results from the Extinction Test portion of the experiment. The best-fitting model indicated a significant association between Trial and the response (β = 0.06, p < .002), whereby the odds of selecting S1 over S2 increased by a factor of 1.06 across trials within sessions. Thus, consistent with the results of Experiment 1, the odds of selecting S1 in Experiment 2 increased across trials during testing.

Discussion

As in Experiment 1, choices between S1 and S2 were sensitive to both the reversal and extinction contingencies. Our findings in the present experiment were also consistent with Rayburn-Reeves et al. (2011) in that introducing contingency reversals after a variable number of trials across sessions was effective in reducing control by trial number and thereby eliminated counting. Instead, performance during the reversals in the current experiment primarily reflected a win-stay/lose-shift strategy compared with Experiment 1, arranging midsession reversals.

Also consistent with Experiment 1, experiencing the extinction contingency for S2 produced an immediate increase in S1 responding in most participants, reflective of previous experiences with contingency reversals across prior sessions and consistent with a win-stay/lose-shift response pattern. Across subsequent extinction trials within each session, the probability of selecting S1 increased in both experiments. These findings are consistent with initial increases observed in other studies arranging more traditional procedures to examine resurgence (e.g., Doughty et al., 2007; Podlesnik & Kelley, 2014). Therefore, performance in studies of repeated midsession reversal learning and resurgence demonstrates a type of behavioral flexibility stemming from extinction inducing previously effective response patterns.

General discussion

The present series of experiments examined reversal learning in humans with either a fixed- or variable-reversal point. In the last five sessions of each experiment, we also examined resurgence by arranging extinction of both responses when extending sessions with additional nonreinforced trials, a novel extension of the standard reversal-learning paradigm. When contingency reversals occurred on a predictable trial midway through each session (i.e., low uncertainty regarding contingency changes), a majority of participants engaged in a win-stay/lose-shift strategy during early sessions, but we observed an increase in the prevalence of (1) counting and (2) anticipatory errors before reversals (Experiment 1). In contrast, arranging the reversal after a variable number of trials across sessions (i.e., high uncertainty regarding contingency changes) resulted in almost no counting and a high prevalence of win-stay/lose-shift response patterns (Experiment 2). We also found that extinction of S2 generally produced an immediate return to S1 across all three experiments, indicative of a resurgence effect via a win-stay/lose-shift response pattern. These response patterns suggest increased prevalence of control by local contingencies (win-stay/lose-shift) over more global contingencies (counting) under conditions of uncertainty introduced through (1) contingency reversals after a variable number of trials and (2) extinction.

Humans have been shown to use a win-stay/lose-shift strategy across a wide variety of repeated decision scenarios introducing conditions of uncertainty (e.g., Cassotti et al., 2011; Zhang et al., 2021). A win-stay/lose-shift strategy is highly effective under some conditions of uncertainty (e.g., Nowak & Sigmund, 1993), at least within tasks including frequent feedback/reinforcement (e.g., B. A. Williams, 1972; Experiment 2). Moreover, win-stay/lose-shift response patterns likely reflect discriminative properties of consequences (Cowie, 2020; Simon et al., 2020). That is, the presence or absence of reinforcement for one response serves as a signal for the subsequent availability of reinforcement across response options.

That different strategies (i.e., counting in Experiment 1 and win-stay/lose-shift in Experiment 2) resulted in the fewest errors during contingency reversals (see Figs. 6 and 12) suggests that effectively adapting to environmental changes requires flexibly shifting between decision-making tactics. For a related example with birds, Kloskowski (2021) found that under some conditions, exclusive use of a win-stay/lose-shift strategy was suboptimal for selecting habitats with greater food availability (see also Otto et al., 2011). In line with this idea, the present findings indicate that win-stay/lose-shift and counting strategies were more or less optimal depending upon the level of uncertainty regarding contingency changes. That is, counting was the optimal strategy under conditions of low uncertainty (midsession reversal, Experiment 1), whereas win-stay/lose-shift was the optimal strategy under conditions of high uncertainty (variable reversal, Experiment 2).

Given differences in the number of errors resulting from counting and win-stay/lose shift strategies, we also evaluated whether these strategies produced different effects on latencies to choosing S1 or S2 (see Figs. 7 and 13). Response latencies could reflect processes underlying choice that cannot be understood by observing selections of S1 or S2 alone (e.g., Konovalov & Krajbich, 2019). For example, prior research has shown that humans’ reaction times increase as a function of uncertainty in psychophysical tasks (see Bonnet et al., 2008, for a discussion). In the present study, we found that latencies remained relatively constant before and after contingency reversals across all sessions and strategies (see also Rayburn-Reeves & Cook, 2016; Smith et al., 2018, for similar findings with pigeons). During Sessions 11–15, in contrast, we showed for the first time that mean latencies to choices of S1 or S2 increased from the last Post-Reversal trial (Trial 24) to the first two Extinction-Test trials (Trials 25–26; see Mellgren & Elsmore, 1991, for related findings with rats). Thus, counting and win-stay/lose-shift produced different effects on response latencies during extinction, but not during contingency reversals.

Research in behavioral economics suggests that latencies can reflect differences in decision strategies (Kahneman, 2013). In line with this idea, counting in the present study resulted in the greatest increases in response latency from Trial 24 to Trial 25 in Experiment 1 as a result of encountering the added trials. In contrast, win-stay/lose-shift resulted in a more gradual increase in response latency across the first two Extinction-Test trials (25, 26) upon experiencing the extinction contingencies for S1 and S2. Thus, latencies peaked when experiencing the extinction contingency, although this occurred at different time points among participants engaging in different response patterns. Specifically, participants using a counting strategy typically switched to the opposite response (and experienced the extinction contingency) one trial earlier than participants using a win-stay/lose-shift strategy. Our findings, therefore, suggest that latencies peak with uncertainty regarding reinforcement contingencies (see Konovalov & Krajbich, 2019, for related findings).

As with response latencies, we also observed differences in response variability during contingency reversals versus extinction. For example, participants repeatedly switched between responses only during the Extinction Test, and workspace responses increased from Post-Reversal to Extinction-Test trials (see Figs. 4 and 10). These findings are similar to research in other experimental paradigms showing that extinction produces increases in variability in novel dimensions of previously reinforced behavior. For example, Antonitis (1951) reinforced rats’ nose-poke responses at a single location but observed during extinction that responses were allocated at increasing distances from the previously reinforced location (see also Eckerman & Lanson, 1969; Stokes, 1995; see Neuringer & Jensen, 2013, for a review). A related point comes from consumer-distribution models in the behavioral ecology literature. These models predict that animals experiencing a worsening of conditions (e.g., lower food quality or availability) will shift to exploring the surrounding environment as opposed to exploiting the impoverished food source (i.e., incentive contrast; see Flaherty, 1999, for a review). In contrast, animals experiencing only the less favorable condition without a shift will consume the lower quality of food throughout (Bernstein et al., 1988). Regarding the present findings, the removal of all reinforcers could have similarly produced an increase in exploratory behavior. The exploratory behavior appeared as resurgence of S1 but also could induce increases in other responses that never resulted in reinforcement under the current experimental conditions (i.e., workspace responses; see Bolívar et al., 2017; Cox et al., 2019; Ho et al., 2018, for similar findings). To further evaluate whether changes in responding during extinction are best characterized as extinction-induced variability or resurgence of a previously reinforced response specifically, future studies could include a control response with no reinforcement history across all phases of the experiment (e.g., Podlesnik et al., 2006; see Lattal & Oliver, 2020, for a relevant discussion).

In summary, we examined humans’ response patterns under changing reinforcement conditions by arranging a simple reversal learning task with additional nonreinforced trials during some sessions to evaluate resurgence. Consistent with previous research, participants adopted a win-stay/lose-shift response pattern under conditions of greater uncertainty, such as when contingency reversals occurred on unpredictable trials and during extinction. We extended prior research by demonstrating (1) that the use of a win-stay/lose-shift strategy under conditions of greater uncertainty led to more optimal performance (i.e., greater accuracy during contingency reversals) and (2) that counting and win-stay/lose-shift strategies produced different effects on response latencies during extinction but not during contingency reversals. The latter finding is in line with behavioral economic research and suggests that latency to choosing S1 or S2 increases during ambiguous reinforcement conditions. These findings also suggest that differences in environmental (un)certainty appear to induce qualitatively different decision-making strategies (i.e., counting versus win-stay/lose-shift; cf. McMillan & Spetch, 2019). Further study of response patterns during repeated decision tasks could contribute to our understanding of behavioral flexibility and in humans’ and nonhumans’ organization and selection of behavior under conditions of uncertainty.