Assessing human performance during contingency changes and extinction tests in reversal-learning tasks

Ritchey, Carolyn M.; Gilroy, Shawn P.; Kuroda, Toshikazu; Podlesnik, Christopher A.

doi:10.3758/s13420-022-00513-9

Assessing human performance during contingency changes and extinction tests in reversal-learning tasks

Published: 02 February 2022

Volume 50, pages 494–508, (2022)
Cite this article

Download PDF

Learning & Behavior Aims and scope Submit manuscript

Assessing human performance during contingency changes and extinction tests in reversal-learning tasks

Download PDF

Carolyn M. Ritchey ORCID: orcid.org/0000-0001-5416-0833¹,
Shawn P. Gilroy²,
Toshikazu Kuroda^3,4 &
…
Christopher A. Podlesnik¹

1518 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

Serial reversal-learning procedures are simple preparations that allow for a better understanding of how animals learn about environmental changes, including flexibly shifting responding to adapt to changing reinforcement contingencies. The present study examined serial reversal learning with humans by arranging both midsession and variable contingency reversals across two experiments. We also examined the effects of extinction by adding nonreinforced trials at the end of later sessions and provided the first evaluation of effects of win-stay/lose-shift versus counting strategies on accuracy and response latency of humans’ reversal-learning performance. In each experiment, responding tracked contingency reversals, primarily with participants using either win-stay/lose-shift or counting strategies. Introducing variable reversal points in the second experiment resulted in near-exclusive win-stay/lose-shift responding among participants and eliminated counting of trials. Each experiment also revealed an immediate shift from S2 to S1 after experiencing extinction during the initial test trial, indicating resurgence of the initial response through a win-stay/lose-shift response pattern. Therefore, the present study replicates and extends prior findings of a win-stay/lose shift response pattern in situations of greater uncertainty. These findings suggest that differences in environmental certainty induce qualitatively different decision-making strategies.

The midsession reversal task: A theoretical analysis

Article 27 April 2020

Within-session reversal learning in rhesus macaques (Macaca mulatta)

Article 28 July 2017

Midsession reversal learning by pigeons: Effect on accuracy of increasing the number of stimuli associated with one of the alternatives

Article 16 August 2019

Organisms face challenges to survival when environments change. Behavioral and evolutionary scientists have examined what is broadly called behavioral flexibility in the face of environmental changes under a wide range of species and conditions (see Lea et al., 2020). One general approach to examining behavioral flexibility assesses how behavior changes when responses that paid off at one time or in one setting do not pay off under other circumstances. In other words, how does operant behavior change when reinforcement contingencies change?

Reversal learning

Reversal learning is one approach to examining changes in operant behavior under changing reinforcement conditions (see Izquierdo et al., 2017, for a review). Tasks using a midsession reversal of reinforcement contingencies typically include two response alternatives and arrange a contingency reversal after a fixed number of trials (see Rayburn-Reeves & Cook, 2016; Zentall, 2020, for reviews). Responding to one location or stimulus (S1) is reinforced during trials comprising the first half of a session, and responding to a different location or stimulus (S2) is reinforced during trials comprising the second half of a session. With repeated exposure to the contingency change between S1 and S2 across sessions, responding reliably tracks those reinforcement contingencies across many species, including rats (e.g., Rayburn-Reeves et al., 2013), pigeons (e.g., R. G. Cook & Rosen, 2010), dogs (Laude et al., 2016), nonhuman primates (e.g., Rayburn-Reeves et al., 2017), and humans (e.g., McMillan & Spetch, 2019).

There are some apparent differences among species in how closely behavior tracks those contingencies before and after the midsession-contingency reversal. For example, counting the number of trials to reversal appears to be unique to humans (McMillan & Spetch, 2019; Rayburn-Reeves et al., 2011). However, humans, rats, parrots, and monkeys all have been reported to engage in a win-stay/lose-shift pattern of behavior—experiencing nonreinforcement of responding to S1 following the contingency reversal during the second half of sessions rapidly produces a shift to responding to S2 (Laschober et al., 2021; Rayburn-Reeves et al., 2011; Rayburn-Reeves et al., 2013; Rayburn-Reeves et al., 2017). These findings suggest individual response-outcome contingencies from trial to trial largely underly discriminations driving reversal performance between S1 and S2 in these species. In contrast, pigeons and dogs have been reported to make large numbers of anticipatory errors before contingency reversals and perseverative errors following contingency reversals (R. G. Cook & Rosen, 2010; Davison & Cowie, 2019; Laude et al., 2016). These findings suggest temporal cues from the start of sessions govern discrimination of contingency reversals in pigeons and dogs to a greater extent than rats or primates (see Zentall, 2020, for a review).

Nevertheless, several environmental conditions can influence the prevalence and pattern of anticipatory and perseverative errors during reversals across species. As examples, both rats and pigeons tend to make fewer errors during spatially defined discriminations than nonspatially defined visual discriminations (e.g., McMillan et al., 2014; McMillan & Roberts, 2012; Rayburn-Reeves et al., 2013), and error patterns between S1 and S2 change with differences in both reinforcer probability (e.g., Santos et al., 2019; Zentall, Andrews, et al., 2019a) and stimulus conditions (e.g., Rayburn-Reeves et al., 2017; Zentall, Peng, et al., 2019b). Such findings indicate that differences in error patterns are dependent on the prevailing stimulus and reinforcement conditions (see Zentall, 2020, for a review). Therefore, with some exceptions, such as counting the number of trials to reversal, differences in reversal learning among species and behavioral flexibility more generally might largely reflect quantitative rather than qualitative differences (see McMillan & Spetch, 2019). One goal of the present research was to examine the extent to which conditions of uncertainty influenced patterns of human decision-making during contingency reversals—in particular, patterns of counting trials versus a win-stay/lose shift strategy.

Humans’ use of a win-stay/lose-shift strategy has been documented across a wide variety of repeated decision scenarios that introduce conditions of uncertainty, including the Iowa gambling task (e.g., Cassotti et al., 2011; Worthy et al., 2013), the Prisoner’s Dilemma (e.g., Wedekind & Milinski, 1996), causal learning tasks (e.g., Bonawitz et al., 2014), rock paper scissors (e.g., Zhang et al., 2021), stock market predictions (Gutiérrez-Roig et al., 2016), and formation changes in football (Tamura & Masuda, 2015). Moreover, prior research on human performance during repeated decision tasks has shown that quantitative models assuming a win-stay/lose-shift strategy outperform traditional reinforcement learning models (e.g., Worthy et al., 2012; Worthy & Maddox, 2012). These findings suggest that win-stay/lose-shift is a pervasive strategy under conditions of uncertainty.

Resurgence

A related set of findings demonstrating behavioral flexibility during changes in reinforcement contingencies is an effect referred to as resurgence. Resurgence has been studied primarily as a preclinical model of relapse in which only a single contingency reversal from S1 to S2 is later followed by extinguishing the most recently reinforced response to S2 (see Lattal et al., 2017; Shahan & Craig, 2017; Wathen & Podlesnik, 2018, for reviews). The increase in S1 responding upon extinguishing S2 responding demonstrates behavioral flexibility in which a previously reinforced behavior returns despite no longer being reinforced. One goal of the present research was to examine the extent to which conditions of uncertainty influence patterns of decision-making when introducing extinction of S1 and S2 responses—in particular, patterns of counting trials versus a win-stay/lose shift strategy.

Resurgence is relevant to understanding behavioral flexibility because the return of previously successful responses is relevant to problem-solving, creativity, foraging strategies, and behavioral variation (see Shahan & Chase, 2002). For example, previously learned approaches to solving complex math problems might resurge when more recently learned approaches are ineffective (e.g., C. L. Williams & St. Peter, 2020). As with midsession reversal learning, resurgence has been demonstrated under laboratory conditions in a range of species, including zebrafish (Kuroda et al., 2017a, 2017b), Siamese fighting fish (da Silva et al., 2014), pigeons (e.g., Liddon et al., 2017), mice (Craig et al., 2020), rats (e.g., Podlesnik et al., 2019), monkeys (Mulick et al., 1976), and humans with and without intellectual disabilities (e.g., Ho et al., 2018; Podlesnik et al., 2020; Shvarts et al., 2020). Resurgence of problem behaviors also has been observed to be prevalent clinically (e.g., Briggs et al., 2018; Muething et al., 2021). In contrast to midsession reversal learning, a comparative approach to examining resurgence across species has not been employed systematically. Nevertheless, the contingency changes with both reversal learning and resurgence procedures offer an opportunity to examine counting and win-stay/lose-shift response patterns during conditions of reinforcement uncertainty. Thus, the present experiments combined reversal learning and resurgence procedures for the first time.

Present experiments

The present experiments examined S1–S2 reversal learning with humans across 10 consecutive blocks of trials -- hereafter referred to as "sessions" -- arranging either a fixed (Experiment 1) or variable (Experiment 2) reversal point. We expected responding to be more precisely controlled by the contingencies with greater exposure, with few anticipatory or perseverative errors toward the end of those sessions (McMillan & Spetch, 2019; Rayburn-Reeves et al., 2011; see also R. G. Cook & Rosen, 2010). During Sessions 11–15, we extended the number of trials per session following S1–S2 contingency reversals to assess how extinguishing responses to both S1 and S2 affected response patterns (cf. Bai et al., 2017). We expected that experiencing extinction of responding to S2 during those additional trials would initially yield an increase in responding to S1, analogous to a resurgence effect and consistent with a win-stay/lose-shift pattern of behavior. Therefore, the present experiments assessed patterns of reversal learning and resurgence under lower (Experiment 1) and higher (Experiment 2) levels of (un)certainty regarding the changing of reinforcement contingencies. This research was designed to answer the following research questions:

1.
To what degree did performance change across trials with repeated exposure to contingency reversals?
2.
To what degree did performance change when removing reinforcement for both responses during extinction testing?
3.
To what degree do different response patterns reflect differences in decision-making? Specifically, how do contingency reversals and extinction induce counting versus win-stay/lose-shift strategies under conditions of lower and higher levels of (un)certainty?

Experiment 1

Experiment 1 systematically replicated Experiment 4 of Rayburn-Reeves et al. (2011) with a touchscreen computer task. During the first 10 sessions, we arranged 24 trials per session in which the reinforcement and extinction contingencies reversed predictably after Trial 12. We arranged two buttons differing in appearance, consistent with previous research on reversal learning of visual discriminations (McMillan & Spetch, 2019) and resurgence (Podlesnik et al., 2020). Therefore, the present experiment examined how contingency changes and extinction affected response patterns and decision-making strategies under relatively certain conditions.

Methods

Participants and apparatus

We recruited a total of 20 undergraduate students from the departmental research pool (SONA) at Auburn University to receive course credit and the opportunity to earn a $25 Amazon gift card. Data from all but two participants were included in the subsequent analyses (exclusion criteria described below). Participants were 18 to 22 years old (M = 19.4, SD = 1.3), and 13 were female (65%). Participants completed the experiment on a 17-in. Angel POS touchscreen monitor (1,280 × 1,024 resolution) connected to a desktop computer running Windows 10. The task was programmed using Visual Basic 2015.

Procedure

For the duration of the experiment, participants sat alone in a 6.1 square-meter room without a phone or watch. Written instructions were presented on the touchscreen monitor. Instructions read:

“After pressing the PROCEED button, you will play a game on a computer. When ready, press the PLAY button. A new page will appear, and you will see buttons. Touching buttons could earn you points. A picture will appear if you earn a point. Then, the session will be resumed shortly. Earn as many points as you can. More points increase the likelihood of receiving a $25 Amazon gift card. To play this game, we ask you not to count your responses or points.”

Instructions to avoid counting have been reported in previous research to prevent biases in responding produced by counting (e.g., Grondin et al., 2004; Rayburn-Reeves et al., 2011).

Pressing “PROCEED” introduced a “PLAY” button, which initiated the first session. Figure 1 shows objects presented during the task. Participants completed 15 sessions. The first 10 sessions consisted of 24 trials and the last five sessions consisted of 36 trials. During each trial, a workspace consisting of a beach background with a height of 700 pixels and a width of 900 pixels appeared at the center of the monitor. The rest of the space on the monitor remained black throughout the session. Though invisible, the workspace was divided into two halves (left and right) and a button appeared at the center of each half at the onset of each trial. One of the buttons was a blue square and the other was an orange square (100 × 100 pixels each). A single symbol from playing cards was superimposed on both the blue square and orange circle buttons (e.g., red heart, black club), with symbol/button combinations counterbalanced across participants. We included these symbols to increase stimulus disparity, as greater disparity between response options tends to produce greater control by consequences (e.g., Gallagher & Alsop, 2001; Godfrey & Davison, 1998). Each button was presented equally often on the left and right sides of the screen, but neither appeared on the same side for more than three trials in a row. Within each half of the workspace, buttons randomly moved 25 pixels at 1.5-s intervals within a restricted area (200 pixels × 200 pixels).

For the first 12 trials of each session (hereafter Pre-Reversal phase), pressing the S1 button displayed a yellow star with “+1” for 1 s according to a fixed ratio (FR) 1 reinforcement schedule. Pressing the S2 button resulted in a brief blackout (the entire screen turning black for 1 s). For the next 12 trials (Trials 13–24; hereafter Post-Reversal phase) of each session, the contingencies were reversed between presses to the S1 and S2 buttons. We will refer to this portion of the experiment (i.e., Trials 1–24 across all 15 sessions) as the Contingency Reversal. Sessions 11–15 included an additional 12 trials (Trials 25–36) in which all responses resulted in a 1-s blackout and no reinforcer deliveries occurred. We will refer to these additional trials during Sessions 11–15 as the Extinction Test.

After completion of all but the last session, the buttons disappeared, and a rotating hourglass appeared on a white background for 5 s. Following this, a “PLAY” button reappeared and pressing it initiated the next session. After the 15th session, participants completed a postexperiment questionnaire which asked about demographic details (age, sex; see Supplemental Materials) and how they made decisions during the task.

Data analysis

Data screening

Data sets were excluded from analyses if (1) at least 50% of responses were allocated to the S2 button for five consecutive sessions during Trials 1 through 12 or (2) at least 50% of responses were allocated to the S1 button for five consecutive sessions during Trials 13 through 24. Similar exclusion criteria were implemented by McMillan and Spetch (2019) to exclude participants who failed to discriminate the contingencies for lack of attending to the task. Excluded data sets are presented with the Supplemental Materials.

Analytical strategy

We used a mixed-effects modeling approach (DeHart & Kaplan, 2019) to evaluate how various factors (i.e., Group, Session, Phase, Trial) influenced the likelihood of selecting S1. Specifically, a multilevel logistic model was used to evaluate trial-by-trial choice for S1 (1) and S2 (0). This approach was suited to this task and this question because the multilevel approach both accounts for intercorrelations within (i.e., Subject) and between (i.e., Group) the data. Further, this approach preserves individual-level variability and supports further review of how reversal learning and resurgence occur for individuals.

All statistical procedures were performed using the R Statistical Program (R Core Team, 2021). Multilevel modeling was performed using the lme method included in the lme4 package (Bates et al., 2015). Although the full data could be evaluated together, individual choices were evaluated separately for the Contingency Reversal (Sessions 1–15, Trials 1–24) and Extinction Test portions of each experiment (Sessions 11–15, Trials 25–36). This was the more pragmatic choice because the quantities of data differed and because research questions were specific to respective portions of the data. Bonferroni corrections (.05/2 = .025) were applied to address issues with repeated comparisons. Random effect structures were evaluated using the second-order Akaike information criterion (AICc) provided in the MuMIn R package (Bartoń, 2020), and the associated fixed effects were subsequently evaluated using likelihood-ratio tests.

Results

Descriptive analysis

The left panel of Fig. 2 shows the percentage of S1 choices as means of sessions 1-5, 6-10, and 11-15 (hereafter session blocks). A lower percentage of participants chose S1 during the first trial (M = 80.4%, SEM = 3.1) compared with subsequent Pre-Reversal trials across all three session blocks (M = 95.9%, SEM = 0.4). There were slight decreases in percentages suggesting anticipatory errors in the last trial before the midsession contingency reversals in some sessions across all three session blocks (see the Supplemental Materials). Following the reversal, the percentage of S1 choices on Trial 13 decreased across session blocks, with fewer errors observed on this trial in later session blocks (Trials 1–5: M = 92.2%, SEM = 2.9; Trials 11–15: M = 64.4%, SEM = 9.5). Percentage of S1 choices decreased rapidly across the first three Post-Reversal trials and reached low levels across session blocks (Trial 15: M = 4.4%, SEM = 1.3). The right panel of Fig. 2 shows the percentage of S1 choices across individual Sessions 11-15 during the last four trials before the Extinction Test and the first four trials during the Extinction Test. The right panel shows that there were no indications of anticipatory errors before the Extinction Test.

Responding during the Extinction Test

The percentage of S1 choices in Session 11 increased from the first to the second trial of the Extinction Test before decreasing on the subsequent trial and oscillating near 50% thereafter (M = 51.4%, SEM = 1.7). Figure 3 shows the pattern of responding across the first three trials of the Extinction-Test sessions. The percentage of participants choosing S1 was low, high, and low across Trials 25, 26, and 27, with this general pattern becoming less pronounced across sessions.

Figure 4 shows the mean number of clicks on areas of the screen other than the response buttons (hereafter workspace responses) per 12 trials across Sessions 11-15. The figure shows that workspace responses occurred more often during the Extinction-Test trials (25–36; M = 1.1, SEM = 0.2) relative to trials preceding the Extinction Test (13–24; M = 0.4, SEM = 0.1). Workspace responses remained low across these sessions during Trials 13–24.

Response patterns

Figure 5 shows the percentage of participants adopting a win-stay/lose-shift, counting, or some other undefined response pattern during the Contingency Reversal and Extinction Test across sessions. The figure also shows the reported percentage of win-stay/lose-shift (dotted line) and counting patterns in the post-experiment survey (dashed line). We coded a win-stay/lose-shift response pattern when two S1 responses occurred immediately before and after the onset of the contingency reversal or extinction and the subsequent response was an S2 response. We coded a counting pattern when there were two S1 responses immediately before the onset of a contingency reversal or extinction and an S2 response after the change. We coded other patterns as those not meeting the definition of a win-stay/lose-shift or counting pattern. Figure 5 shows the prevalence of win-stay/lose-shift response patterns increased from 28% to 83%, while the prevalence of other response patterns decreased from 72% to 17% across Sessions 1–5. Counting remained at zero levels across Sessions 1-5. All response patterns remained stable across Sessions 6–10 until the prevalence of win-stay/lose-shift response patterns decreased from 72% to 44% and the prevalence of counting increased from 11% to 33% in Session 10. Those response patterns largely held steady throughout Sessions 11–15 during the Contingency Reversal (win-stay/lose-shift: M = 51.1%; counting: M = 36.7%). Initiating the Extinction Test produced an (1) increase and subsequent decrease in win-stay/lose-shift response patterns, (2) an immediate decrease in counting, and (3) an increase in other response patterns.

Error patterns

Figure 6 shows the mean number of errors associated with different response patterns (counting, win-stay/lose-shift, other). The figure shows that counting resulted in the lowest number of errors (M = 0.2), while response patterns other than counting or win-stay/lose-shift resulted in the highest number of errors in each session block (M = 3.2).

Response latencies

The left panel of Fig. 7 shows differences in mean latencies to choices of S1 or S2 in the first three trials following the contingency reversal (Trials 13-15) relative to the trial immediately preceding the reversal (Trial 12) across Sessions 1-5 for participants engaging in counting, win-stay/lose-shift, or other response patterns. For participants engaging in win-stay/lose-shift response patterns, the figure shows little increase in latencies to choices of S1 or S2 from the last Pre-Reversal trial (12) to the first three Post-Reversal trials (13–15) in Sessions 1–5 (M = +8.9 ms). However, participants engaging in other response patterns demonstrated increases in latencies to each choice in Trials 13–14 relative to Trial 12 (M = +427.9 ms). The right panel of Fig. 7 shows differences in mean latencies to choices of S1 or S2 from the last Post-Reversal trial (Trial 24) to the first three Extinction-Test trials (Trials 25-27) across Sessions 11-15. The right panel shows that counting response patterns resulted in the greatest increases in response latency from Trial 24 to Trial 25 (M = 617.7 ms), while win-stay/lose-shift patterns resulted in more gradual increases in response latency across the first two Extinction-Test trials (25–26; Trial 25: M = +28.5 ms, Trial 26: M = 777.3 ms).

Statistical analysis

Contingency reversal

Table 1 shows the results from the Contingency Reversal portion of the experiment. Results indicated a significant association between Trial number and the response (β = −0.30, p < .001), whereby increasing trials was associated with decreased odds of S1 versus S2 (factor of 0.74). Similarly, there was a significant association between the type of contingency and the response (β [Pre-Reversal] = 2.87, p < .001), whereby the odds of selecting S1 over S2 increased by a factor of 17.65. These findings suggest that responding tracked reinforcement contingencies.

Table 1 Results of generalized linear mixed-effects regression for Experiment 1

Full size table

Extinction test

Table 1 also shows the results from the Extinction Test portion of the experiment. The best-fitting model indicated a significant association between Trial and the response (β = 0.07, p < .001), whereby the odds of selecting S1 over S2 increased by a factor of 1.08 across trials within sessions.

Discussion

Overall, choices between S1 and S2 were sensitive to both the midsession-reversal and extinction contingencies. We observed anticipatory errors by some participants before contingency reversals, consistent with McMillan and Spetch (2019) and Rayburn-Reeves et al. (2011). We also observed decreases in S1 choices during the first trial following contingency reversals, especially during later session blocks. In other words, some participants switched from S1 to S2 before experiencing the blackout for an incorrect response within sessions. This finding is consistent with Rayburn-Reeves et al. and indicative of counting. Therefore, the instruction not to count trials (Grondin et al., 2004) was not entirely effective in the present experiment or Rayburn-Reeves et al. (2011). Performance reflected experience with contingency changes and, for some participants, the additional influence of counting trials.

We also examined responding when extinguishing responses to both S1 and S2. Consistent with previous studies of resurgence, experiencing the extinction contingency for S2 produced an increase in S1 responding (e.g., Kuroda et al., 2017a; Podlesnik et al., 2020; Robinson & Kelley, 2020). That is, choices in extinction initially reflected previous experience with contingency reversals. The odds of choosing S1 increased across trials (although this was a small effect; see Table 2). In line with these findings, target responding during resurgence tests sometimes increases immediately and then decreases (e.g., Sweeney & Shahan, 2013). In other cases, resurgence initially occurs at a low rate before increasing (e.g., Doughty et al., 2007; Podlesnik & Shahan, 2009, 2010). Nevertheless, the present findings demonstrated that the conditions producing the resurgence effect in a majority of participants were consistent with the win-stay/lose-shift response pattern observed during contingency reversals. In other words, both types of contingency changes similarly induced abrupt changes in responding to the other option.

Table 2 Results of generalized linear mixed-effects regression for Experiment 2

Full size table

Experiment 2

Despite providing instructions not to count (see also Rayburn-Reeves et al., 2011), counting the number of trials to a contingency reversal became relatively common among participants during later sessions in Experiment 1. One approach that appears to minimize control by trial number with human participants is arranging contingency reversals after varying and unpredictable numbers of trials across successive sessions (see also Rayburn-Reeves et al., 2011). In Experiment 2, we replicated the procedures from Experiment 1, but varied reversals across sessions. Eliminating counting from performance during contingency reversals would facilitate the examination of whether a common win-stay/lose-shift response pattern could account for performance both during contingency reversals and subsequent extinction tests. Therefore, the present experiment examined how contingency changes and extinction affected response patterns and decision-making strategies under relatively uncertain conditions.