Using -greedy reinforcement learning methods to further understand ventromedial prefrontal patients’ deficits on the Iowa Gambling Task
Introduction
In the decision making literature of the past decade, a popular paradigm has been the Iowa Gambling Task (IGT) (Bechara, Tranel, & Damasio, 2000). The IGT was originally designed to elucidate some of the particular deficits found in patients with bilateral ventromedial prefrontal cortex lesions (VMF). The IGT is a reinforcement learning problem, in that participants must learn from rewards and punishments to evaluate the most appropriate action. Our aim is to find valuation functions, which describe the average behaviour found in VMF patients and normative human groups on the IGT, using models based on -greedy methods (Sutton & Barto, 1998). The -greedy framework was used because of the simplicity and flexibility it offered in testing a variety of theories. Our work has a related motivation to other modelling work on the IGT (Busemeyer and Stout, 2002, Yechiam et al., 2005). However, our work differs in two important respects, (1) it attempts to clarify and extend current theories; and (2) it tests the models against data from four versions of the IGT across a number of studies, rather than the frequently used ABCD version alone. Additionally, other modelling work has not simulated the time-course of selections across the task (Wagar & Thagard, 2004), or has modelled a reduced choice variant of the IGT (Frank & Claus, 2006) that has not been tested on VMF patients.
The IGT attempts to mimic real world decision making, where the outcome of choices and strategies have an element of immediate and, particularly, long-term uncertain consequence. All four versions of the task we consider contain four decks of cards, two of which are advantageous (decks C and D in the original ABCD version) and two of which are disadvantageous (decks A and B in the original ABCD version). Through selection, players need to learn which decks are best. Initially, the bad decks seem the best, as they offer higher immediate reward. However, they also offer higher uncertain losses, which only becomes evident after a number of selections. Importantly though, as the task progresses, normal healthy humans learn that the best decks are those that offer smaller immediate rewards, but also lower uncertain punishments, whereas VMF patients seem unable to fully use this distinction. Overall, the IGT tests a number of aspects of decision making including, within task learning, management of reversals in contingencies and evaluation of regular rewards and punishments over uncertain ones. It should be noted that in all versions of the IGT, the disadvantageous decks provide the best regular returns (outcomes that occur on every selection of a particular deck).
The IGT paradigm has been used as a method for distinguishing decision making deficits in bilateral VMF patients compared to normal healthy controls (Bechara et al., 1994, Bechara et al., 1999, Bechara et al., 1997, Bechara et al., 2000, Fellows and Farah, 2005), and with various other frontal lesion patient groups (Bechara et al., 1998, Clark et al., 2003, Fellows and Farah, 2005, Manes et al., 2002), including patients with unilateral VMF lesions (Tranel, Bechara, & Denburg, 2002). For a review of a number of other studies, with a variety of subject groups see Dunn, Dalgleish, and Lawrence (2006).
This paper continues by summarizing four different versions of the IGT, and goes on to consider five theories in the literature, which each attempt to define the underlying cause of deficits found in VMF patients’ performance on the IGT. With the aid of simulations of human normal healthy controls’ (NHCs)’ and VMF patients’ IGT profiles, these theories are considered in greater depth, and conclusions are drawn about the most suitable theory. This has allowed the authors to suggest that VMF patients are less strategic (more explorative), which could be due to a working memory deficit, and are more reactive (more influenced by recent results) than healthy controls.
Section snippets
Versions of the Iowa Gambling Task (IGT)
We consider four versions of the IGT, ABCD, A′B′C′D′, EFGH and E′F′G′H′. (For further details of the task see Bechara et al. (2000).) It is important to note that the bad decks, A(′) and B(′), in the A(′) B(′)C(′)D(′) versions have the largest variance in potential wins and losses per card, making them more ‘risky’. Whereas, in the EFGH and E′F′G′H′ versions the good decks, E(′) and G(′), have the highest variance.
In the A(′) B(′)C(′)D(′) versions, the good decks, C(′) and D(′), provide regular
Competing theories of VMF patient deficits
The current work considers five theories present in the literature, that each offer possible underlying causes for the decision making deficits found in bilateral VMF lesion patients tested on the Iowa Gambling Task (IGT). We review these five hypotheses here.
-greedy action-value method
In this paper, we have tried to simulate the human NHCs (Normal Healthy Controls) and human VMF patients’ data from Bechara et al., 1999, Bechara et al., 2000, Bechara and Damasio (2002), Bechara, Dolan, and Damasio (2002) and Clark et al. (2003), using basic reinforcement learning algorithms, based on -greedy action-value methods (see p. 27, Sutton and Barto (1998)). The in the -greedy method signifies the probability of exploration on each trial, where can take values from 0 to 1. If
Results and analysis
In terms of RMSD, the error-driven model with and provides the best match to human VMF patients, see Table 2. This is further supported by three other more complex models (reversal learning, error-variance and error-valence) collapsing to this nested version. For example, for the error-valence model, if the valence weight is then it becomes the error-driven model as losses and gains are equally weighted. This is almost the case (). Therefore, since the
Discussion
In the following five sections of the discussion we consider each of the five theories in light of the results from the simulations and the current literature.
Conclusions
The models and simulations presented in this paper suggest that, among the theories considered, a ‘myopia’ for future consequences provides the best description for deficits found in Bechara at al.’s VMF patients on the IGT. However, it might be more appropriate and complete to suggest that VMF patients are less strategic (higher ), possibly due to working-memory deficits, and more reactive (higher ) than NHCs. Both these aspects are evident in VMF patients’ real life behaviour (Damasio, 1994
Acknowledgements
This research has been supported by the Computing Laboratory at the University of Kent, UK. In addition, we appreciated the insightful comments and suggestions provided by the reviewers and Brad Wyble.
References (59)
- et al.
Insensitivity to future consequences following damage to human prefrontal cortex
Cognition
(1994) - et al.
Decision-making and addiction (part I): Imparied activation of somatic states in substance dependent individuals when pondering decisions with negative consequences
Neuropsychologia
(2002) - et al.
The Iowa Gambling Task and the somatic marker hypothesis: Some questions and answers
Trends in Cognitive Science
(2005) - et al.
Decision-making and addiction (part II): Myopia for future or hypersensitivity to reward?
Neuropsychologia
(2002) - et al.
The neuropsychology of ventral prefrontal cortex: Decision-making and reversal learning
Brain and Cognition
(2004) - et al.
The contributions of lesion laterality and lesion volume to decision-making impairment following frontal lobe damage
Neuropsychologia
(2003) - et al.
T-maze discrimination and reversal learning after unilateral temporal or frontal lobe lesions in man
Cortex
(1991) Metalearning and neuromodulation
Neural Networks
(2002)- et al.
The somatic marker hypothesis: A critical evaluation
Neuroscience and Biobehavioural Reviews
(2006) - et al.
Discrimination, reversal, and shift learning in huntington’s disease: Mechanisms of impaired response selection
Neuropsychologia
(1999)