Elsevier

Neural Networks

Volume 20, Issue 6, August 2007, Pages 676-689
Neural Networks

Using ϵ-greedy reinforcement learning methods to further understand ventromedial prefrontal patients’ deficits on the Iowa Gambling Task

https://doi.org/10.1016/j.neunet.2007.04.026Get rights and content

Abstract

An important component of decision making is evaluating the expected result of a choice, using past experience. The way past experience is used to predict future rewards and punishments can have profound effects on decision making. The aim of this study is to further understand the possible role played by the ventromedial prefrontal cortex in decision making, using results from the Iowa Gambling Task (IGT). A number of theories in the literature offer potential explanations for the underlying cause of the deficit(s) found in bilateral ventromedial prefrontal lesion (VMF) patients on the IGT. An error-driven ϵ-greedy reinforcement learning method was found to produce a good match to both human normative and VMF patient groups from a number of studies. The model supports the theory that the VMF patients are less strategic (more explorative), which could be due to a working memory deficit, and are more reactive than healthy controls. This last aspect seems consistent with a ‘myopia’ for future consequences.

Introduction

In the decision making literature of the past decade, a popular paradigm has been the Iowa Gambling Task (IGT) (Bechara, Tranel, & Damasio, 2000). The IGT was originally designed to elucidate some of the particular deficits found in patients with bilateral ventromedial prefrontal cortex lesions (VMF). The IGT is a reinforcement learning problem, in that participants must learn from rewards and punishments to evaluate the most appropriate action. Our aim is to find valuation functions, which describe the average behaviour found in VMF patients and normative human groups on the IGT, using models based on ϵ-greedy methods (Sutton & Barto, 1998). The ϵ-greedy framework was used because of the simplicity and flexibility it offered in testing a variety of theories. Our work has a related motivation to other modelling work on the IGT (Busemeyer and Stout, 2002, Yechiam et al., 2005). However, our work differs in two important respects, (1) it attempts to clarify and extend current theories; and (2) it tests the models against data from four versions of the IGT across a number of studies, rather than the frequently used ABCD version alone. Additionally, other modelling work has not simulated the time-course of selections across the task (Wagar & Thagard, 2004), or has modelled a reduced choice variant of the IGT (Frank & Claus, 2006) that has not been tested on VMF patients.

The IGT attempts to mimic real world decision making, where the outcome of choices and strategies have an element of immediate and, particularly, long-term uncertain consequence. All four versions of the task we consider contain four decks of cards, two of which are advantageous (decks C and D in the original ABCD version) and two of which are disadvantageous (decks A and B in the original ABCD version). Through selection, players need to learn which decks are best. Initially, the bad decks seem the best, as they offer higher immediate reward. However, they also offer higher uncertain losses, which only becomes evident after a number of selections. Importantly though, as the task progresses, normal healthy humans learn that the best decks are those that offer smaller immediate rewards, but also lower uncertain punishments, whereas VMF patients seem unable to fully use this distinction. Overall, the IGT tests a number of aspects of decision making including, within task learning, management of reversals in contingencies and evaluation of regular rewards and punishments over uncertain ones. It should be noted that in all versions of the IGT, the disadvantageous decks provide the best regular returns (outcomes that occur on every selection of a particular deck).

The IGT paradigm has been used as a method for distinguishing decision making deficits in bilateral VMF patients compared to normal healthy controls (Bechara et al., 1994, Bechara et al., 1999, Bechara et al., 1997, Bechara et al., 2000, Fellows and Farah, 2005), and with various other frontal lesion patient groups (Bechara et al., 1998, Clark et al., 2003, Fellows and Farah, 2005, Manes et al., 2002), including patients with unilateral VMF lesions (Tranel, Bechara, & Denburg, 2002). For a review of a number of other studies, with a variety of subject groups see Dunn, Dalgleish, and Lawrence (2006).

This paper continues by summarizing four different versions of the IGT, and goes on to consider five theories in the literature, which each attempt to define the underlying cause of deficits found in VMF patients’ performance on the IGT. With the aid of simulations of human normal healthy controls’ (NHCs)’ and VMF patients’ IGT profiles, these theories are considered in greater depth, and conclusions are drawn about the most suitable theory. This has allowed the authors to suggest that VMF patients are less strategic (more explorative), which could be due to a working memory deficit, and are more reactive (more influenced by recent results) than healthy controls.

Section snippets

Versions of the Iowa Gambling Task (IGT)

We consider four versions of the IGT, ABCD, A′B′C′D′, EFGH and E′F′G′H′. (For further details of the task see Bechara et al. (2000).) It is important to note that the bad decks, A(′) and B(′), in the A(′) B(′)C(′)D(′) versions have the largest variance in potential wins and losses per card, making them more ‘risky’. Whereas, in the EFGH and E′F′G′H′ versions the good decks, E(′) and G(′), have the highest variance.

In the A(′) B(′)C(′)D(′) versions, the good decks, C(′) and D(′), provide regular

Competing theories of VMF patient deficits

The current work considers five theories present in the literature, that each offer possible underlying causes for the decision making deficits found in bilateral VMF lesion patients tested on the Iowa Gambling Task (IGT). We review these five hypotheses here.

ϵ-greedy action-value method

In this paper, we have tried to simulate the human NHCs (Normal Healthy Controls) and human VMF patients’ data from Bechara et al., 1999, Bechara et al., 2000, Bechara and Damasio (2002), Bechara, Dolan, and Damasio (2002) and Clark et al. (2003), using basic reinforcement learning algorithms, based on ϵ-greedy action-value methods (see p. 27, Sutton and Barto (1998)). The ϵ in the ϵ-greedy method signifies the probability of exploration on each trial, where ϵ can take values from 0 to 1. If ϵ=0

Results and analysis

In terms of RMSD, the error-driven model with ϵ=0.70 and γ=0.90(RMSD=10.8) provides the best match to human VMF patients, see Table 2. This is further supported by three other more complex models (reversal learning, error-variance and error-valence) collapsing to this nested version. For example, for the error-valence model, if the valence weight is w=0.5 then it becomes the error-driven model as losses and gains are equally weighted. This is almost the case (w=0.51). Therefore, since the

Discussion

In the following five sections of the discussion we consider each of the five theories in light of the results from the simulations and the current literature.

Conclusions

The models and simulations presented in this paper suggest that, among the theories considered, a ‘myopia’ for future consequences provides the best description for deficits found in Bechara at al.’s VMF patients on the IGT. However, it might be more appropriate and complete to suggest that VMF patients are less strategic (higher ϵ), possibly due to working-memory deficits, and more reactive (higher γ) than NHCs. Both these aspects are evident in VMF patients’ real life behaviour (Damasio, 1994

Acknowledgements

This research has been supported by the Computing Laboratory at the University of Kent, UK. In addition, we appreciated the insightful comments and suggestions provided by the reviewers and Brad Wyble.

References (59)

  • R.D. Rogers et al.

    Dissociable deficits in the decision-making cognition of chronic amphetamine abusers, opiate abusers, patients with focal damage to prefrontal cortex, and tryptophan-depleted normal volunteers: Evidence for monoaminergic mechanisms

    Neuopsychopharmachology

    (1999)
  • E.T. Rolls

    The function of the orbitofrontal cortex

    Brain and Cognition

    (2004)
  • A.G. Sanfey et al.

    Phineas gauged: Decision-making and the frontal lobes

    Neuropsychologia

    (2003)
  • B. Shurman et al.

    Schizophrenia patients demonstrate a distinctive pattern of decision-making impairment in the Iowa Gambling Task

    Schizophrenia Research

    (2005)
  • D. Tranel et al.

    Asymmetric functional roles of right and left ventromedial prefrontal corticies in social conduction, decision-making, and emotional processing

    Cortex

    (2002)
  • A.F.T. Arsten et al.

    Neurochemical modulation of prefrontal cortical function in humans and animals

  • A. Bechara et al.

    Different contributions of the human amygdala and ventromedial prefrontal cortex to decision-making

    The Journal of Neuroscience

    (1999)
  • A. Bechara et al.

    Dissociation of working memory from decision making within the human prefrontal cortex

    Journal of Neuroscience

    (1998)
  • A. Bechara et al.

    Deciding advantageously before knowing the advantageous strategy

    Science

    (1997)
  • A. Bechara et al.

    Characterization of the decision-making deficit of patients with ventromedial prefrontal cortex lesions

    Brain

    (2000)
  • J.R. Busemeyer et al.

    A contribution of cognitive decision models to clinical assessment: Decomposing performance on the bechara gambling task

    Psychological Assessment

    (2002)
  • A.R. Damasio

    Descartes error: Emotion, reason and the human brain

    (1994)
  • G. Deco et al.

    Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex

    Cerebral Cortex

    (2005)
  • R. Dias et al.

    Dissociable forms of inhibitory control within prefrontal cortex with an analog of wisconsin card sort test : Restriction to novel situations and independence from on-line processing

    The Journal of Neuroscience

    (1997)
  • L.K. Fellows et al.

    Ventromedial frontal cortex mediates affective shifting in humans: Evidence from a reversal learning paradigm

    Brain

    (2003)
  • L.K. Fellows et al.

    Different underlying impairments in decision-making following ventromedial and dorsolateral frontal lobe damage in humans

    Cerebral Cortex

    (2005)
  • M.J. Frank et al.

    Anatomy of a decision: Striato-orbitofrontal interactions in reinforcement learning, decision making and reversal

    Psychological Review

    (2006)
  • P.S. Goldman-Rakic et al.

    Functional architecture of the dorsolateral prefrontal cortex in monkeys and humans

  • J.M. Hinson et al.

    Somatic markers, working memory, and decision making

    Cognitive, Affective, and Behavioral Neuroscience

    (2002)
  • Cited by (0)

    View full text