Deep and beautiful. The reward prediction error hypothesis of dopamine
Introduction
According to the reward-prediction error hypothesis of dopamine (RPEH), the phasic activity of dopaminergic neurons in specific regions in the midbrain signals a discrepancy between the predicted and currently experienced reward of a particular event. The RPEH is widely regarded as one of the largest successes of computational neuroscience. Terrence Sejnowski, a pioneer in computational neuroscience and prominent cognitive scientist, pointed at the RPEH, when, in 2012, he was invited by the online magazine Edge.org to answer the question “What is your favorite deep, elegant, or beautiful explanation?” Several researchers in cognitive and brain sciences would agree that this hypothesis “has become the standard model [for explaining dopaminergic activity and reward-based learning] within neuroscience” (Caplin & Dean, 2008, p. 663). Even among critics, the “stunning elegance” and the “beautiful rigor” of the RPEH are recognized (Berridge, 2007, pp. 399, 403).
However, the type of information coded by dopaminergic transmission—along with its functional role in cognition and behaviour—is very likely to go beyond reward-prediction error. The RPEH is not the only available hypothesis about what type of information is encoded by dopaminergic activity in the midbrain (cf., Berridge, 2007, Friston et al., 2012, Graybiel, 2008, Wise, 2004). Current evidence does not speak univocally in favour of this hypothesis, and disagreement remains about to what extent the RPEH is supported by available evidence (Dayan and Niv, 2008, O’Doherty, 2012, Redgrave and Gurney, 2006). On the one hand, it has been claimed that “to date no alternative has mustered as convincing and multidirectional experimental support as the prediction-error theory of dopamine” (Niv & Montague, 2009, p. 342; see also Glimcher, 2011, Niv, 2009); on the other hand, the counter-claims have been put forward that the RPEH is an “elegant illusion” and that “[s]o far, incentive salience predictions [that is, predictions of an alternative hypothesis about dopamine] appear to best fit the data from situations that explicitly pit the dopamine hypotheses against each other” (Berridge, 2007, p. 424).
How has the RPEH become so successful then? What does it explain exactly? And, granted that it is at least intuitively uncontroversial that the RPEH is beautiful and elegant, in which sense can it be justifiably deemed deeper than alternatives? The present paper addresses these questions by firstly reconstructing the main historical events that led to the formulation and subsequent success of the RPEH (Section 2).
With this historical account on the background, it is elucidated what and how the RPEH explains, contrasting it to the incentive salience hypothesis—arguably its most prominent current alternative. It is clarified that both hypotheses are concerned only with what type of information is encoded by dopaminergic activity. Specifically, the RPEH has the dual role of accurately describing the dynamic profile of phasic dopaminergic activity in the midbrain during reward-based learning and decision-making, and of explaining this profile by citing the representational role of dopaminergic phasic activity. If the RPEH is true, then a mechanism composed of midbrain dopaminergic neurons and their phasic activity carries out the task of learning what to do in the face of expected rewards, generating decisions accordingly (Section 3).
The paper finally explicates under which conditions some explanation of learning, motivation or decision-making phenomena based on the RPEH can be justifiably deemed deeper than some alternative explanation based on the incentive salience hypothesis. Two accounts of explanatory depth are considered. According to one account, deeper explanatory generalizations have wider scope (e.g., Hempel, 1959); according to the other, deeper explanatory generalizations show more degrees of invariance (e.g., Woodward & Hitchcock, 2003). It is argued that, although it is premature to maintain that explanations based on the RPEH are actually deeper—in either of these two senses of explanatory depth—than alternative explanations based on the incentive salience hypothesis, relevant available evidence indicates that they may well be (Section 4). The contribution of the paper to existing literature is summarised in the conclusion.
Section snippets
Reward-prediction error meets dopamine
Dopamine is a neurotransmitter in the brain.
Reward-prediction error and incentive salience: what do they explain?
In light of Montague et al., 1996, Schultz et al., 1997, the RPEH can now be more precisely characterised. The hypothesis states that the phasic firing of dopaminergic neurons in the ventral tegmental area and substantia nigra “in part” encodes reward-prediction errors. Montague and colleagues did not claim that all type of activity in all dopaminergic neurons encode only (or in all circumstances) reward-prediction errors. Their hypothesis is about “a particular relationship between the causes
Explanatory depth, reward-prediction error and incentive salience
A number of accounts of explanatory depth have recently been proposed in philosophy of science (e.g., Woodward and Hitchcock, 2003, Strevens, 2009, Weslake, 2010). While significantly different, these accounts agree that explanatory depth is a feature of generalizations that express the relationship between an explanans and an explanandum.
According to Woodward and Hitchcock (2003), in order to be genuinely explanatory, a generalization should exhibit patterns of counterfactual dependence
Conclusion
This paper has made two types of contributions to existing literature, which should be of interest to both historians and philosophers of cognitive science. First, the paper has provided a comprehensive historical overview of the main steps that have led to the formulation of the RPEH. Second, in light of this historical overview, it has made explicit what precisely the RPEH and the ISH explain, and under which circumstances neurocomputational explanations of learning and decision-making
Acknowledgements
I am sincerely grateful to Aistis Stankevicius, Charles Rathkopf, Peter Dayan, and especially to Gregory Radick, editor of this journal, and to two anonymous referees, for their encouragement, constructive criticisms and helpful suggestions. The work on this project was supported by the Deutsche Forschungsgemeinschaft (DFG) as part of the priority program “New Frameworks of Rationality” ([SPP 1516]). The usual disclaimers about any remaining error or misconception in the paper apply.
References (110)
Theoretical neuroscience rising
Neuron
(2008)- et al.
Midbrain dopamine neurons encode a quantitative reward prediction error signal
Neuron
(2005) - et al.
What is the role of dopamine in reward: Hedonic impact, reward learning, or incentive salience?
Brain Research Reviews
(1998) - et al.
Dopamine neuron systems in the brain: An update
Trends in Neurosciences
(2007) A map of the rat mesencephalon for electrical selfstimulation
Brain Research
(1972)Computational modelling
Current Opinion in Neurobiology
(1994)Twenty-five lessons from computational neuromodulation
Neuron
(2012)- et al.
Reinforcement learning: The good, the bad and the ugly
Current Opinion in Neurobiology
(2008) - et al.
Value-dependent selection in the brain: Simulation in a synthetic neural model
Neuroscience
(1994) Phasic versus tonic dopamine release and the modulation of dopamine system responsivity: A hypothesis for the etiology of schizophrenia
Neuroscience
(1991)