Model-based estimation of subjective values using choice tasks with probabilistic feedback

doi:10.1016/j.jmp.2017.05.005

Journal of Mathematical Psychology

Volume 79, August 2017, Pages 29-43

https://doi.org/10.1016/j.jmp.2017.05.005 Get rights and content

Highlights

•
A novel method for estimating subjective value from choice behavior is proposed.
•
The proposed method employs a new choice task using probabilistic feedback.
•
The proposed method performs the model-based estimation based on a reinforcement learning theory.
•
The validity and limitations of the proposed method are investigated.
•
The proposed method is demonstrated using actual choice data from rats.

Abstract

Evaluating the subjective value of events is a crucial task in the investigation of how the brain implements the value-based computations by which living systems make decisions. This task is often not straightforward, especially for animal subjects. In the present paper, we propose a novel model-based method for estimating subjective value from choice behavior. The proposed method is based on reinforcement learning (RL) theory. It draws upon the premise that a subject tends to choose the option that leads to an outcome with a high subjective value. The proposed method consists of two components: (1) a novel behavioral task in which the choice outcome is presented randomly within the same valence category and (2) the model parameter fit of RL models to the behavioral data. We investigated the validity and limitations of the proposed method by conducting several computer simulations. We also applied the proposed method to actual behavioral data from two rats that performed two tasks: one manipulating the reward amount and another manipulating the delay of reward signals. These results demonstrate that reasonable estimates can be obtained using the proposed method.

Introduction

Subjective values (or utilities) assigned to positive or negative events by living systems in general differ from their objective value (e.g., amount of money). Rewards with larger amounts and less delay are basically preferable, but the subjective values are not linearly related to objective, measurable values such as amount and delay (e.g., Kahneman & Tversky, 1979). Investigations into the valuation systems of living systems have gained significant attention in various fields such as psychology, neuroscience, and psychiatry (e.g., O’Doherty (2014), Rangel et al. (2008)). For example, some psychiatric disorders (e.g., depression) can be characterized by altered subjective values (for a review, see Chen, Takahashi, Nakagawa, Inoue, & Kusumi, 2015). Thus, the validity of animal models of a psychiatric disorder may be evaluated based on the subjective values of the subjects.

Traditional econometric methods of estimating subjective value cannot be applied to animals because they rely on verbal instruction (e.g., Kable & Glimcher (2007), Kahneman & Tversky (1979)). Several methods have been used to estimate subjective values or preferences in animal studies. A typical procedure is to have the subjects learn the relationship between a specific response (e.g., pressing a lever or remaining in a specific location) and the resulting outcome, from which the subjective value is measured (e.g., Green & Estle, 2003). This approach requires sufficient training so that the animals learn the relationships among all of the items and the choice behavior reaches the steady state. Another common method utilizes the law of how animals distribute their responses depending on the reinforcement, i.e., the matching law (Miller, 1976). Both approaches rely on the pairwise comparison of preferences for two items. Thus, to measure the subjective values of several items, the researcher must examine the preferences for multiple combinations of items. This method requires much time and sophisticated experimental considerations.

In the present study, we propose a novel method for estimating subjective values especially from animal behaviors using novel behavioral tasks and reinforcement learning (RL) model-based analysis. RL is usually formulated as an algorithm that attempts to maximize the total reward that a decision-maker can obtain. Recent studies, however, have begun to use the RL framework to model human behavior that does not necessarily lead to reward maximization Neiman & Loewenstein (2011), Shteingart & Loewenstein (2014). For example, basketball players tend to choose to make a 3-point shot immediately after an experience of success; however, this dependence decreases the success rate. This choice behavior is modeled using an RL model Neiman & Loewenstein (2011), Neiman & Loewenstein (2014). Additionally, RL models have been important data analysis tools for experiments involving value-based, decision-making tasks Corrado & Doya (2007), Daw (2011), O’Doherty et al. (2007).

Standard RL theory assumes that there is an increased probability of choosing an option that has been reinforced in the immediate past. The magnitude of dependence decays exponentially with the passage of time (trials) (Katahira, 2015). The main idea of the proposed method is to utilize this property. The RL theory also assumes that the larger the subjective value of an outcome, the more frequently the decision-maker repeats the same choice in the immediate future. Using the model parameter fit of RL models to trial-by-trial data, one can effectively estimate the subjective values of various decision-outcomes. The proposed method takes advantage of transient, trial-level dynamics of behavior, whereas other conventional methods examine only steady-state behavior. By using the transient effect of outcome on subsequent choices, it can estimate the value of multiple types of outcomes in a single experiment consisting of only two options.

The remainder of this paper is organized as follows. First, we describe the proposed method, which consists of the novel experimental design and RL model-based analysis. Next, we examine the validity and several properties of the proposed method based on synthetic data. We then apply the proposed method to actual behavioral data from rats. In the demonstration, we examined the rats’ subjective values regarding amounts of rewards and delays of the reward (and no-reward) signal. Finally, we discuss the advantages and limitations of the proposed method.

Section snippets

Proposed method

The proposed method consists of novel experimental tasks and RL model-based trial-by-trial analysis of behavioral data. In the following, we describe the basic task structure, the RL models, and the statistical analysis procedure.

Simulations

To examine the validity of the proposed method, we perform computer simulations based on the synthesized data set. One of the advantages of simulations using synthetic data compared with real behavioral data is that we can know the ground truth about the underlying subjective reward values. Thus, we can evaluate how the method works in a straightforward manner. Specifically, we address the following 7 points.

1.
The validity of the estimates of subjective values (addressed in Case

Application to rat experiments

We next demonstrate how the proposed method works for an actual animal experiment using rats. The goal here is not to extract general conclusions about rat behavior. Rather, we intend to confirm that the proposed method can extract valid and interpretable estimates from individual rats. Two rats both performed two tasks. In Task 1, we randomly manipulated the reward amount. It is reasonable to expect that the reward value should be a non-decreasing function of the reward amount because it is

Discussion

In the present paper, we have proposed a novel framework for estimating subjective values. The framework consists of novel behavioral tasks and model-based analysis of the behavioral data. Our methods utilize the history dependence of choice behavior (i.e., the larger the subjective value of the outcome, the more likely the action is to be repeated; this influence decays as subjects experience additional trials). This tendency is represented by the RL framework, from which we can estimate the

Acknowledgment

This work was partially supported by Grants-in-Aid for Scientific Research (KAKENHI) Nos. 24700238, 26118506, 15K12140, and 23118003.

References (37)

ChenC. et al.
Reinforcement learning in depression: A review of computational research
Neuroscience and Biobehavioral Reviews
(2015)
KatahiraK.
The relation between reinforcement learning parameters and the influence of reinforcement history on choice behavior
Journal of Mathematical Psychology
(2015)
KatahiraK. et al.
Individual differences in heart rate variability are associated with the avoidance of negative emotional events
Biological Psychology
(2014)
MontagueP.R. et al.
Computational psychiatry
Trends in Cognitive Science
(2012)
O’DohertyJ.P.
The problem with value
Neuroscience and Biobehavioral Reviews
(2014)
ShteingartH. et al.
Reinforcement learning and human behavior
Current Opinion in Neurobiology
(2014)
AhnW.Y. et al.
Comparison of decision learning models using the generalization criterion method
Cognitive Science
(2008)
AkaikeH.
A new look at the statistical model identification
IEEE Transactions on Automatic Control
(1974)
BehrensT.E.J. et al.
Learning the value of information in an uncertain world
Nature Neuroscience
(2007)
CorradoG. et al.
Understanding neural coding through the model-based analysis of decision making
Journal of Neuroscience
(2007)

DawN.D.

Trial-by-trial data analysis using computational models

ErevI. et al.

Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria

American Economic Review

(1998)

GreenL. et al.

Preference reversals with food and water reinforcers in rats

Journal of the Experimental Analysis of Behavior

(2003)

GreenL. et al.

Discounting of delayed food rewards in pigeons and rats: is there a magnitude effect?

Journal of the Experimental Analysis of Behavior

(2004)

HirshJ.B. et al.

Positive mood effects on delay discounting

Emotion

(2010)

HuysQ.J. et al.

Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

Biological Mood in Anxiety Disorder

(2013)

ItoM. et al.

Validation of decision-making models and analysis of decision variables in the rat basal ganglia

Journal of Neuroscience

(2009)

KableJ.W. et al.

The neural correlates of subjective value during intertemporal choice

Nature Neuroscience

(2007)

Cited by (12)

From neuronal to psychological noise – Long-range temporal correlations in EEG intrinsic activity reduce noise in internally-guided decision making
2019, NeuroImage
Citation Excerpt :
The mechanism underlying the correlations, however, remains to be tested. One may wish to argue that we should conduct a computational model fitting which is used in the EDM studies (Bai et al., 2015; Katahira et al., 2017b; O’Doherty et al., 2007) instead of using the three behavioral indices. Indeed, the learning rate α, decision noise (inverse temperature β), and the values of each stimulus at every trial can be assessed by conducting the model fitting.
Our personal internal preferences while making decisions are usually consistent. Recent psychological studies, however, show observable variability of internal criteria occurs by random noise. The neural correlates of said random noise - an instance of ‘psychological noise’ – yet remain unclear. Combining simulation, behavioral, and neural approaches, our study investigated the psychological and neural correlates of such random noise in our internal criteria during decision making. We applied well-established decision-making tasks which relied on either internal criteria - occupation choice task as internally-guided decision making (IDM) - or external criteria - salary judgment task as externally-guided decision making (EDM). Subjects underwent EEG for resting state and task-evoked activity during IDM and EDM. We measured resting state long-range temporal correlation (LRTC) in the alpha frequency range as the index of neuronal noise. Based on our simulation, we identified a measure of psychological noise (as distinguished from true preference change) in IDM. The main finding shows that the indices for psychological noise are directly related to frontocentral LRTC in the alpha range. Higher degrees of frontocentral LRTC, which index lower neuronal noise, were related to lower degrees of psychological noise during IDM. This was not found during EDM. Resting state LRTC was also related to task-evoked activity, such as conflict-related negativity, during IDM only. Taken together, our data demonstrate, for the first time, the direct relationship between neuronal noise in the brain’s intrinsic activity and psychological noise in the internal criteria of our decision making.
The statistical structures of reinforcement learning with asymmetric value updates
2018, Journal of Mathematical Psychology
Citation Excerpt :
The steady-state, asymptotic properties of asymmetric RL models have been clarified in Cazé and van der Meer (2013). However, parameter estimates obtained from model fitting reflect not only the steady-state but also transient trial-by-trial dynamics of behaviors (cf. Katahira, Yuki, & Okanoya, 2017). We thus especially focus on the transient properties of the behavioral consequences of asymmetric value updates.
Reinforcement learning (RL) models have been broadly used in modeling the choice behavior of humans and other animals. In standard RL models, the action values are assumed to be updated according to the reward prediction error (RPE), i.e., the difference between the obtained reward and the expected reward. Numerous studies have noted that the magnitude of the update is biased depending on the sign of the RPE. The bias is represented in RL models by differential learning rates for positive and negative RPEs. However, which aspect of behavioral data that the estimated differential learning rates reflect is not well understood. In this study, we investigate how the differential learning rates influence the statistical properties of choice behavior (i.e., the relation between past experiences and the current choice) based on theoretical considerations and numerical simulations. We clarify that when the learning rates differ, the impact of a past outcome depends on the subsequent outcomes, in contrast to standard RL models with symmetric value updates. Based on the results, we propose a model-neutral statistical test to validate the hypothesis that value updates are asymmetric. The asymmetry in the value updates induces the autocorrelation of choice (i.e., the tendency to repeat the same choice or to switch the choice irrespective of past rewards). Conversely, if an RL model without an intrinsic autocorrelation factor is fitted to data that possess an intrinsic autocorrelation, a statistical bias to overestimate the difference in learning rates arises. We demonstrate that this bias can cause a statistical artifact in RL-model fitting leading to a “pseudo-positivity bias” and a “pseudo-confirmation bias.”
Contextual influence of reinforcement learning performance of depression: Evidence for a negativity bias?
2023, Psychological Medicine
Active reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts
2024, PLoS Computational Biology
Externally Provided Rewards Increase Internal Preference, but Not as Much as Preferred Ones Without Extrinsic Rewards
2024, Computational Brain and Behavior
A Reinforcement Learning Model with Choice Traces for a Progressive Ratio Schedule
2023, bioRxiv

View all citing articles on Scopus

View full text

Model-based estimation of subjective values using choice tasks with probabilistic feedback

Highlights

Abstract

Introduction

Section snippets

Proposed method

Simulations

Application to rat experiments

Discussion

Acknowledgment

Neuroscience and Biobehavioral Reviews

Journal of Mathematical Psychology

Biological Psychology

Trends in Cognitive Science

Neuroscience and Biobehavioral Reviews

Current Opinion in Neurobiology

Comparison of decision learning models using the generalization criterion method

Cognitive Science

A new look at the statistical model identification

IEEE Transactions on Automatic Control

Learning the value of information in an uncertain world

Nature Neuroscience

Understanding neural coding through the model-based analysis of decision making

Journal of Neuroscience

Trial-by-trial data analysis using computational models

Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria

American Economic Review

Preference reversals with food and water reinforcers in rats

Journal of the Experimental Analysis of Behavior

Discounting of delayed food rewards in pigeons and rats: is there a magnitude effect?

Journal of the Experimental Analysis of Behavior

Positive mood effects on delay discounting

Emotion

Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

Biological Mood in Anxiety Disorder

Validation of decision-making models and analysis of decision variables in the rat basal ganglia

Journal of Neuroscience

The neural correlates of subjective value during intertemporal choice

Nature Neuroscience