Model-based estimation of subjective values using choice tasks with probabilistic feedback
Introduction
Subjective values (or utilities) assigned to positive or negative events by living systems in general differ from their objective value (e.g., amount of money). Rewards with larger amounts and less delay are basically preferable, but the subjective values are not linearly related to objective, measurable values such as amount and delay (e.g., Kahneman & Tversky, 1979). Investigations into the valuation systems of living systems have gained significant attention in various fields such as psychology, neuroscience, and psychiatry (e.g., O’Doherty (2014), Rangel et al. (2008)). For example, some psychiatric disorders (e.g., depression) can be characterized by altered subjective values (for a review, see Chen, Takahashi, Nakagawa, Inoue, & Kusumi, 2015). Thus, the validity of animal models of a psychiatric disorder may be evaluated based on the subjective values of the subjects.
Traditional econometric methods of estimating subjective value cannot be applied to animals because they rely on verbal instruction (e.g., Kable & Glimcher (2007), Kahneman & Tversky (1979)). Several methods have been used to estimate subjective values or preferences in animal studies. A typical procedure is to have the subjects learn the relationship between a specific response (e.g., pressing a lever or remaining in a specific location) and the resulting outcome, from which the subjective value is measured (e.g., Green & Estle, 2003). This approach requires sufficient training so that the animals learn the relationships among all of the items and the choice behavior reaches the steady state. Another common method utilizes the law of how animals distribute their responses depending on the reinforcement, i.e., the matching law (Miller, 1976). Both approaches rely on the pairwise comparison of preferences for two items. Thus, to measure the subjective values of several items, the researcher must examine the preferences for multiple combinations of items. This method requires much time and sophisticated experimental considerations.
In the present study, we propose a novel method for estimating subjective values especially from animal behaviors using novel behavioral tasks and reinforcement learning (RL) model-based analysis. RL is usually formulated as an algorithm that attempts to maximize the total reward that a decision-maker can obtain. Recent studies, however, have begun to use the RL framework to model human behavior that does not necessarily lead to reward maximization Neiman & Loewenstein (2011), Shteingart & Loewenstein (2014). For example, basketball players tend to choose to make a 3-point shot immediately after an experience of success; however, this dependence decreases the success rate. This choice behavior is modeled using an RL model Neiman & Loewenstein (2011), Neiman & Loewenstein (2014). Additionally, RL models have been important data analysis tools for experiments involving value-based, decision-making tasks Corrado & Doya (2007), Daw (2011), O’Doherty et al. (2007).
Standard RL theory assumes that there is an increased probability of choosing an option that has been reinforced in the immediate past. The magnitude of dependence decays exponentially with the passage of time (trials) (Katahira, 2015). The main idea of the proposed method is to utilize this property. The RL theory also assumes that the larger the subjective value of an outcome, the more frequently the decision-maker repeats the same choice in the immediate future. Using the model parameter fit of RL models to trial-by-trial data, one can effectively estimate the subjective values of various decision-outcomes. The proposed method takes advantage of transient, trial-level dynamics of behavior, whereas other conventional methods examine only steady-state behavior. By using the transient effect of outcome on subsequent choices, it can estimate the value of multiple types of outcomes in a single experiment consisting of only two options.
The remainder of this paper is organized as follows. First, we describe the proposed method, which consists of the novel experimental design and RL model-based analysis. Next, we examine the validity and several properties of the proposed method based on synthetic data. We then apply the proposed method to actual behavioral data from rats. In the demonstration, we examined the rats’ subjective values regarding amounts of rewards and delays of the reward (and no-reward) signal. Finally, we discuss the advantages and limitations of the proposed method.
Section snippets
Proposed method
The proposed method consists of novel experimental tasks and RL model-based trial-by-trial analysis of behavioral data. In the following, we describe the basic task structure, the RL models, and the statistical analysis procedure.
Simulations
To examine the validity of the proposed method, we perform computer simulations based on the synthesized data set. One of the advantages of simulations using synthetic data compared with real behavioral data is that we can know the ground truth about the underlying subjective reward values. Thus, we can evaluate how the method works in a straightforward manner. Specifically, we address the following 7 points.
- 1.
The validity of the estimates of subjective values (addressed in Case
Application to rat experiments
We next demonstrate how the proposed method works for an actual animal experiment using rats. The goal here is not to extract general conclusions about rat behavior. Rather, we intend to confirm that the proposed method can extract valid and interpretable estimates from individual rats. Two rats both performed two tasks. In Task 1, we randomly manipulated the reward amount. It is reasonable to expect that the reward value should be a non-decreasing function of the reward amount because it is
Discussion
In the present paper, we have proposed a novel framework for estimating subjective values. The framework consists of novel behavioral tasks and model-based analysis of the behavioral data. Our methods utilize the history dependence of choice behavior (i.e., the larger the subjective value of the outcome, the more likely the action is to be repeated; this influence decays as subjects experience additional trials). This tendency is represented by the RL framework, from which we can estimate the
Acknowledgment
This work was partially supported by Grants-in-Aid for Scientific Research (KAKENHI) Nos. 24700238, 26118506, 15K12140, and 23118003.
References (37)
- et al.
Reinforcement learning in depression: A review of computational research
Neuroscience and Biobehavioral Reviews
(2015) The relation between reinforcement learning parameters and the influence of reinforcement history on choice behavior
Journal of Mathematical Psychology
(2015)- et al.
Individual differences in heart rate variability are associated with the avoidance of negative emotional events
Biological Psychology
(2014) - et al.
Computational psychiatry
Trends in Cognitive Science
(2012) The problem with value
Neuroscience and Biobehavioral Reviews
(2014)- et al.
Reinforcement learning and human behavior
Current Opinion in Neurobiology
(2014) - et al.
Comparison of decision learning models using the generalization criterion method
Cognitive Science
(2008) A new look at the statistical model identification
IEEE Transactions on Automatic Control
(1974)- et al.
Learning the value of information in an uncertain world
Nature Neuroscience
(2007) - et al.
Understanding neural coding through the model-based analysis of decision making
Journal of Neuroscience
(2007)
Trial-by-trial data analysis using computational models
Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria
American Economic Review
Preference reversals with food and water reinforcers in rats
Journal of the Experimental Analysis of Behavior
Discounting of delayed food rewards in pigeons and rats: is there a magnitude effect?
Journal of the Experimental Analysis of Behavior
Positive mood effects on delay discounting
Emotion
Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis
Biological Mood in Anxiety Disorder
Validation of decision-making models and analysis of decision variables in the rat basal ganglia
Journal of Neuroscience
The neural correlates of subjective value during intertemporal choice
Nature Neuroscience
Cited by (12)
From neuronal to psychological noise – Long-range temporal correlations in EEG intrinsic activity reduce noise in internally-guided decision making
2019, NeuroImageCitation Excerpt :The mechanism underlying the correlations, however, remains to be tested. One may wish to argue that we should conduct a computational model fitting which is used in the EDM studies (Bai et al., 2015; Katahira et al., 2017b; O’Doherty et al., 2007) instead of using the three behavioral indices. Indeed, the learning rate α, decision noise (inverse temperature β), and the values of each stimulus at every trial can be assessed by conducting the model fitting.
The statistical structures of reinforcement learning with asymmetric value updates
2018, Journal of Mathematical PsychologyCitation Excerpt :The steady-state, asymptotic properties of asymmetric RL models have been clarified in Cazé and van der Meer (2013). However, parameter estimates obtained from model fitting reflect not only the steady-state but also transient trial-by-trial dynamics of behaviors (cf. Katahira, Yuki, & Okanoya, 2017). We thus especially focus on the transient properties of the behavioral consequences of asymmetric value updates.
Contextual influence of reinforcement learning performance of depression: Evidence for a negativity bias?
2023, Psychological MedicineActive reinforcement learning versus action bias and hysteresis: control with a mixture of experts and nonexperts
2024, PLoS Computational BiologyExternally Provided Rewards Increase Internal Preference, but Not as Much as Preferred Ones Without Extrinsic Rewards
2024, Computational Brain and Behavior