Model-based estimation of subjective values using choice tasks with probabilistic feedback

https://doi.org/10.1016/j.jmp.2017.05.005Get rights and content

Highlights

  • A novel method for estimating subjective value from choice behavior is proposed.

  • The proposed method employs a new choice task using probabilistic feedback.

  • The proposed method performs the model-based estimation based on a reinforcement learning theory.

  • The validity and limitations of the proposed method are investigated.

  • The proposed method is demonstrated using actual choice data from rats.

Abstract

Evaluating the subjective value of events is a crucial task in the investigation of how the brain implements the value-based computations by which living systems make decisions. This task is often not straightforward, especially for animal subjects. In the present paper, we propose a novel model-based method for estimating subjective value from choice behavior. The proposed method is based on reinforcement learning (RL) theory. It draws upon the premise that a subject tends to choose the option that leads to an outcome with a high subjective value. The proposed method consists of two components: (1) a novel behavioral task in which the choice outcome is presented randomly within the same valence category and (2) the model parameter fit of RL models to the behavioral data. We investigated the validity and limitations of the proposed method by conducting several computer simulations. We also applied the proposed method to actual behavioral data from two rats that performed two tasks: one manipulating the reward amount and another manipulating the delay of reward signals. These results demonstrate that reasonable estimates can be obtained using the proposed method.

Introduction

Subjective values (or utilities) assigned to positive or negative events by living systems in general differ from their objective value (e.g., amount of money). Rewards with larger amounts and less delay are basically preferable, but the subjective values are not linearly related to objective, measurable values such as amount and delay (e.g., Kahneman & Tversky, 1979). Investigations into the valuation systems of living systems have gained significant attention in various fields such as psychology, neuroscience, and psychiatry (e.g., O’Doherty (2014), Rangel et al. (2008)). For example, some psychiatric disorders (e.g., depression) can be characterized by altered subjective values (for a review, see Chen, Takahashi, Nakagawa, Inoue, & Kusumi, 2015). Thus, the validity of animal models of a psychiatric disorder may be evaluated based on the subjective values of the subjects.

Traditional econometric methods of estimating subjective value cannot be applied to animals because they rely on verbal instruction (e.g., Kable & Glimcher (2007), Kahneman & Tversky (1979)). Several methods have been used to estimate subjective values or preferences in animal studies. A typical procedure is to have the subjects learn the relationship between a specific response (e.g., pressing a lever or remaining in a specific location) and the resulting outcome, from which the subjective value is measured (e.g., Green & Estle, 2003). This approach requires sufficient training so that the animals learn the relationships among all of the items and the choice behavior reaches the steady state. Another common method utilizes the law of how animals distribute their responses depending on the reinforcement, i.e., the matching law (Miller, 1976). Both approaches rely on the pairwise comparison of preferences for two items. Thus, to measure the subjective values of several items, the researcher must examine the preferences for multiple combinations of items. This method requires much time and sophisticated experimental considerations.

In the present study, we propose a novel method for estimating subjective values especially from animal behaviors using novel behavioral tasks and reinforcement learning (RL) model-based analysis. RL is usually formulated as an algorithm that attempts to maximize the total reward that a decision-maker can obtain. Recent studies, however, have begun to use the RL framework to model human behavior that does not necessarily lead to reward maximization Neiman & Loewenstein (2011), Shteingart & Loewenstein (2014). For example, basketball players tend to choose to make a 3-point shot immediately after an experience of success; however, this dependence decreases the success rate. This choice behavior is modeled using an RL model Neiman & Loewenstein (2011), Neiman & Loewenstein (2014). Additionally, RL models have been important data analysis tools for experiments involving value-based, decision-making tasks Corrado & Doya (2007), Daw (2011), O’Doherty et al. (2007).

Standard RL theory assumes that there is an increased probability of choosing an option that has been reinforced in the immediate past. The magnitude of dependence decays exponentially with the passage of time (trials) (Katahira, 2015). The main idea of the proposed method is to utilize this property. The RL theory also assumes that the larger the subjective value of an outcome, the more frequently the decision-maker repeats the same choice in the immediate future. Using the model parameter fit of RL models to trial-by-trial data, one can effectively estimate the subjective values of various decision-outcomes. The proposed method takes advantage of transient, trial-level dynamics of behavior, whereas other conventional methods examine only steady-state behavior. By using the transient effect of outcome on subsequent choices, it can estimate the value of multiple types of outcomes in a single experiment consisting of only two options.

The remainder of this paper is organized as follows. First, we describe the proposed method, which consists of the novel experimental design and RL model-based analysis. Next, we examine the validity and several properties of the proposed method based on synthetic data. We then apply the proposed method to actual behavioral data from rats. In the demonstration, we examined the rats’ subjective values regarding amounts of rewards and delays of the reward (and no-reward) signal. Finally, we discuss the advantages and limitations of the proposed method.

Section snippets

Proposed method

The proposed method consists of novel experimental tasks and RL model-based trial-by-trial analysis of behavioral data. In the following, we describe the basic task structure, the RL models, and the statistical analysis procedure.

Simulations

To examine the validity of the proposed method, we perform computer simulations based on the synthesized data set. One of the advantages of simulations using synthetic data compared with real behavioral data is that we can know the ground truth about the underlying subjective reward values. Thus, we can evaluate how the method works in a straightforward manner. Specifically, we address the following 7 points.

  • 1.

    The validity of the estimates of subjective values (addressed in Case

Application to rat experiments

We next demonstrate how the proposed method works for an actual animal experiment using rats. The goal here is not to extract general conclusions about rat behavior. Rather, we intend to confirm that the proposed method can extract valid and interpretable estimates from individual rats. Two rats both performed two tasks. In Task 1, we randomly manipulated the reward amount. It is reasonable to expect that the reward value should be a non-decreasing function of the reward amount because it is

Discussion

In the present paper, we have proposed a novel framework for estimating subjective values. The framework consists of novel behavioral tasks and model-based analysis of the behavioral data. Our methods utilize the history dependence of choice behavior (i.e., the larger the subjective value of the outcome, the more likely the action is to be repeated; this influence decays as subjects experience additional trials). This tendency is represented by the RL framework, from which we can estimate the

Acknowledgment

This work was partially supported by Grants-in-Aid for Scientific Research (KAKENHI) Nos. 24700238, 26118506, 15K12140, and 23118003.

References (37)

  • DawN.D.

    Trial-by-trial data analysis using computational models

  • ErevI. et al.

    Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria

    American Economic Review

    (1998)
  • GreenL. et al.

    Preference reversals with food and water reinforcers in rats

    Journal of the Experimental Analysis of Behavior

    (2003)
  • GreenL. et al.

    Discounting of delayed food rewards in pigeons and rats: is there a magnitude effect?

    Journal of the Experimental Analysis of Behavior

    (2004)
  • HirshJ.B. et al.

    Positive mood effects on delay discounting

    Emotion

    (2010)
  • HuysQ.J. et al.

    Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis

    Biological Mood in Anxiety Disorder

    (2013)
  • ItoM. et al.

    Validation of decision-making models and analysis of decision variables in the rat basal ganglia

    Journal of Neuroscience

    (2009)
  • KableJ.W. et al.

    The neural correlates of subjective value during intertemporal choice

    Nature Neuroscience

    (2007)
  • Cited by (12)

    • From neuronal to psychological noise – Long-range temporal correlations in EEG intrinsic activity reduce noise in internally-guided decision making

      2019, NeuroImage
      Citation Excerpt :

      The mechanism underlying the correlations, however, remains to be tested. One may wish to argue that we should conduct a computational model fitting which is used in the EDM studies (Bai et al., 2015; Katahira et al., 2017b; O’Doherty et al., 2007) instead of using the three behavioral indices. Indeed, the learning rate α, decision noise (inverse temperature β), and the values of each stimulus at every trial can be assessed by conducting the model fitting.

    • The statistical structures of reinforcement learning with asymmetric value updates

      2018, Journal of Mathematical Psychology
      Citation Excerpt :

      The steady-state, asymptotic properties of asymmetric RL models have been clarified in Cazé and van der Meer (2013). However, parameter estimates obtained from model fitting reflect not only the steady-state but also transient trial-by-trial dynamics of behaviors (cf. Katahira, Yuki, & Okanoya, 2017). We thus especially focus on the transient properties of the behavioral consequences of asymmetric value updates.

    View all citing articles on Scopus
    View full text