ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Neural Networks
Volume 20, Issue 6, August 2007, Pages 668-675
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (959 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.neunet.2007.04.028    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2007 Elsevier Ltd All rights reserved.

Multiple model-based reinforcement learning explains dopamine neuronal activity

Mathieu Bertina, b, Corresponding Author Contact Information, E-mail The Corresponding Author, Nicolas Schweighoferc and Kenji Doyaa, d

aATR Computational Neuroscience Labs, 2-2-2 Hikaridai, “Keihanna Science City”, Kyoto 619-0288, Japan bLaboratoire d’Informatique de Paris 6, Universite Paris 6 Pierre et Marie Curie, 4 place Jussieu 75005, Paris, France cDepartment of Biokinesiology and Physical Therapy, University of Southern California, 1540 E. Alcazar St. CHP 155, Los Angeles 90089-9006, USA dNeural Computation Unit, Initial Research Project Laboratory, Okinawa Institute of Science and Technology, 12-22 Suzaki, Gushikawa, Okinawa, 904-2234, Japan

Received 18 February 2005; 
accepted 11 April 2007. 
Available online 6 June 2007.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

A number of computational models have explained the behavior of dopamine neurons in terms of temporal difference learning. However, earlier models cannot account for recent results of conditioning experiments; specifically, the behavior of dopamine neurons in case of variation of the interval between a cue stimulus and a reward has not been satisfyingly accounted for. We address this problem by using a modular architecture, in which each module consists of a reward predictor and a value estimator. A “responsibility signal”, computed from the accuracy of the predictions of the reward predictors, is used to weight the contributions and learning of the value estimators. This multiple-model architecture gives an accurate account of the behavior of dopamine neurons in two specific experiments: when the reward is delivered earlier than expected, and when the stimulus–reward interval varies uniformly over a fixed range.

Keywords: Dopamine; Reinforcement learning; Multiple model; Timing prediction; Classical conditioning

Article Outline

1. Introduction
2. Experiments on temporal variability
2.1. Earlier reward delivery
2.2. Uniformly varying stimulus–reward interval
3. Multiple Model-based Reinforcement Learning
3.1. Reward predictor
3.2. Value estimator
4. Results
5. Discussion
6. Conclusion
7. Simulation procedures
7.1. Simple conditioning (Fig. 4)
7.2. Earlier reward delivery (Fig. 5, right)
7.3. Uniformly varying stimulus–reward interval (Fig. 6, right)
Acknowledgements
References







Neural Networks
Volume 20, Issue 6, August 2007, Pages 668-675
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.