Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

Aubin, Lise; Khamassi, Mehdi; Girard, Benoît

doi:10.1007/978-3-319-95972-6_4

Lise Aubin²⁰,
Mehdi Khamassi²⁰ &
Benoît Girard²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10928))

Included in the following conference series:

Conference on Biomimetic and Biohybrid Systems

2619 Accesses
6 Citations
2 Altmetric

Abstract

During sleep and wakeful rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These replays have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, prioritized sweeping, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate if such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the transition and reward models should occur during rest periods, and that the corresponding replays should be shuffled.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

O’Keefe, J., Dostrovsky, J.: The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34(1), 171–175 (1971)
Article Google Scholar
Wilson, M.A., McNaughton, B.L., et al.: Reactivation of hippocampal ensemble memories during sleep. Science 265(5172), 676–679 (1994)
Article Google Scholar
Girardeau, G., Benchenane, K., Wiener, S.I., Buzsáki, G., Zugaro, M.B.: Selective suppression of hippocampal ripples impairs spatial memory. Nat. Neurosci. 12(10), 1222–1223 (2009)
Article Google Scholar
Foster, D.J., Wilson, M.A.: Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084), 680–683 (2006)
Article Google Scholar
Lee, A.K., Wilson, M.A.: Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36(6), 1183–1194 (2002)
Article Google Scholar
Gupta, A.S., van der Meer, M.A.A., Touretzky, D.S., Redish, A.D.: Hippocampal replay is not a simple function of experience. Neuron 65(5), 695–705 (2010)
Article Google Scholar
Chen, Z., Wilson, M.A.: Deciphering neural codes of memory during sleep. Trends Neurosci. 40(5), 260–275 (2017)
Article Google Scholar
Peyrache, A., Khamassi, M., Benchenane, K., Wiener, S.I., Battaglia, F.P.: Replay of rule-learning related neural patterns in the prefrontal cortex during sleep. Nat. Neurosci. 12(7), 919–926 (2009)
Article Google Scholar
McClelland, J.L., McNaughton, B.L., O’reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)
Article Google Scholar
De Lavilléon, G., Lacroix, M.M., Rondi-Reig, L., Benchenane, K.: Explicit memory creation during sleep demonstrates a causal role of place cells in navigation. Nat. Neurosci. 18(4), 493–495 (2015)
Article Google Scholar
Cazé, R., Khamassi, M., Aubin, L., Girard, B.: Hippocampal replays under the scrutiny of reinforcement learning models (2018, submitted)
Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)
Chapter Google Scholar
Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)
Google Scholar
Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. Adapt. Behav. 1(4), 437–454 (1993)
Article Google Scholar
Khamassi, M., Lacheze, L., Girard, B., Berthoz, A., Guillot, A.: Actor-critic models of reinforcement learning in the basal ganglia: from natural to arificial rats. Adapt. Behav. 13, 131–148 (2005)
Article Google Scholar
Lin, L.H.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3/4), 69–97 (1992)
Article Google Scholar
Paz-Villagrán, V., Save, E., Poucet, B.: Independent coding of connected environments by place cells. Eur. J. Neurosci. 20(5), 1379–1390 (2004)
Article Google Scholar
Eichenbaum, H.: Prefrontal-hippocampal interactions in episodic memory. Nat. Rev. Neurosci. 18(9), 547 (2017)
Article Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: Analysis of temporal-diffference learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1075–1081 (1997)
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar

Download references

Acknowledgements

The authors thank O. Sigaud for fruitful discussions, and F. Cinotti for proofreading. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 640891 (DREAM Project). This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.

Author information

Authors and Affiliations

Institut des Systèmes Intelligents et de Robotique (ISIR), Sorbonne Université, CNRS, 75005, Paris, France
Lise Aubin, Mehdi Khamassi & Benoît Girard

Authors

Lise Aubin
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Khamassi
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Girard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benoît Girard .

Editor information

Editors and Affiliations

SPECS, Institute for Bioengineering of Catalonia, Barcelona, Spain
Vasiliki Vouloutsi
Laboratoire Interdisciplinaire des, Université Paris Diderot, Paris Cedex 13, France
José Halloy
SPECS, Institute for Bioengineering of Catalonia, Barcelona, Spain
Anna Mura
University of Sheffield, Sheffield, United Kingdom
Michael Mangan
Bristol University, Bristol, United Kingdom
Nathan Lepora
University of Sheffield, Sheffield, United Kingdom
Tony J. Prescott
SPECS, Institute for Bioengineering of Catalonia, Barcelona, Spain
Paul F.M.J. Verschure

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aubin, L., Khamassi, M., Girard, B. (2018). Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays. In: Vouloutsi , V., et al. Biomimetic and Biohybrid Systems. Living Machines 2018. Lecture Notes in Computer Science(), vol 10928. Springer, Cham. https://doi.org/10.1007/978-3-319-95972-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-95972-6_4
Published: 07 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95971-9
Online ISBN: 978-3-319-95972-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics