Selective eye-gaze augmentation to enhance imitation learning in Atari games

Thammineni, Chaitanya; Manjunatha, Hemanth; Esfahani, Ehsan T.

doi:10.1007/s00521-021-06367-y

Selective eye-gaze augmentation to enhance imitation learning in Atari games

S.I.: Human-in-the-loop Machine Learning and its Applications
Published: 13 August 2021

Volume 35, pages 23401–23410, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Chaitanya Thammineni¹,
Hemanth Manjunatha¹ &
Ehsan T. Esfahani ORCID: orcid.org/0000-0001-5893-4664¹

329 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents the selective use of eye-gaze information in learning human actions in Atari games. Extensive evidence suggests that our eye movements convey a wealth of information about the direction of our attention and mental states and encode the information necessary to complete a task. Based on this evidence, we hypothesize that selective use of eye-gaze, as a clue for attention direction, will enhance the learning from demonstration. For this purpose, we propose a selective eye-gaze augmentation (SEA) network that learns when to use the eye-gaze information. The proposed network architecture consists of three sub-networks: gaze prediction, gating, and action prediction network. Using the prior 4 game frames, a gaze map is predicted by the gaze prediction network, which is used for augmenting the input frame. The gating network will determine whether the predicted gaze map should be used in learning and is fed to the final network to predict the action at the current frame. To validate this approach, we use publicly available Atari Human Eye-Tracking And Demonstration (Atari-HEAD) dataset consists of 20 Atari games with 28 million human demonstrations and 328 million eye-gazes (over game frames) collected from four subjects. We demonstrate the efficacy of selective eye-gaze augmentation compared to the state-of-the-art Attention Guided Imitation Learning (AGIL) and Behavior Cloning (BC). The results indicate that the selective augmentation approach (the SEA network) performs significantly better than the AGIL and BC. Moreover, to demonstrate the significance of selective use of gaze through the gating network, we compare our approach with the random selection of the gaze. Even in this case, the SEA network performs significantly better, validating the advantage of selectively using the gaze in demonstration learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AGIL: Learning Attention from Human for Visuomotor Tasks

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

Predicting Gaze in Egocentric Video by Learning Task-Dependent Attention Transition

References

Judah K, Fern A, Tadepalli P, Goetschalckx R (2014) Imitation learning with demonstrations and shaping rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1890–1896
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in Starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354
Article Google Scholar
Zhang R, Walshe C, Liu Z, Guan L, Muller KS, Whritner JA, Zhang L, Hayhoe MM, Ballard DH (2020) Atari-head: Atari human eye-tracking and demonstration dataset. AAAI Conference on Artificial Intelligence (AAAI)
Nikulin D, Ianina A, Aliev V, Nikolenko S (2019) Free-lunch saliency via attention in atari agents. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, pp 4240–4249
Saran A, Zhang R, Short ES, Niekum S (2020) Efficiently guiding imitation learning algorithms with human gaze. arXiv preprint arXiv:2002.12500
Zhang R, Liu Z, Zhang L, Whritner JA, Muller KS, Hayhoe MM, Ballard DH (2018) Agil: Learning attention from human for visuomotor tasks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 663–679
Li Y, Liu M, Rehg JM (2018) In the eye of beholder: Joint learning of gaze and actions in first person video. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 619–635
Neumann O (2016) Beyond capacity: a functional view of attention. Perspectives on perception and action. Routledge, pp 375–408
Houghton G, Tipper SP (2013) A model of selective attention as a mechanism of cognitive control. Localist connectionist approaches to human cognition. Psychology Press, pp 49–84
Castiello U (2005) The neuroscience of grasping. Nat Rev Neurosci 6(9):726–736
Article Google Scholar
Cisek P (2007) Cortical mechanisms of action selection: the affordance competition hypothesis. Philos Trans R Soc B Biol Sci 362(1485):1585–1599
Article Google Scholar
Petrosino G, Parisi D, Nolfi S (2013) Selective attention enables action selection: evidence from evolutionary robotics experiments. Adapt Behav 21(5):356–370
Article Google Scholar
Zhao M, Gersch TM, Schnitzer BS, Dosher BA, Kowler E (2012) Eye movements and attention: the role of pre-saccadic shifts of attention in perception, memory and the control of saccades. Vis Res 74:40–60
Article Google Scholar
Gibson JJ (2014) The ecological approach to visual perception, classic. Psychology Press
Miller J, Hackley SA (1992) Electrophysiological evidence for temporal overlap among contingent mental processes. J Exp Psychol Gen 121(2):195
Article Google Scholar
Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28(11):1311–1328
Article Google Scholar
Ahlstrom C, Victor T, Wege C, Steinmetz E (2011) Processing of eye/head-tracking data in large-scale naturalistic driving data sets. IEEE Trans Intell Transp Syst 13(2):553–564
Article Google Scholar
Gredebäck G, Falck-Ytter T (2015) Eye movements during action observation. Perspect Psychol Sci 10(5):591–598
Article Google Scholar
Flanagan JR, Johansson RS (2003) Action plans used in action observation. Nature 424(6950):769–771
Article Google Scholar
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. J Artif Intell Res 47:253–279. https://doi.org/10.1613/jair.3912
Article Google Scholar
Barrett M, Bingel J, Hollenstein N, Rei M, Søgaard A (2018) Sequence classification with human attention. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp 302–312
Penkov S, Bordallo A, Ramamoorthy S (2017) Physical symbol grounding and instance learning through demonstration and eye tracking. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 5921–5928
Palazzi A, Abati D, Solera F, Cucchiara R et al (2018) Predicting the drivers focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733
Article Google Scholar
Chen Y, Liu C, Tai L, Liu M, Shi BE (2019) Gaze training by modulated dropout improves imitation learning. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 7756–7761
Chen Y, Liu C, Shi BE, Liu M (2020) Robot navigation in crowds by graph convolutional networks with attention learned from human gaze. IEEE Robot Autom Lett 5(2):2754–2761
Article Google Scholar
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Meur OL, Baccino T (2012) Methods for comparing scan paths and saliency maps: strengths and weaknesses. Behav Res Methods 45(1):251–266. https://doi.org/10.3758/s13428-012-0226-9
Article Google Scholar
Adams RA, Bauer M, Pinotsis D, Friston KJ (2016) Dynamic causal modelling of eye movements during pursuit: confirming precision-encoding in v1 using meg. Neuroimage 132:175–189
Article Google Scholar
Gerstenberg T, Peterson MF, Goodman ND, Lagnado DA, Tenenbaum JB (2017) Eye-tracking causality. Psychol Sci 28(12):1731–1744
Article Google Scholar

Download references

Acknowledgements

We gratefully acknowledge NVIDIA Corporation’s support with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Human In the Loop System Laboratory, University at Buffalo, Buffalo, NY, 14260, USA
Chaitanya Thammineni, Hemanth Manjunatha & Ehsan T. Esfahani

Authors

Chaitanya Thammineni
View author publications
You can also search for this author in PubMed Google Scholar
Hemanth Manjunatha
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan T. Esfahani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ehsan T. Esfahani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thammineni, C., Manjunatha, H. & Esfahani, E.T. Selective eye-gaze augmentation to enhance imitation learning in Atari games. Neural Comput & Applic 35, 23401–23410 (2023). https://doi.org/10.1007/s00521-021-06367-y

Download citation

Received: 02 February 2021
Accepted: 26 July 2021
Published: 13 August 2021
Issue Date: November 2023
DOI: https://doi.org/10.1007/s00521-021-06367-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selective eye-gaze augmentation to enhance imitation learning in Atari games

Abstract

Access this article

Similar content being viewed by others

AGIL: Learning Attention from Human for Visuomotor Tasks

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

Predicting Gaze in Egocentric Video by Learning Task-Dependent Attention Transition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Selective eye-gaze augmentation to enhance imitation learning in Atari games

Abstract

Access this article

Similar content being viewed by others

AGIL: Learning Attention from Human for Visuomotor Tasks

In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video

Predicting Gaze in Egocentric Video by Learning Task-Dependent Attention Transition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation