Abstract
This paper presents the selective use of eye-gaze information in learning human actions in Atari games. Extensive evidence suggests that our eye movements convey a wealth of information about the direction of our attention and mental states and encode the information necessary to complete a task. Based on this evidence, we hypothesize that selective use of eye-gaze, as a clue for attention direction, will enhance the learning from demonstration. For this purpose, we propose a selective eye-gaze augmentation (SEA) network that learns when to use the eye-gaze information. The proposed network architecture consists of three sub-networks: gaze prediction, gating, and action prediction network. Using the prior 4 game frames, a gaze map is predicted by the gaze prediction network, which is used for augmenting the input frame. The gating network will determine whether the predicted gaze map should be used in learning and is fed to the final network to predict the action at the current frame. To validate this approach, we use publicly available Atari Human Eye-Tracking And Demonstration (Atari-HEAD) dataset consists of 20 Atari games with 28 million human demonstrations and 328 million eye-gazes (over game frames) collected from four subjects. We demonstrate the efficacy of selective eye-gaze augmentation compared to the state-of-the-art Attention Guided Imitation Learning (AGIL) and Behavior Cloning (BC). The results indicate that the selective augmentation approach (the SEA network) performs significantly better than the AGIL and BC. Moreover, to demonstrate the significance of selective use of gaze through the gating network, we compare our approach with the random selection of the gaze. Even in this case, the SEA network performs significantly better, validating the advantage of selectively using the gaze in demonstration learning.
Similar content being viewed by others
References
Judah K, Fern A, Tadepalli P, Goetschalckx R (2014) Imitation learning with demonstrations and shaping rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1890–1896
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in Starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354
Zhang R, Walshe C, Liu Z, Guan L, Muller KS, Whritner JA, Zhang L, Hayhoe MM, Ballard DH (2020) Atari-head: Atari human eye-tracking and demonstration dataset. AAAI Conference on Artificial Intelligence (AAAI)
Nikulin D, Ianina A, Aliev V, Nikolenko S (2019) Free-lunch saliency via attention in atari agents. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, pp 4240–4249
Saran A, Zhang R, Short ES, Niekum S (2020) Efficiently guiding imitation learning algorithms with human gaze. arXiv preprint arXiv:2002.12500
Zhang R, Liu Z, Zhang L, Whritner JA, Muller KS, Hayhoe MM, Ballard DH (2018) Agil: Learning attention from human for visuomotor tasks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 663–679
Li Y, Liu M, Rehg JM (2018) In the eye of beholder: Joint learning of gaze and actions in first person video. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 619–635
Neumann O (2016) Beyond capacity: a functional view of attention. Perspectives on perception and action. Routledge, pp 375–408
Houghton G, Tipper SP (2013) A model of selective attention as a mechanism of cognitive control. Localist connectionist approaches to human cognition. Psychology Press, pp 49–84
Castiello U (2005) The neuroscience of grasping. Nat Rev Neurosci 6(9):726–736
Cisek P (2007) Cortical mechanisms of action selection: the affordance competition hypothesis. Philos Trans R Soc B Biol Sci 362(1485):1585–1599
Petrosino G, Parisi D, Nolfi S (2013) Selective attention enables action selection: evidence from evolutionary robotics experiments. Adapt Behav 21(5):356–370
Zhao M, Gersch TM, Schnitzer BS, Dosher BA, Kowler E (2012) Eye movements and attention: the role of pre-saccadic shifts of attention in perception, memory and the control of saccades. Vis Res 74:40–60
Gibson JJ (2014) The ecological approach to visual perception, classic. Psychology Press
Miller J, Hackley SA (1992) Electrophysiological evidence for temporal overlap among contingent mental processes. J Exp Psychol Gen 121(2):195
Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28(11):1311–1328
Ahlstrom C, Victor T, Wege C, Steinmetz E (2011) Processing of eye/head-tracking data in large-scale naturalistic driving data sets. IEEE Trans Intell Transp Syst 13(2):553–564
Gredebäck G, Falck-Ytter T (2015) Eye movements during action observation. Perspect Psychol Sci 10(5):591–598
Flanagan JR, Johansson RS (2003) Action plans used in action observation. Nature 424(6950):769–771
Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. J Artif Intell Res 47:253–279. https://doi.org/10.1613/jair.3912
Barrett M, Bingel J, Hollenstein N, Rei M, Søgaard A (2018) Sequence classification with human attention. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp 302–312
Penkov S, Bordallo A, Ramamoorthy S (2017) Physical symbol grounding and instance learning through demonstration and eye tracking. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 5921–5928
Palazzi A, Abati D, Solera F, Cucchiara R et al (2018) Predicting the drivers focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733
Chen Y, Liu C, Tai L, Liu M, Shi BE (2019) Gaze training by modulated dropout improves imitation learning. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 7756–7761
Chen Y, Liu C, Shi BE, Liu M (2020) Robot navigation in crowds by graph convolutional networks with attention learned from human gaze. IEEE Robot Autom Lett 5(2):2754–2761
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
Meur OL, Baccino T (2012) Methods for comparing scan paths and saliency maps: strengths and weaknesses. Behav Res Methods 45(1):251–266. https://doi.org/10.3758/s13428-012-0226-9
Adams RA, Bauer M, Pinotsis D, Friston KJ (2016) Dynamic causal modelling of eye movements during pursuit: confirming precision-encoding in v1 using meg. Neuroimage 132:175–189
Gerstenberg T, Peterson MF, Goodman ND, Lagnado DA, Tenenbaum JB (2017) Eye-tracking causality. Psychol Sci 28(12):1731–1744
Acknowledgements
We gratefully acknowledge NVIDIA Corporation’s support with the donation of the Titan Xp GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Thammineni, C., Manjunatha, H. & Esfahani, E.T. Selective eye-gaze augmentation to enhance imitation learning in Atari games. Neural Comput & Applic 35, 23401–23410 (2023). https://doi.org/10.1007/s00521-021-06367-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06367-y