Skip to main content
Log in

Selective eye-gaze augmentation to enhance imitation learning in Atari games

  • S.I.: Human-in-the-loop Machine Learning and its Applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This paper presents the selective use of eye-gaze information in learning human actions in Atari games. Extensive evidence suggests that our eye movements convey a wealth of information about the direction of our attention and mental states and encode the information necessary to complete a task. Based on this evidence, we hypothesize that selective use of eye-gaze, as a clue for attention direction, will enhance the learning from demonstration. For this purpose, we propose a selective eye-gaze augmentation (SEA) network that learns when to use the eye-gaze information. The proposed network architecture consists of three sub-networks: gaze prediction, gating, and action prediction network. Using the prior 4 game frames, a gaze map is predicted by the gaze prediction network, which is used for augmenting the input frame. The gating network will determine whether the predicted gaze map should be used in learning and is fed to the final network to predict the action at the current frame. To validate this approach, we use publicly available Atari Human Eye-Tracking And Demonstration (Atari-HEAD) dataset consists of 20 Atari games with 28 million human demonstrations and 328 million eye-gazes (over game frames) collected from four subjects. We demonstrate the efficacy of selective eye-gaze augmentation compared to the state-of-the-art Attention Guided Imitation Learning (AGIL) and Behavior Cloning (BC). The results indicate that the selective augmentation approach (the SEA network) performs significantly better than the AGIL and BC. Moreover, to demonstrate the significance of selective use of gaze through the gating network, we compare our approach with the random selection of the gaze. Even in this case, the SEA network performs significantly better, validating the advantage of selectively using the gaze in demonstration learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Judah K, Fern A, Tadepalli P, Goetschalckx R (2014) Imitation learning with demonstrations and shaping rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1890–1896

  2. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  3. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in Starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354

    Article  Google Scholar 

  4. Zhang R, Walshe C, Liu Z, Guan L, Muller KS, Whritner JA, Zhang L, Hayhoe MM, Ballard DH (2020) Atari-head: Atari human eye-tracking and demonstration dataset. AAAI Conference on Artificial Intelligence (AAAI)

  5. Nikulin D, Ianina A, Aliev V, Nikolenko S (2019) Free-lunch saliency via attention in atari agents. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, pp 4240–4249

  6. Saran A, Zhang R, Short ES, Niekum S (2020) Efficiently guiding imitation learning algorithms with human gaze. arXiv preprint arXiv:2002.12500

  7. Zhang R, Liu Z, Zhang L, Whritner JA, Muller KS, Hayhoe MM, Ballard DH (2018) Agil: Learning attention from human for visuomotor tasks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 663–679

  8. Li Y, Liu M, Rehg JM (2018) In the eye of beholder: Joint learning of gaze and actions in first person video. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 619–635

  9. Neumann O (2016) Beyond capacity: a functional view of attention. Perspectives on perception and action. Routledge, pp 375–408

  10. Houghton G, Tipper SP (2013) A model of selective attention as a mechanism of cognitive control. Localist connectionist approaches to human cognition. Psychology Press, pp 49–84

  11. Castiello U (2005) The neuroscience of grasping. Nat Rev Neurosci 6(9):726–736

    Article  Google Scholar 

  12. Cisek P (2007) Cortical mechanisms of action selection: the affordance competition hypothesis. Philos Trans R Soc B Biol Sci 362(1485):1585–1599

    Article  Google Scholar 

  13. Petrosino G, Parisi D, Nolfi S (2013) Selective attention enables action selection: evidence from evolutionary robotics experiments. Adapt Behav 21(5):356–370

    Article  Google Scholar 

  14. Zhao M, Gersch TM, Schnitzer BS, Dosher BA, Kowler E (2012) Eye movements and attention: the role of pre-saccadic shifts of attention in perception, memory and the control of saccades. Vis Res 74:40–60

    Article  Google Scholar 

  15. Gibson JJ (2014) The ecological approach to visual perception, classic. Psychology Press

  16. Miller J, Hackley SA (1992) Electrophysiological evidence for temporal overlap among contingent mental processes. J Exp Psychol Gen 121(2):195

    Article  Google Scholar 

  17. Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28(11):1311–1328

    Article  Google Scholar 

  18. Ahlstrom C, Victor T, Wege C, Steinmetz E (2011) Processing of eye/head-tracking data in large-scale naturalistic driving data sets. IEEE Trans Intell Transp Syst 13(2):553–564

    Article  Google Scholar 

  19. Gredebäck G, Falck-Ytter T (2015) Eye movements during action observation. Perspect Psychol Sci 10(5):591–598

    Article  Google Scholar 

  20. Flanagan JR, Johansson RS (2003) Action plans used in action observation. Nature 424(6950):769–771

    Article  Google Scholar 

  21. Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. J Artif Intell Res 47:253–279. https://doi.org/10.1613/jair.3912

    Article  Google Scholar 

  22. Barrett M, Bingel J, Hollenstein N, Rei M, Søgaard A (2018) Sequence classification with human attention. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp 302–312

  23. Penkov S, Bordallo A, Ramamoorthy S (2017) Physical symbol grounding and instance learning through demonstration and eye tracking. In: 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp 5921–5928

  24. Palazzi A, Abati D, Solera F, Cucchiara R et al (2018) Predicting the drivers focus of attention: the dr (eye) ve project. IEEE Trans Pattern Anal Mach Intell 41(7):1720–1733

    Article  Google Scholar 

  25. Chen Y, Liu C, Tai L, Liu M, Shi BE (2019) Gaze training by modulated dropout improves imitation learning. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 7756–7761

  26. Chen Y, Liu C, Shi BE, Liu M (2020) Robot navigation in crowds by graph convolutional networks with attention learned from human gaze. IEEE Robot Autom Lett 5(2):2754–2761

    Article  Google Scholar 

  27. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

  28. Meur OL, Baccino T (2012) Methods for comparing scan paths and saliency maps: strengths and weaknesses. Behav Res Methods 45(1):251–266. https://doi.org/10.3758/s13428-012-0226-9

    Article  Google Scholar 

  29. Adams RA, Bauer M, Pinotsis D, Friston KJ (2016) Dynamic causal modelling of eye movements during pursuit: confirming precision-encoding in v1 using meg. Neuroimage 132:175–189

    Article  Google Scholar 

  30. Gerstenberg T, Peterson MF, Goodman ND, Lagnado DA, Tenenbaum JB (2017) Eye-tracking causality. Psychol Sci 28(12):1731–1744

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge NVIDIA Corporation’s support with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ehsan T. Esfahani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thammineni, C., Manjunatha, H. & Esfahani, E.T. Selective eye-gaze augmentation to enhance imitation learning in Atari games. Neural Comput & Applic 35, 23401–23410 (2023). https://doi.org/10.1007/s00521-021-06367-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06367-y

Keywords

Navigation