DeepAttent: Saliency Prediction with Deep Multi-scale Residual Network

Dwivedi, Kshitij; Singh, Nitin; Shanmugham, Sabari R.; Kumar, Manoj

doi:10.1007/978-981-32-9291-8_6

Kshitij Dwivedi¹⁸,
Nitin Singh¹⁹,
Sabari R. Shanmugham²⁰ &
…
Manoj Kumar¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1024))

581 Accesses

Abstract

Predicting where humans look in a given scene is a well-known problem with multiple applications in consumer cameras, human–computer interaction, robotics, and gaming. With large-scale image datasets available for human fixation, it is now possible to train deep neural networks for generating a fixation map. Human fixations are a function of both local visual features and global context. We incorporate this in a deep neural network by using global and local features of an image to predict human fixations. We sample multi-scale features of the deep residual network and introduce a new method for incorporating these multi-scale features for the end-to-end training of our network. Our model DeepAttent obtains competitive results on SALICON and iSUN datasets and outperforms state-of-the-art methods on various metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-level Net: A Visual Saliency Prediction Model

A brief survey of visual saliency detection

Article 13 April 2020

Improving Saliency Models by Predicting Human Fixation Patches

References

Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009, pp. 1597–1604. IEEE (2009)
Google Scholar
Borji, A., Tavakoli, H.R., Sihite, D.N., Itti, L.: Analysis of scores, datasets, and models in visual saliency prediction. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 921–928. IEEE (2013)
Google Scholar
Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk: a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)
Article Google Scholar
Cerf, M., Frady, E.P., Koch, C.: Using semantic content as cues for better scanpath prediction. In: Proceedings of the 2008 symposium on Eye tracking research and applications, pp. 143–146. ACM (2008)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, X., Shen, C., Boix, X., Zhao, Q.: Salicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)
Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Article Google Scholar
Jetley, S., Murray, N., Vig, E.: End-to-end saliency mapping via probability distribution prediction. In: Proceedings of Computer Vision and Pattern Recognition 2016, pp. 5753–5761 (2016)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Google Scholar
Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1072–1080. IEEE (2015)
Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)
Google Scholar
Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)
Article MathSciNet Google Scholar
Kümmerer, M., Theis, L., Bethge, M.: Deep gaze i: boosting saliency prediction with feature maps trained on imagenet. arXiv:1411.1045 (2014)
Kümmerer, M., Wallis, T.S., Bethge, M.: Information-theoretic model comparison unifies saliency metrics. Proc. Nat. Acad. Sci. 112(52), 16054–16059 (2015)
Article Google Scholar
Le Meur, O., Le Callet, P., Barba, D.: Predicting visual fixations on video based on low-level visual features. Vision Res. 47(19), 2483–2498 (2007)
Article Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Google Scholar
LSUN’16: Large-scale scene understanding challenge: Leaderboard (2016). http://lsun.cs.princeton.edu/leaderboard/index_2016.html#saliencysalicon. Last accessed 20 Apr 2018
Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)
Google Scholar
Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vision Res. 45(18), 2397–2416 (2005)
Article Google Scholar
Pinheiro, P.O., Lin, T.Y., Collobert, R., Dollár, P.: Learning to refine object segments. In: European Conference on Computer Vision, pp. 75–91. Springer (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., et al.: Going Deeper with Convolutions. In: CVPR (2015)
Google Scholar
Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2798–2805 (2014)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3485–3492. IEEE (2010)
Google Scholar
Xu, P., Ehinger, K.A., Zhang, Y., Finkelstein, A., Kulkarni, S.R., Xiao, J.: Turkergaze: Crowdsourcing saliency with webcam based eye tracking. arXiv preprint arXiv:1504.06755 (2015)

Download references

Author information

Authors and Affiliations

Singapore University of Technology and Design, Singapore, Singapore
Kshitij Dwivedi
Samsung Research Institute, Bengaluru, India
Nitin Singh & Manoj Kumar
DataRobot, Singapore, Singapore
Sabari R. Shanmugham

Authors

Kshitij Dwivedi
View author publications
You can also search for this author in PubMed Google Scholar
Nitin Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sabari R. Shanmugham
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitin Singh .

Editor information

Editors and Affiliations

Techno India University, Kolkata, India
Bidyut B. Chaudhuri
Division of Advanced Information Technology and Computer Science, Tokyo University of Agriculture and Technology, Koganei-shi, Tokyo, Japan
Masaki Nakagawa
Department of Computer Science, Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, Madhya Pradesh, India
Pritee Khanna
Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dwivedi, K., Singh, N., Shanmugham, S.R., Kumar, M. (2020). DeepAttent: Saliency Prediction with Deep Multi-scale Residual Network. In: Chaudhuri, B., Nakagawa, M., Khanna, P., Kumar, S. (eds) Proceedings of 3rd International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 1024. Springer, Singapore. https://doi.org/10.1007/978-981-32-9291-8_6

Download citation

DOI: https://doi.org/10.1007/978-981-32-9291-8_6
Published: 20 September 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9290-1
Online ISBN: 978-981-32-9291-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

DeepAttent: Saliency Prediction with Deep Multi-scale Residual Network

Abstract

Access this chapter

Similar content being viewed by others

Multi-level Net: A Visual Saliency Prediction Model

A brief survey of visual saliency detection

Improving Saliency Models by Predicting Human Fixation Patches

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DeepAttent: Saliency Prediction with Deep Multi-scale Residual Network

Abstract

Access this chapter

Similar content being viewed by others

Multi-level Net: A Visual Saliency Prediction Model

A brief survey of visual saliency detection

Improving Saliency Models by Predicting Human Fixation Patches

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation