Abstract
The geometric distortion of the panoramic image makes the saliency detection method based on traditional 2D convolution invalid. “Mapped Convolution” can effectively solve this problem, which accepts a task- or domain-specific mapping function in the form of an adjacency list that dictates where the convolutional filters sample the input. However, when applied to panorama saliency detection, the method results in additional computational overhead due to repeatedly sampling overlapping regions of adjacent convolution positions along the longitude. In order to solve this problem, we improved the calculation process of “Mapped Convolution”. Rather than accessing adjacency list during the convolution, we first sample the panorama based on the adjacency list for only once and obtain a sampled map. This sampling process is called the decoupled sampling of “Mapped Convolution”. And then the map is convoluted in traditional 2D way, thus avoiding repeatedly sampling. In this paper, an interpolation method based on the Softmax function is also proposed and applied to the interpolation calculation of decoupled sampling. Compared with common interpolation methods such as linear interpolation, this interpolation method makes our network more efficient during training. We additionally introduce a new adaptive equator bias algorithm allowing for different attention distributions at different longitudes, which is more consistent with viewer's visual behavior. Combining the U-Autoencoder network containing the decoupled sampling with the adaptive equator bias algorithm, we construct a 360-degree visual saliency detection model. We map the original panorama into a cube, and then use the the cube isometric mapping method to remap it into a panorama and input it into the network for training. Then, the crude saliency map output by the decoder is combined with the equator bias map to obtain the final saliency map. The results show that the model proposed is superior to recent state-of-the-art models in terms of computational speed and saliency-map prediction.
Similar content being viewed by others
References
Rai, Y., Gutiérrez, J., Le Callet, P.: A dataset of head and eye movements for 360 degree images. Proceedings of the 8th ACM on Multimedia Systems Conference. 205–210 (2017)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)
Jia, L., Wen, G.: Object-Based Visual Saliency Computation. Springer, New York (2014)
Itti, L.: Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans Image Process 13(10), 1304–1318 (2004)
Setlur, V., Takagi, S., Raskar, R., et al.: Automatic image retargeting. Mum05: International Conference on Mobile & Ubiquitous Multimedia. ACM (2004)
Chang, M., Ong, S. K., Nee, A.: Automatic information positioning scheme in AR-assisted maintenance based on visual saliency//International conference on augmented reality. Springer (2016)
Schroers, C., Bazin, J.C., Sorkine-Hornung, A.: An omnistereoscopic video pipeline for capture and display of real-world VR. ACM Transactions Graph. 37(3), 1–13 (2018)
Ding, Y., Liu, Y., Liu, J., et al.: Panoramic Image Saliency Detection by Fusing Visual Frequency Feature and Viewing Behavior Pattern. Springer, Cham (2018)
Startsev, M., Dorr, M.: 360-aware saliency estimation with conventional image saliency predictors. Signal Process. Image Commun. S0923596518302595 (2018)
Harel, J.: Graph-based visual saliency Proc. Conference on Neural Information processing Systems (NIPS). 19: 545–552 (2007)
Bylinskii, Z., Recasens, A., Borji A, et al. Where should saliency models look next? European Conference on Computer Vision (2016)
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: Computer Vision, 2009 IEEE 12th International Conference on. pp. 2106– 2113. IEEE (2009)
Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans Image Process. 26(9), 4446–4456 (2017)
Pan, J.T., Sayrol, E., Giro-i-Nieto, X., McGuinness, K. and O'Connor, N.E.: Shallow and deep convolutional networks for saliency prediction In: Proc of the IEEE Conf on Computer Vision and Pattern Recognition 598–606 (2016)
Wang, W.G., Shen, J.B.: Deep visual attention prediction. IEEE Trans. Image Process. 27(5), 2368–2378 (2018)
Ling, J., Zhang, K., Zhang, Y., et al.: A saliency prediction model on 360 degree images using color dictionary based sparse representation. Signal Process. Image Commun. 69, 60–68 (2018)
Lebreton, P., Raake, A.: GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omni- directional images. Signal Process. Image Commun. 69, 69–78 (2018)
Rafael, M., Sebastian, L., Tejo, C., et al.: Salnet360: saliency maps for omni directional images with CNN. Signal Process. Image Commun. 69(2018), 26–34 (2018)
Le, H., Xuelong, L., Yongsheng, D.: Salnet: edge constraint based to end model for salient object detection. In Chinese Conference on pattern recognition and computer vision (PRCV). Springer, 186–198 (2018)
Fangyi, C., Lu, Z., Hamidouche, W., and Olivier defects Sal-gan360: visual saliency prediction on 360 degree images with general ad-universal networks. In 2018 IEEE International Conference on Multimedia & expoworkshops (icmew). IEEE, 01–04 (2018)
Zhang, Z., Xu, Y., Yu, J., et al.: Saliency detection in 360°videos. Proceedings of the European Conference on Computer Vision.Germany: Springer, 488–503 (2018)
Bogdanova, I., Bur, A., Hugli, H.: Visual attention on the sphere. Comput. Vis. Image Underst. 114(1), 100–110 (2010)
Coors, B., Condurache, AP., Geiger, A., SphereNet: Learning spherical representations for detection and classification in omnidirectional images// European Conference on Computer Vision. Springer, Cham (2018)
Eder, M.et al.: Mapped convolutions. arXiv preprint arXiv:1906.11096, cs.CV (2019)
Keinert, B., Innmann, M., Saenger, M., et al.: Spherical fibonacci mapping. ACM Transactions Graph. 34(6), 1–7 (2015)
Blu, T., Thévenaz, P., Unser, M.: Linear interpolation revitalized. IEEE Transactions Image Process. 13(5), 710–719 (2004)
Sa, Y.: Improved bilinear interpolation method for image fast processing. IEEE (2015)
Lehmann, T.M., Gonner, C., Spitzer, K.: Addendum: B-spline interpolation in medical image processing. IEEE Trans. Med. Imaging 20(7), 660–665 (2001)
Romera, E., Alvarez, J.M., Bergasa, L.M., et al.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. In: IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263–272 (2017). https://doi.org/10.1109/TITS.2017.2750080
Sitzmann, V., Serrano, A., Pavel, A., et al.Saliency in VR: how do people explore virtual environments?. Proceedings of Transactions on Visualization and Computer Graphics. USA: IEEE, 1633 – 1642 (2018)
Monroyr, R., Lutz, S., Chalasani, T., et al.: SalNet360: saliency maps for omnidirectional images with CNN. Signal Process. Image Commun. 69, 26–34 (2017)
Keys, R.G.: Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust Speech Signal Process. 29, 1153 (1981)
Chang, K.-T.: Computation for bilinear interpolation. Introduction to geographic information system-s. 5th ed. New York, NY: McGrawHill, Print (2009)
Martin, D., Serrano, A., and Masia, B.: Panoramic convolutions for 360 single-image saliency prediction //Procee-dings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2020)
Chen, X., Fei, Q., Shi, G.: Bottom-up visual saliency estimation with deep autoencoder-based spars-e reconstruction. IEEE Transactions Neural Netw. Learn. Syst. 27(6), 1227–1240 (2016)
Xun, H., Chengyao, S., Xavier, B., et al.: Silicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision 262–270 (2015)
Kruthiventi, S.S., Ayush, K., Babu, R.V.: Deepfix: a fully convolutional neural network for predicting human eye fixations. IEEE Transactions Image Process. 26(9), 4446–4456 (2017)
Junting, P., Cristian, C. F., Kevin, McG, et al.: Salgan: Visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081(2017) (2017)
Junting, P., Elisa, S., Xavier, G. N., et al.: Shallow and deep convolutional networks for saliency pr-ediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.598–606 (2016)
University of Nantes, Technicolor, Salient360!: Visual attention modeling for 360 images grand challe-nge. In: IEEE International Conference on Multimedia and Expo, ICME (2017)
Pan, J., Ferrer, C.C., McGuinness, K., O'Connor, N.E., Torres, J., Sayrol, E. and Giro-i-Nieto, X.: Salgan: visual saliencyreduction with general advertising networks. ArXiv preprint arXiv: 1701.01081 (2017)
Ziheng, Z., Yanyu, X., Jingyi, Y, et al.: Saliency detection in 360 videos. In procedures of the European Conference on computer vision (ECCV). 488–503 (2018)
Ronneberger, O., Fischer, P., and Brox, T.: U-net: Convolutionalnetworks for biomedical image segmentation. In International Conference onMedical image computing and computer-assisted intervention Springer 234–241 (2015)
Haoran, L., et al.: SalGCN: saliency prediction for 360-degree images based on spherical graph convolutional networks. Proceedings of the 28th ACM International Conference on Multimedia (2020)
Yuchuan, S., Grauman, K.: Kernel transformer networks forcompact spherical convolution. In: Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 9442–9451 (2019)
Martin, D., Serrano, A., & Masia, B. Panoramic convolutions for 360◦ single-image saliency prediction. In: CVPR workshop on computer vision for augmented and virtual reality (2020)
Zhu, Y., Zhai, G., Min, X.: The prediction of head and eye movement for 360 degree images. Signal Process. Image Commun. 69, 15–25 (2018)
Cohen, T. S., Geiger, M., Köhler, J., & Welling, M.: Spherical cnns. arXiv preprint arXiv:1801.10130 (2018)
Azaza, A., Douik, A.: Deep CNN features for visual saliency estimation 2018 15th International Multi-Conference on Systems Signals & Devices (SSD) (2018)
Li, X., et al.: A combined loss-based multiscale fully convolutional network for high-resolution remote sensing image change detection” IEEE Geoscience and Remote Sensing Letters (2021)
Cheng, H. T., et al.: Cube padding for weakly-supervised saliency prediction in 360{\\deg Videos.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition IEEE (2018)
Acknowledgements
This work was supported partially by the National Natural Science Foundation of China under Grant U19A2063, partially by the Jilin Provincial Science & Technology Development Program of China under Grant 20190302113GX.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, R., Chen, C., Zhang, J. et al. 360-degree visual saliency detection based on fast-mapped convolution and adaptive equator-bias perception. Vis Comput 39, 1163–1180 (2023). https://doi.org/10.1007/s00371-021-02395-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02395-w