skip to main content
10.1145/3394171.3413523acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Published:12 October 2020Publication History

ABSTRACT

Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture multi-scale cross-modal fusion features, this paper proposes a novel Multi-stage and Multi-Scale Fusion Network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the mechanism of visual color stage doctrine in human visual system, the proposed CMFM aims to explore the useful and important feature representations in feature response stage, and effectively integrate them into available cross-modal fusion features in adversarial combination stage. Moreover, the proposed BMD learns the combination of cross-modal fusion features from multiple levels to capture both local and global information of salient objects and further reasonably boost the performance of the proposed method. Comprehensive experiments demonstrate that the proposed method can achieve consistently superior performance over the other 14 state-of-the-art methods on six popular RGB-D datasets when evaluated by 8 different metrics.

Skip Supplemental Material Section

Supplemental Material

3394171.3413523.mp4

mp4

14.4 MB

References

  1. Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.Google ScholarGoogle ScholarCross RefCross Ref
  2. Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing 24, 12 (2015), 5706--5722.Google ScholarGoogle Scholar
  3. Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.Google ScholarGoogle ScholarCross RefCross Ref
  4. Hao Chen and Youfu Li. 2019. Three-stream attention-aware network for RGBD salient object detection. IEEE Transactions on Image Processing 28, 6 (2019), 2825--2835.Google ScholarGoogle ScholarCross RefCross Ref
  5. Hao Chen, Youfu Li, and Dan Su. 2019. Multi-modal fusion network with multiscale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition 86 (2019), 376--385.Google ScholarGoogle ScholarCross RefCross Ref
  6. Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology 29, 10 (2019), 2941--2959.Google ScholarGoogle ScholarCross RefCross Ref
  7. Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics 50, 8 (2020), 3627--3639.Google ScholarGoogle ScholarCross RefCross Ref
  8. Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Processing Letters 23, 6 (2016), 819--823.Google ScholarGoogle ScholarCross RefCross Ref
  9. Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment Measure for Binary Foreground Map Evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 698--704.Google ScholarGoogle ScholarCross RefCross Ref
  10. Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.Google ScholarGoogle ScholarCross RefCross Ref
  11. Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, and Ming-Ming Cheng. 2019. Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781 (2019).Google ScholarGoogle Scholar
  12. David Feng, Nick Barnes, Shaodi You, and Chris McCarthy. 2016. Local background enclosure for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2343--2350.Google ScholarGoogle ScholarCross RefCross Ref
  13. Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2020. Res2Net: A New Multi-scale Backbone Architecture. IEEE TPAMI (2020).Google ScholarGoogle Scholar
  14. Chenlei Guo and Liming Zhang. 2009. A novel multiresolution spatio temporal saliency detection model and its applications in image and video compression. IEEE transactions on image processing 19, 1 (2009), 185--198.Google ScholarGoogle Scholar
  15. Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient object detection for RGB-D image via saliency evolution. In 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  16. Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xuelong Li. 2017. CNNsbased RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE transactions on cybernetics 48, 11 (2017), 3171--3183.Google ScholarGoogle Scholar
  17. Junwei Han, King Ngi Ngan, Mingjing Li, and Hong-Jiang Zhang. 2005. Unsupervised extraction of visual attention objects in color images. IEEE transactions on circuits and systems for video technology 16, 1 (2005), 141--145.Google ScholarGoogle Scholar
  18. Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Koteswar Rao Jerripothula, Jianfei Cai, and Junsong Yuan. 2016. Image cosegmentation via saliency co-fusion. IEEE Transactions on Multimedia 18, 9 (2016), 1896--1909.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia 20, 8 (2017), 2035--2048.Google ScholarGoogle ScholarCross RefCross Ref
  21. Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.Google ScholarGoogle ScholarCross RefCross Ref
  22. Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, and Sam Kwong. 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 57, 11 (2019), 9156--9166.Google ScholarGoogle ScholarCross RefCross Ref
  23. Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, and Jingyi Yu. 2014. Saliency detection on light field. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2806--2813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917--3926.Google ScholarGoogle ScholarCross RefCross Ref
  25. Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3089--3098.Google ScholarGoogle ScholarCross RefCross Ref
  26. Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363 (2019), 46--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Georg Elias Müller. 1930. Über die Farbenempfindungen. Joh. Ambr. Barth.Google ScholarGoogle Scholar
  28. Georg Elias Muller et al. 1924. Darstellung und Erklarung der verschiedenen Typen der Farbenblindheit nebst Erorterung der Funktion des Stabchenapparates sowie des Farbensinns der Beinen und der Fische. (1924).Google ScholarGoogle Scholar
  29. Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461.Google ScholarGoogle Scholar
  30. Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.Google ScholarGoogle ScholarCross RefCross Ref
  31. Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.Google ScholarGoogle ScholarCross RefCross Ref
  33. Liangqiong Qu, Shengfeng He, Jiawei Zhang, Jiandong Tian, Yandong Tang, and Qingxiong Yang. 2017. RGBD salient object detection via deep fusion. IEEE Transactions on Image Processing 26, 5 (2017), 2274--2285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jianqiang Ren, Xiaojin Gong, Lu Yu, Wenhui Zhou, and Michael Ying Yang. 2015. Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 25--32.Google ScholarGoogle ScholarCross RefCross Ref
  35. Abhijit Guha Roy, Nassir Navab, and Christian Wachinger. 2018. Concurrent spatial and channel 'squeeze & excitation'in fully convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 421--429.Google ScholarGoogle Scholar
  36. Hangke Song, Zhi Liu, Huan Du, Guangling Sun, Olivier Le Meur, and Tongwei Ren. 2017. Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing 26, 9 (2017), 4204--4216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ningning Wang and Xiaojin Gong. 2019. Adaptive Fusion for RGB-D Salient Object Detection. IEEE Access 7 (2019), 55277--55284.Google ScholarGoogle ScholarCross RefCross Ref
  38. Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.Google ScholarGoogle ScholarCross RefCross Ref
  40. Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. Pdnet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.Google ScholarGoogle ScholarCross RefCross Ref
  41. Chunbiao Zhu and Ge Li. 2017. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3008--3014Google ScholarGoogle Scholar

Index Terms

  1. MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader