ABSTRACT
Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture multi-scale cross-modal fusion features, this paper proposes a novel Multi-stage and Multi-Scale Fusion Network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the mechanism of visual color stage doctrine in human visual system, the proposed CMFM aims to explore the useful and important feature representations in feature response stage, and effectively integrate them into available cross-modal fusion features in adversarial combination stage. Moreover, the proposed BMD learns the combination of cross-modal fusion features from multiple levels to capture both local and global information of salient objects and further reasonably boost the performance of the proposed method. Comprehensive experiments demonstrate that the proposed method can achieve consistently superior performance over the other 14 state-of-the-art methods on six popular RGB-D datasets when evaluated by 8 different metrics.
Supplemental Material
- Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.Google ScholarCross Ref
- Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing 24, 12 (2015), 5706--5722.Google Scholar
- Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.Google ScholarCross Ref
- Hao Chen and Youfu Li. 2019. Three-stream attention-aware network for RGBD salient object detection. IEEE Transactions on Image Processing 28, 6 (2019), 2825--2835.Google ScholarCross Ref
- Hao Chen, Youfu Li, and Dan Su. 2019. Multi-modal fusion network with multiscale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition 86 (2019), 376--385.Google ScholarCross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology 29, 10 (2019), 2941--2959.Google ScholarCross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics 50, 8 (2020), 3627--3639.Google ScholarCross Ref
- Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Processing Letters 23, 6 (2016), 819--823.Google ScholarCross Ref
- Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment Measure for Binary Foreground Map Evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 698--704.Google ScholarCross Ref
- Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.Google ScholarCross Ref
- Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, and Ming-Ming Cheng. 2019. Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781 (2019).Google Scholar
- David Feng, Nick Barnes, Shaodi You, and Chris McCarthy. 2016. Local background enclosure for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2343--2350.Google ScholarCross Ref
- Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2020. Res2Net: A New Multi-scale Backbone Architecture. IEEE TPAMI (2020).Google Scholar
- Chenlei Guo and Liming Zhang. 2009. A novel multiresolution spatio temporal saliency detection model and its applications in image and video compression. IEEE transactions on image processing 19, 1 (2009), 185--198.Google Scholar
- Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient object detection for RGB-D image via saliency evolution. In 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.Google ScholarCross Ref
- Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xuelong Li. 2017. CNNsbased RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE transactions on cybernetics 48, 11 (2017), 3171--3183.Google Scholar
- Junwei Han, King Ngi Ngan, Mingjing Li, and Hong-Jiang Zhang. 2005. Unsupervised extraction of visual attention objects in color images. IEEE transactions on circuits and systems for video technology 16, 1 (2005), 141--145.Google Scholar
- Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606.Google ScholarDigital Library
- Koteswar Rao Jerripothula, Jianfei Cai, and Junsong Yuan. 2016. Image cosegmentation via saliency co-fusion. IEEE Transactions on Multimedia 18, 9 (2016), 1896--1909.Google ScholarDigital Library
- Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia 20, 8 (2017), 2035--2048.Google ScholarCross Ref
- Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.Google ScholarCross Ref
- Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, and Sam Kwong. 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 57, 11 (2019), 9156--9166.Google ScholarCross Ref
- Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, and Jingyi Yu. 2014. Saliency detection on light field. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2806--2813.Google ScholarDigital Library
- Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917--3926.Google ScholarCross Ref
- Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3089--3098.Google ScholarCross Ref
- Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363 (2019), 46--57.Google ScholarDigital Library
- Georg Elias Müller. 1930. Über die Farbenempfindungen. Joh. Ambr. Barth.Google Scholar
- Georg Elias Muller et al. 1924. Darstellung und Erklarung der verschiedenen Typen der Farbenblindheit nebst Erorterung der Funktion des Stabchenapparates sowie des Farbensinns der Beinen und der Fische. (1924).Google Scholar
- Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461.Google Scholar
- Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.Google ScholarCross Ref
- Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740.Google ScholarCross Ref
- Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.Google ScholarCross Ref
- Liangqiong Qu, Shengfeng He, Jiawei Zhang, Jiandong Tian, Yandong Tang, and Qingxiong Yang. 2017. RGBD salient object detection via deep fusion. IEEE Transactions on Image Processing 26, 5 (2017), 2274--2285.Google ScholarDigital Library
- Jianqiang Ren, Xiaojin Gong, Lu Yu, Wenhui Zhou, and Michael Ying Yang. 2015. Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 25--32.Google ScholarCross Ref
- Abhijit Guha Roy, Nassir Navab, and Christian Wachinger. 2018. Concurrent spatial and channel 'squeeze & excitation'in fully convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 421--429.Google Scholar
- Hangke Song, Zhi Liu, Huan Du, Guangling Sun, Olivier Le Meur, and Tongwei Ren. 2017. Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing 26, 9 (2017), 4204--4216.Google ScholarDigital Library
- Ningning Wang and Xiaojin Gong. 2019. Adaptive Fusion for RGB-D Salient Object Detection. IEEE Access 7 (2019), 55277--55284.Google ScholarCross Ref
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarCross Ref
- Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.Google ScholarCross Ref
- Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. Pdnet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.Google ScholarCross Ref
- Chunbiao Zhu and Ge Li. 2017. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3008--3014Google Scholar
Index Terms
- MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection
Recommendations
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
MM '21: Proceedings of the 29th ACM International Conference on MultimediaSalient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are ...
Salient object detection in RGB-D image based on saliency fusion and propagation
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and ServiceAutomatic detection of salient objects in images attracts much research attention for its usage in numerous multimedia applications. In this paper, we propose a saliency fusion and propagation strategy based salient object detection method for RGB-D ...
Multi-scale iterative refinement network for RGB-D salient object detection
AbstractThe extensive research leveraging RGB-D information has been exploited in salient object detection. However, salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels. Meanwhile, ...
Highlights- We define an iterative refinement schema to generate coarse-to-fine salient maps, and employ both bottom-up and top-down feature propagation pathways to improve the network model.
- We propose both coarse and fine-grained refinement ...
Comments