research-article

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Authors:
Guibiao Liao

Peking University, Shenzhen, China

Peking University, Shenzhen, China
View Profile

,
Wei Gao

Peking University, Shenzhen, China

Peking University, Shenzhen, China
View Profile

,
Qiuping Jiang

Ningbo University, Ningbo, China

Ningbo University, Ningbo, China
View Profile

,
Ronggang Wang

Peking University, Shenzhen, China

Peking University, Shenzhen, China
View Profile

,
Ge Li

Peking University, Shenzhen, China

Peking University, Shenzhen, China
View Profile

MM '20: Proceedings of the 28th ACM International Conference on MultimediaOctober 2020Pages 2436–2444https://doi.org/10.1145/3394171.3413523

Published:12 October 2020Publication History

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Pages 2436–2444

ABSTRACT

Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture multi-scale cross-modal fusion features, this paper proposes a novel Multi-stage and Multi-Scale Fusion Network (MMNet), which consists of a cross-modal multi-stage fusion module (CMFM) and a bi-directional multi-scale decoder (BMD). Similar to the mechanism of visual color stage doctrine in human visual system, the proposed CMFM aims to explore the useful and important feature representations in feature response stage, and effectively integrate them into available cross-modal fusion features in adversarial combination stage. Moreover, the proposed BMD learns the combination of cross-modal fusion features from multiple levels to capture both local and global information of salient objects and further reasonably boost the performance of the proposed method. Comprehensive experiments demonstrate that the proposed method can achieve consistently superior performance over the other 14 state-of-the-art methods on six popular RGB-D datasets when evaluated by 8 different metrics.

Supplemental Material

3394171.3413523.mp4

mp4

14.4 MB

Download

References

Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.Google ScholarCross Ref
Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing 24, 12 (2015), 5706--5722.Google Scholar
Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.Google ScholarCross Ref
Hao Chen and Youfu Li. 2019. Three-stream attention-aware network for RGBD salient object detection. IEEE Transactions on Image Processing 28, 6 (2019), 2825--2835.Google ScholarCross Ref
Hao Chen, Youfu Li, and Dan Su. 2019. Multi-modal fusion network with multiscale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition 86 (2019), 376--385.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Ming-Ming Cheng, Weisi Lin, and Qingming Huang. 2019. Review of Visual Saliency Detection with Comprehensive Information. IEEE Transactions on Circuits and Systems for Video Technology 29, 10 (2019), 2941--2959.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. 2020. Going from RGB to RGBD saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics 50, 8 (2020), 3627--3639.Google ScholarCross Ref
Runmin Cong, Jianjun Lei, Changqing Zhang, Qingming Huang, Xiaochun Cao, and Chunping Hou. 2016. Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Processing Letters 23, 6 (2016), 819--823.Google ScholarCross Ref
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment Measure for Binary Foreground Map Evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 698--704.Google ScholarCross Ref
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.Google ScholarCross Ref
Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, and Ming-Ming Cheng. 2019. Rethinking RGB-D salient object detection: Models, datasets, and large-scale benchmarks. arXiv preprint arXiv:1907.06781 (2019).Google Scholar
David Feng, Nick Barnes, Shaodi You, and Chris McCarthy. 2016. Local background enclosure for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2343--2350.Google ScholarCross Ref
Shang-Hua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. 2020. Res2Net: A New Multi-scale Backbone Architecture. IEEE TPAMI (2020).Google Scholar
Chenlei Guo and Liming Zhang. 2009. A novel multiresolution spatio temporal saliency detection model and its applications in image and video compression. IEEE transactions on image processing 19, 1 (2009), 185--198.Google Scholar
Jingfan Guo, Tongwei Ren, and Jia Bei. 2016. Salient object detection for RGB-D image via saliency evolution. In 2016 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.Google ScholarCross Ref
Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xuelong Li. 2017. CNNsbased RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE transactions on cybernetics 48, 11 (2017), 3171--3183.Google Scholar
Junwei Han, King Ngi Ngan, Mingjing Li, and Hong-Jiang Zhang. 2005. Unsupervised extraction of visual attention objects in color images. IEEE transactions on circuits and systems for video technology 16, 1 (2005), 141--145.Google Scholar
Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606.Google ScholarDigital Library
Koteswar Rao Jerripothula, Jianfei Cai, and Junsong Yuan. 2016. Image cosegmentation via saliency co-fusion. IEEE Transactions on Multimedia 18, 9 (2016), 1896--1909.Google ScholarDigital Library
Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia 20, 8 (2017), 2035--2048.Google ScholarCross Ref
Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.Google ScholarCross Ref
Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, and Sam Kwong. 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 57, 11 (2019), 9156--9166.Google ScholarCross Ref
Nianyi Li, Jinwei Ye, Yu Ji, Haibin Ling, and Jingyi Yu. 2014. Saliency detection on light field. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2806--2813.Google ScholarDigital Library
Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, and Jianmin Jiang. 2019. A simple pooling-based design for real-time salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3917--3926.Google ScholarCross Ref
Nian Liu, Junwei Han, and Ming-Hsuan Yang. 2018. Picanet: Learning pixel-wise contextual attention for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3089--3098.Google ScholarCross Ref
Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing 363 (2019), 46--57.Google ScholarDigital Library
Georg Elias Müller. 1930. Über die Farbenempfindungen. Joh. Ambr. Barth.Google Scholar
Georg Elias Muller et al. 1924. Darstellung und Erklarung der verschiedenen Typen der Farbenblindheit nebst Erorterung der Funktion des Stabchenapparates sowie des Farbensinns der Beinen und der Fische. (1924).Google Scholar
Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461.Google Scholar
Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.Google ScholarCross Ref
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740.Google ScholarCross Ref
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.Google ScholarCross Ref
Liangqiong Qu, Shengfeng He, Jiawei Zhang, Jiandong Tian, Yandong Tang, and Qingxiong Yang. 2017. RGBD salient object detection via deep fusion. IEEE Transactions on Image Processing 26, 5 (2017), 2274--2285.Google ScholarDigital Library
Jianqiang Ren, Xiaojin Gong, Lu Yu, Wenhui Zhou, and Michael Ying Yang. 2015. Exploiting global priors for RGB-D saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 25--32.Google ScholarCross Ref
Abhijit Guha Roy, Nassir Navab, and Christian Wachinger. 2018. Concurrent spatial and channel 'squeeze & excitation'in fully convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 421--429.Google Scholar
Hangke Song, Zhi Liu, Huan Du, Guangling Sun, Olivier Le Meur, and Tongwei Ren. 2017. Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Transactions on Image Processing 26, 9 (2017), 4204--4216.Google ScholarDigital Library
Ningning Wang and Xiaojin Gong. 2019. Adaptive Fusion for RGB-D Salient Object Detection. IEEE Access 7 (2019), 55277--55284.Google ScholarCross Ref
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.Google ScholarCross Ref
Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.Google ScholarCross Ref
Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. Pdnet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.Google ScholarCross Ref
Chunbiao Zhu and Ge Li. 2017. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3008--3014Google Scholar

Index Terms

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections

Recommendations

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are ...
Read More
Salient object detection in RGB-D image based on saliency fusion and propagation
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service

Automatic detection of salient objects in images attracts much research attention for its usage in numerous multimedia applications. In this paper, we propose a saliency fusion and propagation strategy based salient object detection method for RGB-D ...
Read More
Multi-scale iterative refinement network for RGB-D salient object detection
Abstract
The extensive research leveraging RGB-D information has been exploited in salient object detection. However, salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels. Meanwhile, ...
Highlights
- We define an iterative refinement schema to generate coarse-to-fine salient maps, and employ both bottom-up and top-down feature propagation pathways to improve the network model.
- We propose both coarse and fine-grained refinement ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
October 2020
4889 pages
ISBN:9781450379885
DOI:10.1145/3394171
General Chairs:
Chang Wen Chen
Chinese University of Hong Kong, Shenzhen, China
,
Rita Cucchiara
UNIMORE, Italy
,
Xian-Sheng Hua
Alibaba Group, China
,
Program Chairs:
Guo-Jun Qi
Futurewei Technologies, USA
,
Elisa Ricci
UNITN & Fondazione Bruno Kessler, Italy
,
Zhengyou Zhang
Tencent, China
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
adversarial combination
cross-modal guided attention
rgb-d image
salient object detection
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 591
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

MM '20: Proceedings of the 28th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Salient object detection in RGB-D image based on saliency fusion and propagation

Multi-scale iterative refinement network for RGB-D salient object detection