ABSTRACT
RGB-D Salient Object Detection (RGB-D SOD) is a pixel-level dense prediction task that can highlight the prominent object in the scene by combining color information and depth constraints. Attention mechanisms have been widely employed in SOD due to their ability to capture important cues. However, most existing attentions (\textite.g., spatial attention, channel attention, self-attention) mainly exploit the pixel-level attention maps, ignoring the region properties of salient objects. To remedy this issue, we propose a progressive saliency iteration network (PSINet) with a region-wise saliency attention to improve the regional integrity of salient objects in an iterative manner. Specifically, two-stream Swin Transformers are first employed to extract RGB and depth features. Second, a multi-modality alternate and inverse module (AIM) is designed to extract complementary features from RGB-D images in an interleaved manner, which breaks down the barriers of inconsistency existing in the cross-modal data and also sufficiently captures the complementarity. Third, a triple progressive iteration decoder (TPID) is proposed to optimize the salient objects, where a coarse saliency map, generated by integrating multi-scale features with a U-Net, is viewed as region-wise attention maps to construct a region-wise saliency attention module(RSAM), which can emphasize the prominent region of features. Finally, the regional integrity of salient objects can be gradually optimized from coarse to fine by iterating the above steps on TPID. Quantitative and qualitative experiments demonstrate that the proposed model performs favorably against 19 state-of-the-art (SOTA) saliency detectors on five benchmark RGB-D SOD datasets.
Supplemental Material
- Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. Frequency-tuned salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1597--1604, 2009.Google ScholarCross Ref
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.Google Scholar
- Chenglizhao Chen, Jipeng Wei, Chong Peng, and Hong Qin. Depth-quality-aware salient object detection. IEEE Transactions on Image Processing, 30:2350--2363, 2021.Google ScholarDigital Library
- Hao Chen, Youfu Li, and Dan Su. Discriminative cross-modal transfer learning and densely cross-level feedback fusion for rgb-d salient object detection. IEEE Transactions on Cybernetics, 50(11):4808--4820, 2019.Google ScholarCross Ref
- Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. RGB-D salient object detection via 3d convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1063--1071, 2021.Google ScholarCross Ref
- Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. Dpanet: Depth potentiality-aware gated attention network for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:7012--7024, 2020.Google ScholarDigital Library
- Yupeng Cheng, Huazhu Fu, Xingxing Wei, Jiangjian Xiao, and Xiaochun Cao. Depth enhanced saliency detection method. In Proceedings of International Conference on Internet Multimedia Computing and Service, pages 23--27, 2014.Google ScholarDigital Library
- Runmin Cong, Jianjun Lei, Huazhu Fu, Junhui Hou, Qingming Huang, and Sam Kwong. Going from rgb to rgbd saliency: A depth-guided transformation model. IEEE Transactions on Cybernetics, 50(8):3627--3639, 2019.Google ScholarCross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Qingming Huang, Xiaochun Cao, and Nam Ling. Hscs: Hierarchical sparsity based co-saliency detection for rgbd images. IEEE Transactions on Multimedia, 21(7):1660--1671, 2018.Google ScholarCross Ref
- Runmin Cong, Jianjun Lei, Huazhu Fu, Weisi Lin, Qingming Huang, Xiaochun Cao, and Chunping Hou. An iterative co-saliency framework for rgbd images. IEEE Transactions on Cybernetics, 49(1):233--246, 2017.Google ScholarCross Ref
- Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 698--704, 2018.Google ScholarCross Ref
- Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2075-- 2089, 2020.Google ScholarCross Ref
- Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. Bbs-net: Rgb-d salient object detection with a bifurcated backbone strategy network. In Proceedings of the European Conference on Computer Vision, pages 275--292, 2020.Google ScholarDigital Library
- Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3146--3154, 2019.Google ScholarCross Ref
- Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. Jl-dcf: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3052--3062, 2020.Google ScholarCross Ref
- Keren Fu, Deng-Ping Fan, Ge-Peng Ji, Qijun Zhao, Jianbing Shen, and Ce Zhu. Siamese network for rgb-d salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.Google ScholarCross Ref
- Lina Gao, Bing Liu, Ping Fu, Mingzhu Xu, and Junbao Li. Visual tracking via dynamic saliency discriminative correlation filter. Applied Intelligence, pages 1--15, 2021.Google Scholar
- Wei Gao, Guibiao Liao, Siwei Ma, Ge Li, Yongsheng Liang, and Weisi Lin. Unified information fusion network for multi-modal rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(4):2091--2106, 2022.Google ScholarCross Ref
- Junwei Han, Hao Chen, Nian Liu, Chenggang Yan, and Xuelong Li. Cnns-based rgb-d saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics, 48(11):3171--3183, 2018.Google ScholarCross Ref
- Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.Google Scholar
- Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.Google Scholar
- Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7132--7141, 2018.Google ScholarCross Ref
- Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, et al. Calibrated rgb-d salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9471--9481, 2021.Google ScholarCross Ref
- Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. Accurate rgb-d salient object detection via collaborative learning. In Proceedings of the European Conference on Computer Vision, pages 52--69, 2020.Google ScholarDigital Library
- Limai Jiang, Hui Fan, and Jinjiang Li. A multi-focus image fusion method based on attention mechanism and supervised learning. Applied Intelligence, 52(1):339--357, 2022.Google ScholarDigital Library
- Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. Depth saliency based on anisotropic center-surround difference. In Proceedings of the IEEE International Conference on Image Processing, pages 1115--1119, 2014.Google ScholarCross Ref
- Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations, 2015.Google Scholar
- Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. Asif-net: Attention steered interweave fusion network for rgb-d salient object detection. IEEE Transactions on Cybernetics, 51(1):88--100, 2020.Google ScholarCross Ref
- Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. Rgb-d salient object detection with cross-modality modulation and selection. In Proceedings of the European Conference on Computer Vision, pages 225--241, 2020.Google ScholarDigital Library
- Gongyang Li, Zhi Liu, Minyu Chen, Zhen Bai, Weisi Lin, and Haibin Ling. Hierarchical alternate interaction network for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:3528--3542, 2021.Google ScholarDigital Library
- Gongyang Li, Zhi Liu, and Haibin Ling. Icnet: Information conversion network for rgb-d based salient object detection. IEEE Transactions on Image Processing, 29:4873--4884, 2020.Google ScholarCross Ref
- Gongyang Li, Zhi Liu, Linwei Ye, Yang Wang, and Haibin Ling. Cross-modal weighting network for rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision, pages 665--681, 2020.Google ScholarDigital Library
- Nian Liu, Ni Zhang, Kaiyuan Wan, Ling Shao, and Junwei Han. Visual saliency transformer. In Proceedings of the IEEE International Conference on Computer Vision, pages 4722--4732, 2021.Google ScholarCross Ref
- Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE International Conference on Computer Vision, pages 10012--10022, 2021.Google ScholarCross Ref
- Zhengyi Liu, Yacheng Tan, Qian He, and Yun Xiao. Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, pages 1--1, 2021.Google Scholar
- Zhengyi Liu, Kaixun Wang, Hao Dong, and Yuan Wang. A cross-modal edgeguided salient object detection for rgb-d image. Neurocomputing, 454:168--177, 2021.Google ScholarCross Ref
- Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, and Bin Tang. Tritransnet: Rgb-d salient object detection with a triplet transformer embedding network. In Proceedings of the ACM International Conference on Multimedia, pages 4481--4490, 2021.Google Scholar
- Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11976--11986, 2022.Google ScholarCross Ref
- Ran Margolin, Lihi Zelnik-Manor, and Ayellet Tal. How to evaluate foreground maps? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 248--255, 2014.Google ScholarDigital Library
- Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning, 2010.Google ScholarDigital Library
- Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. Leveraging stereopsis for saliency analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 454--461, 2012.Google Scholar
- Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. Hierarchical dynamic filtering network for rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision, pages 235--252, 2020.Google ScholarDigital Library
- Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. Rgbd salient object detection: a benchmark and algorithms. In Proceedings of the European Conference on Computer Vision, pages 92--109, 2014.Google ScholarCross Ref
- Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. Saliency filters: Contrast based filtering for salient region detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 733--740, 2012.Google Scholar
- Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. A2dele: Adaptive and attentive depth distiller for efficient rgb-d salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9060--9069, 2020.Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, pages 234--241, 2015.Google ScholarCross Ref
- Peng Sun, Wenhu Zhang, Huanyu Wang, Songyuan Li, and Xi Li. Deep rgb-d saliency detection with depth-sensitive attention and automatic multi-modal fusion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1407--1417, 2021.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, 30, 2017Google Scholar
- Haoxiang Wang, Zhihui Li, Yang Li, Brij B Gupta, and Chang Choi. Visual saliency guided complex image retrieval. Pattern Recognition Letters, 130:64--72, 2020.Google ScholarDigital Library
- Xiaoqiang Wang, Lei Zhu, Siliang Tang, Huazhu Fu, Ping Li, Fei Wu, Yi Yang, and Yueting Zhuang. Boosting rgb-d saliency detection by leveraging unlabeled rgb images. IEEE Transactions on Image Processing, 31:1107--1119, 2022.Google ScholarDigital Library
- Xuehao Wang, Shuai Li, Chenglizhao Chen, Yuming Fang, Aimin Hao, and Hong Qin. Data-level recombination and lightweight fusion scheme for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:458--471, 2020.Google ScholarDigital Library
- Xuehao Wang, Shuai Li, Chenglizhao Chen, Aimin Hao, and Hong Qin. Depth quality-aware selective saliency fusion for rgb-d image salient object detection. Neurocomputing, 432:44--56, 2021.Google ScholarCross Ref
- Qi Bi Chuan Guo Jie Liu Li Cheng Wei Ji, Jingjing Li. Promoting saliency from depth: Deep unsupervised rgb-d saliency detection. In Proceedings of the International Conference on Machine Learning, 2022.Google Scholar
- Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, pages 3--19, 2018.Google ScholarDigital Library
- Jingyu Wu, Fuming Sun, Rui Xu, Jie Meng, and Fasheng Wang. Aggregate interactive learning for rgb-d salient object detection. Expert Systems with Applications, 195:116614, 2022.Google ScholarDigital Library
- Chenxing Xia, Xiuju Gao, Xianjin Fang, Kuan-Ching Li, Shuzhi Su, and Haitao Zhang. Rlp-agmc: Robust label propagation for saliency detection based on an adaptive graph with multiview connections. Signal Processing: Image Communication, 98:116372, 2021.Google ScholarDigital Library
- Chenxing Xia, Xiuju Gao, Kuan-Ching Li, Qianjin Zhao, and Shunxiang Zhang. Salient object detection based on distribution-edge guidance and iterative bayesian optimization. Applied Intelligence, 50(10):2977--2990, 2020.Google ScholarDigital Library
- Chenxing Xia, Hanling Zhang, Xiuju Gao, and Keqin Li. Exploiting background divergence and foreground compactness for salient object detection. Neurocomputing, 383:194--211, 2020.Google ScholarDigital Library
- Chenxing Xia, Hanling Zhang, Xiuju Gao, and Keqin Li. Exploiting background divergence and foreground compactness for salient object detection. Neurocomputing, 383:194--211, 2020.Google ScholarDigital Library
- Yang Yang, Qi Qin, Yongjiang Luo, Yi Liu, Qiang Zhang, and Jungong Han. Bidirectional progressive guidance network for rgb-d salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, pages 1--1, 2022.Google ScholarDigital Library
- Yu Zeng, Yunzhi Zhuge, Huchuan Lu, and Lihe Zhang. Joint learning of saliency detection and weakly supervised semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 7223--7233, 2019.Google Scholar
- Chen Zhang, Runmin Cong, Qinwei Lin, Lin Ma, Feng Li, Yao Zhao, and Sam Kwong. Cross-modality discrepant interaction network for rgb-d salient object detection. In Proceedings of the ACM International Conference on Multimedia, pages 2094--2102, 2021.Google ScholarDigital Library
- Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, and Nick Barnes. Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8582--8591, 2020.Google ScholarCross Ref
- Wenbo Zhang, Ge-Peng Ji, Zhuo Wang, Keren Fu, and Qijun Zhao. Depth qualityinspired feature manipulation for efficient rgb-d salient object detection. In Proceedings of the ACM International Conference on Multimedia, pages 731--740, 2021.Google Scholar
- Wenbo Zhang, Yao Jiang, Keren Fu, and Qijun Zhao. Bts-net: Bi-directional transfer-and-selection network for rgb-d salient object detection. In Proceedings of the IEEE International Conference on Multimedia and Expo, pages 1--6, 2021.Google ScholarCross Ref
- Zhao Zhang, Zheng Lin, Jun Xu, Wen-Da Jin, Shao-Ping Lu, and Deng-Ping Fan. Bilateral attention network for rgb-d salient object detection. IEEE Transactions on Image Processing, 30:1949--1961, 2021.Google ScholarDigital Library
- Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. Contrast prior and fluid pyramid integration for rgbd salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3927--3936, 2019.Google ScholarCross Ref
- Jiawei Zhao, Yifan Zhao, Jia Li, and Xiaowu Chen. Is depth really necessary for salient object detection? In Proceedings of the ACM International Conference on Multimedia, pages 1745--1754, 2020.Google ScholarDigital Library
- Ting Zhao and Xiangqian Wu. Pyramid feature attention network for saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3085--3094, 2019.Google ScholarCross Ref
- Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. A single stream network for robust and real-time rgb-d salient object detection. In Proceedings of the European Conference on Computer Vision, pages 646--662, 2020.Google ScholarDigital Library
- Wujie Zhou, Yun Zhu, Jingsheng Lei, Jian Wan, and Lu Yu. Ccafnet: Crossflow and cross-scale adaptive fusion network for detecting salient objects in rgb-d images. IEEE Transactions on Multimedia, 24:2192--2204, 2021.Google ScholarCross Ref
- Chunbiao Zhu and Ge Li. A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 3008--3014, 2017Google Scholar
Index Terms
- PSINet: Progressive Saliency Iteration Network for RGB-D Salient Object Detection
Recommendations
Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection
MM '21: Proceedings of the 29th ACM International Conference on MultimediaThe popularity and promotion of depth maps have brought new vigor and vitality into salient object detection (SOD), and a mass of RGB-D SOD algorithms have been proposed, mainly concentrating on how to better integrate cross-modality features from RGB ...
RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss
AbstractRGB-D salient object detection aims at identifying attractive objects in a scene by combining the color image and depth map. However, due to the differences between RGB-D image pairs, it is a key issue to utilize cross-modal data ...
Residual attentive feature learning network for salient object detection
AbstractSalient object detection is one of the basic challenging problems in the area of computer vision. Traditional saliency models usually utilize handcrafted features and various prior cues to locate and segment salient objects from ...
Comments