CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD

XingZhao Jia¹,
WenXiu Zhao¹,
YuMei Wang¹,
ChangLei DongYe ORCID: orcid.org/0000-0003-4018-293X¹ &
…
YanJun Peng¹

218 Accesses
Explore all metrics

Abstract

The success of vision transformer demonstrates that the transformer structure is also suitable for various vision tasks, including high-level classification tasks and low-level dense prediction tasks. Salient object detection (SOD) is a pixel-level dense prediction task that simulates the most salient objects in human visual recognition scenarios. In recent years, depth images have been widely used for salient object detection. Compared with RGB SOD, the key point of RGB-D SOD is the effective fusion of depth information. As RGB-D SOD requires extracting depth features and fusing cross-modal information, additional computation is involved. However, except for lightweight models, most RGB-D SOD methods tend to obtain better prediction maps by consuming more computational resources. We propose a cross-modal dense cooperative fusion net, which provides state-of-the-art performance with less computation and parameters. We take advantage of the ability of the transformer structure to model long sequence dependencies to extract saliency features from RGB images. Since there is less information in the depth image than in the RGB image, it is not necessary to use the same structure in the depth stream. For the sake of reducing parameters and computation, we consider the asymmetric architecture. It is enough to meet our needs that deep features extracted by lightweight MobileV2Net. Our decoder can perform dense cooperative fusion of cross-modal information while decoding features. It can both effectively fuse cross-modal information and save computation. Comprehensive experiments on multiple benchmark datasets for RGB-D SOD show that compared with SOTA methods, our method performs much better with less computation and parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Specificity-preserving RGB-D saliency detection

Article Open access 03 January 2023

An adaptive guidance fusion network for RGB-D salient object detection

Article 30 November 2023

A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availibility

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Ye L, Liu Z, Li L, Shen L, Bai C, Wang Y (2017) Salient object segmentation via effective integration of saliency and objectness. IEEE Trans Multimed 19(8):1742–1756
Article Google Scholar
Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20–33
Article Google Scholar
Wang W, Sun G, Van Gool L (2022) Looking Beyond Single Images for Weakly Supervised Semantic Segmentation Learning. IEEE Trans Pattern Anal Mach Intell
Zhu S, Chang Q, Li Q (2022) Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput Appl 34(10):7955–7974
Article Google Scholar
Wang W, Shen J, Ling H (2018) A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544
Article Google Scholar
Abouelaziz I, Chetouani A, El Hassouni M, Latecki LJ, Cherifi H (2020) 3D visual saliency and convolutional neural network for blind mesh quality assessment. Neural Comput Appl 32:16589–16603
Article Google Scholar
Zhang J, Yuan T, He Y, Wang J (2022) A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking. Neural Comput Appl 34:1–18
Google Scholar
Fu K, Fan DP, Ji GP, Zhao Q, Shen J, Zhu C (2021) Siamese network for RGB-D salient object detection and beyond. IEEE Trans Pattern Anal Mach Intell 44(9):5541–5559
Google Scholar
Zhang J, Fan DP, Dai Y, Anwar S, Saleh F, Aliakbarian S et al (2021) Uncertainty inspired RGB-D saliency detection. IEEE Trans Pattern Anal Mach Intell 44(9):5761–5779
Google Scholar
Zhang M, Fei SX, Liu J, Xu S, Piao Y, Lu H (2020) Asymmetric two-stream architecture for accurate RGB-D saliency detection. In: European Conference on Computer Vision. Springer; pp 374–390
Fan DP, Zhai Y, Borji A, Yang J, Shao L (2020) BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: European conference on computer vision. Springer; pp 275–292
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, et al (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: proceedings of the IEEE/CVF international conference on computer vision; pp 558–567
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Niu Y, Geng Y, Li X, Liu F (2012) Leveraging stereopsis for saliency analysis. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE; pp 454–461
Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. In, (2014) IEEE international conference on image processing (ICIP). IEEE pp 1115–1119
Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE; pp 1–6
Peng H, Li B, Xiong W, Hu W, Ji R (2014) RGBD salient object detection: a benchmark and algorithms. In: European conference on computer vision. Springer, pp 92–109
Cong R, Lei J, Zhang C, Huang Q, Cao X, Hou C (2016) Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process Lett 23(6):819–823
Article Google Scholar
Wang A, Wang M (2017) RGB-D salient object detection via minimum barrier distance transform and saliency fusion. IEEE Signal Process Lett 24(5):663–667
Article Google Scholar
Liang F, Duan L, Ma W, Qiao Y, Cai Z, Qing L (2018) Stereoscopic saliency model using contrast and depth-guided-background prior. Neurocomputing 275:2227–2238
Article Google Scholar
Fan DP, Lin Z, Zhang Z, Zhu M, Cheng MM (2020) Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Trans Neural Netwo Learn Syst 32(5):2075–2089
Article Google Scholar
Huang P, Shen CH, Hsiao HF (2018) RGBD salient object detection using spatially coherent deep learning framework. In: 2018 IEEE 23rd international conference on digital signal processing (DSP). IEEE, pp 1–5
Song H, Liu Z, Du H, Sun G, Le Meur O, Ren T (2017) Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans Image Process 26(9):4204–4216
Article MathSciNet Google Scholar
Han J, Chen H, Liu N, Yan C, Li X (2017) CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybernet 48(11):3171–3183
Article Google Scholar
Wang N, Gong X (2019) Adaptive fusion for RGB-D salient object detection. IEEE Access 7:55277–55284
Article Google Scholar
Li C, Cong R, Kwong S, Hou J, Fu H, Zhu G et al (2020) ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans Cybernet 51(1):88–100
Article Google Scholar
Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H (2021) Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans Image Process 30:3528–3542
Article Google Scholar
Zhou T, Fan DP, Cheng MM, Shen J, Shao L (2021) RGB-D salient object detection: a survey. Comput Visual Media 7(1):37–69
Article Google Scholar
Liu N, Zhang N, Wan K, Shao L, Han J (2021) Visual saliency transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4722–4732
Liu Z, Wang Y, Tu Z, Xiao Y, Tang B (2021) TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM international conference on multimedia; pp 4481–4490
Liu Z, Wang Y, Tu Z, Xiao Y, Tang B (2021) TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM international conference on multimedia, pp 4481–4490
Jia X, DongYe C, Peng Y (2022) SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification. Image Vis Comput 127:104549
Article Google Scholar
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7254–7263
Liu N, Zhang N, Shao L, Han J (2020) Learning selective mutual attention and contrast for RGB-D saliency detection. arXiv preprint arXiv:2010.05537
Zhu C, Li G (2017) A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: proceedings of the IEEE international conference on computer vision workshops, pp 3008–3014
Li N, Ye J, Ji Y, Ling H, Yu J (2014) Saliency detection on light field. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2806–2813
Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4548–4557
Fan DP, Gong C, Cao Y, Ren B, Cheng MM, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421
Borji A, Cheng MM, Jiang H, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722
Article MathSciNet Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Fan DP, Ji GP, Sun G, Cheng MM, Shen J, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2777–2787
Piao Y, Rong Z, Zhang M, Ren W, Lu H (2020) ele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9060–9069
Luo A, Li X, Yang F, Jiao Z, Cheng H, Lyu S (2020) Cascade graph neural networks for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 346–364
Zhang M, Yao S, Hu B, Piao Y, Ji W (2022) C2DFNet: criss-cross dynamic filter network for RGB-D salient object detection. IEEE Trans Multimed
Ji W, Li J, Zhang M, Piao Y, Lu H (2020) Accurate RGB-D salient object detection via collaborative learning. In: European conference on computer vision. Springer, pp 52–69
Li C, Cong R, Piao Y, Xu Q, Loy CC (2020) RGB-D salient object detection with cross-modality modulation and selection. In: European conference on computer vision. Springer, pp 225–241
Li G, Liu Z, Ye L, Wang Y, Ling H (2020) Cross-modal weighting network for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 665–681
Pang Y, Zhang L, Zhao X, Lu H (2020) Hierarchical dynamic filtering network for rgb-d salient object detection. In: European conference on computer vision. Springer, pp 235–252
Zhou J, Wang L, Lu H, Huang K, Shi X, Liu B (2022) MVSalNet: Multi-view Augmentation for RGB-D Salient Object Detection. In: European conference on computer vision. Springer, pp 270–287
Chen Q, Zhang Z, Lu Y, Fu K, Zhao Q (2022) 3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond. IEEE Trans Neural Netw Learn Syst
Liu N, Zhang N, Han J (2020) Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13756–13765
Zhang M, Ren W, Piao Y, Rong Z, Lu H (2020) Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3472–3481
Zhao X, Pang Y, Zhang L, Lu H, Ruan X (2022) Self-supervised pretraining for rgb-d salient object detection. In: AAAI conference on artificial intelligence. vol. 3
Zhang J, Fan DP, Dai Y, Anwar S, Saleh FS, Zhang T et al (2020) UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8582–8591

Download references

Acknowledgements

This work was supported by SDUST Young Teachers Teaching Talent Training Plan (BJRC20180501); National Natural Science Foundation of China (Grant No. 61976125); Natural Science Foundation of Shandong Province (ZR2022MF277).

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
XingZhao Jia, WenXiu Zhao, YuMei Wang, ChangLei DongYe & YanJun Peng

Authors

XingZhao Jia
View author publications
You can also search for this author in PubMed Google Scholar
WenXiu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
YuMei Wang
View author publications
You can also search for this author in PubMed Google Scholar
ChangLei DongYe
View author publications
You can also search for this author in PubMed Google Scholar
YanJun Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ChangLei DongYe.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jia, X., Zhao, W., Wang, Y. et al. CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD. Neural Comput & Applic 36, 14361–14378 (2024). https://doi.org/10.1007/s00521-024-09692-0

Download citation

Received: 11 February 2023
Accepted: 25 March 2024
Published: 07 May 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00521-024-09692-0

CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Specificity-preserving RGB-D saliency detection

An adaptive guidance fusion network for RGB-D salient object detection

A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Specificity-preserving RGB-D saliency detection

An adaptive guidance fusion network for RGB-D salient object detection

A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection

Explore related subjects

Data availibility

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation