[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The success of vision transformer demonstrates that the transformer structure is also suitable for various vision tasks, including high-level classification tasks and low-level dense prediction tasks. Salient object detection (SOD) is a pixel-level dense prediction task that simulates the most salient objects in human visual recognition scenarios. In recent years, depth images have been widely used for salient object detection. Compared with RGB SOD, the key point of RGB-D SOD is the effective fusion of depth information. As RGB-D SOD requires extracting depth features and fusing cross-modal information, additional computation is involved. However, except for lightweight models, most RGB-D SOD methods tend to obtain better prediction maps by consuming more computational resources. We propose a cross-modal dense cooperative fusion net, which provides state-of-the-art performance with less computation and parameters. We take advantage of the ability of the transformer structure to model long sequence dependencies to extract saliency features from RGB images. Since there is less information in the depth image than in the RGB image, it is not necessary to use the same structure in the depth stream. For the sake of reducing parameters and computation, we consider the asymmetric architecture. It is enough to meet our needs that deep features extracted by lightweight MobileV2Net. Our decoder can perform dense cooperative fusion of cross-modal information while decoding features. It can both effectively fuse cross-modal information and save computation. Comprehensive experiments on multiple benchmark datasets for RGB-D SOD show that compared with SOTA methods, our method performs much better with less computation and parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availibility

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Ye L, Liu Z, Li L, Shen L, Bai C, Wang Y (2017) Salient object segmentation via effective integration of saliency and objectness. IEEE Trans Multimed 19(8):1742–1756

    Article  Google Scholar 

  2. Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20–33

    Article  Google Scholar 

  3. Wang W, Sun G, Van Gool L (2022) Looking Beyond Single Images for Weakly Supervised Semantic Segmentation Learning. IEEE Trans Pattern Anal Mach Intell

  4. Zhu S, Chang Q, Li Q (2022) Video saliency aware intelligent HD video compression with the improvement of visual quality and the reduction of coding complexity. Neural Comput Appl 34(10):7955–7974

    Article  Google Scholar 

  5. Wang W, Shen J, Ling H (2018) A deep network solution for attention and aesthetics aware photo cropping. IEEE Trans Pattern Anal Mach Intell 41(7):1531–1544

    Article  Google Scholar 

  6. Abouelaziz I, Chetouani A, El Hassouni M, Latecki LJ, Cherifi H (2020) 3D visual saliency and convolutional neural network for blind mesh quality assessment. Neural Comput Appl 32:16589–16603

    Article  Google Scholar 

  7. Zhang J, Yuan T, He Y, Wang J (2022) A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking. Neural Comput Appl 34:1–18

    Google Scholar 

  8. Fu K, Fan DP, Ji GP, Zhao Q, Shen J, Zhu C (2021) Siamese network for RGB-D salient object detection and beyond. IEEE Trans Pattern Anal Mach Intell 44(9):5541–5559

    Google Scholar 

  9. Zhang J, Fan DP, Dai Y, Anwar S, Saleh F, Aliakbarian S et al (2021) Uncertainty inspired RGB-D saliency detection. IEEE Trans Pattern Anal Mach Intell 44(9):5761–5779

    Google Scholar 

  10. Zhang M, Fei SX, Liu J, Xu S, Piao Y, Lu H (2020) Asymmetric two-stream architecture for accurate RGB-D saliency detection. In: European Conference on Computer Vision. Springer; pp 374–390

  11. Fan DP, Zhai Y, Borji A, Yang J, Shao L (2020) BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In: European conference on computer vision. Springer; pp 275–292

  12. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  13. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  14. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, et al (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: proceedings of the IEEE/CVF international conference on computer vision; pp 558–567

  15. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  16. Niu Y, Geng Y, Li X, Liu F (2012) Leveraging stereopsis for saliency analysis. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE; pp 454–461

  17. Ju R, Ge L, Geng W, Ren T, Wu G (2014) Depth saliency based on anisotropic center-surround difference. In, (2014) IEEE international conference on image processing (ICIP). IEEE pp 1115–1119

  18. Guo J, Ren T, Bei J (2016) Salient object detection for RGB-D image via saliency evolution. In: 2016 IEEE international conference on multimedia and expo (ICME). IEEE; pp 1–6

  19. Peng H, Li B, Xiong W, Hu W, Ji R (2014) RGBD salient object detection: a benchmark and algorithms. In: European conference on computer vision. Springer, pp 92–109

  20. Cong R, Lei J, Zhang C, Huang Q, Cao X, Hou C (2016) Saliency detection for stereoscopic images based on depth confidence analysis and multiple cues fusion. IEEE Signal Process Lett 23(6):819–823

    Article  Google Scholar 

  21. Wang A, Wang M (2017) RGB-D salient object detection via minimum barrier distance transform and saliency fusion. IEEE Signal Process Lett 24(5):663–667

    Article  Google Scholar 

  22. Liang F, Duan L, Ma W, Qiao Y, Cai Z, Qing L (2018) Stereoscopic saliency model using contrast and depth-guided-background prior. Neurocomputing 275:2227–2238

    Article  Google Scholar 

  23. Fan DP, Lin Z, Zhang Z, Zhu M, Cheng MM (2020) Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks. IEEE Trans Neural Netwo Learn Syst 32(5):2075–2089

    Article  Google Scholar 

  24. Huang P, Shen CH, Hsiao HF (2018) RGBD salient object detection using spatially coherent deep learning framework. In: 2018 IEEE 23rd international conference on digital signal processing (DSP). IEEE, pp 1–5

  25. Song H, Liu Z, Du H, Sun G, Le Meur O, Ren T (2017) Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning. IEEE Trans Image Process 26(9):4204–4216

    Article  MathSciNet  Google Scholar 

  26. Han J, Chen H, Liu N, Yan C, Li X (2017) CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybernet 48(11):3171–3183

    Article  Google Scholar 

  27. Wang N, Gong X (2019) Adaptive fusion for RGB-D salient object detection. IEEE Access 7:55277–55284

    Article  Google Scholar 

  28. Li C, Cong R, Kwong S, Hou J, Fu H, Zhu G et al (2020) ASIF-Net: attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans Cybernet 51(1):88–100

    Article  Google Scholar 

  29. Li G, Liu Z, Chen M, Bai Z, Lin W, Ling H (2021) Hierarchical alternate interaction network for RGB-D salient object detection. IEEE Trans Image Process 30:3528–3542

    Article  Google Scholar 

  30. Zhou T, Fan DP, Cheng MM, Shen J, Shao L (2021) RGB-D salient object detection: a survey. Comput Visual Media 7(1):37–69

    Article  Google Scholar 

  31. Liu N, Zhang N, Wan K, Shao L, Han J (2021) Visual saliency transformer. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4722–4732

  32. Liu Z, Wang Y, Tu Z, Xiao Y, Tang B (2021) TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM international conference on multimedia; pp 4481–4490

  33. Liu Z, Wang Y, Tu Z, Xiao Y, Tang B (2021) TriTransNet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the 29th ACM international conference on multimedia, pp 4481–4490

  34. Jia X, DongYe C, Peng Y (2022) SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification. Image Vis Comput 127:104549

    Article  Google Scholar 

  35. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  36. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  37. Piao Y, Ji W, Li J, Zhang M, Lu H (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7254–7263

  38. Liu N, Zhang N, Shao L, Han J (2020) Learning selective mutual attention and contrast for RGB-D saliency detection. arXiv preprint arXiv:2010.05537

  39. Zhu C, Li G (2017) A three-pathway psychobiological framework of salient object detection using stereoscopic technology. In: proceedings of the IEEE international conference on computer vision workshops, pp 3008–3014

  40. Li N, Ye J, Ji Y, Ling H, Yu J (2014) Saliency detection on light field. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2806–2813

  41. Fan DP, Cheng MM, Liu Y, Li T, Borji A (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE international conference on computer vision, pp 4548–4557

  42. Fan DP, Gong C, Cao Y, Ren B, Cheng MM, Borji A (2018) Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421

  43. Borji A, Cheng MM, Jiang H, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722

    Article  MathSciNet  Google Scholar 

  44. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  45. Fan DP, Ji GP, Sun G, Cheng MM, Shen J, Shao L (2020) Camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2777–2787

  46. Piao Y, Rong Z, Zhang M, Ren W, Lu H (2020) ele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9060–9069

  47. Luo A, Li X, Yang F, Jiao Z, Cheng H, Lyu S (2020) Cascade graph neural networks for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 346–364

  48. Zhang M, Yao S, Hu B, Piao Y, Ji W (2022) C2DFNet: criss-cross dynamic filter network for RGB-D salient object detection. IEEE Trans Multimed

  49. Ji W, Li J, Zhang M, Piao Y, Lu H (2020) Accurate RGB-D salient object detection via collaborative learning. In: European conference on computer vision. Springer, pp 52–69

  50. Li C, Cong R, Piao Y, Xu Q, Loy CC (2020) RGB-D salient object detection with cross-modality modulation and selection. In: European conference on computer vision. Springer, pp 225–241

  51. Li G, Liu Z, Ye L, Wang Y, Ling H (2020) Cross-modal weighting network for RGB-D salient object detection. In: European conference on computer vision. Springer, pp 665–681

  52. Pang Y, Zhang L, Zhao X, Lu H (2020) Hierarchical dynamic filtering network for rgb-d salient object detection. In: European conference on computer vision. Springer, pp 235–252

  53. Zhou J, Wang L, Lu H, Huang K, Shi X, Liu B (2022) MVSalNet: Multi-view Augmentation for RGB-D Salient Object Detection. In: European conference on computer vision. Springer, pp 270–287

  54. Chen Q, Zhang Z, Lu Y, Fu K, Zhao Q (2022) 3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond. IEEE Trans Neural Netw Learn Syst

  55. Liu N, Zhang N, Han J (2020) Learning selective self-mutual attention for RGB-D saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13756–13765

  56. Zhang M, Ren W, Piao Y, Rong Z, Lu H (2020) Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3472–3481

  57. Zhao X, Pang Y, Zhang L, Lu H, Ruan X (2022) Self-supervised pretraining for rgb-d salient object detection. In: AAAI conference on artificial intelligence. vol. 3

  58. Zhang J, Fan DP, Dai Y, Anwar S, Saleh FS, Zhang T et al (2020) UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8582–8591

Download references

Acknowledgements

This work was supported by SDUST Young Teachers Teaching Talent Training Plan (BJRC20180501); National Natural Science Foundation of China (Grant No. 61976125); Natural Science Foundation of Shandong Province (ZR2022MF277).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ChangLei DongYe.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, X., Zhao, W., Wang, Y. et al. CMDCF: an effective cross-modal dense cooperative fusion network for RGB-D SOD. Neural Comput & Applic 36, 14361–14378 (2024). https://doi.org/10.1007/s00521-024-09692-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09692-0

Keywords

Navigation