[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Cross-modal hierarchical interaction network for RGB-D salient object detection

Published: 01 April 2023 Publication History

Highlights

We propose a novel cross-modal hierarchical interaction network for accurate RGB-D salient object detection, which not only excavates multi-modal interaction but also guides multi-level feature fusion progressively.
A feedback mechanism is applied to the cross-modal information exchange module, which feedbacks the interactive information to the respective feature extractors for boosting dscriminative feature learning.
A multi-level information progressively guided fusion module is introduced with a reverse guidance mechanism, which makes full use of the hierarchical features to produce the accurate saliency detection.

Abstract

How to effectively exchange and aggregate the information of multiple modalities (e.g. RGB image and depth map) is a big challenge in the RGB-D salient object detection community. To address this problem, in this paper, we propose a cross-modal Hierarchical Interaction Network (HINet), which boosts the salient object detection by excavating the cross-modal feature interaction and progressively multi-level feature fusion. To achieve it, we design two modules: cross-modal information exchange (CIE) module and multi-level information progressively guided fusion (PGF) module. Specifically, the CIE module is proposed to exchange the cross-modal features for learning the shared representations, as well as the beneficial feedback to facilitate the discriminative feature learning of different modalities. Besides, the PGF module is designed to aggregate the hierarchical features progressively with the reverse guidance mechanism, which employs the high-level feature fusion to guide the low-level feature fusion and thus improve the saliency detection performance. Extensive experiments show that our proposed model significantly outperforms the existing nine state-of-the-art models on five challenging benchmark datasets. Codes and results are available at: https://github.com/RanwanWu/HINet.

References

[1]
B. Lei, E.-L. Tan, S. Chen, D. Ni, T. Wang, Saliency-driven image classification method based on histogram mining and image score, Pattern Recognit 48 (2015) 2567–2580.
[2]
R. Ji, H. Liu, L. Cao, D. Liu, Y. Wu, F. Huang, Toward optimal manifold hashing via discrete locally linear embedding, IEEE Transactions on Image Processing, 26 (2017) 5411–5420.
[3]
A. Das, H. Agrawal, L. Zitnick, D. Parikh, D. Batra, Human attention in visual question answering: do humans and deep networks look at the same regions?, Comput. Vision Image Understanding 163 (2017) 90–100.
[4]
H. Liu, R. Ji, J. Wang, C. Shen, Ordinal constraint binary coding for approximate nearest neighbor search, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2018) 941–955.
[5]
G. Li, Z. Liu, L. Ye, Y. Wang, H. Ling, Cross-modal weighting network for rgb-d salient object detection, 16th European Conference on Computer Vision, Glasgow, UK, 2020, pp. 665–681.
[6]
H. Chen, Y. Deng, Y. Li, T.-Y. Hung, G. Lin, Rgbd salient object detection via disentangled cross-modal fusion, IEEE Trans. Image Process. 29 (2020) 8407–8416.
[7]
N. Liu, N. Zhang, J. Han, Learning selective self-mutual attention for rgb-d saliency detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13756–13765.
[8]
M. Zhang, W. Ren, Y. Piao, Z. Rong, H. Lu, Select, supplement and focus for rgb-d saliency detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3472–3481.
[9]
T. Zhou, D.-P. Fan, M.-M. Cheng, J. Shen, L. Shao, Rgb-d salient object detection: a survey, Computational Visual Media (2021) 1–33.
[10]
K.-F. Yang, H. Li, C.-Y. Li, Y.-J. Li, A unified framework for salient structure detection by contour-guided visual search, IEEE Trans. Image Process. 25 (2016) 3475–3488.
[11]
M.-M. Cheng, N.J. Mitra, X. Huang, P.H. Torr, S.-M. Hu, Global contrast based salient region detection, IEEE Trans Pattern Anal Mach Intell 37 (2014) 569–582.
[12]
J.-X. Zhao, Y. Cao, D.-P. Fan, M.-M. Cheng, X.-Y. Li, L. Zhang, Contrast prior and fluid pyramid integration for rgbd salient object detection, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3922–3931.
[13]
B. Jiang, Z. Zhou, X. Wang, J. Tang, B. Luo, cmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks, IEEE Trans Multimedia 23 (2020) 1343–1353.
[14]
D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, M.-M. Cheng, Rethinking rgb-d salient object detection: models, data sets, and large-scale benchmarks, IEEE Trans Neural Netw Learn Syst 32 (5) (2021) 2075–2089.
[15]
F. Liang, L. Duan, W. Ma, Y. Qiao, Z. Cai, J. Miao, Q. Ye, CoCNN: RGB-D deep fusion for stereoscopic salient object detection, Pattern Recognit 104 (2020) 107329.
[16]
H. Bi, R. Wu, Z. Liu, J. Zhang, C. Zhang, T.-Z. Xiang, X. Wang, Psnet: parallel symmetric network for RGB-T salient object detection, Neurocomputing 511 (2022) 410–425.
[17]
L. Qu, S. He, J. Zhang, J. Tian, Y. Tang, Q. Yang, Rgbd salient object detection via deep fusion, IEEE Trans. Image Process. 26 (2017) 2274–2285.
[18]
J. Wu, W. Zhou, T. Luo, L. Yu, J. Lei, Multiscale multilevel context and multimodal fusion for RGB-D salient object detection, Signal Processing 178 (2021) 107766.
[19]
X. Zhu, Y. Li, H. Fu, X. Fan, Y. Shi, J. Lei, RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss, Neurocomputing 453 (2021) 623–635.
[20]
Z. Liu, W. Zhang, P. Zhao, A cross-modal adaptive gated fusion generative adversarial network for rgb-d salient object detection, Neurocomputing (2020) 210–220.
[21]
C. Zhu, G. Li, W. Wang, R. Wang, An innovative salient object detection using center-dark channel prior, Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 1509–1515.
[22]
H. Chen, Y. Li, D. Su, Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for rgb-d salient object detection, Pattern Recognit 86 (2019) 376–385.
[23]
K. Fu, D.-P. Fan, G.-P. Ji, Q. Zhao, JL-DCF: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3052–3062.
[24]
X. Wang, S. Li, C. Chen, Y. Fang, A. Hao, H. Qin, Data-level recombination and lightweight fusion scheme for RGB-D salient object detection, IEEE Trans. Image Process. 30 (2020) 458–471.
[25]
C. Li, R. Cong, S. Kwong, J. Hou, H. Fu, G. Zhu, D. Zhang, Q. Huang, Asif-net: attention steered interweave fusion network for RGB-D salient object detection, IEEE Trans Cybern 51 (2020) 88–100.
[26]
J. Zhang, D.-P. Fan, Y. Dai, X. Yu, Y. Zhong, N. Barnes, L. Shao, Rgb-d saliency detection via cascaded mutual information minimization, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4318–4327.
[27]
J. Zhang, D.-P. Fan, Y. Dai, S. Anwar, F. Saleh, S. Aliakbarian, N. Barnes, Uncertainty inspired RGB-D saliency detection, IEEE Trans Pattern Anal Mach Intell 44 (9) (2022) 5761–5779.
[28]
J. Zhao, Y. Zhao, J. Li, X. Chen, Is depth really necessary for salient object detection?, Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1745–1754.
[29]
Y. Pang, L. Zhang, X. Zhao, H. Lu, Hierarchical dynamic filtering network for rgb-d salient object detection, European Conference on Computer Vision, Springer, 2020, pp. 235–252.
[30]
S. Chen, Y. Fu, Progressively guided alternate refinement network for rgb-d salient object detection, European Conference on Computer Vision, Springer, 2020, pp. 520–538.
[31]
W. Zhou, Y. Chen, C. Liu, L. Yu, GFNet: gate fusion network with res2net for detecting salient objects in RGB-D images, IEEE Signal Process Lett (2020) 800–804.
[32]
H. Chen, Y. Li, Progressively complementarity-aware fusion network for rgb-d salient object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3051–3060.
[33]
Z. Huang, H.-X. Chen, T. Zhou, Y.-Z. Yang, B.-Y. Liu, Multi-level cross-modal interaction network for RGB-D salient object detection, Neurocomputing 452 (2021) 200–211.
[34]
T. Zhou, D.-P. Fan, G. Chen, Y. Zhou, H. Fu, Specificity-preserving rgb-d saliency detection, Computational Visual Media, 2022.
[35]
Y. Zhai, D.-P. Fan, J. Yang, A. Borji, L. Shao, J. Han, L. Wang, Bifurcated backbone strategy for RGB-D salient object detection, IEEE Trans. Image Process. 30 (2021) 8727–8742.
[36]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[37]
H. Chen, Y. Li, D. Su, Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection, Pattern Recognition, 86 (2019) 376–385.
[38]
R. Ju, Y. Liu, T. Ren, L. Ge, G. Wu, Depth-aware salient object detection using anisotropic center-surround difference, Signal Process. Image Commun. 38 (2015) 115–126.
[39]
H. Peng, B. Li, W. Xiong, W. Hu, R. Ji, RGB-D salient object detection: a benchmark and algorithms, European Conference on Computer Vision, 2014, pp. 92–109.
[40]
J. Zhang, M. Wang, L. Lin, X. Yang, J. Gao, Y. Rui, Saliency detection on light field: a multi-cue approach, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 13 (2017) 1–22.
[41]
Y. Niu, Y. Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 454–461.
[42]
C. Zhu, G. Li, A three-pathway psychobiological framework of salient object detection using stereoscopic technology, Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 3008–3014.
[43]
Y.C. Deng-Ping Fan Cheng Gong, B. Ren, M.-M. Cheng, A. Borji, Enhanced-alignment measure for binary foreground map evaluation, Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018, pp. 698–704.
[44]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, A. Borji, Structure-measure: A new way to evaluate foreground maps, 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4558–4567.
[45]
P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hierarchical image segmentation, IEEE Trans Pattern Anal Mach Intell 33 (2010) 898–916.
[46]
A. Borji, M.-M. Cheng, H. Jiang, J. Li, Salient object detection: a benchmark, IEEE Trans. Image Process. 24 (2015) 5706–5722.
[47]
R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1597–1604.
[48]
Y. Piao, W. Ji, J. Li, M. Zhang, H. Lu, Depth-induced multi-scale recurrent attention network for saliency detection, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7253–7262.

Cited By

View all
  • (2024)CIA-Net: Cross-Modal Interaction and Depth Quality-Aware Network for RGB-D Salient Object DetectionArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72335-3_6(79-92)Online publication date: 17-Sep-2024
  • (2024)Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object DetectionMultiMedia Modeling10.1007/978-3-031-53305-1_36(475-489)Online publication date: 29-Jan-2024
  • (2023)RGB-T salient object detection via excavating and enhancing CNN featuresApplied Intelligence10.1007/s10489-023-04784-153:21(25543-25561)Online publication date: 1-Nov-2023

Index Terms

  1. Cross-modal hierarchical interaction network for RGB-D salient object detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Pattern Recognition
        Pattern Recognition  Volume 136, Issue C
        Apr 2023
        858 pages

        Publisher

        Elsevier Science Inc.

        United States

        Publication History

        Published: 01 April 2023

        Author Tags

        1. Saliency detection
        2. Salient object detection
        3. RGB-D
        4. Feature fusion
        5. Cross-modal interaction

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 23 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)CIA-Net: Cross-Modal Interaction and Depth Quality-Aware Network for RGB-D Salient Object DetectionArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72335-3_6(79-92)Online publication date: 17-Sep-2024
        • (2024)Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object DetectionMultiMedia Modeling10.1007/978-3-031-53305-1_36(475-489)Online publication date: 29-Jan-2024
        • (2023)RGB-T salient object detection via excavating and enhancing CNN featuresApplied Intelligence10.1007/s10489-023-04784-153:21(25543-25561)Online publication date: 1-Nov-2023
        • (2023)MBDNet: Mitigating the “Under-Training Issue” in Dual-Encoder Model for RGB-d Salient Object DetectionAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4761-4_9(99-111)Online publication date: 10-Aug-2023

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media