More Web Proxy on the site http://driver.im/

research-article

Cross-modal hierarchical interaction network for RGB-D salient object detection

Authors:

Tian-Zhu XiangAuthors Info & Claims

Volume 136, Issue C

https://doi.org/10.1016/j.patcog.2022.109194

Published: 01 April 2023 Publication History

Highlights

•

We propose a novel cross-modal hierarchical interaction network for accurate RGB-D salient object detection, which not only excavates multi-modal interaction but also guides multi-level feature fusion progressively.

•

A feedback mechanism is applied to the cross-modal information exchange module, which feedbacks the interactive information to the respective feature extractors for boosting dscriminative feature learning.

•

A multi-level information progressively guided fusion module is introduced with a reverse guidance mechanism, which makes full use of the hierarchical features to produce the accurate saliency detection.

Abstract

How to effectively exchange and aggregate the information of multiple modalities (e.g. RGB image and depth map) is a big challenge in the RGB-D salient object detection community. To address this problem, in this paper, we propose a cross-modal Hierarchical Interaction Network (HINet), which boosts the salient object detection by excavating the cross-modal feature interaction and progressively multi-level feature fusion. To achieve it, we design two modules: cross-modal information exchange (CIE) module and multi-level information progressively guided fusion (PGF) module. Specifically, the CIE module is proposed to exchange the cross-modal features for learning the shared representations, as well as the beneficial feedback to facilitate the discriminative feature learning of different modalities. Besides, the PGF module is designed to aggregate the hierarchical features progressively with the reverse guidance mechanism, which employs the high-level feature fusion to guide the low-level feature fusion and thus improve the saliency detection performance. Extensive experiments show that our proposed model significantly outperforms the existing nine state-of-the-art models on five challenging benchmark datasets. Codes and results are available at: https://github.com/RanwanWu/HINet.

References

[1]

B. Lei, E.-L. Tan, S. Chen, D. Ni, T. Wang, Saliency-driven image classification method based on histogram mining and image score, Pattern Recognit 48 (2015) 2567–2580.

[2]

R. Ji, H. Liu, L. Cao, D. Liu, Y. Wu, F. Huang, Toward optimal manifold hashing via discrete locally linear embedding, IEEE Transactions on Image Processing, 26 (2017) 5411–5420.

[3]

A. Das, H. Agrawal, L. Zitnick, D. Parikh, D. Batra, Human attention in visual question answering: do humans and deep networks look at the same regions?, Comput. Vision Image Understanding 163 (2017) 90–100.

[4]

H. Liu, R. Ji, J. Wang, C. Shen, Ordinal constraint binary coding for approximate nearest neighbor search, IEEE Transactions on Pattern Analysis and Machine Intelligence, 41 (2018) 941–955.

[5]

G. Li, Z. Liu, L. Ye, Y. Wang, H. Ling, Cross-modal weighting network for rgb-d salient object detection, 16th European Conference on Computer Vision, Glasgow, UK, 2020, pp. 665–681.

[6]

H. Chen, Y. Deng, Y. Li, T.-Y. Hung, G. Lin, Rgbd salient object detection via disentangled cross-modal fusion, IEEE Trans. Image Process. 29 (2020) 8407–8416.

[7]

N. Liu, N. Zhang, J. Han, Learning selective self-mutual attention for rgb-d saliency detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13756–13765.

[8]

M. Zhang, W. Ren, Y. Piao, Z. Rong, H. Lu, Select, supplement and focus for rgb-d saliency detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3472–3481.

[9]

T. Zhou, D.-P. Fan, M.-M. Cheng, J. Shen, L. Shao, Rgb-d salient object detection: a survey, Computational Visual Media (2021) 1–33.

[10]

K.-F. Yang, H. Li, C.-Y. Li, Y.-J. Li, A unified framework for salient structure detection by contour-guided visual search, IEEE Trans. Image Process. 25 (2016) 3475–3488.

[11]

M.-M. Cheng, N.J. Mitra, X. Huang, P.H. Torr, S.-M. Hu, Global contrast based salient region detection, IEEE Trans Pattern Anal Mach Intell 37 (2014) 569–582.

Digital Library

[12]

J.-X. Zhao, Y. Cao, D.-P. Fan, M.-M. Cheng, X.-Y. Li, L. Zhang, Contrast prior and fluid pyramid integration for rgbd salient object detection, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3922–3931.

[13]

B. Jiang, Z. Zhou, X. Wang, J. Tang, B. Luo, cmSalGAN: RGB-D salient object detection with cross-view generative adversarial networks, IEEE Trans Multimedia 23 (2020) 1343–1353.

Digital Library

[14]

D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, M.-M. Cheng, Rethinking rgb-d salient object detection: models, data sets, and large-scale benchmarks, IEEE Trans Neural Netw Learn Syst 32 (5) (2021) 2075–2089.

[15]

F. Liang, L. Duan, W. Ma, Y. Qiao, Z. Cai, J. Miao, Q. Ye, CoCNN: RGB-D deep fusion for stereoscopic salient object detection, Pattern Recognit 104 (2020) 107329.

[16]

H. Bi, R. Wu, Z. Liu, J. Zhang, C. Zhang, T.-Z. Xiang, X. Wang, Psnet: parallel symmetric network for RGB-T salient object detection, Neurocomputing 511 (2022) 410–425.

[17]

L. Qu, S. He, J. Zhang, J. Tian, Y. Tang, Q. Yang, Rgbd salient object detection via deep fusion, IEEE Trans. Image Process. 26 (2017) 2274–2285.

[18]

J. Wu, W. Zhou, T. Luo, L. Yu, J. Lei, Multiscale multilevel context and multimodal fusion for RGB-D salient object detection, Signal Processing 178 (2021) 107766.

[19]

X. Zhu, Y. Li, H. Fu, X. Fan, Y. Shi, J. Lei, RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss, Neurocomputing 453 (2021) 623–635.

Digital Library

[20]

Z. Liu, W. Zhang, P. Zhao, A cross-modal adaptive gated fusion generative adversarial network for rgb-d salient object detection, Neurocomputing (2020) 210–220.

[21]

C. Zhu, G. Li, W. Wang, R. Wang, An innovative salient object detection using center-dark channel prior, Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 1509–1515.

[22]

H. Chen, Y. Li, D. Su, Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for rgb-d salient object detection, Pattern Recognit 86 (2019) 376–385.

[23]

K. Fu, D.-P. Fan, G.-P. Ji, Q. Zhao, JL-DCF: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3052–3062.

[24]

X. Wang, S. Li, C. Chen, Y. Fang, A. Hao, H. Qin, Data-level recombination and lightweight fusion scheme for RGB-D salient object detection, IEEE Trans. Image Process. 30 (2020) 458–471.

[25]

C. Li, R. Cong, S. Kwong, J. Hou, H. Fu, G. Zhu, D. Zhang, Q. Huang, Asif-net: attention steered interweave fusion network for RGB-D salient object detection, IEEE Trans Cybern 51 (2020) 88–100.

[26]

J. Zhang, D.-P. Fan, Y. Dai, X. Yu, Y. Zhong, N. Barnes, L. Shao, Rgb-d saliency detection via cascaded mutual information minimization, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 4318–4327.

[27]

J. Zhang, D.-P. Fan, Y. Dai, S. Anwar, F. Saleh, S. Aliakbarian, N. Barnes, Uncertainty inspired RGB-D saliency detection, IEEE Trans Pattern Anal Mach Intell 44 (9) (2022) 5761–5779.

[28]

J. Zhao, Y. Zhao, J. Li, X. Chen, Is depth really necessary for salient object detection?, Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1745–1754.

[29]

Y. Pang, L. Zhang, X. Zhao, H. Lu, Hierarchical dynamic filtering network for rgb-d salient object detection, European Conference on Computer Vision, Springer, 2020, pp. 235–252.

[30]

S. Chen, Y. Fu, Progressively guided alternate refinement network for rgb-d salient object detection, European Conference on Computer Vision, Springer, 2020, pp. 520–538.

[31]

W. Zhou, Y. Chen, C. Liu, L. Yu, GFNet: gate fusion network with res2net for detecting salient objects in RGB-D images, IEEE Signal Process Lett (2020) 800–804.

[32]

H. Chen, Y. Li, Progressively complementarity-aware fusion network for rgb-d salient object detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3051–3060.

[33]

Z. Huang, H.-X. Chen, T. Zhou, Y.-Z. Yang, B.-Y. Liu, Multi-level cross-modal interaction network for RGB-D salient object detection, Neurocomputing 452 (2021) 200–211.

[34]

T. Zhou, D.-P. Fan, G. Chen, Y. Zhou, H. Fu, Specificity-preserving rgb-d saliency detection, Computational Visual Media, 2022.

[35]

Y. Zhai, D.-P. Fan, J. Yang, A. Borji, L. Shao, J. Han, L. Wang, Bifurcated backbone strategy for RGB-D salient object detection, IEEE Trans. Image Process. 30 (2021) 8727–8742.

Digital Library

[36]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

[37]

H. Chen, Y. Li, D. Su, Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection, Pattern Recognition, 86 (2019) 376–385.

[38]

R. Ju, Y. Liu, T. Ren, L. Ge, G. Wu, Depth-aware salient object detection using anisotropic center-surround difference, Signal Process. Image Commun. 38 (2015) 115–126.

[39]

H. Peng, B. Li, W. Xiong, W. Hu, R. Ji, RGB-D salient object detection: a benchmark and algorithms, European Conference on Computer Vision, 2014, pp. 92–109.

[40]

J. Zhang, M. Wang, L. Lin, X. Yang, J. Gao, Y. Rui, Saliency detection on light field: a multi-cue approach, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 13 (2017) 1–22.

[41]

Y. Niu, Y. Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 454–461.

[42]

C. Zhu, G. Li, A three-pathway psychobiological framework of salient object detection using stereoscopic technology, Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 3008–3014.

[43]

Y.C. Deng-Ping Fan Cheng Gong, B. Ren, M.-M. Cheng, A. Borji, Enhanced-alignment measure for binary foreground map evaluation, Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018, pp. 698–704.

[44]

D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, A. Borji, Structure-measure: A new way to evaluate foreground maps, 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4558–4567.

[45]

P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hierarchical image segmentation, IEEE Trans Pattern Anal Mach Intell 33 (2010) 898–916.

Digital Library

[46]

A. Borji, M.-M. Cheng, H. Jiang, J. Li, Salient object detection: a benchmark, IEEE Trans. Image Process. 24 (2015) 5706–5722.

Digital Library

[47]

R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1597–1604.

[48]

Y. Piao, W. Ji, J. Li, M. Zhang, H. Lu, Depth-induced multi-scale recurrent attention network for saliency detection, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7253–7262.

Cited By

Kuang XZhu AYuan JXu Q(2024)CIA-Net: Cross-Modal Interaction and Depth Quality-Aware Network for RGB-D Salient Object DetectionArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72335-3_6(79-92)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72335-3_6
Yi KTang HBai HWang YXu JLi P(2024)Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object DetectionMultiMedia Modeling10.1007/978-3-031-53305-1_36(475-489)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-53305-1_36
Bi HZhang JWu RTong YFu XShao K(2023)RGB-T salient object detection via excavating and enhancing CNN featuresApplied Intelligence10.1007/s10489-023-04784-153:21(25543-25561)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s10489-023-04784-1

Index Terms

Cross-modal hierarchical interaction network for RGB-D salient object detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
Abstract
Exploring more effective multimodal fusion strategies is still challenging for RGB-T salient object detection (SOD). Most RGB-T SOD methods tend to focus on the strategy of acquiring modal complementary features by utilizing foreground ...
Highlights
- The significance of thermal images in salient object detection was investigated.
- Using attentional mechanisms to capture multimodal complementary features.
- Importance of background pixels for correcting salient object boundary ...
RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss
Abstract
RGB-D salient object detection aims at identifying attractive objects in a scene by combining the color image and depth map. However, due to the differences between RGB-D image pairs, it is a key issue to utilize cross-modal data ...
CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse
Abstract
The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection, and cross-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 136, Issue C

Apr 2023

858 pages

ISSN:0031-3203

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 April 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kuang XZhu AYuan JXu Q(2024)CIA-Net: Cross-Modal Interaction and Depth Quality-Aware Network for RGB-D Salient Object DetectionArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72335-3_6(79-92)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72335-3_6
Yi KTang HBai HWang YXu JLi P(2024)Bi-directional Interaction and Dense Aggregation Network for RGB-D Salient Object DetectionMultiMedia Modeling10.1007/978-3-031-53305-1_36(475-489)Online publication date: 29-Jan-2024
https://dl.acm.org/doi/10.1007/978-3-031-53305-1_36
Bi HZhang JWu RTong YFu XShao K(2023)RGB-T salient object detection via excavating and enhancing CNN featuresApplied Intelligence10.1007/s10489-023-04784-153:21(25543-25561)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1007/s10489-023-04784-1
Wang SYang GZhang YXu QWang Y(2023)MBDNet: Mitigating the “Under-Training Issue” in Dual-Encoder Model for RGB-d Salient Object DetectionAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4761-4_9(99-111)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1007/978-981-99-4761-4_9

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents