[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Transformer-based cross-modality interaction guidance network for RGB-T salient object detection

Published: 18 October 2024 Publication History

Abstract

Exploring more effective multimodal fusion strategies is still challenging for RGB-T salient object detection (SOD). Most RGB-T SOD methods tend to focus on the strategy of acquiring modal complementary features by utilizing foreground information while ignoring the importance of background information for salient object localization. In addition, feature fusion without information filtering may introduce more noise. To solve these problems, this paper proposes a new cross-modal interaction guidance network (CIGNet) for RGB-T saliency object detection. Specifically, we construct a transformer-based dual-stream encoder to extract multimodal features. In the decoder, we propose an attention mechanism-based modal information complementary module (MICM) for capturing cross-modal complementary information for global comparison and salient object localization. Based on the MICM features, we design a multi-scale adaptive fusion module (MAFM) to find the optimal salient region of the multi-scale fusion process and reduce redundant features. In order to enhance the completeness of salient features after multi-scale feature fusion, this paper proposes the saliency region mining module (SRMM), which corrects the features in the boundary neighborhood by exploiting the differences between foreground and background pixels and the boundary. Comparisons with other state-of-the-art methods on three RGB-T datasets and five RGB-D datasets, the experimental results demonstrate the superiority and extensiveness of the proposed CIGNet.

Highlights

The significance of thermal images in salient object detection was investigated.
Using attentional mechanisms to capture multimodal complementary features.
Importance of background pixels for correcting salient object boundary features.

References

[1]
Liu Q., Chen D., Chu Q., Yuan L., Liu B., Zhang L., Yu N., Online multi-object tracking with unsupervised re-identification learning and occlusion estimation, Neurocomputing 483 (2022) 333–347.
[2]
Zhao Q., Wan Y., Xu J., Fang L., Cross-modal attention fusion network for RGB-D semantic segmentation, Neurocomputing 548 (2023).
[3]
Liu J., Zhang F., Zhou Z., Wang J., Bfmnet: Bilateral feature fusion network with multi-scale context aggregation for real-time semantic segmentation, Neurocomputing 521 (2023) 27–40.
[4]
Liu Y., Xiong Z., Yuan Y., Wang Q., Distilling knowledge from super-resolution for efficient remote sensing salient object detection, IEEE Trans. Geosci. Remote Sens. 61 (2023) 1–16.
[5]
Liu Y., Yuan Y., Wang Q., Uncertainty-aware graph reasoning with global collaborative learning for remote sensing salient object detection, IEEE Geosci. Remote Sens. Lett. 20 (2023) 1–5.
[6]
K. Zhang, M. Dong, B. Liu, X.-T. Yuan, Q. Liu, DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 13698–13707.
[7]
Zhou W., Yu L., Zhou Y., Qiu W., Wu M.-W., Luo T., Local and global feature learning for blind quality evaluation of screen content and natural scene images, IEEE Trans. Image Process. 27 (5) (2018) 2086–2095.
[8]
Liang Y., Qin G., Sun M., Qin J., Yan J., Zhang Z., Semantic and detail collaborative learning network for salient object detection, Neurocomputing 462 (2021) 478–490.
[9]
Chen T., Hu X., Xiao J., Zhang G., BPFINet: Boundary-aware progressive feature integration network for salient object detection, Neurocomputing 451 (2021) 152–166.
[10]
Y. Piao, W. Ji, J. Li, M. Zhang, H. Lu, Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection, in: 2019 IEEE/CVF International Conference on Computer Vision, ICCV, 2019, pp. 7253–7262.
[11]
Zhou W., Zhu Y., Lei J., Wan J., Yu L., APNet: Adversarial learning assistance and perceived importance fusion network for all-day RGB-T salient object detection, IEEE Trans. Emerg. Top. Comput. Intell. 6 (4) (2022) 957–968.
[12]
Zhang Q., Huang N., Yao L., Zhang D., Shan C., Han J., RGB-T salient object detection via fusing multi-level CNN features, IEEE Trans. Image Process. 29 (2020) 3321–3335.
[13]
Zhang Q., Xiao T., Huang N., Zhang D., Han J., Revisiting feature fusion for RGB-T salient object detection, IEEE Trans. Circuits Syst. Video Technol. 31 (5) (2021) 1804–1818.
[14]
Zhuge M., Fan D.-P., Liu N., Zhang D., Xu D., Shao L., Salient object detection via integrity learning, IEEE Trans. Pattern Anal. Mach. Intell. 45 (3) (2022) 3738–3752.
[15]
N. Liu, N. Zhang, K. Wan, L. Shao, J. Han, Visual Saliency Transformer, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 4702–4712.
[16]
Chen G., Shao F., Chai X., Chen H., Jiang Q., Meng X., Ho Y.-S., CGMDRNet: Cross-guided modality difference reduction network for RGB-T salient object detection, IEEE Trans. Circuits Syst. Video Technol. 32 (9) (2022) 6308–6323.
[17]
Tu Z., Li Z., Li C., Lang Y., Tang J., Multi-interactive dual-decoder for RGB-thermal salient object detection, IEEE Trans. Image Process. 30 (2021) 5678–5691.
[18]
Wen H., Yan C., Zhou X., Cong R., Sun Y., Zheng B., Zhang J., Bao Y., Ding G., Dynamic selective network for RGB-D salient object detection, IEEE Trans. Image Process. 30 (2021) 9179–9192.
[19]
Xu C., Li Q., Zhou M., Zhou Q., Zhou Y., Ma Y., RGB-T salient object detection via CNN feature and result saliency map fusion, Appl. Intell. 52 (10) (2022) 11343–11362.
[20]
Itti L., Koch C., Niebur E., A model of saliency-based visual attention for rapid scene analysis, IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1254–1259.
[21]
S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, in: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2376–2383.
[22]
Q. Yan, L. Xu, J. Shi, J. Jia, Hierarchical Saliency Detection, in: 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 1155–1162.
[23]
F. Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 733–740.
[24]
Cheng M.-M., Mitra N.J., Huang X., Torr P.H.S., Hu S.-M., Global contrast based salient region detection, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 569–582.
[25]
Song S., Jia Z., Yang J., Kasabov N., Salient detection via the fusion of background-based and multiscale frequency-domain features, Inform. Sci. 618 (2022) 53–71.
[26]
X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, M. Jagersand, BASNet: Boundary-Aware Salient Object Detection, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 7471–7481.
[27]
L. Zhang, J. Dai, H. Lu, Y. He, G. Wang, A Bi-Directional Message Passing Model for Salient Object Detection, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 1741–1750.
[28]
Liu Y., Duanmu M., Huo Z., Qi H., Chen Z., Li L., Zhang Q., Exploring multi-scale deformable context and channel-wise attention for salient object detection, Neurocomputing 428 (2021) 92–103.
[29]
Yao Z., Wang L., Multi-pathway feature integration network for salient object detection, Neurocomputing 461 (2021) 462–478.
[30]
Liu Y., Xiong Z., Yuan Y., Wang Q., Transcending pixels: Boosting saliency detection via scene understanding from aerial imagery, IEEE Trans. Geosci. Remote Sens. 61 (2023) 1–16.
[31]
Huang N., Jiao Q., Zhang Q., Han J., Middle-level feature fusion for lightweight RGB-D salient object detection, IEEE Trans. Image Process. 31 (2022) 6621–6634.
[32]
Zhou W., Liu C., Lei J., Yu L., Luo T., HFNet: Hierarchical feedback network with multilevel atrous spatial pyramid pooling for RGB-D saliency detection, Neurocomputing 490 (2022) 347–357.
[33]
Song H., Liu Z., Du H., Sun G., Le Meur O., Ren T., Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning, IEEE Trans. Image Process. 26 (9) (2017) 4204–4216.
[34]
Zeng Z., Liu H., Chen F., Tan X., Compensated attention feature fusion and hierarchical multiplication decoder network for RGB-D salient object detection, Remote Sens. 15 (9) (2023) 2393.
[35]
A. Luo, X. Li, F. Yang, Z. Jiao, H. Cheng, S. Lyu, Cascade graph neural networks for RGB-D salient object detection, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, 2020, pp. 346–364.
[36]
Chen T., Xiao J., Hu X., Zhang G., Wang S., Adaptive fusion network for RGB-D salient object detection, Neurocomputing 522 (2023) 152–164.
[37]
G. Wang, C. Li, Y. Ma, A. Zheng, J. Tang, B. Luo, RGB-T saliency detection benchmark: Dataset, baselines, analysis and a novel approach, in: Image and Graphics Technologies and Applications: 13th Conference on Image and Graphics Technologies and Applications, IGTA 2018, Beijing, China, April 8–10, 2018, Revised Selected Papers 13, 2018, pp. 359–369.
[38]
Tu Z., Xia T., Li C., Wang X., Ma Y., Tang J., RGB-T image saliency detection via collaborative graph learning, IEEE Trans. Multimed. 22 (1) (2020) 160–173.
[39]
Z. Tu, T. Xia, C. Li, Y. Lu, J. Tang, M3S-NIR: Multi-modal Multi-scale Noise-Insensitive Ranking for RGB-T Saliency Detection, in: 2019 IEEE Conference on Multimedia Information Processing and Retrieval, MIPR, 2019, pp. 141–146.
[40]
Zhou W., Guo Q., Lei J., Yu L., Hwang J.-N., ECFFNet: Effective and consistent feature fusion network for RGB-T salient object detection, IEEE Trans. Circuits Syst. Video Technol. 32 (3) (2022) 1224–1235.
[41]
H. Bi, R. Wu, Z. Liu, J. Zhang, C. Zhang, T.-Z. Xiang, X. Wang, PSNet: Parallel symmetric network for RGB-T salient object detection, Neurocomputing 511, 410–425.
[42]
Guo Q., Zhou W., Lei J., Yu L., TSFNet: Two-stage fusion network for RGB-T salient object detection, IEEE Signal Process. Lett. 28 (2021) 1655–1659.
[43]
Wu J., Zhou W., Qian X., Lei J., Yu L., Luo T., MENet: Lightweight multimodality enhancement network for detecting salient objects in RGB-thermal images, Neurocomputing 527 (2023) 119–129.
[44]
H. Wang, Y. Zhu, H. Adam, A. Yuille, L.-C. Chen, MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers, in: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2021, pp. 5459–5470.
[45]
Zeng K., Ma Q., Wu J., Xiang S., Shen T., Zhang L., Nlfftnet: A non-local feature fusion transformer network for multi-scale object detection, Neurocomputing 493 (2022) 15–27.
[46]
Wang T., Zhang X., Gated Region-Refine pose transformer for human pose estimation, Neurocomputing 530 (2023) 37–47.
[47]
He Z., Lin M., Xu Z., Yao Z., Chen H., Alhudhaif A., Alenezi F., Deconv-transformer (DecT): A histopathological image classification model for breast cancer based on color deconvolution and transformer architecture, Inform. Sci. 608 (2022) 1093–1112.
[48]
Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al., An image is worth 16x16 words: Transformers for image recognition at scale, 2020, arXiv preprint arXiv:2010.11929.
[49]
W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, L. Shao, Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 548–558.
[50]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 9992–10002.
[51]
Wang Q., Liu Y., Xiong Z., Yuan Y., Hybrid feature aligned network for salient object detection in optical remote sensing imagery, IEEE Trans. Geosci. Remote Sens. 60 (2022) 1–15.
[52]
P. Song, J. Zhang, P. Koniusz, N. Barnes, Multi-Modal Transformer for RGB-D Salient Object Detection, in: 2022 IEEE International Conference on Image Processing, ICIP, 2022, pp. 2466–2470.
[53]
Liu Z., Tan Y., He Q., Xiao Y., SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection, IEEE Trans. Circuits Syst. Video Technol. 32 (7) (2022) 4486–4497.
[54]
J. Zhu, X. Zhang, F. Dong, S. Yan, X. Meng, Y. Li, P. Tan, Transformer-based Adaptive Interactive Promotion Network for RGB-T Salient Object Detection, in: 2022 34th Chinese Control and Decision Conference, CCDC, 2022, pp. 1989–1994.
[55]
X. Wang, R. Girshick, A. Gupta, K. He, Non-local Neural Networks, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7794–7803.
[56]
J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, H. Lu, Dual Attention Network for Scene Segmentation, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 3141–3149.
[57]
Cong R., Zhang K., Zhang C., Zheng F., Zhao Y., Huang Q., Kwong S., Does thermal really always matter for RGB-T salient object detection?, IEEE Trans. Multimed. (2022) 1–12.
[58]
G. Máttyus, W. Luo, R. Urtasun, DeepRoadMapper: Extracting Road Topology from Aerial Images, in: 2017 IEEE International Conference on Computer Vision, ICCV, 2017, pp. 3458–3466.
[59]
Tu Z., Ma Y., Li Z., Li C., Xu J., Liu Y., RGBT salient object detection: A large-scale dataset and benchmark, IEEE Trans. Multimed. 25 (2022) 4163–4176.
[60]
Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.
[61]
F. Perazzi, P. Krähenbühl, Y. Pritch, A. Hornung, Saliency filters: Contrast based filtering for salient region detection, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 733–740.
[62]
R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1597–1604.
[63]
R. Margolin, L. Zelnik-Manor, A. Tal, How to Evaluate Foreground Maps, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 248–255.
[64]
Fan D.-P., Gong C., Cao Y., Ren B., Cheng M.-M., Borji A., Enhanced-alignment measure for binary foreground map evaluation, 2018, arXiv preprint arXiv:1805.10421.
[65]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, A. Borji, Structure-Measure: A New Way to Evaluate Foreground Maps, in: 2017 IEEE International Conference on Computer Vision, ICCV, 2017, pp. 4558–4567.
[66]
M.-M. Cheng, G.-X. Zhang, N.J. Mitra, X. Huang, S.-M. Hu, Global contrast based salient region detection, in: CVPR 2011, 2011, pp. 409–416.
[67]
Borji A., Cheng M.-M., Jiang H., Li J., Salient object detection: A benchmark, IEEE Trans. Image Process. 24 (12) (2015) 5706–5722.
[68]
J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, J. Jiang, A Simple Pooling-Based Design for Real-Time Salient Object Detection, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 3912–3921.
[69]
Z. Wu, L. Su, Q. Huang, Cascaded Partial Decoder for Fast and Accurate Salient Object Detection, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2019, pp. 3902–3911.
[70]
K. Fu, D.-P. Fan, G.-P. Ji, Q. Zhao, JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 3049–3059.
[71]
N. Liu, N. Zhang, J. Han, Learning Selective Self-Mutual Attention for RGB-D Saliency Detection, in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, 2020, pp. 13753–13762.
[72]
Chen H., Li Y., Three-stream attention-aware network for RGB-D salient object detection, IEEE Trans. Image Process. 28 (6) (2019) 2825–2835.
[73]
Wang J., Song K., Bao Y., Huang L., Yan Y., CGFNet: Cross-guided fusion network for RGB-T salient object detection, IEEE Trans. Circuits Syst. Video Technol. 32 (5) (2022) 2949–2961.
[74]
Ma S., Song K., Dong H., Tian H., Yan Y., Modal complementary fusion network for RGB-T salient object detection, Appl. Intell. 53 (8) (2023) 9038–9055.
[75]
Song K., Huang L., Gong A., Yan Y., Multiple graph affinity interactive network and a variable illumination dataset for RGBT image salient object detection, IEEE Trans. Circuits Syst. Video Technol. 33 (7) (2022) 3104–3118.
[76]
Pang Y., Zhao X., Zhang L., Lu H., Caver: Cross-modal view-mixed transformer for bi-modal salient object detection, IEEE Trans. Image Process. 32 (2023) 892–904.
[77]
K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2016, pp. 770–778.
[78]
Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.
[79]
R. Ju, L. Ge, W. Geng, T. Ren, G. Wu, Depth saliency based on anisotropic center-surround difference, in: 2014 IEEE International Conference on Image Processing, ICIP, 2014, pp. 1115–1119.
[80]
H. Peng, B. Li, W. Xiong, W. Hu, R. Ji, RGBD salient object detection: A benchmark and algorithms, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13, 2014, pp. 92–109.
[81]
Y. Niu, Y. Geng, X. Li, F. Liu, Leveraging stereopsis for saliency analysis, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 454–461.
[82]
Fan D.-P., Lin Z., Zhang Z., Zhu M., Cheng M.-M., Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks, IEEE Trans. Neural Netw. Learn. Syst. 32 (5) (2021) 2075–2089.
[83]
Y. Cheng, H. Fu, X. Wei, J. Xiao, X. Cao, Depth enhanced saliency detection method, in: Proceedings of International Conference on Internet Multimedia Computing and Service, 2014, pp. 23–27.
[84]
J. Zhao, Y. Zhao, J. Li, X. Chen, Is depth really necessary for salient object detection?, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1745–1754.
[85]
W. Ji, J. Li, M. Zhang, Y. Piao, H. Lu, Accurate RGB-D salient object detection via collaborative learning, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, 2020, pp. 52–69.
[86]
Zhai Y., Fan D.-P., Yang J., Borji A., Shao L., Han J., Wang L., Bifurcated backbone strategy for RGB-D salient object detection, IEEE Trans. Image Process. 30 (2021) 8727–8742.
[87]
Q. Chen, Z. Liu, Y. Zhang, K. Fu, Q. Zhao, H. Du, RGB-D salient object detection via 3D convolutional neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2021, pp. 1063–1071.
[88]
M. Lee, C. Park, S. Cho, S. Lee, Spsn: Superpixel prototype sampling network for rgb-d salient object detection, in: European Conference on Computer Vision, 2022, pp. 630–647.
[89]
Wu Z., Allibert G., Meriaudeau F., Ma C., Demonceaux C., Hidanet: Rgb-d salient object detection via hierarchical depth awareness, IEEE Trans. Image Process. 32 (2023) 2160–2173.

Index Terms

  1. Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Neurocomputing
            Neurocomputing  Volume 600, Issue C
            Oct 2024
            706 pages

            Publisher

            Elsevier Science Publishers B. V.

            Netherlands

            Publication History

            Published: 18 October 2024

            Author Tags

            1. Salient object detection
            2. RGB-thermal images
            3. Transformer
            4. Feature fusion

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 24 Dec 2024

            Other Metrics

            Citations

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media