More Web Proxy on the site http://driver.im/

research-article

AirSOD: A Lightweight Network for RGB-D Salient Object Detection

Authors:

Xiaoheng TanAuthors Info & Claims

IEEE Transactions on Circuits and Systems for Video Technology, Volume 34, Issue 3

Pages 1656 - 1669

https://doi.org/10.1109/TCSVT.2023.3295588

Published: 14 July 2023 Publication History

Abstract

Salient object detection (SOD) aims to identify the most prominent regions in images. However, the large model sizes, high computational costs, and slow inference speeds of existing RGB-D SOD models have hindered their deployment on real-world embedded devices. To address this issue, we propose a novel method named AirSOD, which is committed to lightweight RGB-D SOD. Specifically, we first design a hybrid feature extraction network, which includes the first three stages of MobileNetV2 and our Parallel Attention-Shift convolution (PAS) module. Using the novel PAS module enables capturing both long-range dependencies and local information to enhance the representation learning while significantly reducing the number of parameters and computational complexity. Secondly, we propose a Multi-level and Multi-modal feature Fusion (MMF) module to facilitate feature fusion, and a Multi-path enhancement for Feature Refinement (MFR) decoder for feature integration. The proposed method significantly reduces the model size by 63%, decreases the computational complexity by 43%, and improves the inference speed by 43% compared with the cutting-edge model (MobileSal). We test our AirSOD on six widely-used RGB-D SOD datasets. Extensive experimental results demonstrate that our method obtains satisfactory performance. The source codes will be made available.

References

[1]

D.-P. Fan, W. Wang, M.-M. Cheng, and J. Shen, “Shifting more attention to video salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8546–8556.

[2]

P. Zhang, W. Liu, D. Wang, Y. Lei, H. Wang, and H. Lu, “Non-rigid object tracking via deep multi-scale spatial–temporal discriminative saliency maps,” Pattern Recognit., vol. 100, Apr. 2020, Art. no.

[3]

Y. Zhang, X. Qian, X. Tan, J. Han, and Y. Tang, “Sketch-based image retrieval by salient contour reinforcement,” IEEE Trans. Multimedia, vol. 18, no. 8, pp. 1604–1615, Aug. 2016.

Digital Library

[4]

H. Liu, X. Tan, and X. Zhou, “Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification,” IEEE Trans. Multimedia, vol. 23, pp. 4414–4425, 2021.

[5]

N. Yang, Q. Zhong, K. Li, R. Cong, Y. Zhao, and S. Kwong, “A reference-free underwater image quality assessment metric in frequency domain,” Signal Process., Image Commun., vol. 94, May 2021, Art. no.

[6]

D.-P. Fan, Y. Zhai, A. Borji, J. Yang, and L. Shao, “BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 275–292.

[7]

W. Zhang, Y. Jiang, K. Fu, and Q. Zhao, “BTS-Net: Bi-directional transfer-and-selection network for RGB-D salient object detection,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2021, pp. 1–6.

[8]

R. Conget al., “CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 31, pp. 6800–6815, 2022.

Digital Library

[9]

M.-M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S.-M. Hu, “Global contrast based salient region detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 569–582, Mar. 2015.

Digital Library

[10]

H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2083–2090.

[11]

Q. Ren, S. Lu, J. Zhang, and R. Hu, “Salient object detection by fusing local and global contexts,” IEEE Trans. Multimedia, vol. 23, pp. 1442–1453, 2021.

Digital Library

[12]

Z. Liu, Y. Wang, Z. Tu, Y. Xiao, and B. Tang, “TriTransNet: RGB-D salient object detection with a triplet transformer embedding network,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 4481–4490.

[13]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4510–4520.

[14]

X. Jin, K. Yi, and J. Xu, “MoADNet: Mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 11, pp. 7632–7645, Nov. 2022.

Digital Library

[15]

G. Li, Z. Liu, X. Zhang, and W. Lin, “Lightweight salient object detection in optical remote-sensing images via semantic matching and edge alignment,” IEEE Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no.

[16]

X. Zhao, L. Zhang, Y. Pang, H. Lu, and L. Zhang, “A single stream network for robust and real-time RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 646–662.

[17]

Y. Tang, K. Han, J. Guo, C. Xu, C. Xu, and Y. Wang, “GhostNetV2: Enhance cheap operation with long-range attention,” 2022, arXiv:2211.12905.

[18]

Y.-H. Wu, Y. Liu, J. Xu, J.-W. Bian, Y.-C. Gu, and M.-M. Cheng, “MobileSal: Extremely efficient RGB-D salient object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 10261–10269, Dec. 2022.

[19]

W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate RGB-D salient object detection via collaborative learning,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 52–69.

[20]

W. Zhang, G.-P. Ji, Z. Wang, K. Fu, and Q. Zhao, “Depth quality-inspired feature manipulation for efficient RGB-D salient object detection,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 731–740.

[21]

T. Yu, X. Li, Y. Cai, M. Sun, and P. Li, “S2-MLP: Spatial-shift MLP architecture for vision,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2022, pp. 3615–3624.

[22]

H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “RGBD salient object detection: A benchmark and algorithms,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2014, pp. 92–109.

[23]

Y. Piao, W. Ji, J. Li, M. Zhang, and H. Lu, “Depth-induced multi-scale recurrent attention network for saliency detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7253–7262.

[24]

Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1155–1162.

[25]

F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 733–740.

[26]

H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” Int. J. Comput. Vis., vol. 123, no. 2, pp. 252–268, 2017.

[27]

R. Ju, Y. Liu, T. Ren, L. Ge, and G. Wu, “Depth-aware salient object detection using anisotropic center-surround difference,” Signal Process., Image Commun., vol. 38, pp. 115–126, Oct. 2015.

Digital Library

[28]

H. Song, Z. Liu, H. Du, G. Sun, O. Le Meur, and T. Ren, “Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4204–4216, Sep. 2017.

Digital Library

[29]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.

[30]

G. Li, Z. Liu, and H. Ling, “ICNet: Information conversion network for RGB-D based salient object detection,” IEEE Trans. Image Process., vol. 29, pp. 4873–4884, 2020.

[31]

G. Li, Y. Wang, Z. Liu, X. Zhang, and D. Zeng, “RGB-T semantic segmentation with location, activation, and sharpening,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 3, pp. 1223–1235, Mar. 2023.

Digital Library

[32]

T. Wanget al., “Detect globally, refine locally: A novel approach to saliency detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3127–3135.

[33]

L. Zhang, J. Dai, H. Lu, Y. He, and G. Wang, “A bi-directional message passing model for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1741–1750.

[34]

Y. Yang, Q. Qin, Y. Luo, Y. Liu, Q. Zhang, and J. Han, “Bi-directional progressive guidance network for RGB-D salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5346–5360, Aug. 2022.

Digital Library

[35]

G. Li, Z. Liu, D. Zeng, W. Lin, and H. Ling, “Adjacent context coordination network for salient object detection in optical remote sensing images,” IEEE Trans. Cybern., vol. 53, no. 1, pp. 526–538, Jan. 2023.

[36]

W. Jiet al., “Calibrated RGB-D salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 9466–9476.

[37]

T. Zhou, H. Fu, G. Chen, Y. Zhou, D.-P. Fan, and L. Shao, “Specificity-preserving RGB-D saliency detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 4661–4671.

[38]

R. Cong, J. Lei, H. Fu, Q. Huang, X. Cao, and N. Ling, “HSCS: Hierarchical sparsity based co-saliency detection for RGBD images,” IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1660–1671, Jul. 2019.

[39]

Y. Pang, L. Zhang, X. Zhao, and H. Lu, “Hierarchical dynamic filtering network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 235–252.

[40]

Z. Liu, Y. Tan, Q. He, and Y. Xiao, “SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 7, pp. 4486–4497, Jul. 2022.

Digital Library

[41]

G. Li, Z. Liu, M. Chen, Z. Bai, W. Lin, and H. Ling, “Hierarchical alternate interaction network for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 3528–3542, 2021.

Digital Library

[42]

W.-D. Jin, J. Xu, Q. Han, Y. Zhang, and M.-M. Cheng, “CDNet: Complementary depth network for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 3376–3390, 2021.

Digital Library

[43]

W. Gao, G. Liao, S. Ma, G. Li, Y. Liang, and W. Lin, “Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2091–2106, Apr. 2022.

[44]

C. Zhanget al., “Cross-modality discrepant interaction network for RGB-D salient object detection,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 2094–2102.

[45]

X. Zhang, Y. Xu, T. Wang, and T. Liao, “Multi-prior driven network for RGB-D salient object detection,” IEEE Trans. Circuits Syst. Video Technol., early access, Apr. 18, 2023.

[46]

K. Fu, D.-P. Fan, G.-P. Ji, and Q. Zhao, “JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3049–3059.

[47]

Y. Piao, Z. Rong, M. Zhang, W. Ren, and H. Lu, “A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9057–9066.

[48]

M. Zhang, S. X. Fei, J. Liu, S. Xu, Y. Piao, and H. Lu, “Asymmetric two-stream architecture for accurate RGB-D saliency detection,” in Computer Vision—ECCV. Glasgow, U.K.: Springer, 2020, pp. 374–390.

[49]

Y. Wu, Y. Shi, H. Shen, Y. Tan, and Y. Wang, “Light-TBFNet: RGB-D salient detection based on a lightweight two-branch fusion strategy,” Multimedia Tools Appl., vol. 82, no. 17, pp. 26005–26035, 2023.

Digital Library

[50]

N. Huang, Q. Jiao, Q. Zhang, and J. Han, “Middle-level feature fusion for lightweight RGB-D salient object detection,” IEEE Trans. Image Process., vol. 31, pp. 6621–6634, 2022.

Digital Library

[51]

Z. Huanget al., “CCNet: Criss-cross attention for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 6896–6908, Jun. 2023.

Digital Library

[52]

I. Tolstikhinet al., “MLP-mixer: An all-MLP architecture for vision,” in Proc. 35th Conf. Neural Inf. Process. Syst., vol. 34, 2021, pp. 24261–24272.

[53]

J. Maria Jose Valanarasu and V. M. Patel, “UNeXt: MLP-based rapid medical image segmentation network,” 2022, arXiv:2203.04967.

[54]

C. Li, R. Cong, Y. Piao, Q. Xu, and C. C. Loy, “RGB-D salient object detection with cross-modality modulation and selection,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 225–241.

[55]

H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, “Axial-DeepLab: Stand-alone axial-attention for panoptic segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 108–126.

[56]

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, pp. 2011–2023, Aug. 2020.

Digital Library

[57]

H. Zhanget al., “ResNeSt: Split-attention networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 2736–2746.

[58]

L. Chenet al., “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2017, pp. 5659–5667.

[59]

J.-X. Zhao, Y. Cao, D.-P. Fan, M.-M. Cheng, X.-Y. Li, and L. Zhang, “Contrast prior and fluid pyramid integration for RGBD salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3922–3931.

[60]

D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, and M.-M. Cheng, “Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 2075–2089, May 2021.

[61]

M. Zhang, W. Ren, Y. Piao, Z. Rong, and H. Lu, “Select, supplement and focus for RGB-D saliency detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3469–3478.

[62]

J. Zhanget al., “Uncertainty inspired RGB-D saliency detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5761–5779, Sep. 2022.

[63]

A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–12.

[64]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.

[65]

Y. Cheng, H. Fu, X. Wei, J. Xiao, and X. Cao, “Depth enhanced saliency detection method,” in Proc. Int. Conf. Internet Multimedia Comput. Service, Jul. 2014, pp. 23–27.

[66]

Y. Niu, Y. Geng, X. Li, and F. Liu, “Leveraging stereopsis for saliency analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 454–461.

[67]

R. Ju, L. Ge, W. Geng, T. Ren, and G. Wu, “Depth saliency based on anisotropic center-surround difference,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2014, pp. 1115–1119.

[68]

G. Li and C. Zhu, “A three-pathway psychobiological framework of salient object detection using stereoscopic technology,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017, pp. 3008–3014.

[69]

R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 1597–1604.

[70]

D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in Proc. 27th Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 698–704.

[71]

D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 4548–4557.

[72]

D.-P. Fan, G.-P. Ji, X. Qin, and M.-M. Cheng, “Cognitive vision inspired object segmentation metric and loss function,” Scientia Sinica Informationis, vol. 51, no. 9, p. 1475, Sep. 2021.

Cited By

Wang YZhang LZhang PZhuge YWu JYu HLu H(2024)Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342465134:11_Part_2(11592-11604)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3424651
Meena PKumar HYadav S(2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3412949

Index Terms

AirSOD: A Lightweight Network for RGB-D Salient Object Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

CDNet: Complementary Depth Network for RGB-D Salient Object Detection
Current RGB-D salient object detection (SOD) methods utilize the depth stream as complementary information to the RGB stream. However, the depth maps are usually of low-quality in existing RGB-D SOD datasets. Most RGB-D SOD networks trained with these ...
Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection
Most existing RGB-D salient object detection (SOD) models adopt a two-stream structure to extract the information from the input RGB and depth images. Since they use two subnetworks for unimodal feature extraction and multiple multi-modal feature fusion ...
Depth cue enhancement and guidance network for RGB-D salient object detection
Abstract
Depth maps have been proven profitable to provide supplements for salient object detection in recent years. However, most RGB-D salient object detection approaches ignore that there are usually low-quality depth maps, which will inevitably result ...
Highlights
- We propose a depth cue enhancement and guidance network for RGB-D saliency detection.
- A DCE module is designed to improve the quality of the depth map.
- A UFE module is proposed to enhance the features’ discriminability and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Circuits and Systems for Video Technology

IEEE Transactions on Circuits and Systems for Video Technology Volume 34, Issue 3

March 2024

659 pages

Issue’s Table of Contents

1051-8215 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 14 July 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YZhang LZhang PZhuge YWu JYu HLu H(2024)Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342465134:11_Part_2(11592-11604)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3424651
Meena PKumar HYadav S(2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3412949

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents