[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

AirSOD: A Lightweight Network for RGB-D Salient Object Detection

Published: 14 July 2023 Publication History

Abstract

Salient object detection (SOD) aims to identify the most prominent regions in images. However, the large model sizes, high computational costs, and slow inference speeds of existing RGB-D SOD models have hindered their deployment on real-world embedded devices. To address this issue, we propose a novel method named AirSOD, which is committed to lightweight RGB-D SOD. Specifically, we first design a hybrid feature extraction network, which includes the first three stages of MobileNetV2 and our Parallel Attention-Shift convolution (PAS) module. Using the novel PAS module enables capturing both long-range dependencies and local information to enhance the representation learning while significantly reducing the number of parameters and computational complexity. Secondly, we propose a Multi-level and Multi-modal feature Fusion (MMF) module to facilitate feature fusion, and a Multi-path enhancement for Feature Refinement (MFR) decoder for feature integration. The proposed method significantly reduces the model size by 63%, decreases the computational complexity by 43%, and improves the inference speed by 43% compared with the cutting-edge model (MobileSal). We test our AirSOD on six widely-used RGB-D SOD datasets. Extensive experimental results demonstrate that our method obtains satisfactory performance. The source codes will be made available.

References

[1]
D.-P. Fan, W. Wang, M.-M. Cheng, and J. Shen, “Shifting more attention to video salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8546–8556.
[2]
P. Zhang, W. Liu, D. Wang, Y. Lei, H. Wang, and H. Lu, “Non-rigid object tracking via deep multi-scale spatial–temporal discriminative saliency maps,” Pattern Recognit., vol. 100, Apr. 2020, Art. no.
[3]
Y. Zhang, X. Qian, X. Tan, J. Han, and Y. Tang, “Sketch-based image retrieval by salient contour reinforcement,” IEEE Trans. Multimedia, vol. 18, no. 8, pp. 1604–1615, Aug. 2016.
[4]
H. Liu, X. Tan, and X. Zhou, “Parameter sharing exploration and hetero-center triplet loss for visible-thermal person re-identification,” IEEE Trans. Multimedia, vol. 23, pp. 4414–4425, 2021.
[5]
N. Yang, Q. Zhong, K. Li, R. Cong, Y. Zhao, and S. Kwong, “A reference-free underwater image quality assessment metric in frequency domain,” Signal Process., Image Commun., vol. 94, May 2021, Art. no.
[6]
D.-P. Fan, Y. Zhai, A. Borji, J. Yang, and L. Shao, “BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 275–292.
[7]
W. Zhang, Y. Jiang, K. Fu, and Q. Zhao, “BTS-Net: Bi-directional transfer-and-selection network for RGB-D salient object detection,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), Jul. 2021, pp. 1–6.
[8]
R. Conget al., “CIR-Net: Cross-modality interaction and refinement for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 31, pp. 6800–6815, 2022.
[9]
M.-M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S.-M. Hu, “Global contrast based salient region detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 569–582, Mar. 2015.
[10]
H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 2083–2090.
[11]
Q. Ren, S. Lu, J. Zhang, and R. Hu, “Salient object detection by fusing local and global contexts,” IEEE Trans. Multimedia, vol. 23, pp. 1442–1453, 2021.
[12]
Z. Liu, Y. Wang, Z. Tu, Y. Xiao, and B. Tang, “TriTransNet: RGB-D salient object detection with a triplet transformer embedding network,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 4481–4490.
[13]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 4510–4520.
[14]
X. Jin, K. Yi, and J. Xu, “MoADNet: Mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 11, pp. 7632–7645, Nov. 2022.
[15]
G. Li, Z. Liu, X. Zhang, and W. Lin, “Lightweight salient object detection in optical remote-sensing images via semantic matching and edge alignment,” IEEE Trans. Geosci. Remote Sens., vol. 61, 2023, Art. no.
[16]
X. Zhao, L. Zhang, Y. Pang, H. Lu, and L. Zhang, “A single stream network for robust and real-time RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 646–662.
[17]
Y. Tang, K. Han, J. Guo, C. Xu, C. Xu, and Y. Wang, “GhostNetV2: Enhance cheap operation with long-range attention,” 2022, arXiv:2211.12905.
[18]
Y.-H. Wu, Y. Liu, J. Xu, J.-W. Bian, Y.-C. Gu, and M.-M. Cheng, “MobileSal: Extremely efficient RGB-D salient object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 10261–10269, Dec. 2022.
[19]
W. Ji, J. Li, M. Zhang, Y. Piao, and H. Lu, “Accurate RGB-D salient object detection via collaborative learning,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 52–69.
[20]
W. Zhang, G.-P. Ji, Z. Wang, K. Fu, and Q. Zhao, “Depth quality-inspired feature manipulation for efficient RGB-D salient object detection,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 731–740.
[21]
T. Yu, X. Li, Y. Cai, M. Sun, and P. Li, “S2-MLP: Spatial-shift MLP architecture for vision,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2022, pp. 3615–3624.
[22]
H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “RGBD salient object detection: A benchmark and algorithms,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2014, pp. 92–109.
[23]
Y. Piao, W. Ji, J. Li, M. Zhang, and H. Lu, “Depth-induced multi-scale recurrent attention network for saliency detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 7253–7262.
[24]
Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp. 1155–1162.
[25]
F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 733–740.
[26]
H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” Int. J. Comput. Vis., vol. 123, no. 2, pp. 252–268, 2017.
[27]
R. Ju, Y. Liu, T. Ren, L. Ge, and G. Wu, “Depth-aware salient object detection using anisotropic center-surround difference,” Signal Process., Image Commun., vol. 38, pp. 115–126, Oct. 2015.
[28]
H. Song, Z. Liu, H. Du, G. Sun, O. Le Meur, and T. Ren, “Depth-aware salient object detection and segmentation via multiscale discriminative saliency fusion and bootstrap learning,” IEEE Trans. Image Process., vol. 26, no. 9, pp. 4204–4216, Sep. 2017.
[29]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 770–778.
[30]
G. Li, Z. Liu, and H. Ling, “ICNet: Information conversion network for RGB-D based salient object detection,” IEEE Trans. Image Process., vol. 29, pp. 4873–4884, 2020.
[31]
G. Li, Y. Wang, Z. Liu, X. Zhang, and D. Zeng, “RGB-T semantic segmentation with location, activation, and sharpening,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 3, pp. 1223–1235, Mar. 2023.
[32]
T. Wanget al., “Detect globally, refine locally: A novel approach to saliency detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3127–3135.
[33]
L. Zhang, J. Dai, H. Lu, Y. He, and G. Wang, “A bi-directional message passing model for salient object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1741–1750.
[34]
Y. Yang, Q. Qin, Y. Luo, Y. Liu, Q. Zhang, and J. Han, “Bi-directional progressive guidance network for RGB-D salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5346–5360, Aug. 2022.
[35]
G. Li, Z. Liu, D. Zeng, W. Lin, and H. Ling, “Adjacent context coordination network for salient object detection in optical remote sensing images,” IEEE Trans. Cybern., vol. 53, no. 1, pp. 526–538, Jan. 2023.
[36]
W. Jiet al., “Calibrated RGB-D salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 9466–9476.
[37]
T. Zhou, H. Fu, G. Chen, Y. Zhou, D.-P. Fan, and L. Shao, “Specificity-preserving RGB-D saliency detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 4661–4671.
[38]
R. Cong, J. Lei, H. Fu, Q. Huang, X. Cao, and N. Ling, “HSCS: Hierarchical sparsity based co-saliency detection for RGBD images,” IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1660–1671, Jul. 2019.
[39]
Y. Pang, L. Zhang, X. Zhao, and H. Lu, “Hierarchical dynamic filtering network for RGB-D salient object detection,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 235–252.
[40]
Z. Liu, Y. Tan, Q. He, and Y. Xiao, “SwinNet: Swin transformer drives edge-aware RGB-D and RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 7, pp. 4486–4497, Jul. 2022.
[41]
G. Li, Z. Liu, M. Chen, Z. Bai, W. Lin, and H. Ling, “Hierarchical alternate interaction network for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 3528–3542, 2021.
[42]
W.-D. Jin, J. Xu, Q. Han, Y. Zhang, and M.-M. Cheng, “CDNet: Complementary depth network for RGB-D salient object detection,” IEEE Trans. Image Process., vol. 30, pp. 3376–3390, 2021.
[43]
W. Gao, G. Liao, S. Ma, G. Li, Y. Liang, and W. Lin, “Unified information fusion network for multi-modal RGB-D and RGB-T salient object detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 4, pp. 2091–2106, Apr. 2022.
[44]
C. Zhanget al., “Cross-modality discrepant interaction network for RGB-D salient object detection,” in Proc. 29th ACM Int. Conf. Multimedia, Oct. 2021, pp. 2094–2102.
[45]
X. Zhang, Y. Xu, T. Wang, and T. Liao, “Multi-prior driven network for RGB-D salient object detection,” IEEE Trans. Circuits Syst. Video Technol., early access, Apr. 18, 2023.
[46]
K. Fu, D.-P. Fan, G.-P. Ji, and Q. Zhao, “JL-DCF: Joint learning and densely-cooperative fusion framework for RGB-D salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3049–3059.
[47]
Y. Piao, Z. Rong, M. Zhang, W. Ren, and H. Lu, “A2dele: Adaptive and attentive depth distiller for efficient RGB-D salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9057–9066.
[48]
M. Zhang, S. X. Fei, J. Liu, S. Xu, Y. Piao, and H. Lu, “Asymmetric two-stream architecture for accurate RGB-D saliency detection,” in Computer Vision—ECCV. Glasgow, U.K.: Springer, 2020, pp. 374–390.
[49]
Y. Wu, Y. Shi, H. Shen, Y. Tan, and Y. Wang, “Light-TBFNet: RGB-D salient detection based on a lightweight two-branch fusion strategy,” Multimedia Tools Appl., vol. 82, no. 17, pp. 26005–26035, 2023.
[50]
N. Huang, Q. Jiao, Q. Zhang, and J. Han, “Middle-level feature fusion for lightweight RGB-D salient object detection,” IEEE Trans. Image Process., vol. 31, pp. 6621–6634, 2022.
[51]
Z. Huanget al., “CCNet: Criss-cross attention for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 6896–6908, Jun. 2023.
[52]
I. Tolstikhinet al., “MLP-mixer: An all-MLP architecture for vision,” in Proc. 35th Conf. Neural Inf. Process. Syst., vol. 34, 2021, pp. 24261–24272.
[53]
J. Maria Jose Valanarasu and V. M. Patel, “UNeXt: MLP-based rapid medical image segmentation network,” 2022, arXiv:2203.04967.
[54]
C. Li, R. Cong, Y. Piao, Q. Xu, and C. C. Loy, “RGB-D salient object detection with cross-modality modulation and selection,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 225–241.
[55]
H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille, and L.-C. Chen, “Axial-DeepLab: Stand-alone axial-attention for panoptic segmentation,” in Proc. Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 108–126.
[56]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8, pp. 2011–2023, Aug. 2020.
[57]
H. Zhanget al., “ResNeSt: Split-attention networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2022, pp. 2736–2746.
[58]
L. Chenet al., “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2017, pp. 5659–5667.
[59]
J.-X. Zhao, Y. Cao, D.-P. Fan, M.-M. Cheng, X.-Y. Li, and L. Zhang, “Contrast prior and fluid pyramid integration for RGBD salient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3922–3931.
[60]
D.-P. Fan, Z. Lin, Z. Zhang, M. Zhu, and M.-M. Cheng, “Rethinking RGB-D salient object detection: Models, data sets, and large-scale benchmarks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 2075–2089, May 2021.
[61]
M. Zhang, W. Ren, Y. Piao, Z. Rong, and H. Lu, “Select, supplement and focus for RGB-D saliency detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3469–3478.
[62]
J. Zhanget al., “Uncertainty inspired RGB-D saliency detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5761–5779, Sep. 2022.
[63]
A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 1–12.
[64]
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.
[65]
Y. Cheng, H. Fu, X. Wei, J. Xiao, and X. Cao, “Depth enhanced saliency detection method,” in Proc. Int. Conf. Internet Multimedia Comput. Service, Jul. 2014, pp. 23–27.
[66]
Y. Niu, Y. Geng, X. Li, and F. Liu, “Leveraging stereopsis for saliency analysis,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 454–461.
[67]
R. Ju, L. Ge, W. Geng, T. Ren, and G. Wu, “Depth saliency based on anisotropic center-surround difference,” in Proc. IEEE Int. Conf. Image Process. (ICIP), Oct. 2014, pp. 1115–1119.
[68]
G. Li and C. Zhu, “A three-pathway psychobiological framework of salient object detection using stereoscopic technology,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Oct. 2017, pp. 3008–3014.
[69]
R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2009, pp. 1597–1604.
[70]
D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in Proc. 27th Int. Joint Conf. Artif. Intell., Jul. 2018, pp. 698–704.
[71]
D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 4548–4557.
[72]
D.-P. Fan, G.-P. Ji, X. Qin, and M.-M. Cheng, “Cognitive vision inspired object segmentation metric and loss function,” Scientia Sinica Informationis, vol. 51, no. 9, p. 1475, Sep. 2021.

Cited By

View all
  • (2024)Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342465134:11_Part_2(11592-11604)Online publication date: 8-Jul-2024
  • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024

Index Terms

  1. AirSOD: A Lightweight Network for RGB-D Salient Object Detection
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Circuits and Systems for Video Technology
        IEEE Transactions on Circuits and Systems for Video Technology  Volume 34, Issue 3
        March 2024
        659 pages

        Publisher

        IEEE Press

        Publication History

        Published: 14 July 2023

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 16 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Learning Local-Global Representation for Scribble-Based RGB-D Salient Object Detection via TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.342465134:11_Part_2(11592-11604)Online publication date: 8-Jul-2024
        • (2024)A Volumetric Saliency Guided Image Summarization for RGB-D Indoor Scene ClassificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.341294934:11_Part_1(10917-10929)Online publication date: 11-Jun-2024

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media