[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

TRCA-Net: stronger U structured network for human image segmentation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Human image segmentation has been a practical and active research topic due to its wide range of potential application. There are some previous studies on manual, semi-automatic and automatic segmentation methods to investigate the semantic segmentation of human parts fully for real-world human analysis scenarios, but further research is still needed. This paper presents a novel semantic segmentation network, named TRCA-Net, for human image segmentation tasks. Having the TransUNet as the backbone, TRCA-Net incorporates Res2Net and Coordinate Attention to improve the performance. Res2Net blocks and Transformer can obtain better feature maps by encoding the input images. The Coordinate Attention in the decoder aggregates and upsamples the encoded feature maps, and connects to the high-resolution CNN feature maps for gaining accurate segmentation. The TRCA-Net can enhance finer details by recovering local spatial information. We compare the TRCA-Net with state-of-the-art (SOAT) semantic segmentation networks: the original U-Net, DeepLabv3+, and TransUNet. The experiment results have demonstrated that our proposed TRCA-Net outperforms these networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability statements

All data generated or analyzed during this study are included in this published article [3] and its supplementary information files.

References

  1. Wang L, Ji X, Deng Q and Jia M (2015). Deformable part model based multiple pedestrian detection for video surveillance in crowded scenes. In VISAPP, pp.599-604. https://doi.org/10.5220/0004739105990604.

  2. Gan C, Lin M, Yang Y, De Melo G and Hauptmann AG (2016). Concepts not alone: exploring pairwise relationships for zero-shot video activity recognition. In: AAAI, pp.3487-3493

  3. Gong K, Liang X, Li Y, Chen Y, Yang M and Lin L (2018). Instance-level human parsing via part grouping network. In ECCV, pp.770-785. https://doi.org/10.1007/978-3-030-01225-0_47.

  4. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, et al (2021). TransUNet: transformers make strong encoders for medical image segmentation. In CVPR. https://arxiv.org/abs/2102.04306

  5. Hao L, Lie J, Guo G (2019) A multi-target corner pooling-based neural network for vehicle detection. Neural Comput Appl 32(18):14497–14506. https://doi.org/10.1007/s00521-019-04486-1

    Article  Google Scholar 

  6. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651. https://doi.org/10.1109/CVPR.2015.7298965

    Article  Google Scholar 

  7. He K, Zhang X, Ren S, and Sun J (2016). Deep residual learning for image recognition. In CVPR, pp.770-778. https://doi.org/10.1109/CVPR.2016.90.

  8. Gong K, Liang X, Zhang D, Shen X and Lin L (2017). Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR, pp.932-940. https://arxiv.org/abs/1703.05446

  9. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. Comput Sci. https://doi.org/10.1007/978-3-662-54345-0_3

    Article  Google Scholar 

  10. Gao S, Cheng M, Zhao K, Zhang X, Yang M et al (2019) Res2Net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 43(2):652–662. https://doi.org/10.1109/TPAMI.2019.2938758

    Article  Google Scholar 

  11. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al (2017). Attention is all you need. In: NIPS, pp 5998-6008

  12. Devlin J, Chang MW, Lee K and Toutanova K (2018). Bert: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805

  13. Hou Q, Zhou D, Feng J (2021). Coordinate attention for efficient mobile network design. In: ECCV, pp.13713-13722. https://arxiv.org/abs/2103.02907

  14. Fan DP, Zhou T, Ji GP, Zhou Y, Chen G, Fu HZ, Shen JB, Shao L (2020) Inf-Net: Auto-matic COVID-19 lung infection segmentation from CT S-cans. IEEE T Med Imaging 39(8):2626–2637. https://doi.org/10.1109/TMI.2020.2996645

    Article  Google Scholar 

  15. Lan R, Sun L, Liu Z, Lu H, Luo X (2020) MADNet: a fast and lightweight network for single-image super resolution. IEEE T Cybern 51(3):1443–1453. https://doi.org/10.1109/TCYB.2020.2970104

    Article  Google Scholar 

  16. Li Y, Ji B, Shi X, Zhang J, Kang B and Wang L (2020). Tea: Temporal excitationand aggregation for action recognition. In: CVPR, pp.909-918. https://doi.org/10.1109/CVPR42600.2020.00099.

  17. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N, et al (2020). An image is worth 16x16 words: transformers for image recognition at scale. In: CVPR. https://arxiv.org/abs/2010.11929

  18. Beltagy I, Peters M E, and Cohan A (2020). Longformer: the long-document transformer. https://arxiv.org/abs/2004.05150v2

  19. Parmar N, Vaswani A, Uszkoreit J, et al (2018). Image transformer. In:ICML, pp.4055-4064. https://arxiv.org/abs/1802.05751

  20. Bahdanau D, Cho K and Bengio Y (2014). Neural machine translation by jointly learning to align and translate. In: RCLR. https://arxiv.org/abs/1409.0473

  21. Park J, Woo S, Lee JY, Kweon IS (2018). Bam: bottleneck attention module. In: BMVC. https://arxiv.org/abs/1807.06514

  22. Hu J, Shen L and Sun G (2018). Squeeze-and-Excitation networks. In: ECCV, pp.7132-7141. https://doi.org/10.1109/TPAMI.2019.2913372

  23. Woo S, Park J, Lee JY, et al (2018). CBAM: convolutional block attention module. In: ECCV, pp.3-19. https://doi.org/10.1007/978-3-030-01234-2_1

  24. F. Yu, D. Wang, E. Shelhamer, and T. Darrell (2018). Deep layer aggregation. In:CVPR, pp.2403-2412. https://doi.org/10.48550/arXiv.1707.06484

  25. Jean-Baptiste Cordonnier, Andreas Loukas, Martin Jaggi (2020). On the relationship between self-attention and convolutional layers. In:ICLR. https://doi.org/10.48550/arXiv.1911.03584

  26. Tolstikhin I O, Houlsby N, Kolesnikov A, et al.(2021) Mlp-mixer: an all-mlp architecture for vision[J]. Advances. IN:NIPS,pp.24261-24272. https://doi.org/10.48550/arXiv.2105.01601

  27. Paszke A, Gross S, Massa F, et al (2019). Pytorch: An imperative style, high-performance deep learning library[J]. In: NIPS,pp.3-12. https://doi.org/10.48550/arXiv.1912.01703

  28. Liang CC, Yukun Z, George P, Florian S et al (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp.8-14. https://doi.org/10.1007/978-3-030-01234-2_49.

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 52171292, 51939001), Dalian Outstanding Young Talents Project (Grant No. 2022RJ05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hao, LY., Yang, Z., Liu, YP. et al. TRCA-Net: stronger U structured network for human image segmentation. Neural Comput & Applic 35, 9627–9635 (2023). https://doi.org/10.1007/s00521-023-08199-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08199-4

Keywords

Navigation