TriAxial Low-Rank Transformer for Efficient Medical Image Segmentation

Jiang Shang¹⁵ &
Xi Fang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

921 Accesses

Abstract

Transformer-CNN architectures have achieved state-of-the-art on 3D medical image segmentation due to their ability to capture both long-term dependencies and local information. However, directly using the existing transformers as encoders can be inefficient, particularly when dealing with high-resolution 3D medical images. This is due to the fact that self-attention computes pixel-to-pixel relationships, which is computationally expensive. Despite attempts to mitigate this through the use of local-window attention or axial-wise attention, these methods may result in the loss of interaction between certain local regions during the self-attention computation. Instead of using the sparsified attention, we aim to incorporate the relationships between all pixels while substantially reducing the computational demand. Inspired by the low-rank property of attention, we hypothesized that the pixel-to-pixel relationship can be approximated by the composition of the plane-to-plane relationship. We propose TriAxial Low-Rank Transformer Network (TALoRT-Net) for medical image segmentation. The core of this model lies in its attention module, which approximates pixel-to-pixel attention matrix using the low-rank representation of the product of plane-to-plane matrices and significantly reduces the computation complexity inherent in 3D self-attention. Moreover, we replaced the linear projection and vanilla Multi-Layer Perceptron (MLP) in Vision Transformer with a convolutional stem and depthwise convolution layer (DCL) to further reduce the number of model parameters. We evaluated the performance of the method on the public BTCV dataset, which significantly reduce the computational effort while maintaining uncompromised accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

High-Order Attention Networks for Medical Image Segmentation

Medical Transformer: Gated Axial-Attention for Medical Image Segmentation

Permutohedral Attention Module for Efficient Non-local Neural Networks

References

Ba, J.L., Kiros, J.R., et al.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Broxson, B.J.: The kronecker product (2006)
Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Devaraj, S.J.: Emerging paradigms in transform-based medical image compression for telemedicine environment. In: Telemedicine technologies, pp. 15–29. Elsevier (2019)
Google Scholar
Dong, X., Bao, J., et al.: CSwin transformer: a general vision transformer backbone with cross-shaped windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12124–12134 (2022)
Google Scholar
Dong, X., Lei, Y., et al.: Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network. Radiother. Oncol. 141, 192–199 (2019)
Article Google Scholar
Dosovitskiy, A., Beyer, L., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fang, X., Yan, P.: Multi-organ segmentation over partially labeled datasets with multi-scale feature abstraction. IEEE Trans. Med. Imaging 39(11), 3619–3629 (2020)
Article Google Scholar
Gibson, E., Giganti, F., et al.: Automatic multi-organ segmentation on abdominal ct with dense V-networks. IEEE Trans. Med. Imaging 37(8), 1822–1834 (2018)
Article Google Scholar
Gou, S., Tong, N., et al.: Self-channel-and-spatial-attention neural network for automated multi-organ segmentation on head and neck ct images. Phys. Med. Biol. 65(24), 245034 (2020)
Article Google Scholar
Hatamizadeh, A., Nath, V., et al.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: International MICCAI Brainlesion Workshop, pp. 272–284. Springer (2021). https://doi.org/10.1007/978-3-031-08999-2_22
Hatamizadeh, A., Tang, Y., et al.: UNETR: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv preprint arXiv:1606.08415 (2016)
Ho, J., Kalchbrenner, N., et al.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)
Hu, E.J., Shen, Y., et al.: Lora: low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)
Huang, L., Yuan, Y., et al.: Interlaced sparse self-attention for semantic segmentation. arXiv preprint arXiv:1907.12273 (2019)
Huang, Z., Wang, X., et al.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/Cvf International Conference on Computer Vision, pp. 603–612 (2019)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Google Scholar
Jaderberg, M., Vedaldi, A., et al.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
Landman, B., Xu, Z., et al.: Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge. In: Proceedings of MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, vol. 5, p. 12 (2015)
Google Scholar
Lee, H.H., Bao, S., et al.: 3d UX-Net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076 (2022)
Liu, Z., Lin, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Paszke, A., Gross, S., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Peng, Z., Fang, X., et al.: A method of rapid quantification of patient-specific organ doses for ct using deep-learning-based multi-organ segmentation and gpu-accelerated monte carlo dose computing. Med. Phys. 47(6), 2526–2536 (2020)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Touvron, H., Cord, M., et al.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Google Scholar
Tu, Z., Talebi, H., et al.: Maxvit: multi-axis vision transformer. In: Computer Vision-ECCV 2022: 17th European Conference Proceedings, Part XXIV, pp. 459–479. Springer (2022). https://doi.org/10.1007/978-3-031-20053-3_27
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., Chen, L.-C.: Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 108–126. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_7
Chapter Google Scholar
Wang, S., Li, B.Z., et al.: Linformer: self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020)
Wu, S., Wu, T., et al.: Pale transformer: a general vision transformer backbone with pale-shaped attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2731–2739 (2022)
Google Scholar
Zhu, Q., Wang, Y., et al.: Multi-view coupled self-attention network for pulmonary nodules classification. In: Proceedings of the Asian Conference on Computer Vision, pp. 995–1009 (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, City University of Hong Kong, Hong Kong, Kowloon Tong, China
Jiang Shang
Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, USA
Xi Fang

Authors

Jiang Shang
View author publications
You can also search for this author in PubMed Google Scholar
Xi Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xi Fang .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shang, J., Fang, X. (2024). TriAxial Low-Rank Transformer for Efficient Medical Image Segmentation. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-99-8432-9_8
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics