[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Revisiting Self-attention in Medical Transformers via Dependency Sparsification

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15011))

  • 1107 Accesses

Abstract

Vision transformer (ViT), powered by token-to-token self-ch1attention, has demonstrated superior performance across various vision tasks. The large and even global receptive field obtained via dense self-attention, allows it to build stronger representations than CNN. However, compared to natural images, both the amount and the signal-to-noise ratio of medical images are small, often resulting in poor convergence of vanilla self-attention and further introducing non-negligible noise from extensive unrelated tokens. Besides, token-to-token self-attention requires heavy memory and computation consumption, hindering its deployment onto various computing platforms. In this paper, we propose a dynamic self-attention sparsification method for medical transformers by merging similar feature tokens for dependency distillation under the guidance of feature prototypes. Specifically, we first generate feature prototypes with genetic relationships by simulating the process of cell division, where the number of prototypes is much smaller than that of feature tokens. Then, in each self-attention layer, key and value tokens are grouped based on their distance from feature prototypes. Tokens in the same group, together with the corresponding feature prototype, would be merged into a new prototype according to both feature importance and grouping confidence. Finally, query tokens build pair-wise dependency with such newly-updated prototypes for fewer but global and more efficient interactions. Extensive experiments on three publicly available datasets demonstrate the effectiveness of our solution, working as a plug-and-play module for joint complexity reduction and performance improvement of various medical transformers. Code is available at https://github.com/xianlin7/DMA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 69.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.synapse.org/#!Synapse:syn3193805/wiki/217789/.

  2. 2.

    https://instance.grand-challenge.org/.

  3. 3.

    https://www.creatis.insa-lyon.fr/Challenge/acdc/.

References

  1. Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2022)

    Article  Google Scholar 

  2. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  3. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  4. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6881–6890 (2021)

    Google Scholar 

  5. Li, J., et al.: Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives. Med. Image Anal. 85, 102672 (2023)

    Article  Google Scholar 

  6. Shamshad, F., et al.: Transformers in medical imaging: A survey. Med. Image Anal. 88, 102802 (2023)

    Article  Google Scholar 

  7. Wang. P., et al.: Going deeper with image transformers. In: European Conference on Computer Vision, pp. 285–302 (2022)

    Google Scholar 

  8. Xia, Z., Pan, X., Song, S., Li, L. E., Huang, G.: Vision transformer with deformable attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4794–4803 (2022)

    Google Scholar 

  9. Cao, H., et al. Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision, pp. 205–218 (2022)

    Google Scholar 

  10. Huang, X., Deng, Z., Li, D., Yuan, X., Fu, Y.: MISSFormer: An effective transformer for 2d medical image segmentation. IEEE Trans. Med. Imag. 42(5), 1484–1494 (2022)

    Article  Google Scholar 

  11. Ou, Y., et al.: Patcher: Patch transformers with mixture of experts for precise medical image segmentation. In: Wang, Li., Dou, Q., Fletcher, P.T., Speidel S., Li, S. (eds.) MICCAI 2022, LNCS, vol. 13431, pp. 475–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_46

  12. Landman, B., Xu, Z., Igelsias, J., Styner, M., Langerak, T., Klein, A.: Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge. In: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge, pp. 12 (2015)

    Google Scholar 

  13. Ren, S., Zhou, D., He, S., Feng, J., Wang, X.: Shunted self-attention via multi-scale token aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10853–10862 (2022)

    Google Scholar 

  14. Wang, W., et al.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 568–578 (2021)

    Google Scholar 

  15. Chu, X., et al.: Twins: Revisiting the design of spatial attention in vision transformers. Advances in Neural Information Processing Systems, pp. 9355–9366 (2021)

    Google Scholar 

  16. Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv preprint arXiv:1912.12180 (2019)

  17. Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)

    Google Scholar 

  18. Zhu, L., Wang, X., Ke, Z., Zhang, W., Lau, R. W.: BiFormer: Vision Transformer with Bi-Level Routing Attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023)

    Google Scholar 

  19. Huang, H., Zhou, X., Cao, J., He, R., Tan, T.: Vision Transformer with Super Token Sampling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10323–10333 (2023)

    Google Scholar 

  20. Grainger, R., Paniagua, T., Song, X., Cuntoor, N., Lee, M. W., Wu, T.: PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22690–22699 (2023)

    Google Scholar 

  21. Zhang, Y., Liu, H., Hu, Q.: Transfuse: Fusing transformers and cnns for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021, LNCS, vol. 12901, pp. 14–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2

  22. Wu, H., Chen, S., Chen, G., Wang, W., Lei, B., Wen, Z.: FAT-Net: Feature adaptive transformers for automated skin lesion segmentation: Medical Image Anal. 76, 102327 (2022)

    Google Scholar 

  23. Valanarasu, J. M., Oza, P., Hacihaliloglu, I., Patel, V. M.: Medical transformer: Gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021, LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4

  24. Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imag. 37(11), 2514–2525 (2018) multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data. 5(1), 1–9 (2018)

    Google Scholar 

  25. Li, X., et al.: The state-of-the-art 3D anisotropic intracranial hemorrhage segmentation on non-contrast head CT: The INSTANCE challenge. arXiv preprint arXiv:2301.03281 (2023)

  26. Zhou, H. Y., et al.: nnFormer: Volumetric medical image segmentation via a 3D transformer. IEEE Trans. Image Process. 42, 4036–4045 (2023)

    Article  Google Scholar 

  27. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W. M., Frangi, A.F. (eds.) MICCAI 2015, LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  28. Gu, R., et al.: CA-Net: Comprehensive attention convolutional neural networks for explainable medical image segmentation. IEEE Trans. Med. Imag. 40(2), 699–711 (2020)

    Article  Google Scholar 

  29. Chen, G., Li, L., Dai, Y., Zhang, J., Yap, M. H.: AAU-net: an adaptive attention U-net for breast lesions segmentation in ultrasound images. IEEE Trans. Med. Imag. 42(5), 1289–1300 (2023)

    Article  Google Scholar 

  30. Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., Maier-Hein, K. H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods. 18(2), 2023–2011 (2021)

    Article  Google Scholar 

  31. Chen, J., et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

  32. He, A., Wang, K., Li, T., Du, C., Xia, S., Fu, H.: H2former: An efficient hierarchical hybrid transformer for medical image segmentation. IEEE Trans. Med. Imag. 42(9), 2763–2775 (2023)

    Article  Google Scholar 

  33. Roy, S., et al.: Mednext: transformer-driven scaling of convnets for medical image segmentation.. In: Greenspan, H., et al. (eds.) MICCAI 2023, LNCS, vol. 14223, pp. 405–415. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43901-8_39

  34. Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., Li, J.: TransBTS: Multimodal brain tumor segmentation using transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021, LNCS, vol. 12901, pp. 109–119. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_11

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 62271220 and Grant 62202179, in part by the Natural Science Foundation of Hubei Province of China under Grant 2022CFB585, and in part by the Fundamental Research Funds for the Central Universities, HUST: 2024JYCXJJ032. The computation is supported by the HPC Platform of HUST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zengqiang Yan .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2082 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, X., Wang, Z., Yan, Z., Yu, L. (2024). Revisiting Self-attention in Medical Transformers via Dependency Sparsification. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15011. Springer, Cham. https://doi.org/10.1007/978-3-031-72120-5_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72120-5_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72119-9

  • Online ISBN: 978-3-031-72120-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics