[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-training

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Abstract

The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a collection of multiple large-scale 3D medical imaging datasets. Extensive experiments indicate that our proposed pre-training strategy demonstrates robust transfer-ability in supervised downstream tasks and sheds light on HySparK’s promising prospects. The code is available at https://github.com/FengheTan9/HySparK.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 69.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhang C, Zheng H, Gu Y. Dive into the details of self-supervised learning for medical image analysis[J]. Medical Image Analysis, 2023, 89: 102879.

    Article  Google Scholar 

  2. Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]//ICML. PMLR, 2020: 1597-1607.

    Google Scholar 

  3. He K, Fan H, Wu Y, et al. Momentum contrast for unsupervised visual representation learning[C]//CVPR. 2020: 9729-9738.

    Google Scholar 

  4. Grill J B, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to self-supervised learning[J]. NeurIPS, 2020, 33: 21271-21284.

    Google Scholar 

  5. Caron M, Misra I, Mairal J, et al. Unsupervised learning of visual features by contrasting cluster assignments[J]. NeurIPS, 2020, 33: 9912-9924.

    Google Scholar 

  6. Chen X, He K. Exploring simple siamese representation learning[C]//CVPR. 2021: 15750-15758.

    Google Scholar 

  7. Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers[C]//ICCV. 2021: 9650-9660.

    Google Scholar 

  8. Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.

  9. Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. NeurIPS, 2020, 33: 1877-1901.

    Google Scholar 

  10. Pathak D, Krahenbuhl P, Donahue J, et al. Context encoders: Feature learning by inpainting[C]//CVPR. 2016: 2536-2544.

    Google Scholar 

  11. Bao H, Dong L, Piao S, et al. Beit: Bert pre-training of image transformers[J]. arXiv preprint arXiv:2106.08254, 2021.

  12. Zhou J, Wei C, Wang H, et al. ibot: Image bert pre-training with online tokenizer[J]. arXiv preprint arXiv:2111.07832, 2021.

  13. He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]//CVPR. 2022: 16000-16009.

    Google Scholar 

  14. Xie Z, Zhang Z, Cao Y, et al. Simmim: A simple framework for masked image modeling[C]//CVPR. 2022: 9653-9663.

    Google Scholar 

  15. Assran M, Duval Q, Misra I, et al. Self-supervised learning from images with a joint-embedding predictive architecture[C]//CVPR. 2023: 15619-15629.

    Google Scholar 

  16. Chen X, Ding M, Wang X, et al. Context autoencoder for self-supervised representation learning[J]. IJCV, 2024, 132(1): 208-223.

    Article  Google Scholar 

  17. Tian K, Jiang Y, Diao Q, et al. Designing bert for convolutional networks: Sparse and hierarchical masked modeling[J]. arXiv preprint arXiv:2301.03580, 2023.

  18. Zhou L, Liu H, Bae J, et al. Self pre-training with masked autoencoders for medical image classification and segmentation[C]//ISBI. IEEE, 2023: 1-6.

    Google Scholar 

  19. Goncharov M, Soboleva V, Kurmukov A, et al. vox2vec: A framework for self-supervised contrastive learning of voxel-level representations in medical images[C]//MICCAI, 2023: 605-614.

    Google Scholar 

  20. Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306, 2021.

  21. Wang W, Chen C, Ding M, et al. Transbts: Multimodal brain tumor segmentation using transformer[C]//MICCAI, 2021: 109-119.

    Google Scholar 

  22. Wang, Haonan, et al. Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. AAAI. Vol. 36. No. 3. 2022.

    Google Scholar 

  23. Tang F, Nian B, Ding J, et al. MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation[J]. arXiv preprint arXiv:2312.01740, 2023.

  24. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//MICCAI, 2015: 234-241.

    Google Scholar 

  25. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

  26. Roy S, Koehler G, Ulrich C, et al. Mednext: transformer-driven scaling of convnets for medical image segmentation[C]//MICCAI, 2023: 405-415.

    Google Scholar 

  27. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//CVPR. 2016: 770-778.

    Google Scholar 

  28. Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s[C]//CVPR. 2022: 11976-11986.

    Google Scholar 

  29. Hatamizadeh A, Tang Y, Nath V, et al. Unetr: Transformers for 3d medical image segmentation[C]//WACV. 2022: 574-584.

    Google Scholar 

  30. Tang Y, Yang D, Li W, et al. Self-supervised pre-training of swin transformers for 3d medical image analysis[C]//CVPR. 2022: 20730-20740.

    Google Scholar 

  31. Landman B, Xu Z, Igelsias J, et al. Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge[C]//Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge. 2015, 5: 12.

    Google Scholar 

  32. Kavur A E, Gezer N S, Baríş M, et al. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation[J]. Medical Image Analysis, 2021, 69: 101950.

    Article  Google Scholar 

  33. Luo X, Liao W, Xiao J, et al. WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image[J]. Medical Image Analysis, 2022, 82: 102642-102642.

    Article  Google Scholar 

  34. Ma J, Zhang Y, Gu S, et al. Unleashing the strengths of unlabeled data in pan-cancer abdominal organ quantification: the flare22 challenge[J]. arXiv preprint arXiv:2308.05862, 2023.

  35. Ma J, Zhang Y, Gu S, et al. Abdomenct-1k: Is abdominal organ segmentation a solved problem?[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(10): 6695-6714.

    Article  Google Scholar 

  36. Wasserthal J, Breit H C, Meyer M T, et al. Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images[J]. Radiology: Artificial Intelligence, 2023, 5(5).

    Google Scholar 

  37. Ji Y, Bai H, Ge C, et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation[J]. NeurIPS, 2022, 35: 36722-36732.

    Google Scholar 

  38. Antonelli M, Reinke A, Bakas S, et al. The medical segmentation decathlon[J]. Nature communications, 2022, 13(1): 4128.

    Article  Google Scholar 

  39. Graham B, Van der Maaten L. Submanifold sparse convolutional networks[J]. arXiv preprint arXiv:1706.01307, 2017.

Download references

Acknowledgments

Supported by Natural Science Foundation of China under Grant 62271465, Suzhou Basic Research Program under Grant SYG202338, and Open Fund Project of Guangdong Academy of Medical Sciences, China (No. YKY-KF202206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Kevin Zhou .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8474 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, F. et al. (2024). HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-training. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15011. Springer, Cham. https://doi.org/10.1007/978-3-031-72120-5_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72120-5_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72119-9

  • Online ISBN: 978-3-031-72120-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics