HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-training

Fenghe Tang^14,15,
Ronghao Xu^14,15,
Qingsong Yao¹⁶,
Xueming Fu^14,15,
Quan Quan¹⁶,
Heqin Zhu^14,15,
Zaiyi Liu^17,18 &
…
S. Kevin Zhou^14,15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15011))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1325 Accesses

Abstract

The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a collection of multiple large-scale 3D medical imaging datasets. Extensive experiments indicate that our proposed pre-training strategy demonstrates robust transfer-ability in supervised downstream tasks and sheds light on HySparK’s promising prospects. The code is available at https://github.com/FengheTan9/HySparK.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 69.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abstract: Widening the Focus

Parts2Whole: Self-supervised Contrastive Learning via Reconstruction

Data-to-Model Distillation: Data-Efficient Learning Framework

References

Zhang C, Zheng H, Gu Y. Dive into the details of self-supervised learning for medical image analysis[J]. Medical Image Analysis, 2023, 89: 102879.
Article Google Scholar
Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations[C]//ICML. PMLR, 2020: 1597-1607.
Google Scholar
He K, Fan H, Wu Y, et al. Momentum contrast for unsupervised visual representation learning[C]//CVPR. 2020: 9729-9738.
Google Scholar
Grill J B, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to self-supervised learning[J]. NeurIPS, 2020, 33: 21271-21284.
Google Scholar
Caron M, Misra I, Mairal J, et al. Unsupervised learning of visual features by contrasting cluster assignments[J]. NeurIPS, 2020, 33: 9912-9924.
Google Scholar
Chen X, He K. Exploring simple siamese representation learning[C]//CVPR. 2021: 15750-15758.
Google Scholar
Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers[C]//ICCV. 2021: 9650-9660.
Google Scholar
Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. NeurIPS, 2020, 33: 1877-1901.
Google Scholar
Pathak D, Krahenbuhl P, Donahue J, et al. Context encoders: Feature learning by inpainting[C]//CVPR. 2016: 2536-2544.
Google Scholar
Bao H, Dong L, Piao S, et al. Beit: Bert pre-training of image transformers[J]. arXiv preprint arXiv:2106.08254, 2021.
Zhou J, Wei C, Wang H, et al. ibot: Image bert pre-training with online tokenizer[J]. arXiv preprint arXiv:2111.07832, 2021.
He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]//CVPR. 2022: 16000-16009.
Google Scholar
Xie Z, Zhang Z, Cao Y, et al. Simmim: A simple framework for masked image modeling[C]//CVPR. 2022: 9653-9663.
Google Scholar
Assran M, Duval Q, Misra I, et al. Self-supervised learning from images with a joint-embedding predictive architecture[C]//CVPR. 2023: 15619-15629.
Google Scholar
Chen X, Ding M, Wang X, et al. Context autoencoder for self-supervised representation learning[J]. IJCV, 2024, 132(1): 208-223.
Article Google Scholar
Tian K, Jiang Y, Diao Q, et al. Designing bert for convolutional networks: Sparse and hierarchical masked modeling[J]. arXiv preprint arXiv:2301.03580, 2023.
Zhou L, Liu H, Bae J, et al. Self pre-training with masked autoencoders for medical image classification and segmentation[C]//ISBI. IEEE, 2023: 1-6.
Google Scholar
Goncharov M, Soboleva V, Kurmukov A, et al. vox2vec: A framework for self-supervised contrastive learning of voxel-level representations in medical images[C]//MICCAI, 2023: 605-614.
Google Scholar
Chen J, Lu Y, Yu Q, et al. Transunet: Transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306, 2021.
Wang W, Chen C, Ding M, et al. Transbts: Multimodal brain tumor segmentation using transformer[C]//MICCAI, 2021: 109-119.
Google Scholar
Wang, Haonan, et al. Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. AAAI. Vol. 36. No. 3. 2022.
Google Scholar
Tang F, Nian B, Ding J, et al. MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation[J]. arXiv preprint arXiv:2312.01740, 2023.
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//MICCAI, 2015: 234-241.
Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
Roy S, Koehler G, Ulrich C, et al. Mednext: transformer-driven scaling of convnets for medical image segmentation[C]//MICCAI, 2023: 405-415.
Google Scholar
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//CVPR. 2016: 770-778.
Google Scholar
Liu Z, Mao H, Wu C Y, et al. A convnet for the 2020s[C]//CVPR. 2022: 11976-11986.
Google Scholar
Hatamizadeh A, Tang Y, Nath V, et al. Unetr: Transformers for 3d medical image segmentation[C]//WACV. 2022: 574-584.
Google Scholar
Tang Y, Yang D, Li W, et al. Self-supervised pre-training of swin transformers for 3d medical image analysis[C]//CVPR. 2022: 20730-20740.
Google Scholar
Landman B, Xu Z, Igelsias J, et al. Miccai multi-atlas labeling beyond the cranial vault-workshop and challenge[C]//Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault-Workshop Challenge. 2015, 5: 12.
Google Scholar
Kavur A E, Gezer N S, Baríş M, et al. CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation[J]. Medical Image Analysis, 2021, 69: 101950.
Article Google Scholar
Luo X, Liao W, Xiao J, et al. WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image[J]. Medical Image Analysis, 2022, 82: 102642-102642.
Article Google Scholar
Ma J, Zhang Y, Gu S, et al. Unleashing the strengths of unlabeled data in pan-cancer abdominal organ quantification: the flare22 challenge[J]. arXiv preprint arXiv:2308.05862, 2023.
Ma J, Zhang Y, Gu S, et al. Abdomenct-1k: Is abdominal organ segmentation a solved problem?[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(10): 6695-6714.
Article Google Scholar
Wasserthal J, Breit H C, Meyer M T, et al. Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images[J]. Radiology: Artificial Intelligence, 2023, 5(5).
Google Scholar
Ji Y, Bai H, Ge C, et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation[J]. NeurIPS, 2022, 35: 36722-36732.
Google Scholar
Antonelli M, Reinke A, Bakas S, et al. The medical segmentation decathlon[J]. Nature communications, 2022, 13(1): 4128.
Article Google Scholar
Graham B, Van der Maaten L. Submanifold sparse convolutional networks[J]. arXiv preprint arXiv:1706.01307, 2017.

Download references

Acknowledgments

Supported by Natural Science Foundation of China under Grant 62271465, Suzhou Basic Research Program under Grant SYG202338, and Open Fund Project of Guangdong Academy of Medical Sciences, China (No. YKY-KF202206).

Author information

Authors and Affiliations

School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, Anhui, People’s Republic of China
Fenghe Tang, Ronghao Xu, Xueming Fu, Heqin Zhu & S. Kevin Zhou
Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, 215123, Jiangsu, People’s Republic of China
Fenghe Tang, Ronghao Xu, Xueming Fu, Heqin Zhu & S. Kevin Zhou
Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Qingsong Yao, Quan Quan & S. Kevin Zhou
Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
Zaiyi Liu
Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
Zaiyi Liu

Authors

Fenghe Tang
View author publications
You can also search for this author in PubMed Google Scholar
Ronghao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qingsong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xueming Fu
View author publications
You can also search for this author in PubMed Google Scholar
Quan Quan
View author publications
You can also search for this author in PubMed Google Scholar
Heqin Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Zaiyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
S. Kevin Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Kevin Zhou .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8474 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, F. et al. (2024). HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-training. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15011. Springer, Cham. https://doi.org/10.1007/978-3-031-72120-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-72120-5_31
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72119-9
Online ISBN: 978-3-031-72120-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-training

Abstract

Access this chapter

Subscribe and save

Buy Now