Abstract
Existing training methods for medical image foundation models primarily focus on tasks such as image restoration, overlooking the potential of harnessing the inherent anatomical knowledge of the human body. The discrepancy between the training tasks of foundation models and downstream tasks often necessitates model fine-tuning for each specific application. An insufficient scale of the downstream training set can lead to catastrophic forgetting of the foundational model. To address these issues, we propose a novel unsupervised training method for medical image foundation models. Our approach incorporates an anatomical embedding task, enabling the model to generate anatomically related embeddings for each voxel. To expedite the training and accommodate large-scale models, we employ the strategy of momentum contrast learning, which is further enhanced to adapt to the task of anatomical embedding. To improve the model's performance for specific targets, we introduce the region contrastive loss, utilizing a small set of segmentation labels (e.g., five samples) to identify the focused regions during training. In our experiments, we pre-train the foundation model using a dataset of 4000 unlabeled abdominal CT scans with the downstream task being the few-shot learning of 13 abdominal organ segmentation. The results showed significant improvements in the downstream segmentation task, particularly in the scenarios with limited segmentation annotations, compared to methods without pre-training and similar foundation models. The trained models and the downstream training code have been open sourced at https://github.com/DlutMedimgGroup/Anatomy-Embedding-Foundation-Model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
References
Cardoso, M.J., et al.: MONAI: an open-source framework for deep learning in healthcare, http://arxiv.org/abs/2211.02701 (2022). https://doi.org/10.48550/arXiv.2211.02701
Chaitanya, K., et al.: Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation. Med. Image Anal. 87, 102792 (2023). https://doi.org/10.1016/j.media.2023.102792
Dosovitskiy, A., et al.: An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. http://arxiv.org/abs/2010.11929, (2021). https://doi.org/10.48550/arXiv.2010.11929
Hatamizadeh, A., et al.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: Crimi, A., Bakas, S. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 272–284. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08999-2_22
Hatamizadeh, A., et al.: UNETR: transformers for 3D Medical Image Segmentation. http://arxiv.org/abs/2103.10504 (2021). https://doi.org/10.48550/arXiv.2103.10504
He, K., et al.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735. IEEE, Seattle, WA, USA (2020). https://doi.org/10.1109/CVPR42600.2020.00975
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. Presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
Ma, J., et al.: Segment anything in medical images. Nat Commun. 15, 1, 654 (2024). https://doi.org/10.1038/s41467-024-44824-z
Ma, J., et al.: Unleashing the strengths of unlabeled data in pan-cancer abdominal organ quantification: the FLARE22 challenge. http://arxiv.org/abs/2308.05862 (2023). https://doi.org/10.48550/arXiv.2308.05862
McMahan, B., et al.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
Park, T., et al.: Contrastive learning for unpaired image-to-image translation. http://arxiv.org/abs/2007.15651 (2020). https://doi.org/10.48550/arXiv.2007.15651
Saldanha, O.L., et al.: Swarm learning for decentralized artificial intelligence in cancer histopathology. Nat. Med. 28(6), 1232–1239 (2022). https://doi.org/10.1038/s41591-022-01768-5
Tang, Y., et al.: Self-supervised pre-training of swin transformers for 3D medical image analysis. Presented at the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Warnat-Herresthal, S., et al.: Swarm Learning for decentralized and confidential clinical machine learning. Nature 594(7862), 265–270 (2021). https://doi.org/10.1038/s41586-021-03583-3
Wu, J., et al.: Medical SAM adapter: adapting segment anything model for medical image segmentation. http://arxiv.org/abs/2304.12620 (2023)
Yan, K., et al.: SAM: self-supervised learning of pixel-wise anatomical embeddings in radiological images. IEEE Trans. Med. Imaging 41(10), 2658–2669 (2022). https://doi.org/10.1109/TMI.2022.3169003
Yu, Z., et al.: Cross-grained contrastive representation for unsupervised lesion segmentation in medical images. In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 2339–2346 (2023). https://doi.org/10.1109/ICCVW60793.2023.00248
Zhang, Y., et al.: Input augmentation with SAM: boosting medical image segmentation with segmentation foundation model. In: Celebi, M.E., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 Workshops. pp. 129–139. Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-47401-9_13
Acknowledgments
This work was supported in part by the National Key Research and Development Program No. 2020YFB1711500, 2020YFB1711501 and 2020YFB1711503, the general program of National Natural Science Fund of China (No. 81971693, 61971445), the funding of Dalian Key Laboratory of Digital Medicine for Critical Diseases, the Fundamental Research Funds for the Central Universities (No. DUT22YG229 and DUT22YG205), the funding of Liaoning Key Lab of IC & BME System and Dalian Engineering Research Center for Artificial Intelligence in Medical Imaging.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhuang, M., Xu, R., Zhang, Q., Liu, A., Fan, X., Wang, H. (2025). Anatomical Embedding-Based Training Method for Medical Image Segmentation Foundation Models. In: Deng, Z., et al. Foundation Models for General Medical AI. MedAGI 2024. Lecture Notes in Computer Science, vol 15184. Springer, Cham. https://doi.org/10.1007/978-3-031-73471-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-73471-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73470-0
Online ISBN: 978-3-031-73471-7
eBook Packages: Computer ScienceComputer Science (R0)