Curriculum Prompting Foundation Models for Medical Image Segmentation

Xiuqi Zheng ORCID: orcid.org/0009-0002-2814-5167¹⁴,
Yuhang Zhang¹⁴,
Haoran Zhang¹⁴,
Hongrui Liang¹⁴,
Xueqi Bao¹⁴,
Zhuqing Jiang¹⁴ &
…
Qicheng Lao ORCID: orcid.org/0000-0002-6032-8548¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15012))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

742 Accesses

Abstract

Adapting large pre-trained foundation models, e.g., SAM, for medical image segmentation remains a significant challenge. A crucial step involves the formulation of a series of specialized prompts that incorporate specific clinical instructions. Past works have been heavily reliant on a singular type of prompt for each instance, necessitating manual input of an ideally correct prompt, which is less efficient. To tackle this issue, we propose to utilize prompts of different granularity, which are sourced from original images to provide a broader scope of clinical insights. However, combining prompts of varying types can pose a challenge due to potential conflicts. In response, we have designed a coarse-to-fine mechanism, referred to as curriculum prompting, that progressively integrates prompts of different types. Through extensive experiments on three public medical datasets across various modalities, we demonstrate the effectiveness of our proposed approach, which not only automates the prompt generation process but also yields superior performance compared to other SAM-based medical image segmentation methods. Code will be available at: https://github.com/AnnaZzz-zxq/Curriculum-Prompting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 69.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 84.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation

Efficient In-Context Medical Segmentation with Meta-Driven Visual Prompt Selection

Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. p. 41-48. ICML ’09, Association for Computing Machinery, New York, NY, USA (2009). https://doi.org/10.1145/1553374.1553380
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Cheng, D., Qin, Z., Jiang, Z., Zhang, S., Lao, Q., Li, K.: Sam on medical images: A comprehensive study on three prompt modes. arXiv preprint arXiv:2305.00035 (2023)
Cheng, J., Tian, S., Yu, L., Gao, C., Kang, X., Ma, X., Wu, W., Liu, S., Lu, H.: Resganet: Residual group attention network for medical image classification and segmentation. Medical Image Analysis 76, 102313 (2022)
Article Google Scholar
Cheng, J., Ye, J., Deng, Z., Chen, J., Li, T., Wang, H., Su, Y., Huang, Z., Chen, J., Jiang, L., et al.: Sam-med2d. arXiv preprint arXiv:2308.16184 (2023)
Contributors, M.: Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose (2020)
Degerli, A., Kiranyaz, S., Chowdhury, M.E., Gabbouj, M.: Osegnet: Operational segmentation network for covid-19 detection using chest x-ray images. In: 2022 IEEE International Conference on Image Processing (ICIP). pp. 2306–2310. IEEE (2022)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fan, D.P., Ji, G.P., Zhou, T., Chen, G., Fu, H., Shen, J., Shao, L.: Pranet: Parallel reverse attention network for polyp segmentation. In: International conference on medical image computing and computer-assisted intervention. pp. 263–273. Springer (2020)
Google Scholar
Gong, H., Chen, J., Chen, G., Li, H., Li, G., Chen, F.: Thyroid region prior guided attention for ultrasound segmentation of thyroid nodules. Computers in Biology and Medicine 155, 106389 (2023)
Article Google Scholar
He, S., Bao, R., Li, J., Grant, P.E., Ou, Y.: Accuracy of segment-anything model (sam) in medical image segmentation tasks. arXiv preprint arXiv:2304.09324 (2023)
Huang, Y., Yang, X., Liu, L., Zhou, H., Chang, A., Zhou, X., Chen, R., Yu, J., Chen, J., Chen, C., et al.: Segment anything model for medical images? Medical Image Analysis 92, 103061 (2024)
Article Google Scholar
Huang, Y., Yang, X., Liu, L., Zhou, H., Chang, A., Zhou, X., Chen, R., Yu, J., Chen, J., Chen, C., et al.: Segment anything model for medical images? Medical Image Analysis 92, 103061 (2024)
Article Google Scholar
Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., de Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: A segmented polyp dataset. In: MultiMedia Modeling: 26th International Conference, MMM 2020, Daejeon, South Korea, January 5–8, 2020, Proceedings, Part II 26. pp. 451–462. Springer (2020)
Google Scholar
Ke, L., Ye, M., Danelljan, M., Liu, Y., Tai, Y.W., Tang, C.K., Yu, F.: Segment anything in high quality. arXiv preprint arXiv:2306.01567 (2023)
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)
Li, L.H., Zhang, P., Zhang, H., Yang, J., Li, C., Zhong, Y., Wang, L., Yuan, L., Zhang, L., Hwang, J.N., et al.: Grounded language-image pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10965–10975 (2022)
Google Scholar
Li, Y., Jing, B., Feng, X., Li, Z., He, Y., Wang, J., Zhang, Y.: nnsam: Plug-and-play segment anything model improves nnunet performance. arXiv preprint arXiv:2309.16967 (2023)
Li, Z., Li, Y., Li, Q., Wang, P., Guo, D., Lu, L., Jin, D., Zhang, Y., Hong, Q.: Lvit: language meets vision transformer in medical image segmentation. IEEE transactions on medical imaging (2023)
Google Scholar
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Li, C., Yang, J., Su, H., Zhu, J., et al.: Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023)
Lou, A., Guan, S., Ko, H., Loew, M.H.: Caranet: Context axial reverse attention network for segmentation of small medical objects. In: Medical Imaging 2022: Image Processing. vol. 12032, pp. 81–92. SPIE (2022)
Google Scholar
Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. Nature Communications 15, 1–9 (2024)
Google Scholar
Mattjie, C., de Moura, L.V., Ravazio, R.C., Kupssinskü, L.S., Parraga, O., Delucis, M.M., Barros, R.C.: Zero-shot performance of the segment anything model (sam) in 2d medical imaging: A comprehensive evaluation and practical guidelines. arXiv preprint arXiv:2305.00109 (2023)
Putz, F., Grigo, J., Weissmann, T., Schubert, P., Hoefler, D., Gomaa, A., Tkhayat, H.B., Hagag, A., Lettmaier, S., Frey, B., et al.: The segment anything foundation model achieves favorable brain tumor autosegmentation accuracy on mri to support radiotherapy treatment planning. arXiv preprint arXiv:2304.07875 (2023)
Ren, T., Liu, S., Zeng, A., Lin, J., Li, K., Cao, H., Chen, J., Huang, X., Chen, Y., Yan, F., Zeng, Z., Zhang, H., Li, F., Yang, J., Li, H., Jiang, Q., Zhang, L.: Grounded sam: Assembling open-world models for diverse visual tasks (2024)
Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5693–5703 (2019)
Google Scholar
Xu, Y., Zhang, J., Zhang, Q., Tao, D.: Vitpose: Simple vision transformer baselines for human pose estimation. Advances in Neural Information Processing Systems 35, 38571–38584 (2022)
Google Scholar
Zhang, C., Puspitasari, F.D., Zheng, S., Li, C., Qiao, Y., Kang, T., Shan, X., Zhang, C., Qin, C., Rameau, F., et al.: A survey on segment anything model (sam): Vision foundation model meets prompt engineering. arXiv preprint arXiv:2306.06211 (2023)

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Xiuqi Zheng, Yuhang Zhang, Haoran Zhang, Hongrui Liang, Xueqi Bao, Zhuqing Jiang & Qicheng Lao

Authors

Xiuqi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuhang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongrui Liang
View author publications
You can also search for this author in PubMed Google Scholar
Xueqi Bao
View author publications
You can also search for this author in PubMed Google Scholar
Zhuqing Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qicheng Lao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qicheng Lao .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, X. et al. (2024). Curriculum Prompting Foundation Models for Medical Image Segmentation. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15012. Springer, Cham. https://doi.org/10.1007/978-3-031-72390-2_46

Download citation

DOI: https://doi.org/10.1007/978-3-031-72390-2_46
Published: 23 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72389-6
Online ISBN: 978-3-031-72390-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Curriculum Prompting Foundation Models for Medical Image Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation

Efficient In-Context Medical Segmentation with Meta-Driven Visual Prompt Selection

Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Curriculum Prompting Foundation Models for Medical Image Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross Prompting Consistency with Segment Anything Model for Semi-supervised Medical Image Segmentation

Efficient In-Context Medical Segmentation with Meta-Driven Visual Prompt Selection

Multi-prompt Fine-Tuning of Foundation Models for Enhanced Biomedical Image Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation