Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.01209 (cs)

[Submitted on 2 Oct 2023 (v1), last revised 3 Jul 2024 (this version, v2)]

Title:Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis

Authors:Jue Jiang, Aneesh Rangnekar, Chloe Min Seo Choi, Harini Veeraraghavan

Abstract:Pretraining vision transformers (ViT) with attention guided masked image modeling (MIM) has shown to increase downstream accuracy for natural image analysis. Hierarchical shifted window (Swin) transformer, often used in medical image analysis cannot use attention guided masking as it lacks an explicit [CLS] token, needed for computing attention maps for selective masking. We thus enhanced Swin with semantic class attention. We developed a co-distilled Swin transformer that combines a noisy momentum updated teacher to guide selective masking for MIM. Our approach called \textsc{s}e\textsc{m}antic \textsc{a}ttention guided co-distillation with noisy teacher \textsc{r}egularized Swin \textsc{T}rans\textsc{F}ormer (SMARTFormer) was applied for analyzing 3D computed tomography datasets with lung nodules and malignant lung cancers (LC). We also analyzed the impact of semantic attention and noisy teacher on pretraining and downstream accuracy. SMARTFormer classified lesions (malignant from benign) with a high accuracy of 0.895 of 1000 nodules, predicted LC treatment response with accuracy of 0.74, and achieved high accuracies even in limited data regimes. Pretraining with semantic attention and noisy teacher improved ability to distinguish semantically meaningful structures such as organs in a unsupervised clustering task and localize abnormal structures like tumors. Code, models will be made available through GitHub upon paper acceptance.

Comments:	Paper is under review at TMI
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.01209 [cs.CV]
	(or arXiv:2310.01209v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.01209

Submission history

From: Jue Jiang Dr. [view email]
[v1] Mon, 2 Oct 2023 13:53:55 UTC (10,243 KB)
[v2] Wed, 3 Jul 2024 11:49:33 UTC (23,657 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators