Computer Science > Computer Vision and Pattern Recognition

arXiv:2401.02317 (cs)

[Submitted on 4 Jan 2024 (v1), last revised 20 Mar 2024 (this version, v4)]

Title:BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Authors:Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu, Lizhuang Ma

Abstract:In this paper, we address the challenge of image resolution variation for the Segment Anything Model (SAM). SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes. Previous approaches tend to resize the image to a fixed size or adopt structure modifications, hindering the preservation of SAM's rich prior knowledge. Besides, such task-specific tuning necessitates a complete retraining of the model, which is cost-expensive and unacceptable for deployment in the downstream tasks. In this paper, we reformulate this issue as a length extrapolation problem, where token sequence length varies while maintaining a consistent patch size for images of different sizes. To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for structure modifications. Firstly, we introduce a new scaling factor to ensure consistent magnitude in the attention layer's dot product values when the token sequence length changes. Secondly, we present a bias-mode attention mask that allows each token to prioritize neighboring information, mitigating the impact of untrained distant information. Our BA-SAM demonstrates efficacy in two scenarios: zero-shot and fine-tuning. Extensive evaluation on diverse datasets, including DIS5K, DUTS, ISIC, COD10K, and COCO, reveals its ability to significantly mitigate performance degradation in the zero-shot setting and achieve state-of-the-art performance with minimal fine-tuning. Furthermore, we propose a generalized model and benchmark, showcasing BA-SAM's generalizability across all four datasets simultaneously. Code is available at this https URL

Comments:	Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2401.02317 [cs.CV]
	(or arXiv:2401.02317v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2401.02317

Submission history

From: Qianyu Zhou [view email]
[v1] Thu, 4 Jan 2024 15:34:44 UTC (4,505 KB)
[v2] Mon, 8 Jan 2024 08:39:34 UTC (4,505 KB)
[v3] Tue, 19 Mar 2024 15:48:17 UTC (4,503 KB)
[v4] Wed, 20 Mar 2024 02:03:52 UTC (4,503 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators