SelfLoc: High Quality Unsupervised Object Localization with Self-Prompt SAM

Jiaheng Zhang¹⁵,
Xiandong Wang¹⁵,
Conghui Li¹⁵,
Longyi Chen¹⁵ &
…
Shengke Wang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15042))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

94 Accesses

Abstract

Recently, self-supervised methods based on self-supervised transformer features have demonstrated promising results in unsupervised object localization. However, obtaining exceptional semantic results remains a formidable challenge. The current approaches heavily rely on the similarity between patch-level features within an image, while lacking supervision from image-level information. Meanwhile, the Segment Anything Model (SAM) has demonstrated remarkable class-agnostic segmentation capabilities for arbitrary objects in images with sparse prompts like points. In this work, we propose SelfLoc, a simple yet effective self-supervised object localization method via integration with self-prompt SAM. Specifically, a self-prompt generator is designed to automatically generate sparse prompts based on an image’s self-attention map. Simultaneously, an image-wise integration module is developed to enhance the coarse mask obtained from self-supervised features by leveraging the fine-grained segmentation results of SAM. Extensive experimental results demonstrate that the proposed method not only achieves state-of-the-art performance in unsupervised saliency detection and object discovery tasks, but also sets a new benchmark in unsupervised camouflaged object segmentation. The source code will be publicly available at https://github.com/Rogersiy/SelfLoc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 54.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation

Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization

Self-produced Guidance for Weakly-Supervised Object Localization

References

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Chen, T., Mai, Z., Li, R., lun Chao, W.: Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation (2023)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Fan, D.P., Cheng, M.M., Liu, Y., Li, T., Borji, A.: Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4548–4557 (2017)
Google Scholar
Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2777–2787 (2020)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners (2021). arXiv:2111.06377
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Dollár, P., Girshick, R.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2023)
Google Scholar
Le, T.N., Nguyen, T.V., Nie, Z., Tran, M.T., Sugimoto, A.: Anabranch network for camouflaged object segmentation. Comput. Vis. Image Underst. 184, 45–56 (2019)
Article Google Scholar
Li, X., Lin, C.C., Chen, Y., Liu, Z., Wang, J., Raj, B.: Paintseg: training-free segmentation via painting (2023). arXiv:2305.19406
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer (2014)
Google Scholar
Lv, Y., Zhang, J., Barnes, N., Dai, Y.: Weakly-supervised contrastive learning for unsupervised object discovery (2023). arXiv:2307.03376
Lv, Y., Zhang, J., Dai, Y., Li, A., Liu, B., Barnes, N., Fan, D.P.: Simultaneously localize, segment and rank the camouflaged objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11591–11601 (2021)
Google Scholar
Margolin, R., Zelnik-Manor, L., Tal, A.: How to evaluate foreground maps? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2014)
Google Scholar
Melas-Kyriazi, L., Rupprecht, C., Laina, I., Vedaldi, A.: Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8364–8375 (2022)
Google Scholar
Nguyen, T., Dax, M., Mummadi, C.K., Ngo, N., Nguyen, T.H.P., Lou, Z., Brox, T.: Deepusps: deep robust unsupervised saliency prediction via self-supervision. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)
Article MathSciNet Google Scholar
Seitzer, M., Horn, M., Zadaianchuk, A., Zietlow, D., Xiao, T., Simon-Gabriel, C.J., He, T., Zhang, Z., Schölkopf, B., Brox, T., Locatello, F.: Bridging the gap to real-world object-centric learning. In: ICLR 2023 (2023)
Google Scholar
Shi, J., Yan, Q., Xu, L., Jia, J.: Hierarchical image saliency detection on extended CSSD. IEEE Trans. Pattern Anal. Mach. Intell. 38(4), 717–729 (2015)
Article Google Scholar
Shin, G., Albanie, S., Xie, W.: Unsupervised salient object detection with spectral cluster voting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3971–3980 (2022)
Google Scholar
Siméoni, O., Puy, G., Vo, H.V., Roburin, S., Gidaris, S., Bursuc, A., Pérez, P., Marlet, R., Ponce, J.: Localizing objects with self-supervised transformers and no labels (2021). arXiv:2109.14279
Siméoni, O., Sekkat, C., Puy, G., Vobecky, A., Zablocki, ., Pérez, P.: Unsupervised object localization: Observing the background to discover objects. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2023)
Google Scholar
Skurowski, P., Abdulameer, H., Błaszczyk, J., Depta, T., Kornacki, A., Kozieł, P.: Animal camouflage analysis: Chameleon database (2018), unpublished manuscript
Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013)
Article Google Scholar
Vo, H.V., Pérez, P., Ponce, J.: Toward unsupervised, multi-object discovery in large-scale image collections. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIII 16, pp. 779–795. Springer (2020)
Google Scholar
Vo, V.H., Sizikova, E., Schmid, C., Pérez, P., Ponce, J.: Large-scale unsupervised object discovery. Adv. Neural. Inf. Process. Syst. 34, 16764–16778 (2021)
Google Scholar
Wang, L., Lu, H., Wang, Y., Feng, M., Wang, D., Yin, B., Ruan, X.: Learning to detect salient objects with image-level supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 136–145 (2017)
Google Scholar
Wang, X., Yu, Z., De Mello, S., Kautz, J., Anandkumar, A., Shen, C., Alvarez, J.M.: FreeSOLO: learning to segment objects without annotations (2022). arXiv:2202.12181
Wang, Y., Shen, X., Hu, S.X., Yuan, Y., Crowley, J.L., Vaufreydaz, D.: Self-supervised transformers for unsupervised object discovery using normalized cut. In: Conference on Computer Vision and Pattern Recognition (2022)
Google Scholar
Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1155–1162 (2013)
Google Scholar
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3166–3173 (2013)
Google Scholar
Zhang, Y., Wu, C.: Unsupervised camouflaged object segmentation as domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4334–4344 (2023)
Google Scholar
Zhu, W., Liang, S., Wei, Y., Sun, J.: Saliency optimization from robust background detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2814–2821 (2014)
Google Scholar

Download references

Acknowledgments

This work is supported in part by the Science and Technology Program of Qingdao (24-1-8-cspz-22-nsh).

Author information

Authors and Affiliations

Ocean University of China, Qingdao, China
Jiaheng Zhang, Xiandong Wang, Conghui Li, Longyi Chen & Shengke Wang

Authors

Jiaheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiandong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Conghui Li
View author publications
You can also search for this author in PubMed Google Scholar
Longyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shengke Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengke Wang .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Wang, X., Li, C., Chen, L., Wang, S. (2025). SelfLoc: High Quality Unsupervised Object Localization with Self-Prompt SAM. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15042. Springer, Singapore. https://doi.org/10.1007/978-981-97-8858-3_35

Download citation

DOI: https://doi.org/10.1007/978-981-97-8858-3_35
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8857-6
Online ISBN: 978-981-97-8858-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SelfLoc: High Quality Unsupervised Object Localization with Self-Prompt SAM

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation

Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization

Self-produced Guidance for Weakly-Supervised Object Localization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

SelfLoc: High Quality Unsupervised Object Localization with Self-Prompt SAM

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation

Pro2SAM: Mask Prompt to SAM with Grid Points for Weakly Supervised Object Localization

Self-produced Guidance for Weakly-Supervised Object Localization

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation