[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3664647.3680698acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Channel-Spatial Support-Query Cross-Attention for Fine-Grained Few-Shot Image Classification

Published: 28 October 2024 Publication History

Abstract

Few-shot fine-grained image classification aims to use only few labelled samples to successfully recognize subtle sub-classes within the same parent class. This task is extremely challenging, due to the co-occurrence of large inter-class similarity, low intra-class similarity, and only few labelled samples. In this paper, to address these challenges, we propose a new Channel-Spatial Cross-Attention Module (CSCAM), which can effectively drive a model to extract discriminative fine-grained feature representations with only few shots. CSCAM collaboratively integrates a channel cross-attention module and a spatial cross-attention module, for the attentions across support and query samples. In addition, to fit for the characteristics of fine-grained images, a support averaging method is proposed in CSCAM to reduce the intra-class distance and increase the inter-class distance. Extensive experiments on four few-shot fine-grained classification datasets validate the effectiveness of CSCAM. Furthermore, CSCAM is a plug-and-play module, conveniently enabling effective improvement of state-of-the-art methods for few-shot fine-grained image classification.

References

[1]
Arman Afrasiyabi, Jean-Franccois Lalonde, and Christian Gagné. 2021. Mixture-Based Feature Space Learning for Few-Shot Image Classification. In ICCV.
[2]
Yuexuan An, Hui Xue, Xingyu Zhao, and Jing Wang. 2023. From Instance to Metric Calibration: A Unified Framework for Open-World Few-Shot Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[3]
Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A Closer Look at Few-shot Classification. In ICLR.
[4]
Philip Chikontwe, Soopil Kim, and Sang Hyun Park. 2022. CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification. In CVPR.
[5]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual Attention Network for Scene Segmentation. In CVPR.
[6]
Ankit Goyal, Alexey Bochkovskiy, Jia Deng, and Vladlen Koltun. 2022. Non-deep Networks. In NIPS. Curran Associates, Inc.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[9]
Ruibing Hou, Hong Chang, Bingpeng MA, Shiguang Shan, and Xilin Chen. 2019. Cross Attention Network for Few-shot Classification. In NIPS. Curran Associates, Inc.
[10]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In CVPR.
[11]
Dahyun Kang, Heeseung Kwon, Juhong Min, and Minsu Cho. 2021. Relational Embedding for Few-Shot Classification. In ICCV.
[12]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D object representations for fine-grained categorization. In ICCV.
[13]
Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. 2019. Meta-Learning With Differentiable Convex Optimization. In CVPR.
[14]
SuBeen Lee, WonJun Moon, and Jae-Pil Heo. 2022. Task Discrepancy Maximization for Fine-Grained Few-Shot Classification. In CVPR.
[15]
Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, and Jiebo Luo. 2019. Revisiting Local Descriptor based Image-to-Class Measure for Few-shot Learning. In CVPR.
[16]
Xiaoxu Li, Qi Song, Jijie Wu, Rui Zhu, Zhanyu Ma, and Jing-Hao Xue. 2023. Locally-Enriched Cross-Reconstruction for Few-Shot Fine-Grained Image Classification. IEEE Transactions on Circuits and Systems for Video Technology (2023).
[17]
Xiaoxu Li, Jijie Wu, Zhuo Sun, Zhanyu Ma, Jie Cao, and Jing-Hao Xue. 2021. BSNet: Bi-Similarity Network for Few-shot Fine-grained Image Classification. IEEE Transactions on Image Processing (2021).
[18]
Xiaoxu Li, Xiaochen Yang, Zhanyu Ma, and Jing-Hao Xue. 2023. Deep metric learning for few-shot image classification: A Review of recent developments. Pattern Recognition (2023).
[19]
Hai Liu, Cheng Zhang, Yongjian Deng, Bochen Xie, Tingting Liu, Zhaoli Zhang, and You-Fu Li. 2023. TransIFC: Invariant Cues-aware Feature Concentration Learning for Efficient Fine-grained Bird Image Classification. IEEE Transactions on Multimedia (2023), 1--14.
[20]
Yichao Liu, Zongru Shao, and Nico Hoffmann. 2021. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. ArXiv, Vol. abs/2112.05561 (2021).
[21]
Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao, Zi-Chao Zhang, Xin Luo, and Xin-Shun Xu. 2024. Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 4136--4144.
[22]
Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew B. Blaschko, and Andrea Vedaldi. 2013. Fine-Grained Visual Classification of Aircraft. ArXiv, Vol. abs/1306.5151 (2013).
[23]
Diganta Misra, Trikay Nalamada, Ajay Uppili Arasanipalai, and Qibin Hou. 2021. Rotate to Attend: Convolutional Triplet Attention Module. In WACV.
[24]
Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated Flower Classification over a Large Number of Classes. In ICCV.
[25]
Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, and Gao Huang. 2022. On the Integration of Self-Attention and Convolution. In CVPR.
[26]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In ICCV.
[27]
Christian Simon, Piotr Koniusz, Richard Nock, and Mehrtash Harandi. 2020. Adaptive Subspaces for Few-Shot Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28]
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks for Few-shot Learning. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc.
[29]
Rakshith Subramanyam, Mark Heimann, TS Jayram, Rushil Anirudh, and Jayaraman J Thiagarajan. 2023. Contrastive Knowledge-Augmented Meta-Learning for Few-Shot Classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.
[30]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In CVPR.
[31]
Hao Tang, Chengcheng Yuan, Zechao Li, and Jinhui Tang. 2022. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition (2022).
[32]
Meihan Tong, Shuai Wang, Bin Xu, Yixin Cao, Minghui Liu, Lei Hou, and Juanzi Li. 2021. Learning from Miscellaneous Other-Class Words for Few-shot Named Entity Recognition. ArXiv, Vol. abs/2106.15167 (2021).
[33]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.
[35]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD Birds-200--2011 dataset. (2011).
[36]
Chuanming Wang, Huiyuan Fu, and Huadong Ma. 2023. Learning Mutually Exclusive Part Representations for Fine-grained Image Classification. IEEE Transactions on Multimedia (2023), 1--12.
[37]
Zhengyao Wen, Wenzhong Lin, Tao Wang, and Ge Xu. 2023. Distract Your Attention: Multi-Head Cross Attention Network for Facial Expression Recognition. Biomimetics (2023).
[38]
Davis Wertheimer, Luming Tang, and Bharath Hariharan. 2021. Few-Shot Classification With Feature Map Reconstruction Networks. In CVPR.
[39]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional Block Attention Module. In ECCV.
[40]
Jijie Wu, Dongliang Chang, Aneeshan Sain, Xiaoxu Li, Zhanyu Ma, Jie Cao, Jun Guo, and Yi-Zhe Song. 2023. Bi-directional feature reconstruction network for fine-grained few-shot image classification. In AAAI.
[41]
Jingyi Xu, Hieu Le, Mingzhen Huang, ShahRukh Athar, and Dimitris Samaras. 2021. Variational Feature Disentangling for Fine-Grained Few-Shot Classification. In ICCV.
[42]
Shu-Lin Xu, Faen Zhang, Xiu-Shen Wei, and Jianhua Wang. 2022. Dual attention networks for few-shot fine-grained recognition. In AAAI.
[43]
Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, and Ping Li. 2021. S(^mbox2)-MLPv2: Improved Spatial-Shift MLP Architecture for Vision. ArXiv, Vol. abs/2108.01072 (2021).
[44]
Zican Zha, Hao Tang, Yunlian Sun, and Jinhui Tang. 2023. Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment. IEEE Transactions on Circuits and Systems for Video Technology (2023).
[45]
Bo Zhang, Jiakang Yuan, Baopu Li, Tao Chen, Jiayuan Fan, and Botian Shi. 2022. Learning cross-image object semantic relation in transformer for few-shot fine-grained image classification. In ACM MM.
[46]
Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. 2020. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers. In CVPR.
[47]
Yaohui Zhu, Chenlong Liu, and Shuqiang Jiang. 2020. Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition. In IJCAI.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
October 2024
11719 pages
ISBN:9798400706868
DOI:10.1145/3664647
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. channel cross-attention
  2. few-shot learning
  3. fine-grained image classification
  4. spatial cross-attention

Qualifiers

  • Research-article

Conference

MM '24
Sponsor:
MM '24: The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 86
    Total Downloads
  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)43
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media