More Web Proxy on the site http://driver.im/

short-paper

Meta-Prompt Tuning Vision-Language Model for Multi-Label Few-Shot Image Recognition

Authors:

Jiabin ZhengAuthors Info & Claims

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Pages 4258 - 4262

https://doi.org/10.1145/3627673.3679963

Published: 21 October 2024 Publication History

Abstract

Multi-label few-shot image recognition aims to identify multiple unseen objects using only a handful of examples. Recent methods typically tune pre-trained vision-language models with shared or class-specific prompts. However, they still have drawbacks. Tuning a shared prompt is insufficient for all samples especially when the tasks are complex and tuning specific prompts for each class is inevitable to lose generalization ability, thus failing to capture diverse visual knowledge. To address these issues, we propose to meta-tune a generalized prompt pool, enabling each prompt to act as an expert for multi-label few-shot image recognition. Specifically, we first construct a diverse prompt pool to handle complex samples and tasks effectively. Then, the meta-tuning strategy is designed to learn meta-knowledge and transfer it from source tasks to target tasks, enhancing the generalization of prompts. Extensive experimental results on two widely used multi-label image recognition datasets demonstrate the effectiveness of our method.

References

[1]

Amit Alfassy, Leonid Karlinsky, Amit Aides, Joseph Shtok, Sivan Harary, Rogério Schmidt Feris, Raja Giryes, and Alexander M. Bronstein. 2019. LaSO: Label-Set Operations Networks for Multi-Label Few-Shot Learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]

Delong Chen, Jianfeng Liu, Wenliang Dai, and Baoyuan Wang. 2024. Visual Instruction Tuning with Polite Flamingo. In AAAI Conference on Artificial Intelligence (AAAI). 17745--17753.

[3]

Tianshui Chen, Liang Lin, Riquan Chen, Xiaolu Hui, and Hefeng Wu. 2022. Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2022).

[4]

Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-Label Image Recognition With Graph Convolutional Networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Zixuan Ding, Ao Wang, Hui Chen, Qiang Zhang, Pengzhang Liu, Yongjun Bao, Weipeng Yan, and Jungong Han. 2023. Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17--24, 2023. IEEE, 3398--3407.

[6]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR).

[7]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. (2010).

[8]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In International Conference on Machine Learning (ICML).

[9]

Bin-Bin Gao and Hong-Yu Zhou. 2021. Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition. IEEE Trans. Image Process. (2021).

Digital Library

[10]

Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, and Wangmeng Zuo. 2023. Texts as Images in Prompt Tuning for Multi-Label Image Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2808--2817.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]

Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2021. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In Conference on Empirical Methods in Natural Language Processing (EMNLP). 7514--7528.

[13]

Ping Hu, Ximeng Sun, Stan Sclaroff, and Kate Saenko. 2024. DualCoOp: Fast and Effective Adaptation to Multi-Label Recognition With Limited Annotations. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 46, 5 (2024), 3450--3462.

Digital Library

[14]

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), Vol. 139. 4904--4916.

[15]

Weisen Jiang, Yu Zhang, and James Kwok. 2023. Effective Structured Prompting by Meta-Learning and Representative Verbalizer. In International Conference on Machine Learning (ICML).

[16]

Muhammad Uzair Khattak, Hanoona Abdul Rasheed, Muhammad Maaz, Salman H. Khan, and Fahad Shahbaz Khan. 2023. MaPLe: Multi-modal Prompt Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19113--19122.

[17]

Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning (ICML).

[18]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV.

[19]

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning. In Annual Conference on Neural Information Processing Systems 'NeurIPS).

[20]

Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sung Ju Hwang, and Yi Yang. 2019. Learning to Propagate Labels: Transductive Propagation Network for Few-Shot Learning. In International Conference on Learning Representations (ICLR).

[21]

Tsendsuren Munkhdalai and Hong Yu. 2017. Meta Networks. In International Conference on Machine Learning (ICML).

[22]

Shubham Parashar, Zhiqiu Lin, Tian Liu, Xiangjue Dong, Yanan Li, Deva Ramanan, James Caverlee, and Shu Kong. 2024. The Neglected Tails of Vision-Language Models. CoRR, Vol. abs/2401.12425 (2024).

[23]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning (ICML), Vol. 139. 8748--8763.

[24]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In International Conference on Machine Learning (ICML).

[25]

Tal Ridnik, Emanuel Ben Baruch, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. 2021. Asymmetric Loss For Multi-Label Classification. In IEEE/CVF International Conference on Computer Vision (ICCV).

[26]

Pau Rodríguez, Issam H. Laradji, Alexandre Drouin, and Alexandre Lacoste. 2020. Embedding Propagation: Smoother Manifold for Few-Shot Classification. In ECCV.

[27]

Cheng Shi and Sibei Yang. 2023. LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models. In IEEE/CVF International Conference on Computer Vision (ICCV). 2920--2929.

[28]

Jake Snell, Kevin Swersky, and Richard S. Zemel. 2017. Prototypical Networks for Few-shot Learning. In Annual Conference on Neural Information Processing Systems (NeurIPS).

[29]

Ximeng Sun, Ping Hu, and Kate Saenko. 2022. DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations. In Annual Conference on Neural Information Processing Systems (NeurIPS).

[30]

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to Compare: Relation Network for Few-Shot Learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]

Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H. S. Torr, Adel Bibi, Samuel Albanie, and Matthias Bethge. 2024. No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance. CoRR, Vol. abs/2404.04125 (2024).

[32]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. JMLR (2008).

[33]

Henan Wang, Muli Yang, Kun Wei, and Cheng Deng. 2023. Hierarchical Prompt Learning for Compositional Zero-Shot Recognition. In International Joint Conference on Artificial Intelligence (IJCAI). 1470--1478.

[34]

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer G. Dy, and Tomas Pfister. 2022. Learning to Prompt for Continual Learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 139--149.

[35]

Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, and Nan Duan. 2023. BridgeTower: Building Bridges between Encoders in Vision-Language Representation Learning. In AAAI Conference on Artificial Intelligence (AAAI). 10637--10647.

Digital Library

[36]

Kun Yan, Chenbin Zhang, Jun Hou, Ping Wang, Zied Bouraoui, Shoaib Jameel, and Steven Schockaert. 2022. Inferring Prototypes for Multi-Label Few-Shot Image Classification with Word Vector Guided Attention. In AAAI Conference on Artificial Intelligence (AAAI).

[37]

Vacit Oguz Yazici, Abel Gonzalez-Garcia, Arnau Ramisa, Bartlomiej Twardowski, and Joost van de Weijer. 2020. Orderless Recurrent Models for Multi-Label Classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]

Vacit Oguz Yazici, Abel Gonzalez-Garcia, Arnau Ramisa, Bartlomiej Twardowski, and Joost van de Weijer. 2020. Orderless Recurrent Models for Multi-Label Classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]

Jaesik Yoon, Taesup Kim, Ousmane Dia, Sungwoong Kim, Yoshua Bengio, and Sungjin Ahn. 2018. Bayesian Model-Agnostic Meta-Learning. In Annual Conference on Neural Information Processing Systems (NeurIPS). 7343--7353.

[40]

Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, and Shilei Wen. 2020. Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification. In AAAI Conference on Artificial Intelligence (AAAI).

[41]

Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, and Jia Li. 2021. Transformer-based Dual Relation Graph for Multi-label Image Recognition. In IEEE/CVF International Conference on Computer Vision (ICCV). 163--172.

[42]

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Conditional Prompt Learning for Vision-Language Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16795--16804.

[43]

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Learning to Prompt for Vision-Language Models. Int. J. Comput. Vis. (2022).

[44]

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. 2023. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR, Vol. abs/2304.10592 (2023).

Index Terms

Meta-Prompt Tuning Vision-Language Model for Multi-Label Few-Shot Image Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Few-shot partial multi-label learning with synthetic features network
Abstract
In partial multi-label learning (PML) problems, each training sample is partially annotated with a candidate label set, among which only a subset of labels are valid. The major hardship for PML is that its training procedure is prone to be misled ...
Semantic guide for semi-supervised few-shot multi-label node classification
Abstract
We study a new research problem named semi-supervised few-shot multi-label node classification which has the following characteristics: 1) the extreme imbalance between the number of labeled and unlabeled nodes that are connected on ...
Label Hierarchical Structure-Aware Multi-Label Few-Shot Intent Detection via Prompt Tuning
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Multi-label intent detection aims to recognize multiple user intents behind dialogue utterances. The diversity of user utterances and the scarcity of training data motivate multi-label few-shot intent detection. However, existing methods ignore the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

October 2024

5705 pages

ISBN:9798400704369

DOI:10.1145/3627673

General Chairs:
Edoardo Serra
Boise State University, USA
,
Francesca Spezzano
Boise State University, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Science and Technology Project of State Grid Corporation of China

Conference

CIKM '24

Sponsor:

SIGIR

CIKM '24: The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

ID, Boise, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
87
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)54

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents