More Web Proxy on the site http://driver.im/

research-article

Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning

Authors:

Steven SchockaertAuthors Info & Claims

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 367 - 375

https://doi.org/10.1145/3460426.3463641

Published: 01 September 2021 Publication History

Abstract

Few-shot learning (FSL) is the task of learning to recognize previously unseen categories of images from a small number of training examples. This is a challenging task, as the available examples may not be enough to unambiguously determine which visual features are most characteristic of the considered categories. To alleviate this issue, we propose a method that additionally takes into account the names of the image classes. While the use of class names has already been explored in previous work, our approach differs in two key aspects. First, while previous work has aimed to directly predict visual prototypes from word embeddings, we found that better results can be obtained by treating visual and text-based prototypes separately. Second, we propose a simple strategy for learning class name embeddings using the BERT language model, which we found to substantially outperform the GloVe vectors that were used in previous work. We furthermore propose a strategy for dealing with the high dimensionality of these vectors, inspired by models for aligning cross-lingual word embeddings. We provide experiments on miniImageNet, CUB and tieredImageNet, showing that our approach consistently improves the state-of-the-art in metric-based FSL.

References

[1]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations. In Proc. AAAI. 5012--5019.

[2]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association of Computational Linguistics, Vol. 5, 1 (2017), 135--146.

[3]

Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, and Shih-Fu Chang. 2018. Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks. In Proc. CVPR. 1043--1052.

[4]

Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A Closer Look at Few-shot Classification. In 7Proc. ICLR .

[5]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proc. CVPR. Ieee, 248--255.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. NAACL-HLT .

[7]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proc. ICML . 1126--1135.

[8]

Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Proc. NIPS . 2121--2129.

[9]

Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In Proc. CVPR . 4367--4375.

[10]

Fusheng Hao, Fengxiang He, Jun Cheng, Lei Wang, Jianzhong Cao, and Dacheng Tao. 2019. Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning. In Proceedings of the IEEE International Conference on Computer Vision . 8460--8469.

[11]

Bharath Hariharan and Ross Girshick. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proc. ICCV . 3018--3027.

[12]

Han He and Jinho Choi. 2020. Establishing strong baselines for the new decade: Sequence tagging, syntactic and semantic parsing with BERT. In Proc. FLAIRS .

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. CVPR. 770--778.

[14]

Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2019. Cross Attention Network for Few-shot Classification. In Proc. NeurIPS . 4005--4016.

[15]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proc. CVPR. 4700--4708.

[16]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. ICML. 448--456.

Digital Library

[17]

Jongmin Kim, Taesup Kim, Sungwoong Kim, and Chang D Yoo. 2019. Edge-labeling graph neural network for few-shot learning. In Proc. CVPR . 11--20.

[18]

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proc. ICLR .

[19]

Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML Workshop, Vol. 2. Lille.

[20]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proc. NIPS . 1097--1105.

[21]

Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. 2019. Meta-Learning With Differentiable Convex Optimization. In Proc. CVPR . 10657--10665.

[22]

Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. 2020. Boosting Few-Shot Learning With Adaptive Margin Loss. In Proc. CVPR . 12573--12581.

[23]

Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, and Xiaogang Wang. 2019 a. Finding task-relevant features for few-shot learning by category traversal. In Proc. CVPR . 1--10.

[24]

Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, and Jiebo Luo. 2019 b. Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning. In Proc. CVPR . 7260--7268.

[25]

Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. 2017. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017).

[26]

Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, and Andrei Bursuc. 2019. Dense Classification and Implanting for Few-Shot Learning. In Proc. CVPR . 9258--9267.

[27]

Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Proc. NeurIPS. 13--23.

[28]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.

[29]

Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, and Ling Shao. 2020. Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification. In Proc. ECCV .

Digital Library

[30]

Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).

[31]

Boris N. Oreshkin, Pau Rodr'i guez Ló pez, and Alexandre Lacoste. 2018. TADAM: Task dependent adaptive metric for improved few-shot learning. In Proc. NIPS . 719--729.

[32]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proc. EMNLP. 1532--1543.

[33]

Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations. In Proc. NAACL-HLT. 1267--1273.

[34]

Sachin Ravi and Hugo Larochelle. 2017. Optimization as a Model for Few-Shot Learning. In Proc. ICLR .

[35]

Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. 2018. Meta-Learning for Semi-Supervised Few-Shot Classification. In Proc. ICLR .

[36]

Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. 2019. Meta-Learning with Latent Embedding Optimization. In Proc. ICLR .

[37]

Victor Garcia Satorras and Joan Bruna Estrach. 2018. Few-Shot Learning with Graph Neural Networks. In Proc. ICLR .

[38]

Christian Simon, Piotr Koniusz, Richard Nock, and Mehrtash Harandi. 2020. Adaptive Subspaces for Few-Shot Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020 . 4135--4144.

[39]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[40]

Jake Snell, Kevin Swersky, and Richard S. Zemel. 2017. Prototypical Networks for Few-shot Learning. In Proc. NIPS. 4077--4087.

Digital Library

[41]

Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In Proc. ICLR .

[42]

Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-Transfer Learning for Few-Shot Learning. In Proc. CVPR. 403--412.

[43]

Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proc. CVPR . 1199--1208.

[44]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. CVPR. 1--9.

[45]

Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proc. EMNLP-IJCNLP. 5099--5110.

[46]

Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. 2020. Rethinking Few-Shot Image Classification: A Good Embedding is All You Need?. In Proc. CVPR . 266--282.

Digital Library

[47]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. NIPS. 5998--6008.

[48]

Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In Proc. NIPS. 3630--3638.

[49]

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.

[50]

Yan Wang, Wei-Lun Chao, Kilian Q. Weinberger, and Laurens van der Maaten. 2019. SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning. CoRR, Vol. abs/1911.04623 (2019).

[51]

Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. 2018. Low-shot learning from imaginary data. In Proc. CVPR. 7278--7286.

[52]

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proc. CVPR . 1492--1500.

[53]

Chen Xing, Negar Rostamzadeh, Boris N. Oreshkin, and Pedro O. Pinheiro. 2019. Adaptive Cross-Modal Few-shot Learning. In Proc. NIPS. 4848--4858.

[54]

Shipeng Yan, Songyang Zhang, and Xuming He. 2019. A Dual Attention Network with Semantic Embedding for Few-Shot Learning. In Proc. AAAI . 9079--9086.

Digital Library

[55]

Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. 2020. Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. In Proc. CVPR .

[56]

Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. 2020. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020 . 12200--12210.

[57]

Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a Deep Embedding Model for Zero-Shot Learning. In Proc. CVPR . 3010--3019.

[58]

Ruixiang Zhang, Tong Che, Zoubin Ghahramani, Yoshua Bengio, and Yangqiu Song. 2018. MetaGAN: An Adversarial Approach to Few-Shot Learning. In Proc. NIPS . 2371--2380.

Cited By

Zhou XLiu FZhang CLi FCai WZhou J(2024)Feature-weighted Multi-stage Bayesian Prototype for Few-shot ClassificationProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700244(1-7)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700244
Li ZWang YLi KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FewVS: A Vision-Semantics Integration Framework for Few-Shot Image ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681427(1341-1350)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681427
Li WWang QZhao PYin YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)KNN Transformer with Pyramid Prompts for Few-Shot LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680601(1082-1091)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680601
Show More Cited By

Index Terms

Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification
        Object recognition

Recommendations

Can we improve meta-learning model in few-shot learning by aligning data distributions?
Abstract
Meta-learning becomes a promising way to solve the few-shot learning problem in recent research. This paradigm mainly relies on hierarchical architecture and episodic training to achieve good generalization on the new learning task. ...
Unsupervised meta-learning for few-shot learning
Highlights
- Unsupervised meta-learning that auto-constructs tasks from unlabeled data.
- ...
Abstract
Meta-learning is an effective tool to address the few-shot learning problem, which requires new data to be classified considering only a few training examples. However, when used for classification, it requires large labeled datasets, ...
Learning task-specific discriminative embeddings for few-shot image classification
Abstract
Recently, few-shot learning has attracted more and more attention. Generally, the fine-tuning-based few-shot learning framework contains two stages: i) In the pre-training stage, using base data to train the feature extractor; ii) In ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

August 2021

715 pages

ISBN:9781450384636

DOI:10.1145/3460426

General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

GENCI-IDRIS
ANR
Clinical Medicine Plus X - Young Scholars Project Peking University the Fundamental Research Funds for the Central Universitie
the National Key R&D Program of China
Capital Health Development Scientific Research Project
Global Challenges Research Fund (GCRF)

Conference

ICMR '21

Sponsor:

SIGMM

ICMR '21: International Conference on Multimedia Retrieval

August 21 - 24, 2021

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
163
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou XLiu FZhang CLi FCai WZhou J(2024)Feature-weighted Multi-stage Bayesian Prototype for Few-shot ClassificationProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700244(1-7)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700244
Li ZWang YLi KCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)FewVS: A Vision-Semantics Integration Framework for Few-Shot Image ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681427(1341-1350)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681427
Li WWang QZhao PYin YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)KNN Transformer with Pyramid Prompts for Few-Shot LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680601(1082-1091)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680601
Zhang HXu JJiang SHe Z(2024)Simple Semantic-Aided Few-Shot Learning2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02701(28588-28597)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02701
Huang XChoi S(2024)Bimodal semantic fusion prototypical network for few-shot classificationInformation Fusion10.1016/j.inffus.2024.102421109(102421)Online publication date: Sep-2024
https://doi.org/10.1016/j.inffus.2024.102421
Li NKteich HBouraoui ZSchockaert SChen HDuh WHuang HKato MMothe JPoblete B(2023)Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language ModelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591667(216-226)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591667
Yang FWang RChen X(2023)Semantic Guided Latent Parts Embedding for Few-Shot Learning2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00541(5436-5446)Online publication date: Jan-2023
https://doi.org/10.1109/WACV56688.2023.00541
Chen WSi CZhang ZWang LWang ZTan T(2023)Notice of Removal: Semantic Prompt for Few-Shot Image Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.02258(23581-23591)Online publication date: Jun-2023
https://doi.org/10.1109/CVPR52729.2023.02258

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents