[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3460426.3463641acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning

Published: 01 September 2021 Publication History

Abstract

Few-shot learning (FSL) is the task of learning to recognize previously unseen categories of images from a small number of training examples. This is a challenging task, as the available examples may not be enough to unambiguously determine which visual features are most characteristic of the considered categories. To alleviate this issue, we propose a method that additionally takes into account the names of the image classes. While the use of class names has already been explored in previous work, our approach differs in two key aspects. First, while previous work has aimed to directly predict visual prototypes from word embeddings, we found that better results can be obtained by treating visual and text-based prototypes separately. Second, we propose a simple strategy for learning class name embeddings using the BERT language model, which we found to substantially outperform the GloVe vectors that were used in previous work. We furthermore propose a strategy for dealing with the high dimensionality of these vectors, inspired by models for aligning cross-lingual word embeddings. We provide experiments on miniImageNet, CUB and tieredImageNet, showing that our approach consistently improves the state-of-the-art in metric-based FSL.

References

[1]
Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations. In Proc. AAAI. 5012--5019.
[2]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association of Computational Linguistics, Vol. 5, 1 (2017), 135--146.
[3]
Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, and Shih-Fu Chang. 2018. Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks. In Proc. CVPR. 1043--1052.
[4]
Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A Closer Look at Few-shot Classification. In 7Proc. ICLR .
[5]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proc. CVPR. Ieee, 248--255.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. NAACL-HLT .
[7]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proc. ICML . 1126--1135.
[8]
Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Proc. NIPS . 2121--2129.
[9]
Spyros Gidaris and Nikos Komodakis. 2018. Dynamic Few-Shot Visual Learning Without Forgetting. In Proc. CVPR . 4367--4375.
[10]
Fusheng Hao, Fengxiang He, Jun Cheng, Lei Wang, Jianzhong Cao, and Dacheng Tao. 2019. Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning. In Proceedings of the IEEE International Conference on Computer Vision . 8460--8469.
[11]
Bharath Hariharan and Ross Girshick. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proc. ICCV . 3018--3027.
[12]
Han He and Jinho Choi. 2020. Establishing strong baselines for the new decade: Sequence tagging, syntactic and semantic parsing with BERT. In Proc. FLAIRS .
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proc. CVPR. 770--778.
[14]
Ruibing Hou, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2019. Cross Attention Network for Few-shot Classification. In Proc. NeurIPS . 4005--4016.
[15]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proc. CVPR. 4700--4708.
[16]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. ICML. 448--456.
[17]
Jongmin Kim, Taesup Kim, Sungwoong Kim, and Chang D Yoo. 2019. Edge-labeling graph neural network for few-shot learning. In Proc. CVPR . 11--20.
[18]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proc. ICLR .
[19]
Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML Workshop, Vol. 2. Lille.
[20]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proc. NIPS . 1097--1105.
[21]
Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, and Stefano Soatto. 2019. Meta-Learning With Differentiable Convex Optimization. In Proc. CVPR . 10657--10665.
[22]
Aoxue Li, Weiran Huang, Xu Lan, Jiashi Feng, Zhenguo Li, and Liwei Wang. 2020. Boosting Few-Shot Learning With Adaptive Margin Loss. In Proc. CVPR . 12573--12581.
[23]
Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, and Xiaogang Wang. 2019 a. Finding task-relevant features for few-shot learning by category traversal. In Proc. CVPR . 1--10.
[24]
Wenbin Li, Lei Wang, Jinglin Xu, Jing Huo, Yang Gao, and Jiebo Luo. 2019 b. Revisiting Local Descriptor Based Image-To-Class Measure for Few-Shot Learning. In Proc. CVPR . 7260--7268.
[25]
Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. 2017. Meta-sgd: Learning to learn quickly for few-shot learning. arXiv preprint arXiv:1707.09835 (2017).
[26]
Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, and Andrei Bursuc. 2019. Dense Classification and Implanting for Few-Shot Learning. In Proc. CVPR . 9258--9267.
[27]
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Proc. NeurIPS. 13--23.
[28]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[29]
Sanath Narayan, Akshita Gupta, Fahad Shahbaz Khan, Cees G. M. Snoek, and Ling Shao. 2020. Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification. In Proc. ECCV .
[30]
Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999 (2018).
[31]
Boris N. Oreshkin, Pau Rodr'i guez Ló pez, and Alexandre Lacoste. 2018. TADAM: Task dependent adaptive metric for improved few-shot learning. In Proc. NIPS . 719--729.
[32]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Proc. EMNLP. 1532--1543.
[33]
Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations. In Proc. NAACL-HLT. 1267--1273.
[34]
Sachin Ravi and Hugo Larochelle. 2017. Optimization as a Model for Few-Shot Learning. In Proc. ICLR .
[35]
Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, and Richard S. Zemel. 2018. Meta-Learning for Semi-Supervised Few-Shot Classification. In Proc. ICLR .
[36]
Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. 2019. Meta-Learning with Latent Embedding Optimization. In Proc. ICLR .
[37]
Victor Garcia Satorras and Joan Bruna Estrach. 2018. Few-Shot Learning with Graph Neural Networks. In Proc. ICLR .
[38]
Christian Simon, Piotr Koniusz, Richard Nock, and Mehrtash Harandi. 2020. Adaptive Subspaces for Few-Shot Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020 . 4135--4144.
[39]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[40]
Jake Snell, Kevin Swersky, and Richard S. Zemel. 2017. Prototypical Networks for Few-shot Learning. In Proc. NIPS. 4077--4087.
[41]
Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In Proc. ICLR .
[42]
Qianru Sun, Yaoyao Liu, Tat-Seng Chua, and Bernt Schiele. 2019. Meta-Transfer Learning for Few-Shot Learning. In Proc. CVPR. 403--412.
[43]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proc. CVPR . 1199--1208.
[44]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proc. CVPR. 1--9.
[45]
Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proc. EMNLP-IJCNLP. 5099--5110.
[46]
Yonglong Tian, Yue Wang, Dilip Krishnan, Joshua B. Tenenbaum, and Phillip Isola. 2020. Rethinking Few-Shot Image Classification: A Good Embedding is All You Need?. In Proc. CVPR . 266--282.
[47]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Proc. NIPS. 5998--6008.
[48]
Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching Networks for One Shot Learning. In Proc. NIPS. 3630--3638.
[49]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.
[50]
Yan Wang, Wei-Lun Chao, Kilian Q. Weinberger, and Laurens van der Maaten. 2019. SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning. CoRR, Vol. abs/1911.04623 (2019).
[51]
Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. 2018. Low-shot learning from imaginary data. In Proc. CVPR. 7278--7286.
[52]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proc. CVPR . 1492--1500.
[53]
Chen Xing, Negar Rostamzadeh, Boris N. Oreshkin, and Pedro O. Pinheiro. 2019. Adaptive Cross-Modal Few-shot Learning. In Proc. NIPS. 4848--4858.
[54]
Shipeng Yan, Songyang Zhang, and Xuming He. 2019. A Dual Attention Network with Semantic Embedding for Few-Shot Learning. In Proc. AAAI . 9079--9086.
[55]
Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. 2020. Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. In Proc. CVPR .
[56]
Chi Zhang, Yujun Cai, Guosheng Lin, and Chunhua Shen. 2020. DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020 . 12200--12210.
[57]
Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a Deep Embedding Model for Zero-Shot Learning. In Proc. CVPR . 3010--3019.
[58]
Ruixiang Zhang, Tong Che, Zoubin Ghahramani, Yoshua Bengio, and Yangqiu Song. 2018. MetaGAN: An Adversarial Approach to Few-Shot Learning. In Proc. NIPS . 2371--2380.

Cited By

View all
  • (2024)Feature-weighted Multi-stage Bayesian Prototype for Few-shot ClassificationProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700244(1-7)Online publication date: 3-Dec-2024
  • (2024)FewVS: A Vision-Semantics Integration Framework for Few-Shot Image ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681427(1341-1350)Online publication date: 28-Oct-2024
  • (2024)KNN Transformer with Pyramid Prompts for Few-Shot LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680601(1082-1091)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
      August 2021
      715 pages
      ISBN:9781450384636
      DOI:10.1145/3460426
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 September 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bert
      2. few-shot learning
      3. metric-based learning
      4. multi-modal

      Qualifiers

      • Research-article

      Funding Sources

      • GENCI-IDRIS
      • ANR
      • Clinical Medicine Plus X - Young Scholars Project Peking University the Fundamental Research Funds for the Central Universitie
      • the National Key R&D Program of China
      • Capital Health Development Scientific Research Project
      • Global Challenges Research Fund (GCRF)

      Conference

      ICMR '21
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 254 of 830 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 30 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Feature-weighted Multi-stage Bayesian Prototype for Few-shot ClassificationProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700244(1-7)Online publication date: 3-Dec-2024
      • (2024)FewVS: A Vision-Semantics Integration Framework for Few-Shot Image ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681427(1341-1350)Online publication date: 28-Oct-2024
      • (2024)KNN Transformer with Pyramid Prompts for Few-Shot LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680601(1082-1091)Online publication date: 28-Oct-2024
      • (2024)Simple Semantic-Aided Few-Shot Learning2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02701(28588-28597)Online publication date: 16-Jun-2024
      • (2024)Bimodal semantic fusion prototypical network for few-shot classificationInformation Fusion10.1016/j.inffus.2024.102421109(102421)Online publication date: Sep-2024
      • (2023)Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language ModelsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591667(216-226)Online publication date: 19-Jul-2023
      • (2023)Semantic Guided Latent Parts Embedding for Few-Shot Learning2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00541(5436-5446)Online publication date: Jan-2023
      • (2023)Notice of Removal: Semantic Prompt for Few-Shot Image Recognition2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.02258(23581-23591)Online publication date: Jun-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media