Abstract
Few-shot learning methods enable models to recognize novel classes from one or only a few examples. Most of the existing methods use episodic training which require multiple training models and/or variable memory requirements. We propose a non-episodic model, which combines a \(\beta\)-variational auto encoder (\(\beta\)-VAE) with the cosine similarity classifier (CSC). It is trained only once with fixed memory requirement and can be used for all test settings. We have also created our own printed character dataset with style attributes that includes six Indic scripts to analyze the effect of \(\beta\)-VAE’s disentangled content on few-shot classification accuracy. In our model, the encoder of \(\beta\)-VAE generates both the content and style distribution where only the content is used by the CSC. The network is trained in an end-to-end fashion which enables the encoder to disentangle the content features needed for classification from style attributes. After the training phase, the weights of novel classes are generated by normalizing the encoder’s content representation and used for testing. In case of Omniglot, we got 98.91% classification accuracy for 5-way 1-shot which is on par/better than the state-of-the art episodic methods. Our ablation study shows that \(\beta\)-VAE with CSC network outperforms CSC network on Indic script dataset. Our proposed model which is trained only once with fixed memory requirement gives comparable results with episodic and non-episodic methods as the encoder feeds the disentangled content features to the classifier. In a few settings, it also outperforms the episodic methods.
Similar content being viewed by others
Data Availability
The UHIndicPCwS dataset that we have created will be made available from April 3rd, 2023 at https://scis.uohyd.ac.in/~chakcs/datasets/UHIndicPCwS.html. The other standard dataset that is used in this paper is also publicly available.
Notes
For higher values of \(N(\>50)\) we ran 100 test episodes due to resource limitations.
References
Bengio Y. Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML workshop on unsupervised and transfer learning. 2012;17–36. JMLR workshop and conference proceedings.
Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Adv Neural Inf Process Syst. 2014;27:3320–8.
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to compare: relation network for few-shot learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. 2018;1199–208. https://doi.org/10.1109/CVPR.2018.00131.
Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Advances in neural information processing systems. NIPS’17. Curran Associates Inc., Red Hook; 2017. p. 4080–90.
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. In: Proceedings of the 30th international conference on neural information processing systems. NIPS’16, 2016;3637–45. Curran Associates Inc., Red Hook.
Chen W-Y, Liu Y-C, Kira Z, Wang Y-CF, Huang J-B. A closer look at few-shot classification. In: International conference on learning representations. 2019. https://openreview.net/forum?id=HkxLXnAcFQ.
Qi H, Brown M, Lowe DG. Low-shot learning with imprinted weights. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. 2018;5822–30. https://doi.org/10.1109/CVPR.2018.00610.
Gidaris S, Komodakis N. Dynamic few-shot visual learning without forgetting. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. 2018;4367–75. https://doi.org/10.1109/CVPR.2018.00459.
Gidaris S, Komodakis N. Generating classification weights with gnn denoising autoencoders for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019;21–30.
Sun Q, Liu Y, Chua T-S, Schiele B. Meta transfer learning for few shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2019;403–12.
Padma V, Bhagvati C. Non-episodic variational discriminative few-shot classifier. In: PReMI21: 9th international conference on pattern recognition and machine intelligence; 2021.
Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE Trans Pattern Anal Mach Intell. 2006;28(4):594–611. https://doi.org/10.1109/TPAMI.2006.79.
Lake B, Salakhutdinov R, Gross J, Tenenbaum J. One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society. 2011;33.
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th international conference on machine learning. 2017;70:1126–35. PMLR, Sydney, Australia.
Finn C, Xu K, Levine S. Probabilistic model-agnostic meta-learning. Adv Neural Inf Process Syst. 2018;31:25.
Jerfel G, Grant E, Griffiths T, Heller KA. Reconciling meta-learning and continual learning with online mixtures of tasks. In: Advances in neural information processing systems. Montreal: Curran Associates Inc; 2019. p. 32.
Li X, Yu L, Fu C-W, Fang M, Heng P. Revisiting metric learning for few-shot image classification. Neurocomputing. 2020;406:49–58.
Zhang C, Cai Y, Lin G, Shen C. Deepemd: few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2020; p. 12200–10. https://doi.org/10.1109/CVPR42600.2020.01222.
Hoogeboom E. Few-shot classification by learning disentangled representations. Master’s thesis, University of Amsterdem (2017).
Lee K, Maji S, Ravichandran A, Soatto S. Meta-learning with differentiable convex optimization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2019;10649–57. https://doi.org/10.1109/CVPR.2019.01091.
Guo Y, Cheung N-M. Attentive weights generation for few shot learning via information maximization. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2020;13496–505. https://doi.org/10.1109/CVPR42600.2020.01351.
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P. Rethinking few-shot image classification: a good embedding is all you need? In: Computer vision-ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, Part XIV 16, 2020; p. 266–82.
Wang Y, Chao W, Weinberger K, van der Maaten LS. Revisiting nearest-neighbor classification for few-shot learning. arXiv:1911.04623 (arXiv preprint) (2019).
Liu C, Fu Y, Xu C, Yang S, Li J, Wang C, Zhang L. Learning a few-shot embedding model with contrastive learning. In: Proceedings of the AAAI conference on artificial intelligence. 2021;35:8635–43.
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A. beta-vae: Learning basic visual concepts with a constrained variational framework. In: ICLR 2017. OpenReview.net, Toulon, France; 2017.
Kingma DP, Welling M. Auto-encoding variational bayes. arXiv:1312.6114 (arXiv preprint) (2013).
Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q. Cosine normalization: using cosine similarity instead of dot product in neural networks. In: International conference on artificial neural networks. Springer. 2018;382–91.
Edwards H, Storkey AJ. Towards a neural statistician. In: 5th international conference on learning representations, ICLR 2017. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=HJDBUF5le.
Hewitt LB, Nye MI, Gane A, Jaakkola TS, Tenenbaum JB. The variational homoencoder: Learning to learn high capacity generative models from few examples. In: Proceedings of the thirty-fourth conference on uncertainty in artificial intelligence, UAI 2018, pp. 988–997. AUAI Press, California, USA; 2018.
Zhang J, Zhao C, Ni B, Xu M, Yang X. Variational few-shot learning. In: 2019 IEEE/CVF international conference on computer vision (ICCV), 2019; p. 1685–1694. https://doi.org/10.1109/ICCV.2019.00177.
Funding
This work is part of the research work leading to a PhD degree and is currently not funded by any source. There are no conflicts of interest and ethics approval is not applicable as the work is primarily in the software domain. The authors work in a university with emphasis on postgraduate research and teaching; therefore the consent for participation and publication are not required and not applicable. The Indic dataset created as a part of this work will be made available soon after testing it internally to eliminate any errors. The code is written by the research scholar and does not follow any software engineering standards. Therefore, its release requires substantial effort and may not be available soon. The first author is a research scholar and the second, her supervisor. The contributions are therefore equally shared.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Pattern Recognition and Machine Learning” guest edited by Ashish Ghosh, Monidipa Das and Anwesha Law.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Padma, V., Bhagvati, C. Non-episodic Variational Weight Imprinting for Few-Shot Learning. SN COMPUT. SCI. 4, 303 (2023). https://doi.org/10.1007/s42979-023-01732-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-01732-1