Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions

Zhenyi Wang¹²,
Li Shen¹³,
Le Fang¹²,
Qiuling Suo¹²,
Donglin Zhan¹⁴,
Tiehang Duan¹⁵ &
…
Mingchen Gao¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13680))

Included in the following conference series:

European Conference on Computer Vision

2751 Accesses
5 Citations

Abstract

The paradigm of machine intelligence moves from purely supervised learning to a more practical scenario when many loosely related unlabeled data are available and labeled data is scarce. Most existing algorithms assume that the underlying task distribution is stationary. Here we consider a more realistic and challenging setting in that task distributions evolve over time. We name this problem as Semi-supervised meta-learning with Evolving Task diStributions, abbreviated as SETS. Two key challenges arise in this more realistic setting: (i) how to use unlabeled data in the presence of a large amount of unlabeled out-of-distribution (OOD) data; and (ii) how to prevent catastrophic forgetting on previously learned task distributions due to the task distribution shift. We propose an OOD Robust and knowleDge presErved semi-supeRvised meta-learning approach (ORDER) (we use ORDER to denote the task distributions sequentially arrive with some ORDER), to tackle these two major challenges. Specifically, our ORDER introduces a novel mutual information regularization to robustify the model with unlabeled OOD data and adopts an optimal transport regularization to remember previously learned knowledge in feature space. In addition, we test our method on a very challenging dataset: SETS on large-scale non-stationary semi-supervised task distributions consisting of (at least) 72K tasks. With extensive experiments, we demonstrate the proposed ORDER alleviates forgetting on evolving task distributions and is more robust to OOD data than related strong baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Task Sampler Learning for Meta-Learning

Article 17 June 2024

Self-Enhancer: A Self-supervised Framework for Low-Supervision, Drifted Data with Significant Missing Values

Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams

References

Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z.: Differentiable convex optimization layers. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 144–161. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_9
Chapter Google Scholar
Aljundi, R., Kelchtermans, K., Tuytelaars, T.: Task-free continual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Antoniou, A., Edwards, H., Storkey, A.: How to train your maml. In: International Conference on Learning Representations (2019)
Google Scholar
Antoniou, A., Patacchiola, M., Ochal, M., Storkey, A.: Defining benchmarks for continual few-shot learning (2020). https://arxiv.org/abs/2004.11967
Bae, I., et al.: Self-driving like a human driver instead of a robocar: Personalized comfortable driving experience for autonomous vehicles. In: NeurIPS Workshop (2019)
Google Scholar
Balaji, Y., Chellappa, R., Feizi, S.: Robust optimal transport with applications in generative modeling and domain adaptation. In: Advances in Neural Information Processing Systems, pp. 12934–12944 (2020)
Google Scholar
Barber, D., Agakov, F.: The im algorithm: A variational approach to information maximization (2003)
Google Scholar
Belouadah, E., Popescu, A.: Il2m: Class incremental learning with dual memory. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 583–592 (2019)
Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: A holistic approach to semi-supervised learning (2019)
Google Scholar
Bertinetto, L., Henriques, J.F., Torr, P.H.S., Vedaldi, A.: Meta-learning with differentiable closed-form solvers. In: International Conference on Learning Representations (2019)
Google Scholar
Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark experience for general continual learning: a strong, simple baseline. In: 34th Conference on Neural Information Processing Systems (2020)
Google Scholar
Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with a-gem. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Chaudhry, A., et al.: Continual learning with tiny episodic memories (2019). https://arxiv.org/abs/1902.10486
Chen, T., Wu, W., Gao, Y., Dong, L., Luo, X., Lin, L.: Fine-grained representation learning and recognition by exploiting hierarchical semantic embedding. In: ACM International Conference on Multimedia (2018)
Google Scholar
Chen, W.Y., Liu, Y.C., Kira, Z., Wang, Y.C., Huang, J.B.: A closer look at few-shot classification. In: International Conference on Learning Representations (2019)
Google Scholar
Cheng, P., Hao, W., Dai, S., Liu, J., Gan, Z., Carin, L.: Club: A contrastive log-ratio upper bound of mutual information. In: Proceedings of the 37th International Conference on Machine Learning (2020)
Google Scholar
Diethe, T.: Practical considerations for continual learning (Amazon) (2020)
Google Scholar
Ebrahimi, S., Meier, F., Calandra, R., Darrell, T., Rohrbach, M.: Adversarial continual learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 386–402. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_23
Chapter Google Scholar
Edwards, H., Storkey, A.: Towards a neural statistician. arXiv: 6060.2185 (2017)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning (2017)
Google Scholar
Finn, C., Rajeswaran, A., Kakade, S., Levine, S.: Online meta-learning. In: Proceedings of International Conference on Machine Learning (2019)
Google Scholar
Finn, C., Xu, K., Levine, S.: Probabilistic model-agnostic meta-learning. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Genevay, A., Peyré, G., Cuturi, M.: Learning generative models with sinkhorn divergences (2018)
Google Scholar
Guo, J., Zhu, X., Zhao, C., Cao, D., Lei, Z., Li, S.Z.: Learning meta face recognition in unseen domains. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Guo, X., Yang, C., Li, B., Yuan, Y.: Metacorrection: Domain-aware meta loss correction for unsupervised domain adaptation in semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Horn, G.V., et al.: The inaturalist species classification and detection dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Huang, X., Qi, J., Sun, Y., Zhang, R.: Semi-supervised dialogue policy learning via stochastic reward estimation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representations (2014)
Google Scholar
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences (2017)
Google Scholar
Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.: One shot learning of simple visual concepts. In: Conference of the Cognitive Science Society (2011)
Google Scholar
Lee, K., Lee, K., Shin, J., Lee, H.: Overcoming catastrophic forgetting with unlabeled data in the wild. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Li, X., et al.: Learning to self-train for semi-supervised few-shot classification. In: Proceedings of the Advances in Neural Information Processing Systems (2019)
Google Scholar
Liu, Y., et al.: Learning to propagate labels: Transductive propagation network for few-shot learning. In: International Conference on Learning Representations (2019)
Google Scholar
Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft (2013). https://arxiv.org/abs/1306.5151
Mishra, N., Rohaninejad, M., Chen, X., Abbeel, P.: A simple neural attentive meta-learner. In: International Conference on Learning Representations (2018)
Google Scholar
Munkhdalai, T., Yu, H.: Meta networks. In: Proceedings of the 34th International Conference on Machine Learning (2017)
Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Nguyen, C.V., Li, Y., Bui, T.D., Turner, R.E.: Variational continual learning. In: Proceedings of the International Conference on Learning Representations (2018)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes (2008)
Google Scholar
Poole, B., Ozair, S., van den Oord, A., Alemi, A.A., Tucker, G.: On variational bounds of mutual information (2019)
Google Scholar
Raghu, A., Raghu, M., Bengio, S., Vinyals, O.: Rapid learning or feature reuse? towards understanding the effectiveness of maml. In: International Conference on Learning Representations (2020)
Google Scholar
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (2017)
Google Scholar
Ren, M., et al.: Meta-learning for semi-supervised few-shot classification. In: International Conference on Learning Representations (2018)
Google Scholar
Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference. In: International Conference on Learning Representations (2019)
Google Scholar
Saha, G., Garg, I., Roy, K.: Gradient projection memory for continual learning. In: International Conference on Learning Representations (2021)
Google Scholar
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: Proceedings of the 34th International Conference on Machine Learning (2016)
Google Scholar
Schmidhuber, J.: A neural network that embeds its own meta-levels. In: IEEE International Conference on Neural Networks (1993)
Google Scholar
Sehwag, V., Chiang, M., Mittal, P.: Ssd: A unified framework for self-supervised outlier detection. In: International Conference on Learning Representations (2021)
Google Scholar
Smith, J., Balloch, J., Hsu, Y.C., Kira, Z.: Memory-efficient semi-supervised continual learning: The world is its own replay buffer (2021). https://arxiv.org/abs/2101.09536
Snell, J., Swersky, K., Zemel, R.S.: Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Tseng, H.Y., Lee, H.Y., Huang, J.B., Yang, M.H.: Cross-domain few-shot classification via learned feature-wise transformation. In: Proceedings of the International Conference on Learning Representations (2020)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning (2016). https://arxiv.org/pdf/1606.04080.pdf
Vitter, J.S.: Random sampling with a reservoir. Association for Computing Machinery (1985)
Google Scholar
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Software 11, 37–57 (1985)
Article MathSciNet Google Scholar
Vuorio, R., Sun, S.H., Hu, H., Lim, J.J.: Multimodal model-agnostic meta-learning via task-aware modulation. In: Proceedings of the Advances in Neural Information Processing Systems (2019)
Google Scholar
Wang, L., Yang, K., Li, C., Hong, L., Li, Z., Zhu, J.: Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Wang, Y.X., Ramanan, D., Hebert, M.: Meta-learning to detect rare objects. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2019)
Google Scholar
Wang, Z., Duan, T., Fang, L., Suo, Q., Gao, M.: Meta learning on a sequence of imbalanced domains with difficulty awareness. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8947–8957 (2021)
Google Scholar
Wang, Z., Shen, L., Duan, T., Zhan, D., Fang, L., Gao, M.: Learning to learn and remember super long multi-domain task sequence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7982–7992 (2022)
Google Scholar
Wang, Z., Shen, L., Fang, L., Suo, Q., Duan, T., Gao, M.: Improving task-free continual learning by distributionally robust memory evolution. In: International Conference on Machine Learning, pp. 22985–22998 (2022)
Google Scholar
Wang, Z., et al.: Meta-learning without data via wasserstein distributionally-robust model fusion. In: The Conference on Uncertainty in Artificial Intelligence (2022)
Google Scholar
Wang, Z., Zhao, Y., Yu, P., Zhang, R., Chen, C.: Bayesian meta sampling for fast uncertainty adaptation. In: International Conference on Learning Representations (2020)
Google Scholar
Welinder, P., et al.: Caltech-UCSD Birds 200 (2010)
Google Scholar
Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. In: International Conference on Learning Representations (2018)
Google Scholar
Zhou, Y., Wang, Z., Xian, J., Chen, C., Xu, J.: Meta-learning with neural tangent kernels. In: International Conference on Learning Representations (2021)
Google Scholar
Zhuang, P., Wang, Y., Qiao, Y.: Wildfish: A large benchmark for fish recognition in the wild (2018)
Google Scholar

Download references

Acknowledgement

We thank all the anonymous reviewers for their thoughtful and insightful comments. This research was supported in part by NSF through grant IIS-1910492.

Author information

Authors and Affiliations

State University of New York at Buffalo, Buffalo, USA
Zhenyi Wang, Le Fang, Qiuling Suo & Mingchen Gao
JD Explore Academy, Beijing, China
Li Shen
Columbia University, New York, NY, USA
Donglin Zhan
Meta, Seattle, WA, USA
Tiehang Duan

Authors

Zhenyi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Li Shen
View author publications
You can also search for this author in PubMed Google Scholar
Le Fang
View author publications
You can also search for this author in PubMed Google Scholar
Qiuling Suo
View author publications
You can also search for this author in PubMed Google Scholar
Donglin Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Tiehang Duan
View author publications
You can also search for this author in PubMed Google Scholar
Mingchen Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Li Shen or Mingchen Gao .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 464 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z. et al. (2022). Meta-Learning with Less Forgetting on Large-Scale Non-Stationary Task Distributions. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13680. Springer, Cham. https://doi.org/10.1007/978-3-031-20044-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-20044-1_13
Published: 20 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20043-4
Online ISBN: 978-3-031-20044-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics