[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Leveraging Task Variability in Meta-learning

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Meta-learning (ML) utilizes extracted meta-knowledge from data to enable models to perform well on unseen data that they have not encountered before. Typically, this meta-knowledge is acquired from randomly sampled task batches, and a critical assumption in the meta-learning is that all tasks in a batch equally contribute to the meta-knowledge. However, this assumption may not always hold true. In this study, we explore the impact of weighting tasks in a batch based on their contribution to meta-knowledge. We achieve this by introducing a learnable “task attention module” that can be integrated into any episodic training pipeline. We demonstrate that our approach improves the quality of the meta-knowledge obtained on standard meta-learning benchmarks such as miniImagenet, FC100, and tieredImagenet, as well as on noisy and cross-domain few-shot benchmarks. Additionally, we conduct a comprehensive analysis of the proposed task attention module to gain insights into its operation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Not applicable.

Notes

  1. https://github.com/taskattention/task-attended-metalearning.git.

References

  1. Agarwal M, Yurochkin M, Sun Y. On sensitivity of meta-learning to support data. In: Advances in Neural Information Processing Systems. 2021.

  2. Aimen A, Sidheekh S, Madan V, et al. Stress testing of meta-learning approaches for few-shot learning. In: AAAI Workshop on Meta-Learning and MetaDL Challenge. 2021.

  3. Antoniou A, Edwards H, Storkey A. How to train your MAML. In: Seventh International Conference on Learning Representations. 2019.

  4. Arnold S, Dhillon G, Ravichandran A, et al. Uniform sampling over episode difficulty. In: Advances in Neural Information Processing Systems. 2021.

  5. Arnold SM, Mahajan P, Datta D, et al. learn2learn: a library for meta-learning research. CoRR. 2020.

  6. Bengio Y, Louradour J, Collobert R, et al. Curriculum learning. In: International Conference on Machine Learning, ACM International Conference Proceeding Series. 2009.

  7. Bronskill J, Massiceti D, Patacchiola M, et al. Memory efficient meta-learning with large images. In: Advances in Neural Information Processing Systems. 2021.

  8. Chang H, Learned-Miller EG, McCallum A. Active bias: Training more accurate neural networks by emphasizing high variance samples. In: Advances in Neural Information Processing Systems. 2017.

  9. Chen WY, Liu YC, Kira Z, et al. A closer look at few-shot classification. In: International Conference on Learning Representations. 2018.

  10. Dhillon GS, Chaudhari P, Ravichandran A, et al. A baseline for few-shot image classification. In: International Conference on Learning Representations. 2019.

  11. Dumoulin V, Houlsby N, Evci U, et al. A unified few-shot classification benchmark to compare transfer and meta learning approaches. In: Neural Information Processing Systems Datasets and Benchmarks Track. 2021.

  12. Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning. 2017.

  13. Guo Y, Codella NC, Karlinsky L, et al. A broader study of cross-domain few-shot learning. In: European Conference on Computer Vision. Springer; 2020.

  14. Gutierrez RL, Leonetti M. Information-theoretic task selection for meta-reinforcement learning. In: Advances in Neural Information Processing Systems. 2020.

  15. Hochreiter S, Schmidhuber J. Long short-term memory. In: Neural Computation. 1997.

  16. Jamal MA, Qi G. Task agnostic meta-learning for few-shot learning. In: Computer Vision and Pattern Recognition. 2019.

  17. Jiang L, Zhou Z, Leung T, et al. Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International Conference on Machine Learning. 2018.

  18. Kaddour J, Sæmundsson S, Deisenroth MP. Probabilistic active meta-learning. In: Advances in Neural Information Processing Systems. 2020.

  19. Kahn H, Marshall AW. Methods of reducing sample size in Monte Carlo computations. Oper Res. 1953;1(5):263–78.

    MATH  Google Scholar 

  20. Kingma DP, Ba J. Adam: a method for stochastic optimization. In: International Conference on Learning Representations. 2015.

  21. Kolesnikov A, Beyer L, Zhai X, et al. Big transfer (bit): general visual representation learning. In: European Conference on Computer Vision. Springer; 2020.

  22. Kumar MP, Packer B, Koller D. Self-paced learning for latent variable models. In: Advances in Neural Information Processing Systems. 2010.

  23. Li J, Luo X, Qiao M. On generalization error bounds of noisy gradient methods for non-convex learning. In: International Conference on Learning Representations. 2019.

  24. Li Z, Zhou F, Chen F, et al. Meta-SGD: learning to learn quickly for few-shot learning. 2017.

  25. Lin T, Goyal P, Girshick RB, et al. Focal loss for dense object detection. In: IEEE International Conference on Computer Vision. 2017.

  26. Liu B, Liu X, Jin X, et al. Conflict-averse gradient descent for multi-task learning. In: Advances in Neural Information Processing Systems. 2021a.

  27. Liu C, Wang Z, Sahoo D, et al. Adaptive task sampling for meta-learning. In: European Conference on Computer Vision. 2020.

  28. Liu EZ, Haghgoo B, Chen AS, et al. Just train twice: improving group robustness without training group information. In: International Conference on Machine Learning. 2021b.

  29. Oh J, Yoo H, Kim C, et al. Boil: towards representation change for few-shot learning. In: International Conference on Learning Representations. 2020.

  30. Oreshkin BN, López PR, Lacoste A. TADAM: task dependent adaptive metric for improved few-shot learning. In: Advances in Neural Information Processing Systems. 2018.

  31. Raghu A, Raghu M, Bengio S, et al. Rapid learning or feature reuse? Towards understanding the effectiveness of MAML. In: International Conference on Learning Representations. 2020.

  32. Ravi S, Larochelle H. Optimization as a model for few-shot learning. In: International Conference on Learning Representations. 2017.

  33. Ren M, Triantafillou E, Ravi S, et al. Meta-learning for semi-supervised few-shot classification. In: International Conference on Learning Representations. 2018a.

  34. Ren M, Zeng W, Yang B, et al. Learning to reweight examples for robust deep learning. In: International Conference on Machine Learning. 2018b.

  35. Rusu AA, Rao D, Sygnowski J, et al. Meta-learning with latent embedding optimization. In: International Conference on Learning Representations. 2019.

  36. Shin J, Lee HB, Gong B, et al. Large-scale meta-learning with continual trajectory shifting. In: International Conference on Machine Learning. 2021.

  37. Shrivastava A, Gupta A, Girshick RB. Training region-based object detectors with online hard example mining. In: Conference on Computer Vision and Pattern Recognition. 2016.

  38. Sun Q, Liu Y, Chua T, et al. Meta-transfer learning for few-shot learning. In: Computer Vision and Pattern Recognition. 2019.

  39. Sun Q, Liu Y, Chen Z, Chua T, Schiele B, et al. Meta-transfer learning through hard tasks. IEEE Trans Pattern Anal Mach Intell. 2022;44(3):1443–56.

    Article  Google Scholar 

  40. Triantafillou E, Zhu T, Dumoulin V, et al. Meta-dataset: a dataset of datasets for learning to learn from few examples. In: International Conference on Learning Representations. 2019.

  41. Vinyals O, Blundell C, Lillicrap T, et al. Matching networks for one shot learning. In: Advances in Neural Information Processing Systems. 2016.

  42. Yao H, Wang Y, Wei Y, et al. Meta-learning with an adaptive task scheduler. In: Advances in Neural Information Processing Systems. 2021.

  43. Zhai X, Puigcerver J, Kolesnikov A, et al. A large-scale study of representation learning with the visual task adaptation benchmark. arXiv preprint. 2019.

  44. Zhao P, Zhang T. Stochastic optimization with importance sampling for regularized loss minimization. In: International Conference on Machine Learning. 2015.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aroof Aimen or Narayanan C. Krishnan.

Ethics declarations

Conflict of interest

With the submission of this manuscript, we would like to declare that: We do not have any commercial or non-commercial conflicts of interest associated with this work. This manuscript has not been published elsewhere and is not under consideration by another journal. All authors have approved the manuscript and agreed upon the submission to SN Journal (Special Issue: Research Trends in Computational Intelligence). This work has no ethical implications, and all the results are reproducible. The link to the code is provided in the manuscript. The support and the resources provided by ‘PARAM Shivay Facility’ under the National Supercomputing Mission, Government of India at the Indian Institute of Technology, Varanasi, and under the Google Tensorflow Research award are gratefully acknowledged.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Research Trends in Computational Intelligence” guest edited by Anshul Verma, Pradeepika Verma, Vivek Kumar Singh and S. Karthikeyan.

Appendices

Appendix A. Preliminary

A.0.1. Meta-knowledge as an Optimal Initialization

When meta-knowledge is a generic initialization on the model parameters learned through the experience over various tasks, it is enforced to be close to each individual training tasks’ optimal parameters. A model initialized with such an optimal prior quickly adapts to unseen tasks from the same distribution during meta-testing. MAML [12] employs a nested iterative process to learn the task-agnostic optimal prior \(\theta \). In the inner iterations representing the task adaptation steps, \(\theta \) is separately fine-tuned for each meta-training task \(\mathcal {T}_i\) of a batch using \(D_i\) to obtain \(\phi _i\) through gradient descent on the train loss L using learning rate \(\alpha \). Specifically, \(\phi _i\) is initialized as \(\theta \) and updated using \(\phi _i \leftarrow \phi _i - \alpha \nabla _{\phi _i}L(\phi _i)\), T times resulting in the adapted model \(\phi _i^T\). In the outer loop, meta-knowledge is gathered by optimizing \(\theta \) over loss \(L^*\) computed with the task adapted model parameters \(\phi _i^T\) on query dataset \(D^*_i\). Specifically, during meta-optimization \(\theta \leftarrow \theta -\beta \nabla _{\theta } \sum _{i=1}^B L^*(\phi _i^T)\) using a task batch of size B and learning rate \(\beta \). MetaSGD [24] improves upon MAML by learning parameter-specific learning rates \(\varvec{\alpha }\) in addition to the optimal initialization in a similar nested iterative procedure. Meta-knowledge is gathered by optimizing \(\theta \) and \(\varvec{\alpha }\) in the outer loop using the loss \(L^*\) computed on query set \(D^*_i\). Specifically, during meta-optimization \((\theta ,\varvec{\alpha }) \leftarrow (\theta ,\varvec{\alpha }) - \beta \nabla _{(\theta , \varvec{\alpha })} \sum _{i=1}^B L^*(\phi _i^T)\). Learning dynamic learning rates for each parameter of a model makes MetaSGD faster and more generalizable than MAML. A single adaptation step is sufficient to adjust the model towards a new task. The performance of MAML is attributed to the reuse of the features across tasks rather than the rapid learning of new tasks [31]. Exploiting this characteristic, ANIL freezes the feature backbone layers (\(1,\dots , l-1\)) and only adapts classifier layer (l) in the inner loop T times. Specifically during adaptation \(\phi _i^l \leftarrow \phi _i^l - \alpha \nabla _{\phi _i^l} L(\phi _i^l)\). During meta-optimization \(\theta ^{1,\dots , l} \leftarrow \theta ^{1,\dots , l} -\beta \nabla _{\theta ^{1,\dots , l}} \sum _{i=1}^B L^*(\phi _i^{lT})\) i.e., all layers are learned in the outer loop. Freezing the feature backbone during adaptation reduces the overhead of computing gradient through the gradient (differentiating through the inner loop), and thereby heavier backbones could be used for the feature extraction. TAML [16] suggests that the optimal prior learned by MAML may still be biased towards some tasks. They propose to reduce this bias and enforce equity among the tasks by explicitly minimizing the inequality among the performances of tasks in a batch. The inequality defined using statistical measures such as Theil Index, Atkinson Index, Generalized Entropy Index, and Gini Coefficient among the performances of tasks in a batch is used as a regularizer while gathering the meta-knowledge. For the baseline comparison, in our experiments, we use the Theil index for TAML owing to its average best results. Specifically during meta-optimization \(\theta \leftarrow \theta -\beta \nabla _{\theta } \left[ \sum _{i=1}^B L^*(\phi _i^T)+ \lambda \left\{ \dfrac{ L^*(\phi _i^0)}{\bar{L}^*(\phi _i^0)} \ln \dfrac{L^*(\phi _i^0)}{\bar{L}^*(\phi _i^0)}\right\} \right] \) (for TAML-Theil Index) where B is the number of tasks in a batch, \(L^*(\phi _i^0)\) is the loss incurred by initial model \(\phi _i^0\) on the query set \({D}^*_i\) of task \(\mathcal {T}_i\) and \(\bar{L}^{*}(\phi _i^0)\) is the average query loss of initial model on a batch of tasks. As TAML enforces equity of the optimal prior towards meta-train tasks, it counters the adaptation, which leads to slow and unstable training largely dependent on \(\lambda \).

A.0.2. Meta-knowledge as a Parametric Optimizer

A regulated gradient-based optimizer gathers the task-specific and task-agnostic meta-knowledge to traverse the loss surfaces of tasks in the meta-train set during meta-training. A base model guided by such a learned parametric optimizer quickly finds the way to minima even for unseen tasks sampled from the same distribution during meta-testing. MetaLSTM [32] is a recurrent parametric optimizer \(\theta \) that mimics the gradient-based optimization of a base model \(\phi \). This recurrent optimizer is an LSTM [15] and is inherently capable of performing two-level learning due to its architecture. During adaptation of \(\phi _i\) on \(D_i\), \(\theta \) takes meta information of \(\phi _i\) characterized by its current loss L and gradients \(\nabla _{\phi _ {i}}(L)\) as input and outputs the next set of parameters for \(\phi _i\). This adaptation procedure is repeated T times resulting in the adapted base-model \(\phi _i^T\). Internally, the cell state of \(\theta \) corresponds to \(\phi _i\), and the cell state update for \(\theta \) resembles a learned and controlled gradient update. The emphasis on previous parameters and the current update is regulated by the learned forget and input gates respectively. While adapting \(\phi _i\) to \(D_i\), information about the trajectory on the loss surface across the adaptation steps is captured in the hidden states of \(\theta \), representing the task-specific knowledge. During meta-optimization, \(\theta \) is updated based on the loss of the adapted model \(L^*(\phi _i^T)\) computed on the query set \(D^*_i\) to garner the meta-knowledge across tasks. Specifically, during meta-optimization, \(\theta \leftarrow \theta -\beta \nabla _{\theta } L^*(\phi _i^T)\). MetaLSTM updates parametric optimizer \(\theta \) after adapting the base model \(\phi \) to each task. This causes \(\theta \) to follow optima’s of all adapted base models leading to its elongated and fluctuating optimization trajectory, which is biased towards the last task. MetaLSTM++ [2] circumvents these issues as \(\theta \) is updated by an aggregate query loss of the adapted models on a batch of tasks. Batch updates smoothen the optimization trajectory of \(\theta \) and eliminate its bias towards the last task. Specifically, during meta-optimization \(\theta \leftarrow \theta -\beta \nabla _{\theta } \sum _{i=1}^B L^*(\phi _i^T)\).

Appendix B. Details of Proposed Approach

Fig. 7
figure 7

[Best viewed in color] Workflow of proposed training curriculum

We explain the proposed approach through Figs. 1, 7, Algorithm 1, and equations. We first sample a batch of tasks (B) from a random pool of data (Fig. 7-Label \(\textcircled {1}\)). For each task, the base-model \(\phi _i\) is adapted using the support data \(D_i\) for T time-steps (line 7 and lines 20–32 in Algorithm 1, Fig. 7-Label \(\textcircled {3}\)). Specifically, the adaptation is done using gradient descent on the train loss L for initialization approaches (lines 22–26 in Algorithm 1, Fig. 7-GD), or the current loss and gradients are inputted to the meta-model \(\theta \) for optimization approaches, which then outputs the updated base-model parameters (lines 27–31 in Algorithm 1, Fig. 7-PO). The meta-information (\(\mathcal {I}\)) corresponding to each task in the batch is then calculated (Fig. 7-Label \(\textcircled {4}\)), which includes the loss, accuracy, loss-ratio, and gradient norm of adapted models on the query data. This is given as input to the task attention module (Fig. 1-Label \(\textcircled {2}\), Fig. 7-Label \(\textcircled {5}\)), which outputs the attention vector (line 10 in Algorithm 1, Fig. 7-Label \(\textcircled {6}\)). The attention vector and test losses are used to update the meta-model parameters \(\theta \) according to Eq. (2) (line 11 in Algorithm 1, Fig. 1-Label \(\textcircled {4}\), Fig. 7-Label \(\textcircled {7}\)). A new batch of tasks is then sampled and the base-models are adapted using the updated meta-model (Lines 12–16 in Algorithm 1, Fig. 1-Label \(\textcircled {5}\)). The mean test loss over the adapted base-models is calculated and used to update the parameters of the task attention module \(\delta \) according to Eq. (3).

Appendix C. Experiments

C.0.1. Datasets Details

miniImagenet dataset [41] comprises 600 color images of size 84 \(\times \) 84 from each of 100 classes sampled from the Imagenet dataset. The 100 classes are split into 64, 16 and 20 classes for meta-training, meta-validation and meta-testing respectively. miniImagenet-noisy [42] is constructed from the miniImagenet dataset with the additional constraint that tasks have noisy support labels and clean query labels. The noise in support labels is introduced by symmetry flipping, and the default noise ratio is 0.6. Fewshot Cifar 100 (FC100) dataset [30] has been created from Cifar 100 object classification dataset. It contains 600 color images of size 32 \(\times \) 32 corresponding to each of 100 classes grouped into 20 super-classes. Among 100 classes, 60 classes belonging to 12 super-classes correspond to the meta-train set, 20 classes from 4 super-classes to the meta-validation set, and the rest to the meta-test set. tieredImagenet [33] is a more challenging benchmark for few-shot image classification. It contains 779,165 color images sampled from 608 classes of Imagenet and are grouped into 34 super-classes. These super-classes are divided into 20, 6, and 8 disjoint sets for meta-training, meta-validation, and meta-testing. Metadataset [40] comprises of 10 freely available diverse datasets—Aircraft, CUB-200-2011, Describable Textures, Fungi, ILSVRC-2012, MSCOCO, Omniglot, Quick Draw, Traffic Signs, and VGG Flower. We utilized CUB-200, FGVC-Aircraft, Describable Textures, and Omniglot datasets from Metadataset. VTAB dataset [43] is a more diverse dataset than Metadataset that was proposed to avoid overlapping classes of sub-datasets with the Imagenet dataset. VTAB comprises of 19 datasets divided into three domains—Natural, Specialized, and Structured, depending on the type of images. The natural group contains Caltech101, CIFAR100, DTD, Flowers102, Pets, Sun397, and SVHN sub-datasets, while the specialized group consists of remote sensing datasets like EuroSAT and Resisic 45 and medical datasets like Retinopathy and Patch Camelyon. Structured contains object counting or 3D depth prediction datasets like Clevr/count, Clevr/distance, dSprites/location, dSprites/orientation, SmallNORB/azimuth, SmallNORB/elevation, DMLab, and KITTI/distance. We considered Natural sub-datasets like DTD, CIFAR FC 100, Flowers102, and SVHN, specialized sub-datasets like EuroSAT and Resisc45, and structured sub-datasets like dSprites_location and dSprites_orientation for cross-domain experimentation. According to [11], we have kept Describable Textures as a part of Metadataset and Flowers102 as a component of the VTAB dataset.

Fig. 8
figure 8

Rank analysis of tasks for maximum and minimum values of: loss, loss-ratio, accuracy and gradient norm throughout the training of TA-MAML\(^*\) for 5 way 1 shot setting on miniImagenet dataset

C.0.2. Ablation Studies

We analyze the ranks of the tasks for maximum and minimum values of: loss, loss ratio, accuracy, and gradient norm in a batch wrt attention weights throughout meta-training of TA-MAML on a 5 way 1 and 5 shot settings on miniImagenet dataset (Figs. 8, 9). Specifically, the highest weighted task is given rank one, and the least weighted task in a batch is given the last rank. We observe that the TA module does not assign maximum weight to the tasks with maximum or minimum values of: test loss, loss ratio, gradient norm or accuracy throughout meta-training. Thus, the TA module does not trivially learn to assign weights to the tasks based on some component of meta-information but learns useful latent information from all the components to assign importance for the tasks in a batch.

Fig. 9
figure 9

Rank analysis of tasks for maximum and minimum values of: loss, loss-ratio, accuracy and gradient norm throughout the training of TA-MAML\(^*\) for 5 way 5 shot setting on miniImagenet dataset

Table 7 Comparison of few-shot classification performance of MAML and ANIL reported in the original papers (denoted by #) and the re-implementation by others on miniImagenet dataset for 5 way 1 and 5 shot settings
Fig. 10
figure 10

Trend analysis of weighted loss across meta-training iterations for TA-MAML\(^*\) on 5 way 1 shot (left) and 5 shot (right) settings on miniImagenet dataset. Iterations are in thousands

C.0.3. Relation of Weights with Meta-information

In Fig. 10, we illustrate the trend of mean weighted loss across iterations for TA-MAML on 5 way 1 and 5 shot settings on miniImagenet dataset. The trend indicates that the average weighted loss decreases over the meta-training iterations. The shaded region represents a 95% confidence interval over 100 tasks.

C.0.4. Hyperparameter Details

Setting

Model

Base lr

Meta lr

Attention lr

Lambda

   

miniImagenet

  

5.1

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.0748

 

TA-MAML\(^*\)

0.0763

0.0005

0.0004

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0529

0.0011

0.0004

 

MetaLSTM

0.005

 

MetaLSTM++

0.0012

 

TA-MetaLSTM++\(^*\)

0.0012

0.0031

 

ANIL

0.3000

0.0006

 

TA-ANIL\(^*\)

0.0763

0.0005

0.0004

5.5

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.7916

 

TA-MAML\(^*\)

0.0763

0.0005

0.0004

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0529

0.0011

0.0004

 

MetaLSTM

0.005

 

MetaLSTM++

0.0012

 

TA-MetaLSTM++\(^*\)

0.0004

0.0001

 

ANIL

0.3000

0.0006

 

TA-ANIL\(^*\)

0.0763

0.0005

0.0004

10.1

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.2631

 

TA-MAML\(^*\)

0.2551

0.0015

0.0001

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0627

0.0008

0.0013

 

MetaLSTM

0.005

 

MetaLSTM++

0.0015

 

TA-MetaLSTM++\(^*\)

0.0009

0.0015

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.2551

0.0015

0.0001

10.5

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.0741

 

TA-MAML\(^*\)

0.2551

0.0015

0.0001

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0627

0.0008

0.0013

 

MetaLSTM

0.005

 

MetaLSTM++

0.0036

 

TA-MetaLSTM++\(^*\)

0.0024

0.0002

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.2551

0.0015

0.0001

   

FC100

  

5.1

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.0164

 

TA-MAML\(^*\)

0.2826

0.0003

0.0024

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0349

0.0008

0.0001

 

MetaLSTM

0.005

 

MetaLSTM++

0.0010

 

TA-MetaLSTM++\(^*\)

0.0002

0.0074

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.2826

0.0003

0.0024

5.5

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.0153

 

TA-MAML\(^*\)

0.2826

0.0003

0.0024

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0349

0.0008

0.0001

 

MetaLSTM

0.005

 

MetaLSTM++

0.0002

 

TA-MetaLSTM++\(^*\)

0.0007

0.0003

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.2826

0.0003

0.0024

10.1

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.0794

 

TA-MAML\(^*\)

0.2353

0.0002

0.0001

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.2583

0.0029

0.0007

 

MetaLSTM

0.005

 

MetaLSTM++

0.0021

 

TA-MetaLSTM++\(^*\)

0.0005

0.0014

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.2826

0.0003

0.0024

10.5

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.0193

 

TA-MAML\(^*\)

0.2353

0.0002

0.0001

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.2583

0.0029

0.0007

 

MetaLSTM

0.005

 

MetaLSTM++

0.0004

 

TA-MetaLSTM++\(^*\)

0.0004

0.0090

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.2826

0.0003

0.0024

   

tieredImagenet

  

5.1

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.3978

 

TA-MAML\(^*\)

0.0261

0.0005

0.0015

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0944

0.0003

0.0002

 

MetaLSTM

0.005

 

MetaLSTM++

0.0002

 

TA-MetaLSTM++\(^*\)

0.0010

0.0006

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.0261

0.0005

0.0015

5.5

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.7733

 

TA-MAML\(^*\)

0.0261

0.0005

0.0015

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0944

0.0003

0.0002

 

MetaLSTM

0.005

 

MetaLSTM++

0.0009

 

TA-MetaLSTM++\(^*\)

0.0012

0.0001

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.0261

0.0005

0.0015

10.1

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.4752

 

TA-MAML\(^*\)

0.0821

0.0002

0.0006

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0512

0.0007

0.0018

 

MetaLSTM

0.005

 

MetaLSTM++

0.0011

 

TA-MetaLSTM++\(^*\)

0.0018

0.0002

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.0821

0.0002

0.0006

10.5

MAML

0.5000

0.0030

 

TAML

0.5000

0.0030

0.2501

 

TA-MAML\(^*\)

0.0821

0.0002

0.0006

 

MetaSGD

0.5000

0.0030

 

TA-MetaSGD\(^*\)

0.0512

0.0007

0.0018

 

MetaLSTM

0.0050

 

MetaLSTM++

0.0024

 

TA-MetaLSTM++\(^*\)

0.0015

0.0019

 

ANIL

0.5000

0.0030

 

TA-ANIL\(^*\)

0.0821

0.0002

0.0006

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aimen, A., Ladrecha, B., Sidheekh, S. et al. Leveraging Task Variability in Meta-learning. SN COMPUT. SCI. 4, 539 (2023). https://doi.org/10.1007/s42979-023-01951-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-01951-6

Keywords