CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One

Liyuan Wang^12,13,14,
Xingxing Zhang¹⁴,
Qian Li^12,13,
Jun Zhu¹⁴ &
…
Yi Zhong^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13686))

Included in the following conference series:

European Conference on Computer Vision

3311 Accesses
8 Citations

Abstract

Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance. Our code is available at https://github.com/lywang3081/CoSCL.

L. Wang and X. Zhang—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

Adversarial Continual Learning

Notes

1.
In contrast to a single continual learning model with a wide network, we refer to such narrower sub-networks as “small” continual learners.
2.
A concurrent work observed that the regular CNN architecture indeed achieves better continual learning performance than more advanced architectures such as ResNet and ViT with the same amount of parameters [27].
3.
They both are performed against a similar AlexNet-based architecture.
4.
Here we only use feature ensemble (FE) with ensemble cooperation loss (EC).

References

Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Proceedings of the European Conference on Computer Vision, pp. 139–154 (2018)
Google Scholar
Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375 (2017)
Google Scholar
Aso, Y., et al.: The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014)
Article Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1), 151–175 (2010)
Article MathSciNet MATH Google Scholar
Cha, J., et al.: Swad: domain generalization by seeking flat minima. arXiv preprint arXiv:2102.08604 (2021)
Cha, S., Hsu, H., Hwang, T., Calmon, F., Moon, T.: CPR: classifier-projection regularization for continual learning. In: Proceedings of the International Conference on Learning Representations (2020)
Google Scholar
Chaudhry, A., Dokania, P.K., Ajanthan, T., Torr, P.H.: Riemannian walk for incremental learning: Understanding forgetting and intransigence. In: Proceedings of the European Conference on Computer Vision, pp. 532–547 (2018)
Google Scholar
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)
Google Scholar
Cohn, R., Morantte, I., Ruta, V.: Coordinated and compartmentalized neuromodulation shapes sensory processing in drosophila. Cell 163(7), 1742–1755 (2015)
Article Google Scholar
Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)
Google Scholar
Deng, D., Chen, G., Hao, J., Wang, Q., Heng, P.A.: Flattening sharpness for dynamic gradient projection memory benefits continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Dinh, L., Pascanu, R., Bengio, S., Bengio, Y.: Sharp minima can generalize for deep nets. In: Proceedings of the International Conference on Machine Learning, pp. 1019–1028. PMLR (2017)
Google Scholar
Doan, T., Mirzadeh, S.I., Pineau, J., Farajtabar, M.: Efficient continual learning ensembles in neural network subspaces. arXiv preprint arXiv:2202.09826 (2022)
Fernando, C., et al.: Pathnet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)
Hu, D., et al.: How well self-supervised pre-training performs with streaming data? arXiv preprint arXiv:2104.12081 (2021)
Hurtado, J., Raymond, A., Soto, A.: Optimizing reusable knowledge for continual learning via metalearning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Jung, S., Ahn, H., Cha, S., Moon, T.: Continual learning with node-importance based adaptive group sparse regularization. arXiv e-prints pp. arXiv-2003 (2020)
Google Scholar
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet MATH Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Google Scholar
Liu, Yu., Parisot, S., Slabaugh, G., Jia, X., Leonardis, A., Tuytelaars, T.: More classifiers, less forgetting: a generic multi-classifier paradigm for incremental learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 699–716. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_42
Chapter Google Scholar
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: Proceedings of the International Conference on Machine Learning, pp. 97–105. PMLR (2015)
Google Scholar
Lopez-Paz, D., et al.: Gradient episodic memory for continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 6467–6476 (2017)
Google Scholar
Madaan, D., Yoon, J., Li, Y., Liu, Y., Hwang, S.J.: Rethinking the representational continuity: Towards unsupervised continual learning. arXiv preprint arXiv:2110.06976 (2021)
McAllester, D.A.: PAC-Bayesian model averaging. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 164–170 (1999)
Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)
Google Scholar
Mirzadeh, S.I., Chaudhry, A., Hu, H., Pascanu, R., Gorur, D., Farajtabar, M.: Wide neural networks forget less catastrophically. arXiv preprint arXiv:2110.11526 (2021)
Mirzadeh, S.I., et al.: Architecture matters in continual learning. arXiv preprint arXiv:2202.00275 (2022)
Mirzadeh, S.I., Farajtabar, M., Gorur, D., Pascanu, R., Ghasemzadeh, H.: Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495 (2020)
Mirzadeh, S.I., Farajtabar, M., Pascanu, R., Ghasemzadeh, H.: Understanding the role of training regimes in continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 7308–7320 (2020)
Google Scholar
Modi, M.N., Shuai, Y., Turner, G.C.: The drosophila mushroom body: from architecture to algorithm in a learning circuit. Annu. Rev. Neurosci. 43, 465–484 (2020)
Article Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Qin, Q., Hu, W., Peng, H., Zhao, D., Liu, B.: BNS: building network structures dynamically for continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Ramesh, R., Chaudhari, P.: Model zoo: a growing brain that learns continually. In: NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (2021)
Google Scholar
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: ICARL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)
Google Scholar
Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910 (2018)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning. In: Proceedings of the International Conference on Machine Learning, pp. 4528–4537. PMLR (2018)
Google Scholar
Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: Proceedings of the International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)
Google Scholar
Shi, G., Chen, J., Zhang, W., Zhan, L.M., Wu, X.M.: Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)
Google Scholar
Wang, L., Yang, K., Li, C., Hong, L., Li, Z., Zhu, J.: Ordisco: effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5383–5392 (2021)
Google Scholar
Wang, L., et al.: AFEC: active forgetting of negative transfer in continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Wang, L., et al.: Memory replay with data compression for continual learning. In: Proceedings of the International Conference on Learning Representations (2021)
Google Scholar
Wen, Y., Tran, D., Ba, J.: Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In: Proceedings of the International Conference on Learning Representations (2020)
Google Scholar
Wortsman, M., Horton, M.C., Guestrin, C., Farhadi, A., Rastegari, M.: Learning neural network subspaces. In: Proceedings of the International Conference on Machine Learning, pp. 11217–11227. PMLR (2021)
Google Scholar
Wortsman, M., et al.: Supermasks in superposition. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 15173–15184 (2020)
Google Scholar
Yan, S., Xie, J., He, X.: DER: dynamically expandable representation for class incremental learning. arXiv preprint arXiv:2103.16788 (2021)
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: Proceedings of the International Conference on Machine Learning, pp. 12310–12320. PMLR (2021)
Google Scholar
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of the International Conference on Machine Learning, pp. 3987–3995. PMLR (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2017YFA0700904, 2020AAA0106000, 2020AAA0104304, 2020AAA0106302, 2021YFB2701000), NSFC Projects (Nos. 62061136001, 62106123, 62076147, U19B2034, U1811461, U19A2081, 61972224), Beijing NSF Project (No. JQ19016), BNRist (BNR2022RC01006), Tsinghua-Peking Center for Life Sciences, Tsinghua Institute for Guo Qiang, Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-OPPO Joint Research Center for Future Terminal Technology, the High Performance Computing Center, Tsinghua University, and China Postdoctoral Science Foundation (Nos. 2021T140377, 2021M701892).

Author information

Authors and Affiliations

School of Life Sciences, IDG/McGovern Institute for Brain Research, Tsinghua University, Beijing, China
Liyuan Wang, Qian Li & Yi Zhong
Tsinghua-Peking Center for Life Sciences, Beijing, China
Liyuan Wang, Qian Li & Yi Zhong
Department of Computer Science Technology, Institute for AI, BNRist Center, THBI Lab, Tsinghua University, Beijing, China
Liyuan Wang, Xingxing Zhang & Jun Zhu

Authors

Liyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xingxing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qian Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jun Zhu or Yi Zhong .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2624 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Zhang, X., Li, Q., Zhu, J., Zhong, Y. (2022). CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-19809-0_15
Published: 01 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19808-3
Online ISBN: 978-3-031-19809-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

Adversarial Continual Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2624 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DLCFT: Deep Linear Continual Fine-Tuning for General Incremental Learning

Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks

Adversarial Continual Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 2624 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation