Abstract
Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance. Our code is available at https://github.com/lywang3081/CoSCL.
L. Wang and X. Zhang—Contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In contrast to a single continual learning model with a wide network, we refer to such narrower sub-networks as “small” continual learners.
- 2.
A concurrent work observed that the regular CNN architecture indeed achieves better continual learning performance than more advanced architectures such as ResNet and ViT with the same amount of parameters [27].
- 3.
They both are performed against a similar AlexNet-based architecture.
- 4.
Here we only use feature ensemble (FE) with ensemble cooperation loss (EC).
References
Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: learning what (not) to forget. In: Proceedings of the European Conference on Computer Vision, pp. 139–154 (2018)
Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3366–3375 (2017)
Aso, Y., et al.: The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014)
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Mach. Learn. 79(1), 151–175 (2010)
Cha, J., et al.: Swad: domain generalization by seeking flat minima. arXiv preprint arXiv:2102.08604 (2021)
Cha, S., Hsu, H., Hwang, T., Calmon, F., Moon, T.: CPR: classifier-projection regularization for continual learning. In: Proceedings of the International Conference on Learning Representations (2020)
Chaudhry, A., Dokania, P.K., Ajanthan, T., Torr, P.H.: Riemannian walk for incremental learning: Understanding forgetting and intransigence. In: Proceedings of the European Conference on Computer Vision, pp. 532–547 (2018)
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)
Cohn, R., Morantte, I., Ruta, V.: Coordinated and compartmentalized neuromodulation shapes sensory processing in drosophila. Cell 163(7), 1742–1755 (2015)
Delange, M., et al.: A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44(7), 3366–3385 (2021)
Deng, D., Chen, G., Hao, J., Wang, Q., Heng, P.A.: Flattening sharpness for dynamic gradient projection memory benefits continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Dinh, L., Pascanu, R., Bengio, S., Bengio, Y.: Sharp minima can generalize for deep nets. In: Proceedings of the International Conference on Machine Learning, pp. 1019–1028. PMLR (2017)
Doan, T., Mirzadeh, S.I., Pineau, J., Farajtabar, M.: Efficient continual learning ensembles in neural network subspaces. arXiv preprint arXiv:2202.09826 (2022)
Fernando, C., et al.: Pathnet: evolution channels gradient descent in super neural networks. arXiv preprint arXiv:1701.08734 (2017)
Hu, D., et al.: How well self-supervised pre-training performs with streaming data? arXiv preprint arXiv:2104.12081 (2021)
Hurtado, J., Raymond, A., Soto, A.: Optimizing reusable knowledge for continual learning via metalearning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Jung, S., Ahn, H., Cha, S., Moon, T.: Continual learning with node-importance based adaptive group sparse regularization. arXiv e-prints pp. arXiv-2003 (2020)
Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Liu, Yu., Parisot, S., Slabaugh, G., Jia, X., Leonardis, A., Tuytelaars, T.: More classifiers, less forgetting: a generic multi-classifier paradigm for incremental learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 699–716. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58574-7_42
Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: Proceedings of the International Conference on Machine Learning, pp. 97–105. PMLR (2015)
Lopez-Paz, D., et al.: Gradient episodic memory for continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 6467–6476 (2017)
Madaan, D., Yoon, J., Li, Y., Liu, Y., Hwang, S.J.: Rethinking the representational continuity: Towards unsupervised continual learning. arXiv preprint arXiv:2110.06976 (2021)
McAllester, D.A.: PAC-Bayesian model averaging. In: Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pp. 164–170 (1999)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)
Mirzadeh, S.I., Chaudhry, A., Hu, H., Pascanu, R., Gorur, D., Farajtabar, M.: Wide neural networks forget less catastrophically. arXiv preprint arXiv:2110.11526 (2021)
Mirzadeh, S.I., et al.: Architecture matters in continual learning. arXiv preprint arXiv:2202.00275 (2022)
Mirzadeh, S.I., Farajtabar, M., Gorur, D., Pascanu, R., Ghasemzadeh, H.: Linear mode connectivity in multitask and continual learning. arXiv preprint arXiv:2010.04495 (2020)
Mirzadeh, S.I., Farajtabar, M., Pascanu, R., Ghasemzadeh, H.: Understanding the role of training regimes in continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 7308–7320 (2020)
Modi, M.N., Shuai, Y., Turner, G.C.: The drosophila mushroom body: from architecture to algorithm in a learning circuit. Annu. Rev. Neurosci. 43, 465–484 (2020)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Qin, Q., Hu, W., Peng, H., Zhao, D., Liu, B.: BNS: building network structures dynamically for continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Ramesh, R., Chaudhari, P.: Model zoo: a growing brain that learns continually. In: NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications (2021)
Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: ICARL: incremental classifier and representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)
Riemer, M., et al.: Learning to learn without forgetting by maximizing transfer and minimizing interference. arXiv preprint arXiv:1810.11910 (2018)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Schwarz, J., et al.: Progress & compress: a scalable framework for continual learning. In: Proceedings of the International Conference on Machine Learning, pp. 4528–4537. PMLR (2018)
Serra, J., Suris, D., Miron, M., Karatzoglou, A.: Overcoming catastrophic forgetting with hard attention to the task. In: Proceedings of the International Conference on Machine Learning, pp. 4548–4557. PMLR (2018)
Shi, G., Chen, J., Zhang, W., Zhan, L.M., Wu, X.M.: Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-UCSD birds-200-2011 dataset (2011)
Wang, L., Yang, K., Li, C., Hong, L., Li, Z., Zhu, J.: Ordisco: effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5383–5392 (2021)
Wang, L., et al.: AFEC: active forgetting of negative transfer in continual learning. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 34 (2021)
Wang, L., et al.: Memory replay with data compression for continual learning. In: Proceedings of the International Conference on Learning Representations (2021)
Wen, Y., Tran, D., Ba, J.: Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In: Proceedings of the International Conference on Learning Representations (2020)
Wortsman, M., Horton, M.C., Guestrin, C., Farhadi, A., Rastegari, M.: Learning neural network subspaces. In: Proceedings of the International Conference on Machine Learning, pp. 11217–11227. PMLR (2021)
Wortsman, M., et al.: Supermasks in superposition. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 33, pp. 15173–15184 (2020)
Yan, S., Xie, J., He, X.: DER: dynamically expandable representation for class incremental learning. arXiv preprint arXiv:2103.16788 (2021)
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: Proceedings of the International Conference on Machine Learning, pp. 12310–12320. PMLR (2021)
Zenke, F., Poole, B., Ganguli, S.: Continual learning through synaptic intelligence. In: Proceedings of the International Conference on Machine Learning, pp. 3987–3995. PMLR (2017)
Acknowledgements
This work was supported by the National Key Research and Development Program of China (2017YFA0700904, 2020AAA0106000, 2020AAA0104304, 2020AAA0106302, 2021YFB2701000), NSFC Projects (Nos. 62061136001, 62106123, 62076147, U19B2034, U1811461, U19A2081, 61972224), Beijing NSF Project (No. JQ19016), BNRist (BNR2022RC01006), Tsinghua-Peking Center for Life Sciences, Tsinghua Institute for Guo Qiang, Beijing Academy of Artificial Intelligence (BAAI), Tsinghua-OPPO Joint Research Center for Future Terminal Technology, the High Performance Computing Center, Tsinghua University, and China Postdoctoral Science Foundation (Nos. 2021T140377, 2021M701892).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, L., Zhang, X., Li, Q., Zhu, J., Zhong, Y. (2022). CoSCL: Cooperation of Small Continual Learners is Stronger Than a Big One. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-19809-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19808-3
Online ISBN: 978-3-031-19809-0
eBook Packages: Computer ScienceComputer Science (R0)