Reducing the Teacher-Student Gap via Elastic Student

Haorong Li¹³,
Zihao Chen¹³,
Jingtao Zhou¹³ &
…
Shuangyin Li¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14117))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

734 Accesses

Abstract

The application of knowledge distillation (KD) has shown promise in transferring knowledge from a larger teacher model to a smaller student model. Nevertheless, a prevalent phenomenon in knowledge distillation is that student performance decreases when the teacher-student gap becomes large. Our contention is that the degradation from teacher to student is predominantly attributable to two gaps, namely the capacity gap and the knowledge gap. In this paper, we introduce Elastic Student Knowledge Distillation (ESKD), an innovative method that comprises Elastic Architecture and Elastic Learning to bridge the two gaps. The Elastic Architecture temporarily increases the number of student’s parameters during training and subsequently reverts to its original size while inference. It improves the learning ability of the model without increasing the cost at inference time. The Elastic Learning strategy introduces mask matrix and progressive learning strategies that facilitates the student in comprehending the intricate knowledge of the teacher and accomplishing the effect of regularization. We conducted extensive experiments on CIFAR-100 and ImageNet datasets, demonstrating that ESKD outperforms existing methods while preserving computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550(2014)
Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR, pp. 3967–3976 (2019)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)
Huang, Z., Wang, N.: Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)
Chen, D., et al.: Cross-layer distillation with semantic calibration. In: AAAI, pp. 7028–7036 (2021)
Google Scholar
Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR, pp. 9163–9171 (2019)
Google Scholar
Komodakis, N., Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In ICLR (2017)
Google Scholar
Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: ICCV, pp. 1365–1374 (2019)
Google Scholar
Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y., Zhang, Z.: Correlation congruence for knowledge distillation. In: ICCV, pp. 5007–5016 (2019)
Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: CVPR, pp. 13733–13742 (2021)
Google Scholar
Ding, X., Xia, C., Zhang, X., Chu, X., Han, J., Ding, G.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883 (2021)
Ding, X., Zhang, X., Han, J., Ding, G.: Diverse branch block: Building a convolution as an inception-like unit. In: CVPR, pp. 10886–10895 (2021)
Google Scholar
Zhang, K., Zhang, C., Li, S., Zeng, D., Ge, S.: Student network learning via evolutionary knowledge distillation.In: IEEE Trans. Circuits Syst. Video Technol. 32(4), 2251–2263 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI, vol. 34, no. 04, pp. 5191–5198 (2020)
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR, pp. 16000–16009 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Guangdong (2023A1515012073), National Natural Science Foundation of China(No. 62006083) and South China Normal University Student Innovation and Entrepreneurship Training Program(No. 202321004).

Author information

Authors and Affiliations

School of Computer, South China Normal University, Guangzhou, China
Haorong Li, Zihao Chen, Jingtao Zhou & Shuangyin Li

Authors

Haorong Li
View author publications
You can also search for this author in PubMed Google Scholar
Zihao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jingtao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Shuangyin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuangyin Li .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Chen, Z., Zhou, J., Li, S. (2023). Reducing the Teacher-Student Gap via Elastic Student. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14117. Springer, Cham. https://doi.org/10.1007/978-3-031-40283-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-40283-8_37
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40282-1
Online ISBN: 978-3-031-40283-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics