[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Reducing the Teacher-Student Gap via Elastic Student

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14117))

  • 734 Accesses

Abstract

The application of knowledge distillation (KD) has shown promise in transferring knowledge from a larger teacher model to a smaller student model. Nevertheless, a prevalent phenomenon in knowledge distillation is that student performance decreases when the teacher-student gap becomes large. Our contention is that the degradation from teacher to student is predominantly attributable to two gaps, namely the capacity gap and the knowledge gap. In this paper, we introduce Elastic Student Knowledge Distillation (ESKD), an innovative method that comprises Elastic Architecture and Elastic Learning to bridge the two gaps. The Elastic Architecture temporarily increases the number of student’s parameters during training and subsequently reverts to its original size while inference. It improves the learning ability of the model without increasing the cost at inference time. The Elastic Learning strategy introduces mask matrix and progressive learning strategies that facilitates the student in comprehending the intricate knowledge of the teacher and accomplishing the effect of regularization. We conducted extensive experiments on CIFAR-100 and ImageNet datasets, demonstrating that ESKD outperforms existing methods while preserving computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 51.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  2. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550(2014)

  3. Park, W., Kim, D., Lu, Y., Cho, M.: Relational knowledge distillation. In: CVPR, pp. 3967–3976 (2019)

    Google Scholar 

  4. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. arXiv preprint arXiv:1910.10699 (2019)

  5. Huang, Z., Wang, N.: Like what you like: knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017)

  6. Chen, D., et al.: Cross-layer distillation with semantic calibration. In: AAAI, pp. 7028–7036 (2021)

    Google Scholar 

  7. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR, pp. 9163–9171 (2019)

    Google Scholar 

  8. Komodakis, N., Zagoruyko, S.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In ICLR (2017)

    Google Scholar 

  9. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: ICCV, pp. 1365–1374 (2019)

    Google Scholar 

  10. Peng, B., Jin, X., Liu, J., Li, D., Wu, Y., Liu, Y., Zhang, Z.: Correlation congruence for knowledge distillation. In: ICCV, pp. 5007–5016 (2019)

    Google Scholar 

  11. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: making VGG-style convnets great again. In: CVPR, pp. 13733–13742 (2021)

    Google Scholar 

  12. Ding, X., Xia, C., Zhang, X., Chu, X., Han, J., Ding, G.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:2105.01883 (2021)

  13. Ding, X., Zhang, X., Han, J., Ding, G.: Diverse branch block: Building a convolution as an inception-like unit. In: CVPR, pp. 10886–10895 (2021)

    Google Scholar 

  14. Zhang, K., Zhang, C., Li, S., Zeng, D., Ge, S.: Student network learning via evolutionary knowledge distillation.In: IEEE Trans. Circuits Syst. Video Technol. 32(4), 2251–2263 (2021)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  16. Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  18. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

  19. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  20. Mirzadeh, S.I., Farajtabar, M., Li, A., Levine, N., Matsukawa, A., Ghasemzadeh, H.: Improved knowledge distillation via teacher assistant. In: AAAI, vol. 34, no. 04, pp. 5191–5198 (2020)

    Google Scholar 

  21. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: CVPR, pp. 16000–16009 (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported by Natural Science Foundation of Guangdong (2023A1515012073), National Natural Science Foundation of China(No. 62006083) and South China Normal University Student Innovation and Entrepreneurship Training Program(No. 202321004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuangyin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, H., Chen, Z., Zhou, J., Li, S. (2023). Reducing the Teacher-Student Gap via Elastic Student. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14117. Springer, Cham. https://doi.org/10.1007/978-3-031-40283-8_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40283-8_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40282-1

  • Online ISBN: 978-3-031-40283-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics