Learning to Weight Samples for Dynamic Early-Exiting Networks

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13671))

Included in the following conference series:

European Conference on Computer Vision

2976 Accesses

Abstract

Early exiting is an effective paradigm for improving the inference efficiency of deep networks. By constructing classifiers with varying resource demands (the exits), such networks allow easy samples to be output at early exits, removing the need for executing deeper layers. While existing works mainly focus on the architectural design of multi-exit networks, the training strategies for such models are largely left unexplored. The current state-of-the-art models treat all samples the same during training. However, the early-exiting behavior during testing has been ignored, leading to a gap between training and testing. In this paper, we propose to bridge this gap by sample weighting. Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers. The training of hard samples (mostly exit from deeper layers), however, should be emphasized by the late classifiers. Our work proposes to adopt a weight prediction network to weight the loss of different training samples at each exit. This weight prediction network and the backbone model are jointly optimized under a meta-learning framework with a novel optimization objective. By bringing the adaptive behavior during inference into the training phase, we show that the proposed weighting mechanism consistently improves the trade-off between classification accuracy and inference efficiency. Code is available at https://github.com/LeapLabTHU/L2W-DEN.

Y. Han and Y. Pu—Equal contribution.

Z. Lai—Work done during an internship at Tsinghua University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning

More Classifiers, Less Forgetting: A Generic Multi-classifier Paradigm for Incremental Learning

An Incremental Scheme with Weight Pruning to Train Deep Neural Network

Notes

1.
We set \(q\! =\!0.5\) in training, and therefore the proportion of output samples at 5 exits follows an exponential distribution of [0.52, 0.26, 0.13, 0.06, 0.03].

References

Bejnordi, B.E., Blankevoort, T., Welling, M.: Batch-shaping for learning conditional channel gated networks. In: ICLR (2020)
Google Scholar
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V.: Adaptive neural networks for efficient inference. In: ICML (2017)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Google Scholar
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: CVPR (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Dong, X., Yang, Y.: Network pruning via transformable architecture search. In: NeurIPS (2019)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2021)
Google Scholar
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. JCSS 55(1), 119–139 (1997)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding (2016)
Google Scholar
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. TPAMI (2021)
Google Scholar
Han, Y., Huang, G., Song, S., Yang, L., Zhang, Y., Jiang, H.: Spatially adaptive feature refinement for efficient inference. In: TIP (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: CVPR (2019)
Google Scholar
Herrmann, C., Bowen, R.S., Zabih, R.: Channel selection using gumbel softmax. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 241–257. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_15
Chapter Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Huang, Z., Wang, N.: Data-driven sparse structure selection for deep neural networks. In: ECCV (2018)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: NeurIPS (2016)
Google Scholar
Jiang, L., Meng, D., Mitamura, T., Hauptmann, A.G.: Easy samples first: self-paced reranking for zero-example multimedia search. In: ACM MM (2014)
Google Scholar
Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019). https://doi.org/10.1186/s40537-019-0192-5
Article Google Scholar
Kaya, Y., Hong, S., Dumitras, T.: Shallow-deep networks: understanding and mitigating network overthinking. In: ICML (2019)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical Report (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)
Google Scholar
Kumar, M., Packer, B., Koller, D.: Self-paced learning for latent variable models. In: NeurIPS (2010)
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: NeurIPS (1990)
Google Scholar
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: ICLR (2017)
Google Scholar
Li, H., Zhang, H., Qi, X., Yang, R., Huang, G.: Improved techniques for training adaptive deep networks. In: ICCV (2019)
Google Scholar
Li, S., Ma, W., Zhang, J., Liu, C.H., Liang, J., Wang, G.: Meta-reweighted regularization for unsupervised domain adaptation. TKDE (2021)
Google Scholar
Lin, J., Rao, Y., Lu, J., Zhou, J.: Runtime neural pruning. In: NeurIPS (2017)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV (2017)
Google Scholar
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-SVMs for object detection and beyond. In: ICCV (2011)
Google Scholar
Panda, P., Sengupta, A., Roy, K.: Conditional deep learning for energy-efficient and enhanced pattern recognition. In: DATE (2016)
Google Scholar
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: ICML (2018)
Google Scholar
Shu, J., et al.: Meta-weight-net: Learning an explicit mapping for sample weighting. In: NeurIPS (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Teerapittayanon, S., McDanel, B., Kung, H.T.: Branchynet: fast inference via early exiting from deep neural networks. In: ICPR (2016)
Google Scholar
THospedales, T., Antoniou, A., Micaelli, P., Storkey, A.: Meta-learning in neural networks: a survey. TPAMI (2021)
Google Scholar
Verelst, T., Tuytelaars, T.: Dynamic convolutions: exploiting spatial sparsity for faster inference. In: CVPR (2020)
Google Scholar
Wang, X., Chen, Y., Zhu, W.: A survey on curriculum learning. TPAMI (2021)
Google Scholar
Wang, X., Yu, F., Dou, Z.Y., Darrell, T., Gonzalez, J.E.: SkipNet: learning dynamic routing in convolutional networks. In: ECCV (2018)
Google Scholar
Yang, L., Han, Y., Chen, X., Song, S., Dai, J., Huang, G.: Resolution adaptive networks for efficient inference. In: CVPR (2020)
Google Scholar
Yang, L., et al.: CondenseNet v2: sparse feature reactivation for deep networks. In: CVPR (2021)
Google Scholar
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: ICML (2004)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR (2018)
Google Scholar
Zhang, Z., Pfister, T.: Learning fast sample re-weighting without reward data. In: ICCV (2021)
Google Scholar
Zhao, G., Yang, W., Ren, X., Li, L., Sun, X.: Well-classified examples are underestimated in classification with deep neural networks. In: AAAI (2022)
Google Scholar

Download references

Acknowledgement

This work is supported in part by National Key R &D Program of China (2020AAA0105200), the National Natural Science Foundation of China under Grants 62022048, Guoqiang Institute of Tsinghua University and Beijing Academy of Artificial Intelligence. We also appreciate the generous donation of computing resources by High-Flyer AI.

Author information

Authors and Affiliations

Tsinghua University, Beijing, 100084, China
Yizeng Han, Yifan Pu, Chaofei Wang, Shiji Song & Gao Huang
Carnegie Mellon University, Pennsylvania, 15213, USA
Zihang Lai
China Mobile Research Institute, Beijing, 100084, China
Junfeng Cao, Wenhui Huang & Chao Deng

Authors

Yizeng Han
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Pu
View author publications
You can also search for this author in PubMed Google Scholar
Zihang Lai
View author publications
You can also search for this author in PubMed Google Scholar
Chaofei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiji Song
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Cao
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Huang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Deng
View author publications
You can also search for this author in PubMed Google Scholar
Gao Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gao Huang .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 190 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, Y. et al. (2022). Learning to Weight Samples for Dynamic Early-Exiting Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13671. Springer, Cham. https://doi.org/10.1007/978-3-031-20083-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-20083-0_22
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20082-3
Online ISBN: 978-3-031-20083-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics