[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Fast Conditional Network Compression Using Bayesian HyperNetworks

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Research Track (ECML PKDD 2021)

Abstract

We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g., a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compress a given large network into much smaller size tailored to meet each contextual requirement. We employ a hypernetwork to parameterize the posterior distribution of weights given conditional inputs and minimize a variational objective of this Bayesian neural network. To further reduce the network sizes, we propose a new input-output group sparsity factorization of weights to encourage more sparseness in the generated weights. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 103.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We assume the bias, if exist, can be included into the weight matrix and the input is augmented by a constant one.

  2. 2.

    Better thresholds can be chosen by visually inspection on the log dropout rates.

  3. 3.

    This involves adding a generative network for the condition posterior given all layer-masks then maximizing its log-likelihood.

  4. 4.

    We also tested BC-IO-GNJ on Lenet-5-Caffe and got better performance.

  5. 5.

    In general, the condition can be sampled randomly from \(\text {Bernoulli}(\frac{1}{2})\) during training.

References

  1. Arpit, D., et al.: A closer look at memorization in deep networks. In: International Conference on Machine Learning, pp. 233–242. PMLR (2017)

    Google Scholar 

  2. Bejnordi, B.E., Blankevoort, T., Welling, M.: Batch-shaping for learning conditional channel gated networks. In: International Conference on Learning Representations (2019)

    Google Scholar 

  3. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  4. Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  5. Chen, Z., Li, Y., Bengio, S., Si, S.: You look twice: GaterNet for dynamic filter selection in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9172–9180 (2019)

    Google Scholar 

  6. Fang, C., Dong, H., Zhang, T.: Mathematical models of overparameterized neural networks. In: Proceedings of the IEEE (2021)

    Google Scholar 

  7. Gao, X., Zhao, Y., Dudziak, Ł., Mullins, R., Xu, C.: Dynamic channel pruning: feature boosting and suppression. In: International Conference on Learning Representations (2018)

    Google Scholar 

  8. Ghosh, S., Yao, J., Doshi-Velez, F.: Structured variational learning of Bayesian neural networks with horseshoe priors. In: International Conference on Machine Learning, pp. 1744–1753. PMLR (2018)

    Google Scholar 

  9. Ha, D., Dai, A., Le, Q.V.: Hypernetworks. In: ICLR (2017)

    Google Scholar 

  10. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2016)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  12. Hooker, S., Moorosi, N., Clark, G., Bengio, S., Denton, E.: Characterising bias in compressed models. In: ICML Workshop on Human Interpretability in Machine Learning (2020)

    Google Scholar 

  13. Innes, M., et al.: Fashionable modelling with flux. CoRR, abs/1811.01457 (2018)

    Google Scholar 

  14. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)

    Google Scholar 

  15. Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 2, pp. 2575–2583 (2015)

    Google Scholar 

  16. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)

    Google Scholar 

  17. Li, Y., Gu, S., Zhang, K., Van Gool, L., Timofte, R.: DHP: differentiable meta pruning via hypernetworks. arXiv preprint arXiv:2003.13683 (2020)

  18. Liu, Z., et al.: MetaPruning: meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3296–3305 (2019)

    Google Scholar 

  19. Louizos, C., Ullrich, K., Welling, M.: Bayesian compression for deep learning. In: Advances in Neural Information Processing Systems, pp. 3288–3298 (2017)

    Google Scholar 

  20. Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through l\(\_\)0 regularization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  21. Molchanov, D., Ashukha, A., Vetrov, D.: Variational dropout sparsifies deep neural networks. In: International Conference on Machine Learning, pp. 2498–2507. PMLR (2017)

    Google Scholar 

  22. Neal, R.M.: Bayesian Learning for Neural Networks, vol. 118. Springer, New York (2012). https://doi.org/10.1007/978-1-4612-0745-0

  23. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286. PMLR (2014)

    Google Scholar 

  24. Shazeer, N., et al.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017)

  25. Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)

    Google Scholar 

  26. Wang, S., Manning, C.: Fast dropout training. In: International Conference on Machine Learning, pp. 118–126. PMLR (2013)

    Google Scholar 

  27. Yang, B., Bender, G., Le, Q.V., Ngiam, J.: CondConv: conditionally parameterized convolutions for efficient inference. In: Advances in Neural Information Processing Systems, pp. 1307–1318 (2019)

    Google Scholar 

Download references

Acknowledgments

This research was a collaboration between the Commonwealth Australia (represented by Department of Defence) and Deakin University, through a Defence Science Partnerships agreement.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuoc Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, P. et al. (2021). Fast Conditional Network Compression Using Bayesian HyperNetworks. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86523-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86522-1

  • Online ISBN: 978-3-030-86523-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics