Fast Conditional Network Compression Using Bayesian HyperNetworks

Phuoc Nguyen¹³,
Truyen Tran¹³,
Ky Le¹³,
Sunil Gupta¹³,
Santu Rana¹³,
Dang Nguyen¹³,
Trong Nguyen¹³,
Shannon Ryan¹³ &
…
Svetha Venkatesh¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12977))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2012 Accesses
1 Citations

Abstract

We introduce a conditional compression problem and propose a fast framework for tackling it. The problem is how to quickly compress a pretrained large neural network into optimal smaller networks given target contexts, e.g., a context involving only a subset of classes or a context where only limited compute resource is available. To solve this, we propose an efficient Bayesian framework to compress a given large network into much smaller size tailored to meet each contextual requirement. We employ a hypernetwork to parameterize the posterior distribution of weights given conditional inputs and minimize a variational objective of this Bayesian neural network. To further reduce the network sizes, we propose a new input-output group sparsity factorization of weights to encourage more sparseness in the generated weights. Our methods can quickly generate compressed networks with significantly smaller sizes than baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A revisit to MacKay algorithm and its application to deep network compression

Article 03 January 2020

Learning Compression from Limited Unlabeled Data

Dimensionality reduced training by pruning and freezing parts of a deep neural network: a survey

Article 01 May 2023

Notes

1.
We assume the bias, if exist, can be included into the weight matrix and the input is augmented by a constant one.
2.
Better thresholds can be chosen by visually inspection on the log dropout rates.
3.
This involves adding a generative network for the condition posterior given all layer-masks then maximizing its log-likelihood.
4.
We also tested BC-IO-GNJ on Lenet-5-Caffe and got better performance.
5.
In general, the condition can be sampled randomly from \(\text {Bernoulli}(\frac{1}{2})\) during training.

References

Arpit, D., et al.: A closer look at memorization in deep networks. In: International Conference on Machine Learning, pp. 233–242. PMLR (2017)
Google Scholar
Bejnordi, B.E., Blankevoort, T., Welling, M.: Batch-shaping for learning conditional channel gated networks. In: International Conference on Learning Representations (2019)
Google Scholar
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Article MathSciNet Google Scholar
Chen, Z., Li, Y., Bengio, S., Si, S.: You look twice: GaterNet for dynamic filter selection in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9172–9180 (2019)
Google Scholar
Fang, C., Dong, H., Zhang, T.: Mathematical models of overparameterized neural networks. In: Proceedings of the IEEE (2021)
Google Scholar
Gao, X., Zhao, Y., Dudziak, Ł., Mullins, R., Xu, C.: Dynamic channel pruning: feature boosting and suppression. In: International Conference on Learning Representations (2018)
Google Scholar
Ghosh, S., Yao, J., Doshi-Velez, F.: Structured variational learning of Bayesian neural networks with horseshoe priors. In: International Conference on Machine Learning, pp. 1744–1753. PMLR (2018)
Google Scholar
Ha, D., Dai, A., Le, Q.V.: Hypernetworks. In: ICLR (2017)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Google Scholar
Hooker, S., Moorosi, N., Clark, G., Bengio, S., Denton, E.: Characterising bias in compressed models. In: ICML Workshop on Human Interpretability in Machine Learning (2020)
Google Scholar
Innes, M., et al.: Fashionable modelling with flux. CoRR, abs/1811.01457 (2018)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 2, pp. 2575–2583 (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: ICLR (2014)
Google Scholar
Li, Y., Gu, S., Zhang, K., Van Gool, L., Timofte, R.: DHP: differentiable meta pruning via hypernetworks. arXiv preprint arXiv:2003.13683 (2020)
Liu, Z., et al.: MetaPruning: meta learning for automatic neural network channel pruning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3296–3305 (2019)
Google Scholar
Louizos, C., Ullrich, K., Welling, M.: Bayesian compression for deep learning. In: Advances in Neural Information Processing Systems, pp. 3288–3298 (2017)
Google Scholar
Louizos, C., Welling, M., Kingma, D.P.: Learning sparse neural networks through l\(\_\)0 regularization. In: International Conference on Learning Representations (2018)
Google Scholar
Molchanov, D., Ashukha, A., Vetrov, D.: Variational dropout sparsifies deep neural networks. In: International Conference on Machine Learning, pp. 2498–2507. PMLR (2017)
Google Scholar
Neal, R.M.: Bayesian Learning for Neural Networks, vol. 118. Springer, New York (2012). https://doi.org/10.1007/978-1-4612-0745-0
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286. PMLR (2014)
Google Scholar
Shazeer, N., et al.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538 (2017)
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8612–8620 (2019)
Google Scholar
Wang, S., Manning, C.: Fast dropout training. In: International Conference on Machine Learning, pp. 118–126. PMLR (2013)
Google Scholar
Yang, B., Bender, G., Le, Q.V., Ngiam, J.: CondConv: conditionally parameterized convolutions for efficient inference. In: Advances in Neural Information Processing Systems, pp. 1307–1318 (2019)
Google Scholar

Download references

Acknowledgments

This research was a collaboration between the Commonwealth Australia (represented by Department of Defence) and Deakin University, through a Defence Science Partnerships agreement.

Author information

Authors and Affiliations

A2I2, Deakin University, Geelong, Australia
Phuoc Nguyen, Truyen Tran, Ky Le, Sunil Gupta, Santu Rana, Dang Nguyen, Trong Nguyen, Shannon Ryan & Svetha Venkatesh

Authors

Phuoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Truyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Ky Le
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Santu Rana
View author publications
You can also search for this author in PubMed Google Scholar
Dang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Trong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Shannon Ryan
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phuoc Nguyen .

Editor information

Editors and Affiliations

ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
Nuria Oliver
ETHZ and EPFL, Zürich, Switzerland
Fernando Pérez-Cruz
Johannes Gutenberg University of Mainz, Mainz, Germany
Stefan Kramer
École Polytechnique, Palaiseau, France
Jesse Read
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, P. et al. (2021). Fast Conditional Network Compression Using Bayesian HyperNetworks. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12977. Springer, Cham. https://doi.org/10.1007/978-3-030-86523-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-86523-8_20
Published: 11 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86522-1
Online ISBN: 978-3-030-86523-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Fast Conditional Network Compression Using Bayesian HyperNetworks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A revisit to MacKay algorithm and its application to deep network compression

Learning Compression from Limited Unlabeled Data

Dimensionality reduced training by pruning and freezing parts of a deep neural network: a survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Fast Conditional Network Compression Using Bayesian HyperNetworks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A revisit to MacKay algorithm and its application to deep network compression

Learning Compression from Limited Unlabeled Data

Dimensionality reduced training by pruning and freezing parts of a deep neural network: a survey

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation