More Web Proxy on the site http://driver.im/

Article

Weight uncertainty in neural networks

Authors:

Charles Blundell,

Julien Cornebise,

Koray Kavukcuoglu,

Daan WierstraAuthors Info & Claims

ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

Pages 1613 - 1622

Published: 06 July 2015 Publication History

Abstract

We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.

References

[1]

Shipra Agrawal and Navin Goyal. Analysis of Thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference On Learning Theory (COLT), volume 23, pages 39.1-39.26, 2012.

[2]

Shipra Agrawal and Navin Goyal. Further optimal regret bounds for Thompson sampling. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics Learning (AISTATS), pages 99-107, 2013.

[3]

Kevin Bache and Moshe Lichman. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, 2013. URL http://archive.ics.uci.edu/ml.

[4]

Christopher M Bishop. Section 10.1: variational inference. In Pattern Recognition and Machine Learning. Springer, 2006. ISBN 9780387310732.

[5]

Wray L Buntine and Andreas S Weigend. Bayesian backpropagation. Complex systems, 5(6):603-643, 1991.

[6]

Olivier Chapelle and Lihong Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems (NIPS), pages 2249-2257, 2011.

[7]

Hugh Chipman. Bayesian variable selection with related predictors. Canadian Journal of Statistics, 24(1):17-36, 1996.

[8]

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS), pages 1223-1231, 2012.

[9]

Sarah Filippi, Olivier Cappe, Aurlien Garivier, and Csaba Szepesvri. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems, pages 586-594, 2010.

[10]

Karl Friston, Jérémie Mattout, Nelson Trujillo-Barreto, John Ashburner, and Will Penny. Variational free energy and the Laplace approximation. Neuroimage, 34 (1):220-234, 2007.

[11]

Andrew Gelman. Objections to Bayesian statistics. Bayesian Analysis, 3:445-450, 2008. ISSN 1931-6690.

[12]

Edward I George and Robert E McCulloch. Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881-889, 1993.

[13]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics Learning (AISTATS), volume 15, pages 315-323, 2011.

[14]

Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 2348-2356, 2011.

[15]

Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, and Daan Wierstra. Deep AutoRegressive networks. In Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1242-1250, 2014.

[16]

Arthur Guez. Sample-Based Search Methods For Bayes-Adaptive Planning. PhD thesis, University College London, 2015.

[17]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NIPS 2014 Deep Learning and Representation Learning Workshop, 2014.

[18]

Geoffrey E Hinton and Drew Van Camp. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the 16th Annual Conference On Learning Theory (COLT), pages 5-13. ACM, 1993.

[19]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, July 2012.

[20]

Tommi S. Jaakkola and Michael I. Jordan. Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1):25-37, 2000.

[21]

Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of the 23rd Annual Conference on Algorithmic Learning Theory (ALT), pages 199-213. Springer, 2012.

[22]

Diederik P. Kingma and MaxWelling. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014. arXiv: 1312.6114.

[23]

Yann LeCun. Une procédure d'apprentissage pour réseau à seuil asymmetrique (a learning scheme for asymmetric threshold networks). In Proceedings of Cognitiva 85, Paris, France, pages 599-604, 1985.

[24]

Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. 1998. URL http://yann.lecun.com/exdb/mnist/.

[25]

Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 661-670, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-799-8.

[26]

David JC MacKay. A practical Bayesian framework for backpropagation networks. Neural computation, 4(3): 448-472, 1992.

[27]

David JC MacKay. Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6(3):469-505, 1995.

[28]

Benedict C May, Nathan Korda, Anthony Lee, and David S. Leslie. Optimistic Bayesian sampling in contextual-bandit problems. The Journal of Machine Learning Research, 13(1):2069-2106, 2012.

[29]

Thomas P Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology, 2001.

[30]

Thomas P Minka. Divergence measures and message passing. Technical report, Microsoft Research, 2005.

[31]

Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023-1032, 1988.

[32]

Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML), pages 807-814, 2010.

[33]

Radford M Neal and Geoffrey E Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, pages 355- 368. Springer, 1998.

[34]

Manfred Opper and Cédric Archambeau. The variational Gaussian approximation revisited. Neural computation, 21(3):786-792, 2009.

[35]

Art B. Owen. Monte Carlo theory, methods and examples. 2013.

[36]

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1278-1286, 2014.

[37]

David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by backpropagating errors. Cognitive modeling, 5, 1988.

[38]

Lawrence K Saul, Tommi Jaakkola, and Michael I Jordan. Mean field theory for sigmoid belief networks. Journal of artificial intelligence research, 4(1):61-76, 1996.

[39]

Patrice Y Simard, Dave Steinkraus, and John C Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), volume 2, pages 958-958. IEEE Computer Society, 2003.

[40]

William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, pages 285-294, 1933.

[41]

Michalis Titsias and Miguel Lázaro-Gredilla. Doubly stochastic variational bayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1971-1979, 2014.

[42]

Li Wan, Matthew Zeiler, Sixin Zhang, Yann L Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1058-1066, 2013.

[43]

Jonathan S Yedidia, William T Freeman, and Yair Weiss. Generalized belief propagation. In Advances in Neural Information Processing Systems (NIPS), volume 13, pages 689-695, 2000.

Cited By

Xuan JWu MLiu ZLu JKiyavash NMooij J(2024)Functional Wasserstein variational policy optimizationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702858(3893-3911)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702858
Wu MXuan JLu JKiyavash NMooij J(2024)Functional Wasserstein bridge inference for Bayesian deep learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702853(3791-3815)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702853
Gedon DAbedsoltan ASchön TBelkin MKiyavash NMooij J(2024)Uncertainty estimation with recursive feature machinesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702742(1408-1437)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702742
Show More Cited By

Weight uncertainty in neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Hyperspherical Weight Uncertainty in Neural Networks
Advances in Intelligent Data Analysis XIX
Abstract
Bayesian neural networks learn a posterior probability distribution over the weights of the network to estimate the uncertainty in predictions. Parameterization of prior and posterior distribution as Gaussian in Monte Carlo Dropout, Bayes-by-...
Learning in fixed-weight recurrent neural networks
Weight agnostic neural networks
NIPS'19: Proceedings of the 33rd International Conference on Neural Information Processing Systems

Not all neural network architectures are created equal, some perform much better than others for certain tasks. But how important are the weight parameters of a neural network compared to its architecture? In this work, we question to what extent neural ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37

July 2015

2558 pages

Editors:
Francis Bach,
David Blei

Publisher

JMLR.org

Publication History

Published: 06 July 2015

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

162
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xuan JWu MLiu ZLu JKiyavash NMooij J(2024)Functional Wasserstein variational policy optimizationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702858(3893-3911)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702858
Wu MXuan JLu JKiyavash NMooij J(2024)Functional Wasserstein bridge inference for Bayesian deep learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702853(3791-3815)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702853
Gedon DAbedsoltan ASchön TBelkin MKiyavash NMooij J(2024)Uncertainty estimation with recursive feature machinesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702742(1408-1437)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702742
Zarifzadeh SLiu PShokri RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Low-cost high-power membership inference attacksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694473(58244-58282)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694473
Yoon TKim HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Uncertainty estimation by density aware evidential deep learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694431(57217-57243)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694431
Papamarkou TSkoularidou MPalla KAitchison LArbel JDunson DFilippone MFortuin VHennig PHernández-Lobato JHubin AImmer AKaraletsos TKhan MKristiadi ALi YMandt SNemeth COsborne MRudner TRügamer DTeh YWelling MWilson AZhang RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693672(39556-39586)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693672
Ortega LSantana SHernández-Lobato DSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Variational linearized Laplace approximation for Bayesian deep learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693643(38815-38836)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693643
Nabarro SVan Der Wilk MDavison ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Learning in deep factor graphs with Gaussian belief propagationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693577(37141-37163)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693577
Long YTian ZZhang LXu HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Reparameterized importance sampling for robust variational bayesian neural networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693396(32680-32690)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693396
Huang WLiu YQin HLi YZhang SLiu XMagno MQi XSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)BiLLMProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692876(20023-20042)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692876
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents