[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3045118.3045290guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Weight uncertainty in neural networks

Published: 06 July 2015 Publication History

Abstract

We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.

References

[1]
Shipra Agrawal and Navin Goyal. Analysis of Thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference On Learning Theory (COLT), volume 23, pages 39.1-39.26, 2012.
[2]
Shipra Agrawal and Navin Goyal. Further optimal regret bounds for Thompson sampling. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics Learning (AISTATS), pages 99-107, 2013.
[3]
Kevin Bache and Moshe Lichman. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, 2013. URL http://archive.ics.uci.edu/ml.
[4]
Christopher M Bishop. Section 10.1: variational inference. In Pattern Recognition and Machine Learning. Springer, 2006. ISBN 9780387310732.
[5]
Wray L Buntine and Andreas S Weigend. Bayesian backpropagation. Complex systems, 5(6):603-643, 1991.
[6]
Olivier Chapelle and Lihong Li. An empirical evaluation of Thompson sampling. In Advances in Neural Information Processing Systems (NIPS), pages 2249-2257, 2011.
[7]
Hugh Chipman. Bayesian variable selection with related predictors. Canadian Journal of Statistics, 24(1):17-36, 1996.
[8]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems (NIPS), pages 1223-1231, 2012.
[9]
Sarah Filippi, Olivier Cappe, Aurlien Garivier, and Csaba Szepesvri. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems, pages 586-594, 2010.
[10]
Karl Friston, Jérémie Mattout, Nelson Trujillo-Barreto, John Ashburner, and Will Penny. Variational free energy and the Laplace approximation. Neuroimage, 34 (1):220-234, 2007.
[11]
Andrew Gelman. Objections to Bayesian statistics. Bayesian Analysis, 3:445-450, 2008. ISSN 1931-6690.
[12]
Edward I George and Robert E McCulloch. Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881-889, 1993.
[13]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics Learning (AISTATS), volume 15, pages 315-323, 2011.
[14]
Alex Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 2348-2356, 2011.
[15]
Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, and Daan Wierstra. Deep AutoRegressive networks. In Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1242-1250, 2014.
[16]
Arthur Guez. Sample-Based Search Methods For Bayes-Adaptive Planning. PhD thesis, University College London, 2015.
[17]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. In NIPS 2014 Deep Learning and Representation Learning Workshop, 2014.
[18]
Geoffrey E Hinton and Drew Van Camp. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the 16th Annual Conference On Learning Theory (COLT), pages 5-13. ACM, 1993.
[19]
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, July 2012.
[20]
Tommi S. Jaakkola and Michael I. Jordan. Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1):25-37, 2000.
[21]
Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In Proceedings of the 23rd Annual Conference on Algorithmic Learning Theory (ALT), pages 199-213. Springer, 2012.
[22]
Diederik P. Kingma and MaxWelling. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2014. arXiv: 1312.6114.
[23]
Yann LeCun. Une procédure d'apprentissage pour réseau à seuil asymmetrique (a learning scheme for asymmetric threshold networks). In Proceedings of Cognitiva 85, Paris, France, pages 599-604, 1985.
[24]
Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. 1998. URL http://yann.lecun.com/exdb/mnist/.
[25]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, WWW '10, pages 661-670, New York, NY, USA, 2010. ACM. ISBN 978-1-60558-799-8.
[26]
David JC MacKay. A practical Bayesian framework for backpropagation networks. Neural computation, 4(3): 448-472, 1992.
[27]
David JC MacKay. Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6(3):469-505, 1995.
[28]
Benedict C May, Nathan Korda, Anthony Lee, and David S. Leslie. Optimistic Bayesian sampling in contextual-bandit problems. The Journal of Machine Learning Research, 13(1):2069-2106, 2012.
[29]
Thomas P Minka. A family of algorithms for approximate Bayesian inference. PhD thesis, Massachusetts Institute of Technology, 2001.
[30]
Thomas P Minka. Divergence measures and message passing. Technical report, Microsoft Research, 2005.
[31]
Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023-1032, 1988.
[32]
Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML), pages 807-814, 2010.
[33]
Radford M Neal and Geoffrey E Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models, pages 355- 368. Springer, 1998.
[34]
Manfred Opper and Cédric Archambeau. The variational Gaussian approximation revisited. Neural computation, 21(3):786-792, 2009.
[35]
Art B. Owen. Monte Carlo theory, methods and examples. 2013.
[36]
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning (ICML), pages 1278-1286, 2014.
[37]
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by backpropagating errors. Cognitive modeling, 5, 1988.
[38]
Lawrence K Saul, Tommi Jaakkola, and Michael I Jordan. Mean field theory for sigmoid belief networks. Journal of artificial intelligence research, 4(1):61-76, 1996.
[39]
Patrice Y Simard, Dave Steinkraus, and John C Platt. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), volume 2, pages 958-958. IEEE Computer Society, 2003.
[40]
William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, pages 285-294, 1933.
[41]
Michalis Titsias and Miguel Lázaro-Gredilla. Doubly stochastic variational bayes for non-conjugate inference. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 1971-1979, 2014.
[42]
Li Wan, Matthew Zeiler, Sixin Zhang, Yann L Cun, and Rob Fergus. Regularization of neural networks using dropconnect. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1058-1066, 2013.
[43]
Jonathan S Yedidia, William T Freeman, and Yair Weiss. Generalized belief propagation. In Advances in Neural Information Processing Systems (NIPS), volume 13, pages 689-695, 2000.

Cited By

View all
  • (2024)Functional Wasserstein variational policy optimizationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702858(3893-3911)Online publication date: 15-Jul-2024
  • (2024)Functional Wasserstein bridge inference for Bayesian deep learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702853(3791-3815)Online publication date: 15-Jul-2024
  • (2024)Uncertainty estimation with recursive feature machinesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702742(1408-1437)Online publication date: 15-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37
July 2015
2558 pages

Publisher

JMLR.org

Publication History

Published: 06 July 2015

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Functional Wasserstein variational policy optimizationProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702858(3893-3911)Online publication date: 15-Jul-2024
  • (2024)Functional Wasserstein bridge inference for Bayesian deep learningProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702853(3791-3815)Online publication date: 15-Jul-2024
  • (2024)Uncertainty estimation with recursive feature machinesProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702742(1408-1437)Online publication date: 15-Jul-2024
  • (2024)Low-cost high-power membership inference attacksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694473(58244-58282)Online publication date: 21-Jul-2024
  • (2024)Uncertainty estimation by density aware evidential deep learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694431(57217-57243)Online publication date: 21-Jul-2024
  • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693672(39556-39586)Online publication date: 21-Jul-2024
  • (2024)Variational linearized Laplace approximation for Bayesian deep learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693643(38815-38836)Online publication date: 21-Jul-2024
  • (2024)Learning in deep factor graphs with Gaussian belief propagationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693577(37141-37163)Online publication date: 21-Jul-2024
  • (2024)Reparameterized importance sampling for robust variational bayesian neural networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693396(32680-32690)Online publication date: 21-Jul-2024
  • (2024)BiLLMProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692876(20023-20042)Online publication date: 21-Jul-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media