More Web Proxy on the site http://driver.im/

article

Free access

Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

Authors:

Michael U. Gutmann,

Aapo HyvärinenAuthors Info & Claims

The Journal of Machine Learning Research, Volume 13, Issue 1

Pages 307 - 361

Published: 01 February 2012 Publication History

PDF eReader Publisher Site

Abstract

We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities.

References

[1]

C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.

[2]

C.J. Geyer. On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society, Series B (Methodological), 56(1):261-274, 1994.

[3]

M. Gutmann and A. Hyvärinen. Learning features by contrasting natural images with noise. In Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN), volume 5769 of Lecture Notes in Computer Science, pages 623-632. Springer Berlin / Heidelberg, 2009.

[4]

M. Gutmann and A. Hyvärinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 9 of JMLR W&CP, pages 297-304, 2010.

[5]

T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learning. Springer, 2009.

[6]

G. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771-1800, 2002.

[7]

A. Hyvärinen. Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research, 6:695-709, 2005.

[8]

A. Hyvärinen. Optimal approximation of signal priors. Neural Computation, 20:3087-3110, 2008.

[9]

A. Hyvärinen, P.O. Hoyer, and M. Inki. Topographic independent component analysis. Neural Computation, 13(7):1527-1558, 2001a.

[10]

A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley-Interscience, 2001b.

[11]

A. Hyvärinen, J. Hurri, and P.O. Hoyer. Natural Image Statistics. Springer, 2009.

[12]

Y. Karklin and M. Lewicki. A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Computation, 17:397-423, 2005.

[13]

D. Koller and N. Friedman. Probabilistic Graphical Models. MIT Press, 2009.

[14]

U. Köster and A. Hyvärinen. A two-layer model of natural stimuli estimated with score matching. Neural Computation, 22(9):2308-2333, 2010.

[15]

J. Lücke and M. Sahani. Maximal causes for non-linear component extraction. Journal of Machine Learning Research, 9:1227-1267, 2008.

[16]

R.M. Neal. Handbook of Markov Chain Monte Carlo, chapter MCMC using Hamiltonian Dynamics. Chapman & Hall / CRC Press, 2010.

[17]

B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607-609, 1996.

[18]

S. Osindero and G. Hinton. Modeling image patches with a directed hierarchy of Markov random fields. In Advances in Neural Information Processing Systems 20, pages 1121-1128. MIT Press, 2008.

[19]

S. Osindero, M. Welling, and G. E. Hinton. Topographic product models applied to natural scene statistics. Neural Computation, 18 (2):381-414, 2006.

[20]

M. Pihlaja, M. Gutmann, and A. Hyvärinen. A family of computationally efficient and simple estimators for unnormalized statistical models. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), pages 442-449. AUAI Press, 2010.

[21]

M.A. Ranzato and G. Hinton. Modeling pixel means and covariances using factorized third-order Boltzmann machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2551-2558, 2010.

[22]

C.E. Rasmussen. Conjugate gradient algorithm, Matlab code version 2006-09-08. Downloaded from http://learning.eng.cam.ac.uk/carl/code/minimize/minimize.m. 2006.

[23]

C.P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, 2nd edition, 2004.

[24]

N.N. Schraudolph and T. Graepel. Towards stochastic conjugate gradient methods. In Proceedings of the 9th International Conference on Neural Information Processing (ICONIP), volume 2, pages 853-856, 2002.

[25]

W. Sun and Y. Yuan. Optimization Theory and Methods: Nonlinear Programming. Springer, 2006.

[26]

Y. Teh, M. Welling, S. Osindero, and G. Hinton. Energy-based models for sparse overcomplete representations. Journal of Machine Learning Research, 4:1235-1260, 2004.

[27]

T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning, pages 1064-1071, 2008.

[28]

J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1394):359-366, 1998.

[29]

Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600-612, 2004.

[30]

L. Wasserman. All of Statistics. Springer, 2004.

[31]

L. Younes. Parametric inference for imperfectly observed Gibbsian fields. Probability Theory and Related Fields, 82(4):625-645, 1989.

Cited By

Morioka HHyvärinen ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Causal representation learning made identifiable by grouping of observational variablesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693546(36249-36293)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693546
Hoang VLee OWooldridge MDy JNatarajan S(2024)Transitivity-preserving graph representation learning for bridging local connectivity and role-based similarityProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29138(12456-12465)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i11.29138
Zeng YLi HWang SGe Y(2024)A Remote Sensing Image Classification Based on Contrast Learning with Similarity FusionProceedings of the 2024 4th International Conference on Internet of Things and Machine Learning10.1145/3697467.3697646(222-228)Online publication date: 9-Aug-2024
https://dl.acm.org/doi/10.1145/3697467.3697646
Show More Cited By

Index Terms

Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

Recommendations

Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, ...
Mean-squared error and threshold SNR prediction of maximum-likelihood signal parameter estimation with estimated colored noise covariances

An interval error-based method (MIE) of predicting mean squared error (MSE) performance of maximum-likelihood estimators (MLEs) is extended to the case of signal parameter estimation requiring intermediate estimation of an unknown colored noise ...
Generating dithering noise for maximum likelihood estimation from quantized data

The Quantization Theorem I (QT I) implies that the likelihood function can be reconstructed from quantized sensor observations, given that appropriate dithering noise is added before quantization. We present constructive algorithms to generate such ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 13, Issue 1

January 2012

3712 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 February 2012

Published in JMLR Volume 13, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

114
Total Citations
View Citations
656
Total Downloads

Downloads (Last 12 months)65
Downloads (Last 6 weeks)24

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Morioka HHyvärinen ASalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Causal representation learning made identifiable by grouping of observational variablesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693546(36249-36293)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693546
Hoang VLee OWooldridge MDy JNatarajan S(2024)Transitivity-preserving graph representation learning for bridging local connectivity and role-based similarityProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29138(12456-12465)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i11.29138
Zeng YLi HWang SGe Y(2024)A Remote Sensing Image Classification Based on Contrast Learning with Similarity FusionProceedings of the 2024 4th International Conference on Internet of Things and Machine Learning10.1145/3697467.3697646(222-228)Online publication date: 9-Aug-2024
https://dl.acm.org/doi/10.1145/3697467.3697646
Zheng XWu RHan ZChen CChen LHan B(2024)Heterogeneous Information Crossing on Graphs for Session-Based Recommender SystemsACM Transactions on the Web10.1145/357240718:2(1-24)Online publication date: 8-Jan-2024
https://dl.acm.org/doi/10.1145/3572407
Xu MWang HNi B(2024)Graphical Modeling for Multi-Source Domain AdaptationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317237246:3(1727-1741)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TPAMI.2022.3172372
Xu NFeng WZhang TZhang Y(2024)FD-GAN: Generalizable and Robust Forgery Detection via Generative Adversarial NetworksInternational Journal of Computer Vision10.1007/s11263-024-02136-1132:12(5801-5819)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s11263-024-02136-1
Linhart JGramfort ARodrigues POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)L-C2STProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668582(56384-56410)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668582
Chehab OHyvärinen ARisteski AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Provable benefits of annealing for estimating normalizing constantsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668114(45945-45970)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668114
Ajay AHan SDu YLi SGupta AJaakkola TTenenbaum JKaelbling LSrivastava AAgrawal POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Compositional foundation models for hierarchical planningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667101(22304-22325)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667101
Williams DLiu SKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Approximate stein classes for truncated density estimationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619952(37066-37090)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619952
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents