[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

Published: 01 February 2012 Publication History

Abstract

We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities.

References

[1]
C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
[2]
C.J. Geyer. On the convergence of Monte Carlo maximum likelihood calculations. Journal of the Royal Statistical Society, Series B (Methodological), 56(1):261-274, 1994.
[3]
M. Gutmann and A. Hyvärinen. Learning features by contrasting natural images with noise. In Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN), volume 5769 of Lecture Notes in Computer Science, pages 623-632. Springer Berlin / Heidelberg, 2009.
[4]
M. Gutmann and A. Hyvärinen. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 9 of JMLR W&CP, pages 297-304, 2010.
[5]
T. Hastie, R. Tibshirani, and J.H. Friedman. The Elements of Statistical Learning. Springer, 2009.
[6]
G. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771-1800, 2002.
[7]
A. Hyvärinen. Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research, 6:695-709, 2005.
[8]
A. Hyvärinen. Optimal approximation of signal priors. Neural Computation, 20:3087-3110, 2008.
[9]
A. Hyvärinen, P.O. Hoyer, and M. Inki. Topographic independent component analysis. Neural Computation, 13(7):1527-1558, 2001a.
[10]
A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley-Interscience, 2001b.
[11]
A. Hyvärinen, J. Hurri, and P.O. Hoyer. Natural Image Statistics. Springer, 2009.
[12]
Y. Karklin and M. Lewicki. A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals. Neural Computation, 17:397-423, 2005.
[13]
D. Koller and N. Friedman. Probabilistic Graphical Models. MIT Press, 2009.
[14]
U. Köster and A. Hyvärinen. A two-layer model of natural stimuli estimated with score matching. Neural Computation, 22(9):2308-2333, 2010.
[15]
J. Lücke and M. Sahani. Maximal causes for non-linear component extraction. Journal of Machine Learning Research, 9:1227-1267, 2008.
[16]
R.M. Neal. Handbook of Markov Chain Monte Carlo, chapter MCMC using Hamiltonian Dynamics. Chapman & Hall / CRC Press, 2010.
[17]
B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607-609, 1996.
[18]
S. Osindero and G. Hinton. Modeling image patches with a directed hierarchy of Markov random fields. In Advances in Neural Information Processing Systems 20, pages 1121-1128. MIT Press, 2008.
[19]
S. Osindero, M. Welling, and G. E. Hinton. Topographic product models applied to natural scene statistics. Neural Computation, 18 (2):381-414, 2006.
[20]
M. Pihlaja, M. Gutmann, and A. Hyvärinen. A family of computationally efficient and simple estimators for unnormalized statistical models. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), pages 442-449. AUAI Press, 2010.
[21]
M.A. Ranzato and G. Hinton. Modeling pixel means and covariances using factorized third-order Boltzmann machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2551-2558, 2010.
[22]
C.E. Rasmussen. Conjugate gradient algorithm, Matlab code version 2006-09-08. Downloaded from http://learning.eng.cam.ac.uk/carl/code/minimize/minimize.m. 2006.
[23]
C.P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer, 2nd edition, 2004.
[24]
N.N. Schraudolph and T. Graepel. Towards stochastic conjugate gradient methods. In Proceedings of the 9th International Conference on Neural Information Processing (ICONIP), volume 2, pages 853-856, 2002.
[25]
W. Sun and Y. Yuan. Optimization Theory and Methods: Nonlinear Programming. Springer, 2006.
[26]
Y. Teh, M. Welling, S. Osindero, and G. Hinton. Energy-based models for sparse overcomplete representations. Journal of Machine Learning Research, 4:1235-1260, 2004.
[27]
T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th International Conference on Machine Learning, pages 1064-1071, 2008.
[28]
J. H. van Hateren and A. van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1394):359-366, 1998.
[29]
Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600-612, 2004.
[30]
L. Wasserman. All of Statistics. Springer, 2004.
[31]
L. Younes. Parametric inference for imperfectly observed Gibbsian fields. Probability Theory and Related Fields, 82(4):625-645, 1989.

Cited By

View all
  • (2024)Causal representation learning made identifiable by grouping of observational variablesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693546(36249-36293)Online publication date: 21-Jul-2024
  • (2024)Transitivity-preserving graph representation learning for bridging local connectivity and role-based similarityProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29138(12456-12465)Online publication date: 20-Feb-2024
  • (2024)A Remote Sensing Image Classification Based on Contrast Learning with Similarity FusionProceedings of the 2024 4th International Conference on Internet of Things and Machine Learning10.1145/3697467.3697646(222-228)Online publication date: 9-Aug-2024
  • Show More Cited By

Index Terms

  1. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image The Journal of Machine Learning Research
        The Journal of Machine Learning Research  Volume 13, Issue 1
        January 2012
        3712 pages
        ISSN:1532-4435
        EISSN:1533-7928
        Issue’s Table of Contents

        Publisher

        JMLR.org

        Publication History

        Published: 01 February 2012
        Published in JMLR Volume 13, Issue 1

        Author Tags

        1. computation
        2. estimation
        3. natural image statistics
        4. partition function
        5. unnormalized models

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)65
        • Downloads (Last 6 weeks)24
        Reflects downloads up to 06 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Causal representation learning made identifiable by grouping of observational variablesProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693546(36249-36293)Online publication date: 21-Jul-2024
        • (2024)Transitivity-preserving graph representation learning for bridging local connectivity and role-based similarityProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i11.29138(12456-12465)Online publication date: 20-Feb-2024
        • (2024)A Remote Sensing Image Classification Based on Contrast Learning with Similarity FusionProceedings of the 2024 4th International Conference on Internet of Things and Machine Learning10.1145/3697467.3697646(222-228)Online publication date: 9-Aug-2024
        • (2024)Heterogeneous Information Crossing on Graphs for Session-Based Recommender SystemsACM Transactions on the Web10.1145/357240718:2(1-24)Online publication date: 8-Jan-2024
        • (2024)Graphical Modeling for Multi-Source Domain AdaptationIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317237246:3(1727-1741)Online publication date: 1-Mar-2024
        • (2024)FD-GAN: Generalizable and Robust Forgery Detection via Generative Adversarial NetworksInternational Journal of Computer Vision10.1007/s11263-024-02136-1132:12(5801-5819)Online publication date: 1-Dec-2024
        • (2023)L-C2STProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668582(56384-56410)Online publication date: 10-Dec-2023
        • (2023)Provable benefits of annealing for estimating normalizing constantsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668114(45945-45970)Online publication date: 10-Dec-2023
        • (2023)Compositional foundation models for hierarchical planningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667101(22304-22325)Online publication date: 10-Dec-2023
        • (2023)Approximate stein classes for truncated density estimationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619952(37066-37090)Online publication date: 23-Jul-2023
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Full Access

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media