[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Hyper-Parameter-Free Generative Modelling with Deep Boltzmann Trees

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

  • 1554 Accesses

Abstract

Deep neural networks achieve state-of-the-art results in various classification and synthetic data generation tasks. However, only little is known about why depth improves a model. We investigate the structure of stochastic deep neural works, also known as Deep Boltzmann Machines, to shed some light on this issue. While the best known results postulate an exponential dependence between the number of visible units and the depth of the model, we show that the required depth is upper bounded by the longest path in the underlying junction tree, which is at most linear in the number of visible units. Moreover, we show that the conditional independence structure of any categorical Deep Boltzmann Machine contains a sub-tree that allows the consistent estimation of the full joint probability mass function of all visible units. We connect our results to \(l_1\)-regularized maximum-likelihood estimation and Chow-Liu trees. Based on our theoretical findings, we present a new tractable version of Deep Boltzmann Machines, namely the Deep Boltzmann Tree (DBT). We provide a hyper-parameter-free algorithm for learning the DBT from data, and propose a new initialization method to enforce convergence to good solutions. Our findings provide some theoretical evidence for why a deep model might be beneficial. Experimental results on benchmark data show, that the DBT is a theoretical sound alternative to likelihood-free generative models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 67.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 84.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    It turns out that DBTs consist of exactly two hidden layers. While this kind of depth seems rather “shallow”, original work on DBMs [20] define the DBM as a restricted Boltzmann machine which has more than one hidden layer. Thus, to be consistent with the common terminology, we decided to denote our proposed model as “deep”.

  2. 2.

    We have to stress that this is not a hyper-parameter of the DBT. Moreover, 0.05 is not “tuned” either as it is the default value in Chordalysis.

References

  1. Ba, J., Caruana, R.: Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems (NIPS), pp. 2654–2662 (2014)

    Google Scholar 

  2. Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Advances in Neural Information Processing Systems (NIPS) 30, pp. 6241–6250 (2017)

    Google Scholar 

  3. Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 535–541 (2006)

    Google Scholar 

  4. Bulatov, A., Grohe, M.: The complexity of partition functions. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 294–306. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27836-8_27

    Chapter  Google Scholar 

  5. Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)

    Article  MathSciNet  MATH  Google Scholar 

  6. Dempster, A.P., Laird, M.N., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Fulkerson, D.R., Gross, O.A.: Incidence matrices and interval graphs. Pac. J. Math. 15(3), 835–855 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  8. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 249–256 (2010)

    Google Scholar 

  9. Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complexity of neural networks. In: Conference On Learning Theory (COLT), pp. 297–299 (2018)

    Google Scholar 

  10. Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)

    Google Scholar 

  11. Hammersley, J.M., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished manuscript (1971)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  13. Li, Y., Turner, R.E.: Gradient estimators for implicit models. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  14. Montavon, G., Braun, M.L., Müller, K.: Deep Boltzmann machines as feed-forward hierarchies. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 798–804 (2012)

    Google Scholar 

  15. Montúfar, G.: Deep narrow Boltzmann machines are universal approximators. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  16. Montúfar, G., Morton, J.: Discrete restricted Boltzmann machines. J. Mach. Learn. Res. 16, 653–672 (2015)

    MathSciNet  MATH  Google Scholar 

  17. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/\(k^2\)). Sov. Math. Dokl. 27(2), 372–376 (1983)

    MATH  Google Scholar 

  18. Neyshabur, B., Tomioka, R., Srebro, N.: Norm-based capacity control in neural networks. In: Conference on Learning Theory (COLT), pp. 1376–1401 (2015)

    Google Scholar 

  19. Papandreou, G., Yuille, A.L.: Perturb-and-MAP random fields: using discrete optimization to learn and sample from energy models. In: IEEE International Conference on Computer Vision (ICCV), pp. 193–200 (2011)

    Google Scholar 

  20. Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 448–455 (2009)

    Google Scholar 

  21. Salakhutdinov, R., Larochelle, H.: Efficient learning of deep Boltzmann machines. In: Artificial Intelligence and Statistics (AISTATS), pp. 693–700 (2010)

    Google Scholar 

  22. Shazeer, N., et al.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. CoRR abs/1701.06538 (2017)

    Google Scholar 

  23. Valiant, L.G.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8(3), 410–421 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  24. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1(1–2), 1–305 (2008)

    Article  MATH  Google Scholar 

  25. Warde-Farley, D., Bengio, Y.: Improving generative adversarial networks with denoising feature matching. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  26. Webb, G.I., Petitjean, F.: A multiple test correction for streams and cascades of statistical hypothesis tests. In: International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1225–1264 (2016)

    Google Scholar 

  27. Yang, E., Lozano, A.C., Ravikumar, P.: Elementary estimators for graphical models. In: Advances in Neural Information Processing Systems (NIPS) 27, pp. 2159–2167 (2014)

    Google Scholar 

Download references

Acknowledgments

This research has been funded by the Federal Ministry of Education and Research of Germany as part of the competence center for machine learning ML2R (01S18038A).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nico Piatkowski .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 167 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Piatkowski, N. (2020). Hyper-Parameter-Free Generative Modelling with Deep Boltzmann Trees. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46147-8_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46146-1

  • Online ISBN: 978-3-030-46147-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics