Learning flat representations with artificial neural networks

Vlad Constantinescu^1,2,
Costin Chiru^1,2,
Tudor Boloni³,
Adina Florea² &
…
Robi Tacutu ORCID: orcid.org/0000-0001-5853-2470^1,4

214 Accesses
Explore all metrics

Abstract

In this paper, we propose a method of learning representation layers with squashing activation functions within a deep artificial neural network which directly addresses the vanishing gradients problem. The proposed solution is derived from solving the maximum likelihood estimator for components of the posterior representation, which are approximately Beta-distributed, formulated in the context of variational inference. This approach not only improves the performance of deep neural networks with squashing activation functions on some of the hidden layers - including in discriminative learning - but can be employed towards producing sparse codes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Why Neural Networks Work

Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks

Article Open access 30 June 2021

Introduction to Neural Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Source: http://yann.lecun.com/exdb/mnist/
Source: https://catalog.ldc.upenn.edu/LDC93S1
Source: https://www.cs.toronto.edu/~kriz/cifar.html
Source: https://www.ldc.upenn.edu/language-resources/tools/sphere-conversion-tools
Source: https://github.com/jameslyons/python_speech_features
Source: https://github.com/deepmind/sonnet

References

Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc, pp 1097–1105
LeCunn Y, Bottou L, Orr GB, Muller K-R (1998) Efficient backprop. In: Neural Networks: tricks of the trade. Springer, New York
Hahnloser RHR, Sarpeshkar R, Mahowald MA, Douglas RJ, Sebastian SH (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature p 405
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (ELUs)
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space
Bojanowski P, Joulin A (2017) Unsupervised learning by predicting noise
Rifai S, Vincent P, Muller X, et al. (2011) Contractive auto-encoders: Explicit invariance during feature extraction. Proceedings of the
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Jiang N, Rong W, Peng B, Nie Y, Xiong Z (2015) An empirical analysis of different sparse penalties for autoencoder in unsupervised feature learning. In: 2015 international joint conference on neural networks (IJCNN), pp 1–8
Le QV, Karpenko A, Ngiam J, Ng AY (2011) ICA with reconstruction cost for efficient overcomplete feature learning. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ (eds) Advances in neural information processing systems 24. Curran Associates, Inc, pp 1017–1025
Dinh L, Krueger D, Bengio Y (2014) Nice: Non-linear independent components estimation
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, New Jersey
Book Google Scholar
Miller EG, Fisher JW III (2003) Independent components analysis by direct entropy minimization. Technical report, DTIC Document
Sejnowski TJ, Bell AJ (1995) An information-maximisation approach to blind separation and blind deconvolution
Kingma DP, Welling M (2013) Auto-Encoding variational bayes
Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using a learned similarity metric
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) Beta-VAE: learning basic visual concepts with a constrained variational framework
Muche R (1999) Applied survival analysis: Regression modeling of time to event data. dw hosmer, jr., s lemeshow. Wiley, New York, p 386. ISBN: 0-471-15410-5
Google Scholar
Sussillo D, Abbott LF (2014) Random walk initialization for training very deep feedforward networks
Yoshida Y, Miyato T (2017) Spectral norm regularization for improving the generalizability of deep learning
Paisley J, Blei D, Jordan M (2012) Variational bayesian inference with stochastic search
Beckman RJ, Tiet jen GL (1978) Maximum likelihood estimation for the beta distribution. J Stat Comput Simul 7(3-4):253–258
Article Google Scholar
Krizhevsky A (2012) Learning multiple layers of features from tiny images
Kingma D, Ba J (2014) Adam: A method for stochastic optimization
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks
Bienaymé I-J (1853) Considérations à l’appui de la découverte de laplace sur la loi de probabilité dans la méthode des moindres carrés, vol 37. Comptes rendus de l’Académie des sciences, Paris, pp 309–317
Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition
Liao Q, Poggio T (2016) Bridging the gaps between residual learning recurrent neural networks and visual cortex
Graves A, Fernandez S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition

Download references

Acknowledgements

This work was supported by the National Authority for Scientific Research and Innovation, and by the Ministry of European Funds, through the Competitiveness Operational Programme 2014-2020, POC-A.1-A.1.1.4-E-2015 [Grant number: 40/02.09.2016, ID: P_37_778, to RT]. We also gratefully acknowledge the support of the NVIDIA Corporation for the donation of a Titan Xp GPU, and the support of the Microsoft Corporation for a 1-year Azure Research Sponsorship. We are thankful to Dmitri Toren for helping in the making of Figure 1.

Author information

Authors and Affiliations

Systems Biology of Aging Group, Institute of Biochemistry of the Romanian Academy, Bucharest, Romania
Vlad Constantinescu, Costin Chiru & Robi Tacutu
Computer Science and Engineering Department, University Politehnica of Bucharest, Bucharest, Romania
Vlad Constantinescu, Costin Chiru & Adina Florea
AITIAOne Inc., 2531 Piedmont Ave., Montrose, CA, United States
Tudor Boloni
Chronos Biosystems SRL, Bucharest, Romania
Robi Tacutu

Authors

Vlad Constantinescu
View author publications
You can also search for this author in PubMed Google Scholar
Costin Chiru
View author publications
You can also search for this author in PubMed Google Scholar
Tudor Boloni
View author publications
You can also search for this author in PubMed Google Scholar
Adina Florea
View author publications
You can also search for this author in PubMed Google Scholar
Robi Tacutu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Vlad Constantinescu or Robi Tacutu.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Constantinescu, V., Chiru, C., Boloni, T. et al. Learning flat representations with artificial neural networks. Appl Intell 51, 2456–2470 (2021). https://doi.org/10.1007/s10489-020-02032-4

Download citation

Accepted: 21 October 2020
Published: 04 November 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s10489-020-02032-4

Learning flat representations with artificial neural networks

Abstract

Access this article

Subscribe and save

Buy Now