More Web Proxy on the site http://driver.im/

article

Free access

Distribution-specific hardness of learning neural networks

Author:

Ohad ShamirAuthors Info & Claims

The Journal of Machine Learning Research, Volume 19, Issue 1

Pages 1135 - 1163

Published: 01 January 2018 Publication History

PDF eReader Publisher Site

Abstract

Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions. In this paper, we take a more nuanced view, and consider whether specific assumptions on the "niceness" of the input distribution, or "niceness" of the target function (e.g. in terms of smoothness, non-degeneracy, incoherence, random choice of parameters etc.), are sufficient to guarantee learnability using gradient-based methods. We provide evidence that neither class of assumptions alone is sufficient: On the one hand, for any member of a class of "nice" target functions, there are difficult input distributions. On the other hand, we identify a family of simple target functions, which are di_cult to learn even if the input distribution is "nice". To prove our results, we develop some tools which may be of independent interest, such as extending Fourier-based hardness techniques developed in the context of statistical queries (Blum et al., 1994), from the Boolean cube to Euclidean space and to more general classes of functions.

References

[1]

Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In ICML, 2014.

Digital Library

[2]

Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. In ICML, 2014.

Digital Library

[3]

Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In STOC, 1994.

Digital Library

[4]

Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.

[5]

Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[6]

Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. In AISTATS, 2015.

[7]

Amit Daniely and Shai Shalev-Shwartz. Complexity theoretic limitations on learning dnf's. In COLT, 2016.

[8]

Amit Daniely, Roy Frostig, and Yoram Singer. Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity. arXiv preprint arXiv:1602.05897, 2016.

[9]

David Donoho and Iain Johnstone. Projection-based approximation and a duality with kernel methods. The Annals of Statistics, pages 58-106, 1989.

[10]

John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul): 2121-2159, 2011.

[11]

Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. Statistical query algorithms for stochastic convex optimization. arXiv preprint arXiv:1512.09170, 2015.

[12]

Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasiconvex optimization. In NIPS, 2015.

[13]

John K. Hunter and Bruno Nachtergaele. Applied analysis. World Scientific Publishing, 2001.

[14]

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.

Digital Library

[15]

Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. Beating the perils of nonconvexity: Guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473, 2015.

[16]

Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983-1006, 1998.

[17]

Adam Klivans and Alexander Sherstov. Cryptographic hardness for learning intersections of halfspaces. Journal of Computer and System Sciences, 75(1):2-12, 2009.

Digital Library

[18]

Adam R. Klivans and Pravesh Kothari. Embedding hard learning problems into gaussian space. In APPROX/RANDOM, 2014.

[19]

Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir. On the computational efficiency of training neural networks. In NIPS, 2014.

Digital Library

[20]

Peter McCullagh and John Nelder. Generalized linear models. CRC press, 1989.

[21]

Yurii Nesterov. Minimization methods for nonsmooth convex and quasiconvex functions. Matekon, 29:519-531, 1984.

[22]

Itay Safran and Ohad Shamir. On the quality of the initial basin in overspecified neural networks. In ICML, 2016.

Digital Library

[23]

Le Song, Santosh Vempala, John Wilmes, and Bo Xie. On the complexity of learning neural networks. arXiv preprint arXiv:1707.04615, 2017.

[24]

Daniel Soudry and Yair Carmon. No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv preprint arXiv:1605.08361, 2016.

[25]

Yuchen Zhang, Jason Lee, Martin Wainwright, and Michael Jordan. Learning halfspaces and neural networks with random initialization. arXiv preprint arXiv:1511.07948, 2015.

Cited By

Xiao HSuh GDevadas SLuo BLiao XXu JKirda ELie D(2024)Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable EncryptionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670277(1834-1848)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3670277
Daniely ASrebro NVardi GOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Computational complexity of learning neural networksProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669456(76272-76297)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669456
Bruna JPillaud-Vivien LZweig AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On single index models beyond gaussian dataProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666569(10210-10222)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666569
Show More Cited By

Distribution-specific hardness of learning neural networks
1. Computing methodologies

Recommendations

A review of online learning in supervised neural networks

Learning in neural networks can broadly be divided into two categories, viz., off-line (or batch) learning and online (or incremental) learning. In this paper, a review of a variety of supervised neural networks with online learning capabilities is ...
Artificial neural networks: learning algorithms, performance evaluation, and applications
Common nature of learning between BP-type and Hopfield-type neural networks

Being two famous neural networks, the error back-propagation (BP) algorithm based neural networks (i.e., BP-type neural networks, BPNNs) and Hopfield-type neural networks (HNNs) have been proposed, developed, and investigated extensively for scientific ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 19, Issue 1

January 2018

3249 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Francis Bach
INRIA
,
Blei David
Columbia University
,
Bernhard Schölkopf
MPI for Intelligent Systems

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Revised: 01 March 2018

Published: 01 January 2018

Published in JMLR Volume 19, Issue 1

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
82
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)9

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiao HSuh GDevadas SLuo BLiao XXu JKirda ELie D(2024)Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable EncryptionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670277(1834-1848)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3670277
Daniely ASrebro NVardi GOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Computational complexity of learning neural networksProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669456(76272-76297)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669456
Bruna JPillaud-Vivien LZweig AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)On single index models beyond gaussian dataProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666569(10210-10222)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666569
Arjevani YField MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Annihilation of spurious minima in two-layer ReLU networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602989(37510-37523)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602989
Abbe EBoix-Adserà EKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)On the non-universality of deep learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601520(17188-17201)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601520
Chen SGollakota AKlivans AMeka RKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Hardness of noise-free learning for two-hidden-layer neural networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601048(10709-10724)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601048
Song MZadik IBruna JRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)On the cryptographic hardness of learning single periodic neuronsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542527(29602-29615)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542527
Arjevani YField MRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Analytic study of families of spurious minima in two-layer ReLU neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541423(15162-15174)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541423
Zhang SWang MLiu SChen PXiong JRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Why lottery ticket wins? a theoretical perspective of sample complexity on pruned neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540468(2707-2720)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540468
Zhang SWang MLiu SChen PXiong JDaumé HSingh A(2020)Fast learning of graph neural networks with guaranteed generalizabilityProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525983(11268-11277)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525983
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents