[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Distribution-specific hardness of learning neural networks

Published: 01 January 2018 Publication History

Abstract

Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions. In this paper, we take a more nuanced view, and consider whether specific assumptions on the "niceness" of the input distribution, or "niceness" of the target function (e.g. in terms of smoothness, non-degeneracy, incoherence, random choice of parameters etc.), are sufficient to guarantee learnability using gradient-based methods. We provide evidence that neither class of assumptions alone is sufficient: On the one hand, for any member of a class of "nice" target functions, there are difficult input distributions. On the other hand, we identify a family of simple target functions, which are di_cult to learn even if the input distribution is "nice". To prove our results, we develop some tools which may be of independent interest, such as extending Fourier-based hardness techniques developed in the context of statistical queries (Blum et al., 1994), from the Boolean cube to Euclidean space and to more general classes of functions.

References

[1]
Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In ICML, 2014.
[2]
Sanjeev Arora, Aditya Bhaskara, Rong Ge, and Tengyu Ma. Provable bounds for learning some deep representations. In ICML, 2014.
[3]
Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In STOC, 1994.
[4]
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
[5]
Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
[6]
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. In AISTATS, 2015.
[7]
Amit Daniely and Shai Shalev-Shwartz. Complexity theoretic limitations on learning dnf's. In COLT, 2016.
[8]
Amit Daniely, Roy Frostig, and Yoram Singer. Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity. arXiv preprint arXiv:1602.05897, 2016.
[9]
David Donoho and Iain Johnstone. Projection-based approximation and a duality with kernel methods. The Annals of Statistics, pages 58-106, 1989.
[10]
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul): 2121-2159, 2011.
[11]
Vitaly Feldman, Cristobal Guzman, and Santosh Vempala. Statistical query algorithms for stochastic convex optimization. arXiv preprint arXiv:1512.09170, 2015.
[12]
Elad Hazan, Kfir Levy, and Shai Shalev-Shwartz. Beyond convexity: Stochastic quasiconvex optimization. In NIPS, 2015.
[13]
John K. Hunter and Bruno Nachtergaele. Applied analysis. World Scientific Publishing, 2001.
[14]
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
[15]
Majid Janzamin, Hanie Sedghi, and Anima Anandkumar. Beating the perils of nonconvexity: Guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473, 2015.
[16]
Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983-1006, 1998.
[17]
Adam Klivans and Alexander Sherstov. Cryptographic hardness for learning intersections of halfspaces. Journal of Computer and System Sciences, 75(1):2-12, 2009.
[18]
Adam R. Klivans and Pravesh Kothari. Embedding hard learning problems into gaussian space. In APPROX/RANDOM, 2014.
[19]
Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir. On the computational efficiency of training neural networks. In NIPS, 2014.
[20]
Peter McCullagh and John Nelder. Generalized linear models. CRC press, 1989.
[21]
Yurii Nesterov. Minimization methods for nonsmooth convex and quasiconvex functions. Matekon, 29:519-531, 1984.
[22]
Itay Safran and Ohad Shamir. On the quality of the initial basin in overspecified neural networks. In ICML, 2016.
[23]
Le Song, Santosh Vempala, John Wilmes, and Bo Xie. On the complexity of learning neural networks. arXiv preprint arXiv:1707.04615, 2017.
[24]
Daniel Soudry and Yair Carmon. No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv preprint arXiv:1605.08361, 2016.
[25]
Yuchen Zhang, Jason Lee, Martin Wainwright, and Michael Jordan. Learning halfspaces and neural networks with random initialization. arXiv preprint arXiv:1511.07948, 2015.

Cited By

View all
  • (2024)Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable EncryptionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670277(1834-1848)Online publication date: 2-Dec-2024
  • (2023)Computational complexity of learning neural networksProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669456(76272-76297)Online publication date: 10-Dec-2023
  • (2023)On single index models beyond gaussian dataProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666569(10210-10222)Online publication date: 10-Dec-2023
  • Show More Cited By
  1. Distribution-specific hardness of learning neural networks

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The Journal of Machine Learning Research
    The Journal of Machine Learning Research  Volume 19, Issue 1
    January 2018
    3249 pages
    ISSN:1532-4435
    EISSN:1533-7928
    Issue’s Table of Contents

    Publisher

    JMLR.org

    Publication History

    Revised: 01 March 2018
    Published: 01 January 2018
    Published in JMLR Volume 19, Issue 1

    Author Tags

    1. computational hardness
    2. distributional assumptions
    3. gradient-based methods
    4. neural networks

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Formal Privacy Proof of Data Encoding: The Possibility and Impossibility of Learnable EncryptionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670277(1834-1848)Online publication date: 2-Dec-2024
    • (2023)Computational complexity of learning neural networksProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669456(76272-76297)Online publication date: 10-Dec-2023
    • (2023)On single index models beyond gaussian dataProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666569(10210-10222)Online publication date: 10-Dec-2023
    • (2022)Annihilation of spurious minima in two-layer ReLU networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602989(37510-37523)Online publication date: 28-Nov-2022
    • (2022)On the non-universality of deep learningProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601520(17188-17201)Online publication date: 28-Nov-2022
    • (2022)Hardness of noise-free learning for two-hidden-layer neural networksProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601048(10709-10724)Online publication date: 28-Nov-2022
    • (2021)On the cryptographic hardness of learning single periodic neuronsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542527(29602-29615)Online publication date: 6-Dec-2021
    • (2021)Analytic study of families of spurious minima in two-layer ReLU neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541423(15162-15174)Online publication date: 6-Dec-2021
    • (2021)Why lottery ticket wins? a theoretical perspective of sample complexity on pruned neural networksProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540468(2707-2720)Online publication date: 6-Dec-2021
    • (2020)Fast learning of graph neural networks with guaranteed generalizabilityProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525983(11268-11277)Online publication date: 13-Jul-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media