[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3540261.3542527guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

On the cryptographic hardness of learning single periodic neurons

Published: 10 June 2024 Publication History

Abstract

We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir'18), and Statistical Query (SQ) algorithms (Song et al.'17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms, beyond gradient-based and SQ algorithms, under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.'21) and phase retrieval with an optimal sample complexity of d + 1 samples. In the former case, this improves upon the quadratic-in-d sample complexity required in (Bruna et al.'21).

Supplementary Material

Additional material (3540261.3542527_supp.pdf)
Supplemental material.

References

[1]
Emmanuel Abbe and Colin Sandon. Poly-time universality and limitations of deep learning, 2020.
[2]
Dorit Aharonov and Oded Regev. Lattice problems in np ∩ conp. J. ACM, 52(5):749–765, September 2005.
[3]
Gorjan Alagic, Jacob Alperin-Sheriff, Daniel Apon, David Cooper, Quynh Dang, John Kelsey, Yi-Kai Liu, Carl Miller, Dustin Moody, Rene Peralta, Ray Perlner, Angela Robinson, and Daniel Smith-Tone. Status Report on the Second Round of the NIST Post-Quantum Cryptography Standardization Process. Technical Report NIST Internal or Interagency Report (NISTIR) 8309, National Institute of Standards and Technology, July 2020.
[4]
Zeyuan Allen-Zhu, Yuanzhi Li, and Yingyu Liang. Learning and generalization in overparame-terized neural networks, going beyond two layers, 2018.
[5]
Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Structures & Algorithms, 13(3-4):457–466, 1998.
[6]
Alexandr Andoni, Daniel Hsu, Kevin Shi, and Xiaorui Sun. Correspondence retrieval. In COLT, volume 65 of Proceedings of Machine Learning Research, pages 105–126. PMLR, 2017.
[7]
Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Online stochastic gradient descent on non-convex losses from high-dimensional inference, 2021.
[8]
Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, and Lenka Zdeborová. The committee machine: Computational to statistical gaps in learning a two-layers neural network. Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124023, 2019.
[9]
Afonso S. Bandeira, Amelia Perry, and Alexander S. Wein. Notes on computational-to-statistical gaps: predictions using statistical physics, 2018.
[10]
B. Barak, S. B. Hopkins, J. Kelner, P. Kothari, A. Moitra, and A. Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 428–437, 2016.
[11]
Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. Optimal errors and phase transitions in high-dimensional generalized linear models. Proceedings of the National Academy of Sciences, 116(12):5451–5460, 2019.
[12]
Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In STOC, STOC '94, pages 253–262, 1994.
[13]
Joan Bruna, Oded Regev, Min Jae Song, and Yi Tang. Continuous lwe. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021.
[14]
Alon Brutzkus, Amir Globerson, Eran Malach, and Shai Shalev-Shwartz. Sgd learns over-parameterized networks that provably generalize on linearly separable data, 2017.
[15]
Amit Daniely and Gal Vardi. From local pseudorandom generators to hardness of learning, 2021.
[16]
Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R Klivans, and Mahdi Soltanolkotabi. Approximation schemes for relu regression. In Conference on Learning Theory, pages 1452–1485. PMLR, 2020.
[17]
Ilias Diakonikolas, Daniel M Kane, Vasilis Kontonis, and Nikos Zarifis. Algorithms and sq lower bounds for pac learning one-hidden-layer relu networks. In Conference on Learning Theory, pages 1514–1539. PMLR, 2020.
[18]
David L. Donoho, Arian Maleki, and Andrea Montanari. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.
[19]
Leo Ducas, Tancrede Lepoint, Vadim Lyubashevsky, Peter Schwabe, Gregor Seiler, and Damien Stehle. Crystals – dilithium: Digital signatures from module lattices. Cryptology ePrint Archive, Report 2017/633, 2017.
[20]
Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S. Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. J. ACM, 64(2), 2017.
[21]
Spencer Frei, Yuan Cao, and Quanquan Gu. Agnostic learning of a single neuron with gradient descent, 2020.
[22]
Alan M. Frieze. On the lagarias-odlyzko algorithm for the subset sum problem. SIAM J. Comput., 15:536–539, 1986.
[23]
David Gamarnik and Ilias Zadik. High dimensional linear regression with binary coefficients: Mean squared error and a phase transition. In Conference on Learning Theory (COLT), 2017.
[24]
David Gamarnik and Ilias Zadik. The landscape of the planted clique problem: Dense subgraphs and the overlap gap property, 2019.
[25]
David Gamarnik and Ilias Zadik. Sparse high-dimensional linear regression. algorithmic barriers and a local search algorithm, 2019.
[26]
David Gamarnik, Eren C. Kizildağ, and Ilias Zadik. Inference in high-dimensional linear regression via lattice basis reduction and integer relation detection, 2019.
[27]
Rong Ge, Jason D Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with landscape design, 2017.
[28]
Surbhi Goel, Sushrut Karmalkar, and Adam Klivans. Time/accuracy tradeoffs for learning a relu with respect to gaussian marginals. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d 'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[29]
Surbhi Goel, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, and Adam Klivans. Super-polynomial lower bounds for learning one-layer neural networks using gradient descent. In International Conference on Machine Learning, pages 3587–3596. PMLR, 2020.
[30]
Mark Jerrum. Large cliques elude the metropolis process. Random Structures & Algorithms, 3 (4):347–359, 1992.
[31]
Sham M Kakade, Varun Kanade, Ohad Shamir, and Adam Kalai. Efficient learning of generalized linear and single index models with isotonic regression. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
[32]
Ravi Kannan. Improved algorithms for integer programming and related lattice problems. In Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC '83, page 193–206, New York, NY, USA, 1983. Association for Computing Machinery.
[33]
Michael Kearns. Efficient noise-tolerant learning from statistical queries. J. ACM, 45(6): 983–1006, November 1998.
[34]
Michael Kearns and Leslie Valiant. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM, 41(1):67–95, January 1994.
[35]
Michael Kharitonov. Cryptographic hardness of distribution-specific learning. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, STOC '93, page 372–381, New York, NY, USA, 1993. Association for Computing Machinery.
[36]
Adam R Klivans and Alexander A Sherstov. Cryptographic hardness for learning intersections of halfspaces. Journal of Computer and System Sciences, 75(1):2–12, 2009.
[37]
Dmitriy Kunisky, Alexander S. Wein, and Afonso S. Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio, 2019.
[38]
J. C. Lagarias and A. M. Odlyzko. Solving low-density subset sum problems. J. ACM, 32(1): 229–246, January 1985.
[39]
Jeffrey C Lagarias. Knapsack public key cryptosystems and diophantine approximation. In Advances in cryptology, pages 3–23. Springer, 1984.
[40]
Arjen Klaas Lenstra, Hendrik Willem Lenstra, and László Lovász. Factoring polynomials with rational coefficients. Mathematische Annalen, 261(4):515–534, 1982.
[41]
Antoine Maillard, Bruno Loureiro, Florent Krzakala, and Lenka Zdeborová. Phase retrieval in high dimensions: Statistical and computational phase transitions, 2020.
[42]
Daniele Micciancio and Shafi Goldwasser. Complexity of Lattice Problems: A Cryptographic Perspective. The Springer International Series in Engineering and Computer Science. Springer US, 2002.
[43]
Daniele Micciancio and Oded Regev. Lattice-based Cryptography, pages 147–191. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.
[44]
Marlene Müller. Generalized Linear Models, pages 205–228. Springer Berlin Heidelberg, Berlin, Heidelberg, 2000.
[45]
J. A. Nelder and R. W. M. Wedderburn. Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135(3):370–384, 1972.
[46]
Phong Q. Nguyen and Damien Stehlé. An lll algorithm with quadratic complexity. SIAM Journal on Computing, 39(3):874–903, 2009.
[47]
Sundeep Rangan. Generalized approximate message passing for estimation with random linear mixing. In 2011 IEEE International Symposium on Information Theory Proceedings, pages 2168–2172. IEEE, 2011.
[48]
Galen Reeves, Jiaming Xu, and Ilias Zadik. All-or-nothing phenomena: From single-letter to high dimensions. In 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 654–658, 2019.
[49]
Oded Regev. On lattices, learning with errors, random linear codes, and cryptography. In STOC, STOC '05, pages 84–93, 2005.
[50]
Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, and Lenka Zdeborová. Marvels and pitfalls of the langevin algorithm in noisy high-dimensional inference. Phys. Rev. X, 10:011057, 2020.
[51]
Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, USA, 2014.
[52]
Shai Shalev-Shwartz, Ohad Shamir, and Shaked Shammah. Failures of gradient-based deep learning. In International Conference on Machine Learning, pages 3067–3075. PMLR, 2017.
[53]
Adi Shamir. A polynomial time algorithm for breaking the basic merkle-hellman cryptosystem. In 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), pages 145–152. IEEE, 1982.
[54]
Ohad Shamir. Distribution-specific hardness of learning neural networks. J. Mach. Learn. Res., 19(1):1135–1163, 2018.
[55]
Denis Simon. Selected Applications of LLL in Number Theory, pages 265–282. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
[56]
Mahdi Soltanolkotabi. Learning relus via gradient descent, 2017.
[57]
Le Song, Santosh Vempala, John Wilmes, and Bo Xie. On the complexity of learning neural networks. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[58]
Balázs Szörényi. Characterizing statistical query learning: Simplified notions and proofs. In Ricard Gavaldà, Gábor Lugosi, Thomas Zeugmann, and Sandra Zilles, editors, Algorithmic Learning Theory, pages 186–200, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
[59]
Martin J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1 -constrained quadratic programming (lasso). IEEE Transactions on Information Theory, 55 (5):2183–2202, 2009.
[60]
Ilias Zadik and David Gamarnik. High dimensional linear regression using lattice basis reduction. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
[61]
Kai Zhong, Zhao Song, Prateek Jain, Peter L Bartlett, and Inderjit S Dhillon. Recovery guarantees for one-hidden-layer neural networks, 2017.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
December 2021
30517 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 June 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media