More Web Proxy on the site http://driver.im/

research-article

Open access

A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

Authors:

Aravind Gollakota,

Adam R. Klivans,

Pravesh K. KothariAuthors Info & Claims

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

Pages 1657 - 1670

https://doi.org/10.1145/3564246.3585206

Published: 02 June 2023 Publication History

Abstract

A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of testable learning, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians.

In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree sandwiching polynomials, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions.

Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.

References

[1]

Noga Alon, Oded Goldreich, and Yishay Mansour. 2003. Almost k-wise independence versus k-wise independence. Inform. Process. Lett., 88, 3 (2003), 107–110.

Digital Library

[2]

D. Bakry and Michel Émery. 1985. Diffusions hypercontractives. In Séminaire de probabilités, XIX, 1983/84 (Lecture Notes in Math., Vol. 1123). Springer, Berlin, 177–206. https://doi.org/10.1007/BFb0075847

[3]

Ainesh Bakshi, Ilias Diakonikolas, Samuel B. Hopkins, Daniel Kane, Sushrut Karmalkar, and Pravesh K. Kothari. 2020. Outlier-Robust Clustering of Gaussians and Other Non-Spherical Mixtures. In 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16-19, 2020, Sandy Irani (Ed.). IEEE, 149–159. https://doi.org/10.1109/FOCS46700.2020.00023

[4]

Ainesh Bakshi and Pravesh Kothari. 2020. Outlier-Robust Clustering of Non-Spherical Mixtures. CoRR, abs/2005.02970 (2020), arXiv:2005.02970. arxiv:2005.02970

[5]

Ainesh Bakshi and Pravesh K. Kothari. 2021. List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, Dániel Marx (Ed.). SIAM, 1279–1297. https://doi.org/10.1137/1.9781611976465.78

[6]

Peter L Bartlett. 2014. UC Berkeley CS281B/Stat241B: Statistical Learning Theory, Lecture 7. URL: https://www.stat.berkeley.edu/ bartlett/courses/2014spring-cs281bstat241b/lectures/07-notes.pdf

[7]

Peter L Bartlett, Stéphane Boucheron, and Gábor Lugosi. 2002. Model selection and error estimation. Machine Learning, 48, 1 (2002), 85–113.

Digital Library

[8]

Peter L Bartlett, Olivier Bousquet, and Shahar Mendelson. 2005. Local rademacher complexities. The Annals of Statistics, 33, 4 (2005), 1497–1537.

[9]

Peter L Bartlett, Philip M Long, Gábor Lugosi, and Alexander Tsigler. 2020. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117, 48 (2020), 30063–30070.

[10]

Peter L Bartlett and Shahar Mendelson. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, Nov (2002), 463–482.

[11]

Peter L Bartlett, Andrea Montanari, and Alexander Rakhlin. 2021. Deep learning: a statistical viewpoint. Acta numerica, 30 (2021), 87–201.

[12]

Louay MJ Bazzi. 2009. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput., 38, 6 (2009), 2220–2272.

Digital Library

[13]

Mikhail Belkin. 2021. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30 (2021), 203–248.

[14]

Gyora M Benedek and Alon Itai. 1991. Learnability with respect to fixed distributions. Theoretical Computer Science, 86, 2 (1991), 377–389.

Digital Library

[15]

Lucien Birgé and Pascal Massart. 1997. From model selection to adaptive estimation. In Festschrift for lucien le cam. Springer, 55–87.

[16]

Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2003. Introduction to statistical learning theory. In Summer school on machine learning. 169–207.

[17]

Mark Braverman. 2010. Polylogarithmic independence fools AC^0 circuits. Journal of the ACM (JACM), 57, 5 (2010), 1–10.

Digital Library

[18]

Nader H Bshouty and Christino Tamon. 1996. On the Fourier spectrum of monotone functions. Journal of the ACM (JACM), 43, 4 (1996), 747–770.

Digital Library

[19]

Amit Daniely. 2016. Complexity theoretic limitations on learning halfspaces. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing. 105–117.

Digital Library

[20]

Amit Daniely and Shai Shalev-Shwartz. 2016. Complexity theoretic limitations on learning DNF’s. In Conference on Learning Theory. 815–830.

[21]

Anindya De, Shivam Nadimpalli, and Rocco Servedio. 2022. Convex Influences. Innovations in Theoretical Computer Science.

[22]

Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A Servedio, and Emanuele Viola. 2010. Bounded independence fools halfspaces. SIAM J. Comput., 39, 8 (2010), 3441–3462.

Digital Library

[23]

Ilias Diakonikolas, Daniel Kane, and Nikos Zarifis. 2020. Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals. Advances in Neural Information Processing Systems, 33 (2020), 13586–13596.

[24]

Ilias Diakonikolas, Daniel M Kane, and Jelani Nelson. 2010. Bounded independence fools degree-2 threshold functions. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. 11–20.

Digital Library

[25]

Ilias Diakonikolas, Daniel M Kane, Thanasis Pittas, and Nikos Zarifis. 2021. The optimality of polynomial regression for agnostic learning under gaussian marginals in the SQ model. In Conference on Learning Theory. 1552–1584.

[26]

Vitaly Feldman, Venkatesan Guruswami, Prasad Raghavendra, and Yi Wu. 2012. Agnostic learning of monomials by halfspaces is hard. SIAM J. Comput., 41, 6 (2012), 1558–1590.

Digital Library

[27]

Yu R Gabovich. 1981. Stability of the characterization of the multivariate normal distribution in the Skitovich-Darmois theorem. Journal of Soviet Mathematics, 16, 5 (1981), 1341–1349.

[28]

Surbhi Goel, Aravind Gollakota, and Adam Klivans. 2020. Statistical-query lower bounds via functional gradients. Advances in Neural Information Processing Systems, 33 (2020), 2147–2158.

[29]

Venkatesan Guruswami and Prasad Raghavendra. 2009. Hardness of learning halfspaces with noise. SIAM J. Comput., 39, 2 (2009), 742–765.

Digital Library

[30]

Prahladh Harsha and Srikanth Srinivasan. 2019. On polynomial approximations to AC^0. Random Structures & Algorithms, 54, 2 (2019), 289–303.

[31]

Trevor Hastie, Andrea Montanari, Saharon Rosset, and Ryan J Tibshirani. 2022. Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50, 2 (2022), 949–986.

[32]

Misha Ivkov and Pravesh K. Kothari. 2022. List-decodable covariance estimation. In STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, Stefano Leonardi and Anupam Gupta (Eds.). ACM, 1276–1283. https://doi.org/10.1145/3519935.3520006

Digital Library

[33]

Adam Tauman Kalai, Adam R Klivans, Yishay Mansour, and Rocco A Servedio. 2008. Agnostically learning halfspaces. SIAM J. Comput., 37, 6 (2008), 1777–1805.

Digital Library

[34]

Daniel Kane, Adam Klivans, and Raghu Meka. 2013. Learning halfspaces under log-concave densities: Polynomial approximations and moment matching. In Conference on Learning Theory. 522–545.

[35]

Daniel M Kane. 2011. The Gaussian surface area and noise sensitivity of degree-d polynomial threshold functions. computational complexity, 20, 2 (2011), 389–412.

[36]

Sushrut Karmalkar, Adam R. Klivans, and Pravesh Kothari. 2019. List-decodable Linear Regression. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 7423–7432. https://proceedings.neurips.cc/paper/2019/hash/7f5fc754c7af0a6370c9bf91314e79f4-Abstract.html

[37]

Michael Kearns and Leslie Valiant. 1994. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM (JACM), 41, 1 (1994), 67–95.

Digital Library

[38]

Michael J Kearns, Robert E Schapire, and Linda M Sellie. 1992. Toward efficient agnostic learning. In Proceedings of the fifth annual workshop on Computational learning theory. 341–352.

Digital Library

[39]

LB Klebanov and ST Rachev. 1996. Proximity of probability measures with common marginals in a finite number of directions. Lecture Notes-Monograph Series, 162–174.

[40]

Adam Klivans and Raghu Meka. 2013. Moment-matching polynomials. arXiv preprint arXiv:1301.0820.

[41]

Adam R Klivans, Ryan O’Donnell, and Rocco A Servedio. 2008. Learning geometric concepts via Gaussian surface area. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science. 541–550.

Digital Library

[42]

Adam R Klivans and Alexander A Sherstov. 2009. Cryptographic hardness for learning intersections of halfspaces. J. Comput. System Sci., 75, 1 (2009), 2–12.

Digital Library

[43]

Vladimir Koltchinskii. 2001. Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47, 5 (2001), 1902–1914.

Digital Library

[44]

Vladimir Koltchinskii. 2006. Local Rademacher complexities and oracle inequalities in risk minimization. The Annals of Statistics, 34, 6 (2006), 2593–2656.

[45]

Vladimir Koltchinskii and Dmitriy Panchenko. 2000. Rademacher processes and bounding the risk of function learning. In High dimensional probability II. Springer, 443–457.

[46]

Pravesh K Kothari and Roi Livni. 2018. Improper learning by refuting. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018).

[47]

Gil Kur and Alexander Rakhlin. 2021. On the Minimal Error of Empirical Risk Minimization. In Conference on Learning Theory. 2849–2852.

[48]

Michel Ledoux. 2001. The Concentration of Measure Phenomenon (Mathematical surveys and monographs). American Mathematical Society.

[49]

Nathan Linial, Yishay Mansour, and Noam Nisan. 1993. Constant depth circuits, Fourier transform, and learnability. Journal of the ACM (JACM), 40, 3 (1993), 607–620.

Digital Library

[50]

Vaishnavh Nagarajan and J Zico Kolter. 2019. Uniform convergence may be unable to explain generalization in deep learning. Advances in Neural Information Processing Systems, 32 (2019).

[51]

Svetlozar T Rachev, Lev B Klebanov, Stoyan V Stoyanov, and Frank Fabozzi. 2013. The methods of distances in the theory of probability and statistics. 10, Springer.

[52]

Prasad Raghavendra and Morris Yau. 2020. List Decodable Learning via Sum of Squares. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5-8, 2020, Shuchi Chawla (Ed.). SIAM, 161–180. https://doi.org/10.1137/1.9781611975994.10

[53]

Prasad Raghavendra and Morris Yau. 2020. List Decodable Subspace Recovery. arxiv:2002.03004.

[54]

Herbert Robbins. 1955. A remark on Stirling’s formula. The American mathematical monthly, 62, 1 (1955), 26–29.

[55]

Ronitt Rubinfeld and Arsen Vasilyan. 2022. Testing distributional assumptions of learning algorithms. arXiv preprint arXiv:2204.07196v1, April.

[56]

Ronitt Rubinfeld and Arsen Vasilyan. 2022. Testing distributional assumptions of learning algorithms (v2). Private communication.

[57]

Adrien Saumard and Jon A Wellner. 2014. Log-concavity and strong log-concavity: a review. Statistics surveys, 8 (2014), 45.

[58]

Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.

Digital Library

[59]

Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. 2010. Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11 (2010), 2635–2670.

Digital Library

[60]

Alexander Shapiro. 2001. On duality theory of conic linear problems. In Semi-infinite programming. Springer, 135–165.

[61]

Avishay Tal. 2017. Tight bounds on the Fourier spectrum of AC^0. In 32nd Computational Complexity Conference (CCC 2017).

Digital Library

[62]

A.B. Tsybakov. 2008. Introduction to Nonparametric Estimation. Springer New York. isbn:9780387790527 lccn:2008939894

Digital Library

[63]

Salil Vadhan. 2017. On learning vs. refutation. In Conference on Learning Theory. 1835–1848.

[64]

Sara A Van de Geer. 2000. Empirical Processes in M-estimation. 6, Cambridge university press.

[65]

V.N. Vapnik. 1998. Statistical Learning Theory. Wiley. isbn:9780471030034 lccn:97037075

[66]

Roman Vershynin. 2018. High-dimensional probability: An introduction with applications in data science. 47, Cambridge university press.

[67]

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM, 64, 3 (2021), 107–115.

Digital Library

[68]

Vladimir Mikhailovich Zolotarev. 1984. Probability metrics. Theory of Probability & Its Applications, 28, 2 (1984), 278–302.

Cited By

Index Terms

A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory

Recommendations

A Characterization of List Learnability
STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

A classical result in learning theory shows the equivalence of PAC learnability of binary hypothesis classes and the finiteness of VC dimension. Extending this to the multiclass setting was an open problem, which was settled in a recent breakthrough ...
The complexity of properly learning simple concept classes

We consider the complexity of properly learning concept classes, i.e. when the learner must output a hypothesis of the same form as the unknown concept. We present the following new upper and lower bounds on well-known concept classes:*We show that ...
Efficient learning algorithms yield circuit lower bounds

We describe a new approach for understanding the difficulty of designing efficient learning algorithms. We prove that the existence of an efficient learning algorithm for a circuit class C in Angluin's model of exact learning from membership and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

STOC 2023: Proceedings of the 55th Annual ACM Symposium on Theory of Computing

June 2023

1926 pages

ISBN:9781450399135

DOI:10.1145/3564246

General Chair:
Barna Saha
University of California at San Diego, USA
,
Program Chair:
Rocco A. Servedio
Columbia University, USA

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

STOC '23

Sponsor:

SIGACT

STOC '23: 55th Annual ACM Symposium on Theory of Computing

June 20 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

Upcoming Conference

STOC '25

Sponsor:
sigact

57th Annual ACM Symposium on Theory of Computing (STOC 2025)

June 23 - 27, 2025

Prague , Czech Republic

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
348
Total Downloads

Downloads (Last 12 months)183
Downloads (Last 6 weeks)26

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents