Abstract
We prove new lower bounds for learning intersections of halfspaces, one of the most important concept classes in computational learning theory. Our main result is that any statistical-query algorithm for learning the intersection of \(\sqrt{n}\) halfspaces in n dimensions must make \(2^{\varOmega (\sqrt{n})}\) queries. This is the first non-trivial lower bound on the statistical query dimension for this concept class (the previous best lower bound was n Ω(log n)). Our lower bound holds even for intersections of low-weight halfspaces. In the latter case, it is nearly tight.
We also show that the intersection of two majorities (low-weight halfspaces) cannot be computed by a polynomial threshold function (PTF) with fewer than n Ω(log n/log log n) monomials. This is the first super-polynomial lower bound on the PTF length of this concept class, and is nearly optimal. For intersections of k=ω(log n) low-weight halfspaces, we improve our lower bound to \(\min\{2^{\varOmega (\sqrt{n})},n^{\varOmega (k/\log k)}\},\) which too is nearly optimal. As a consequence, intersections of even two halfspaces are not computable by polynomial-weight PTFs, the most expressive class of functions known to be efficiently learnable via Jackson’s Harmonic Sieve algorithm. Finally, we report our progress on the weak learnability of intersections of halfspaces under the uniform distribution.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alekhnovich, M., Braverman, M., Feldman, V., Klivans, A., & Pitassi, T. (2004). Learnability and automatizability. In Proceedings of the 45th annual symposium on foundations of computer science (FOCS).
Aspnes, J., Beigel, R., Furst, M. L., & Rudich, S. (1994). The expressive power of voting polynomials. Combinatorica, 14(2), 135–148.
Beigel, R., Reingold, N., & Spielman, D. A. (1995). PP is closed under intersection. Journal of Computer and System Sciences, 50(2), 191–202.
Blum, A., Frieze, A., Kannan, R., & Vempala, S. (1997). A polynomial time algorithm for learning noisy linear threshold functions. Algorithmica, 22(1/2), 35–52.
Blum, A., Furst, M., Jackson, J., Kearns, M., Mansour, Y., & Rudich, S. (1994). Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th annual ACM symposium on theory of computing (STOC) (pp. 253–262). New York: ACM.
Blum, A., Kalai, A., & Wasserman, H. (2003). Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM, 50(4), 506–519.
Bruck, J. (1990). Harmonic analysis of polynomial threshold functions. SIAM Journal on Discrete Mathematics, 3(2), 168–177.
Bshouty, N. H., & Tamon, C. (1996). On the Fourier spectrum of monotone functions. Journal of the ACM, 43(4), 747–770.
Feldman, V., Gopalan, P., Khot, S., & Ponnuswami, A. K. (2006). New results for learning noisy parities and halfspaces. In Proceedings of the 47th annual symposium on foundations of computer science (FOCS) (pp. 563–574).
Jackson, J. C. (1995). The harmonic sieve: a novel application of Fourier analysis to machine learning theory and practice. PhD thesis, Carnegie Mellon University.
Kearns, M. (1993). Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25th annual ACM symposium on theory of computing (STOC) (pp. 392–401). New York: ACM.
Klivans, A., O’Donnell, R., & Servedio, R. (2004). Learning intersections and thresholds of halfspaces. Journal of Computer and System Sciences, 68(4), 808–840.
Klivans, A., & Servedio, R. (2004). Learning intersections of halfspaces with a margin. In Proceedings of the 17th annual conference on learning theory (pp. 348–362).
Klivans, A. R., & Sherstov, A. A. (2006). Cryptographic hardness for learning intersections of halfspaces. In Proceedings of the 47th annual symposium on foundations of computer science (FOCS) (pp. 553–562), October 2006.
Krause, M., & Pudlák, P. (1997). On the computational power of depth-2 circuits with threshold and modulo gates. Theoretical Computer Science, 174(1–2), 137–156.
Kushilevitz, E., & Mansour, Y. (1993). Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6), 1331–1348.
Kwek, S., & Pitt, L. (1998). PAC learning intersections of halfspaces with membership queries. Algorithmica, 22(1/2), 53–75.
Macwilliams, F. J., & Sloane, N. J. A. (1977). The theory of error correcting codes. Amsterdam: North-Holland.
Minsky, M. L., & Papert, S. A. (1988). Perceptrons: expanded edition. Cambridge: MIT.
O’Donnell, R. (2003). Computational applications of noise sensitivity. PhD thesis, Massachusetts Institute of Technology.
O’Donnell, R., & Servedio, R. A. (2003). New degree bounds for polynomial threshold functions. In Proceedings of the 25th annual ACM symposium on theory of computing (STOC) (pp. 325–334). New York: ACM.
O’Neil, P. (1971). Hyperplane cuts of an n-cube. Discrete Mathematics, 1(2), 193–195.
Saks, M. E. (1993). Slicing the hypercube. Surveys in Combinatorics, 1993, 211–255.
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Vempala, S. (1997). A random sampling based algorithm for learning the intersection of halfspaces. In Proceedings of the 38th annual symposium on foundations of computer science (FOCS) (pp. 508–513).
Yang, K. (2005). New lower bounds for statistical query learning. Journal of Computer and System Sciences, 70(4), 485–509.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Hans Ulrich Simon, Gabor Lugosi, Avrim Blum.
Rights and permissions
About this article
Cite this article
Klivans, A.R., Sherstov, A.A. Unconditional lower bounds for learning intersections of halfspaces. Mach Learn 69, 97–114 (2007). https://doi.org/10.1007/s10994-007-5010-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5010-1