More Web Proxy on the site http://driver.im/

article

PAC Analogues of Perceptron and Winnow Via Boosting the Margin

Author:

R. ServedioAuthors Info & Claims

Machine Learning, Volume 47, Issue 2-3

Pages 133 - 151

https://doi.org/10.1023/A:1013633619373

Published: 01 May 2002 Publication History

Abstract

We describe a novel family of PAC model algorithms for learning linear threshold functions. The new algorithms work by boosting a simple weak learner and exhibit sample complexity bounds remarkably similar to those of known online algorithms such as Perceptron and Winnow, thus suggesting that these well-studied online algorithms in some sense correspond to instances of boosting. We show that the new algorithms can be viewed as natural PAC analogues of the online p-norm algorithms which have recently been studied by Grove, Littlestone, and Schuurmans (1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171–183) and Gentile and Littlestone (1999, Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1–11). As special cases of the algorithm, by taking p = 2 and p = ∞ we obtain natural boosting-based PAC analogues of Perceptron and Winnow respectively. The p = ∞ case of our algorithm can also be viewed as a generalization (with an improved sample complexity bound) of Jackson and Craven's PAC-model boosting-based algorithm for learning “sparse perceptrons” (Jackson & Craven, 1996, Advances in neural information processing systems 8, MIT Press). The analysis of the generalization error of the new algorithms relies on techniques from the theory of large margin classification.

References

[1]

Alon, N., Spencer, J., & Erdos, P. (1992). The probabilistic method. New York: Wiley-Interscience.

[2]

Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319-342.

Digital Library

[3]

Auer, P., & Warmuth, M. (1995). Tracking the best disjunction. In Proceedings of the 36th Symposium on Foundations of Computer Science (pp. 312-321).

Digital Library

[4]

Bartlett, P., & Shawe-Taylor, J. (1999). Generalization performance of support vector machines and other pattern classifiers. In B. Scholkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods--Support vector learning. Cambridge, MA: MIT Press.

Digital Library

[5]

Baum, E. (1990). The perceptron algorithm is fast for nonmalicious distributions. Neural Computation, 2:2, 248-260.

Digital Library

[6]

Blum, A., Frieze, A., Kannan, R., & Vempala, S. (1996). A polynomial time algorithm for learning noisy linear threshold functions. In Proceedings of the 37th Symposium on Foundations of Computer Science (pp. 330-338).

Digital Library

[7]

Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36:4, 929-965.

Digital Library

[8]

Bylander, T. (1998). Worst-case analysis of the perceptron and exponentiated update algorithms. Artificial Intelligence, 106:2, 335-352.

Digital Library

[9]

Cohen, E. (1997). Learning noisy perceptrons by a perceptron in polynomial time. In Proceedings of the 38th Symposium on Foundations of Computer Science (pp. 514-523).

Digital Library

[10]

Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (pp. 391-398).

Digital Library

[11]

Freund, Y. (1995). Boosting a weak learning algorithm by majority, Information and Computation, 121:2, 256-285.

Digital Library

[12]

Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory (pp. 325-332).

Digital Library

[13]

Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119-139.

Digital Library

[14]

Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37:3, 277-296.

Digital Library

[15]

Gentile, C., & Littlestone, N. (1999). The robustness of the p-norm algorithms. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1-11).

Digital Library

[16]

Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171-183).

Digital Library

[17]

Goldmann, M., Håstad, J., & Razborov, R. (1992). Majority gates vs. general weighted threshold gates. Computational Complexity, 2, 277-300.

[18]

Haussler, D. (1988). Space efficient learning algorithms. Technical Report UCSC-CRL-88-2, University of California, Santa Cruz.

[19]

Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13-30.

[20]

Jackson, J., & Craven, M. (1996). Learning sparse perceptrons. Advances in neural information processing systems 8, Cambridge, MA: MIT Press.

[21]

Ji, C., & Ma, S. (1997). Combinations of weak classifiers. IEEE Transactions on Neural Networks, 8:1, 32-42.

Digital Library

[22]

Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). Recent results on boolean concept learning. In Proceedings of the Fourth International Workshop on Machine Learning (pp. 337-352).

[23]

Kearns, M., & Mansour, Y. (1996). On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the 28th Symposium on Theory of Computing (pp. 459-468).

Digital Library

[24]

Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning boolean formulae and finite automata. J. ACM, 41:1, 67-95.

Digital Library

[25]

Kivinen, J., Warmuth, M., & Auer, P. (1997). The perceptron algorithm versus winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:1/2, 325-343.

Digital Library

[26]

Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285-318.

[27]

Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. Ph.D. thesis, Technical Report UCSC-CRL-89-11, University of California, Santa Cruz.

Digital Library

[28]

Littlestone, N. (1989). From online to batch learning. In Proceedings of the Second Annual Workshop on Computational Learning Theory (pp. 269-284).

Digital Library

[29]

Littlestone, N. (1991). Redundant Noisy attributes, attribute errors, and linear-threshold learning using winnow. In Proceedings of the Fourth Annual Conference on Computational Learning Theory (pp. 147-156).

Digital Library

[30]

Long, P. (1994). Halfspace learning, linear programming, and nonmalicious distributions. Information Processing Letters, 51, 245-250.

Digital Library

[31]

Maass, W., & Turan, G. (1994). How fast can a threshold gate learn? In S. J. Hanson, G. Drastal, & R. Rivest (Eds.), Computational learning theory and natural learning systems: Volume I: Constraints and prospects. Cambridge, MA: MIT Press.

Digital Library

[32]

Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5:2, 197-227.

Digital Library

[33]

Schapire, R. (1999). Drifting games. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 114-124).

Digital Library

[34]

Servedio, R. (1999). On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 296-307).

Digital Library

[35]

Schapire, R., Freund, Y., Bartlett, P, & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26:5, 1651-1686.

[36]

Schapire, R., & Singer, Y. (1998). improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (pp. 80-91).

Digital Library

[37]

Schmitt, M. (1998). Identification criteria and lower bounds for Perceptron-like learning rules. Neural Computation, 10, 235-250.

Digital Library

[38]

Taylor, A., & Mann, W. (1972). Advanced calculus. New York: John Wiley & Sons.

[39]

Vapnik, V. (1998). Statistical learning theory. New York: John Wiley & Sons.

Cited By

Impagliazzo RLei RPitassi TSorrell JLeonardi SGupta A(2022)Reproducibility in learningProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3519973(818-831)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519935.3519973
Klivans ALong PServedio R(2009)Learning Halfspaces with Malicious NoiseThe Journal of Machine Learning Research10.5555/1577069.175587710(2715-2740)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1577069.1755877
Muhlbaier MTopalis APolikar R(2009)Learn++.NCIEEE Transactions on Neural Networks10.1109/TNN.2008.200832620:1(152-168)Online publication date: 1-Jan-2009
https://dl.acm.org/doi/10.1109/TNN.2008.2008326

Index Terms

PAC Analogues of Perceptron and Winnow Via Boosting the Margin

Recommendations

Perceptron, Winnow, and PAC Learning

We analyze the performance of the widely studied Perceptron and Winnow algorithms for learning linear threshold functions under Valiant's probably approximately correct (PAC) model of concept learning. We show that under the uniform distribution on ...
Learning noisy perceptrons by a perceptron in polynomial time
FOCS '97: Proceedings of the 38th Annual Symposium on Foundations of Computer Science

Learning perceptrons (linear threshold functions) from labeled examples is an important problem in machine learning. We consider the problem where labels are subjected to random classification noise. The problem was known to be PAC learnable via a ...
Exploring learnability between exact and PAC
Special issue on COLT 2002

We study a model of probably exactly correct (PExact) learning that can be viewed either as the Exact model (learning from equivalence queries only) relaxed so that counterexamples to equivalence queries are distributionally drawn rather than ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 47, Issue 2-3

May-June 2002

163 pages

ISSN:0885-6125

Issue’s Table of Contents

Copyright © Copyright © 2002 Kluwer Academic Publishers.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2002

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Impagliazzo RLei RPitassi TSorrell JLeonardi SGupta A(2022)Reproducibility in learningProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3519973(818-831)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519935.3519973
Klivans ALong PServedio R(2009)Learning Halfspaces with Malicious NoiseThe Journal of Machine Learning Research10.5555/1577069.175587710(2715-2740)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1577069.1755877
Muhlbaier MTopalis APolikar R(2009)Learn++.NCIEEE Transactions on Neural Networks10.1109/TNN.2008.200832620:1(152-168)Online publication date: 1-Jan-2009
https://dl.acm.org/doi/10.1109/TNN.2008.2008326

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents