[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

PAC Analogues of Perceptron and Winnow Via Boosting the Margin

Published: 01 May 2002 Publication History

Abstract

We describe a novel family of PAC model algorithms for learning linear threshold functions. The new algorithms work by boosting a simple weak learner and exhibit sample complexity bounds remarkably similar to those of known online algorithms such as Perceptron and Winnow, thus suggesting that these well-studied online algorithms in some sense correspond to instances of boosting. We show that the new algorithms can be viewed as natural PAC analogues of the online p-norm algorithms which have recently been studied by Grove, Littlestone, and Schuurmans (1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171–183) and Gentile and Littlestone (1999, Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1–11). As special cases of the algorithm, by taking p = 2 and p = ∞ we obtain natural boosting-based PAC analogues of Perceptron and Winnow respectively. The p = ∞ case of our algorithm can also be viewed as a generalization (with an improved sample complexity bound) of Jackson and Craven's PAC-model boosting-based algorithm for learning “sparse perceptrons” (Jackson & Craven, 1996, Advances in neural information processing systems 8, MIT Press). The analysis of the generalization error of the new algorithms relies on techniques from the theory of large margin classification.

References

[1]
Alon, N., Spencer, J., & Erdos, P. (1992). The probabilistic method. New York: Wiley-Interscience.
[2]
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319-342.
[3]
Auer, P., & Warmuth, M. (1995). Tracking the best disjunction. In Proceedings of the 36th Symposium on Foundations of Computer Science (pp. 312-321).
[4]
Bartlett, P., & Shawe-Taylor, J. (1999). Generalization performance of support vector machines and other pattern classifiers. In B. Scholkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel methods--Support vector learning. Cambridge, MA: MIT Press.
[5]
Baum, E. (1990). The perceptron algorithm is fast for nonmalicious distributions. Neural Computation, 2:2, 248-260.
[6]
Blum, A., Frieze, A., Kannan, R., & Vempala, S. (1996). A polynomial time algorithm for learning noisy linear threshold functions. In Proceedings of the 37th Symposium on Foundations of Computer Science (pp. 330-338).
[7]
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36:4, 929-965.
[8]
Bylander, T. (1998). Worst-case analysis of the perceptron and exponentiated update algorithms. Artificial Intelligence, 106:2, 335-352.
[9]
Cohen, E. (1997). Learning noisy perceptrons by a perceptron in polynomial time. In Proceedings of the 38th Symposium on Foundations of Computer Science (pp. 514-523).
[10]
Freund, Y. (1992). An improved boosting algorithm and its implications on learning complexity. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (pp. 391-398).
[11]
Freund, Y. (1995). Boosting a weak learning algorithm by majority, Information and Computation, 121:2, 256-285.
[12]
Freund, Y., & Schapire, R. (1996). Game theory, on-line prediction and boosting. In Proceedings of the Ninth Annual Conference on Computational Learning Theory (pp. 325-332).
[13]
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119-139.
[14]
Freund, Y., & Schapire, R. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37:3, 277-296.
[15]
Gentile, C., & Littlestone, N. (1999). The robustness of the p-norm algorithms. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1-11).
[16]
Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171-183).
[17]
Goldmann, M., Håstad, J., & Razborov, R. (1992). Majority gates vs. general weighted threshold gates. Computational Complexity, 2, 277-300.
[18]
Haussler, D. (1988). Space efficient learning algorithms. Technical Report UCSC-CRL-88-2, University of California, Santa Cruz.
[19]
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13-30.
[20]
Jackson, J., & Craven, M. (1996). Learning sparse perceptrons. Advances in neural information processing systems 8, Cambridge, MA: MIT Press.
[21]
Ji, C., & Ma, S. (1997). Combinations of weak classifiers. IEEE Transactions on Neural Networks, 8:1, 32-42.
[22]
Kearns, M., Li, M., Pitt, L., & Valiant, L. (1987). Recent results on boolean concept learning. In Proceedings of the Fourth International Workshop on Machine Learning (pp. 337-352).
[23]
Kearns, M., & Mansour, Y. (1996). On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the 28th Symposium on Theory of Computing (pp. 459-468).
[24]
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning boolean formulae and finite automata. J. ACM, 41:1, 67-95.
[25]
Kivinen, J., Warmuth, M., & Auer, P. (1997). The perceptron algorithm versus winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:1/2, 325-343.
[26]
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285-318.
[27]
Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. Ph.D. thesis, Technical Report UCSC-CRL-89-11, University of California, Santa Cruz.
[28]
Littlestone, N. (1989). From online to batch learning. In Proceedings of the Second Annual Workshop on Computational Learning Theory (pp. 269-284).
[29]
Littlestone, N. (1991). Redundant Noisy attributes, attribute errors, and linear-threshold learning using winnow. In Proceedings of the Fourth Annual Conference on Computational Learning Theory (pp. 147-156).
[30]
Long, P. (1994). Halfspace learning, linear programming, and nonmalicious distributions. Information Processing Letters, 51, 245-250.
[31]
Maass, W., & Turan, G. (1994). How fast can a threshold gate learn? In S. J. Hanson, G. Drastal, & R. Rivest (Eds.), Computational learning theory and natural learning systems: Volume I: Constraints and prospects. Cambridge, MA: MIT Press.
[32]
Schapire, R. (1990). The strength of weak learnability. Machine Learning, 5:2, 197-227.
[33]
Schapire, R. (1999). Drifting games. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 114-124).
[34]
Servedio, R. (1999). On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 296-307).
[35]
Schapire, R., Freund, Y., Bartlett, P, & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics, 26:5, 1651-1686.
[36]
Schapire, R., & Singer, Y. (1998). improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (pp. 80-91).
[37]
Schmitt, M. (1998). Identification criteria and lower bounds for Perceptron-like learning rules. Neural Computation, 10, 235-250.
[38]
Taylor, A., & Mann, W. (1972). Advanced calculus. New York: John Wiley & Sons.
[39]
Vapnik, V. (1998). Statistical learning theory. New York: John Wiley & Sons.

Cited By

View all
  • (2022)Reproducibility in learningProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3519973(818-831)Online publication date: 9-Jun-2022
  • (2009)Learning Halfspaces with Malicious NoiseThe Journal of Machine Learning Research10.5555/1577069.175587710(2715-2740)Online publication date: 1-Dec-2009
  • (2009)Learn++.NCIEEE Transactions on Neural Networks10.1109/TNN.2008.200832620:1(152-168)Online publication date: 1-Jan-2009

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 47, Issue 2-3
May-June 2002
163 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 May 2002

Author Tags

  1. boosting
  2. linear threshold functions
  3. probably approximately correct learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Reproducibility in learningProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3519973(818-831)Online publication date: 9-Jun-2022
  • (2009)Learning Halfspaces with Malicious NoiseThe Journal of Machine Learning Research10.5555/1577069.175587710(2715-2740)Online publication date: 1-Dec-2009
  • (2009)Learn++.NCIEEE Transactions on Neural Networks10.1109/TNN.2008.200832620:1(152-168)Online publication date: 1-Jan-2009

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media