The Value of Agreement, a New Boosting Algorithm

Boaz Leskes²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3559))

Included in the following conference series:

International Conference on Computational Learning Theory

3698 Accesses
8 Citations

Abstract

We present a new generalization bound where the use of unlabeled examples results in a better ratio between training-set size and the resulting classifier’s quality and thus reduce the number of labeled examples necessary for achieving it. This is achieved by demanding from the algorithms generating the classifiers to agree on the unlabeled examples. The extent of this improvement depends on the diversity of the learners—a more diverse group of learners will result in a larger improvement whereas using two copies of a single algorithm gives no advantage at all. As a proof of concept, we apply the algorithm, named AgreementBoost, to a web classification problem where an up to 40% reduction in the number of labeled examples is obtained.

A full version of this paper is available on http://www.illc.uva.nl/Publications/ ResearchReports/MoL-2005-02.text.pdf

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Boosting conditional probability estimators

Article 13 June 2015

Constrained Naïve Bayes with application to unbalanced data classification

Article Open access 20 October 2021

Large-width machine learning algorithm

Article 05 August 2020

References

Meir, R., Rätsch, G.: An Introduction to Boosting and Leveraging. Advanced lectures on machine learning, 118–183 (ISBN:3-540-00529-3)
Google Scholar
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian Complexities: Risk bounds and Structural Results. The Journal of Machine Learning Research 3, 463–482 (2003)
Article MATH MathSciNet Google Scholar
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics 30(1) (February 2002)
Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. In: Machine Learning: Proceedings of the Fourteenth Fourteenth International Conference (1997)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, ACM, New York (1998)
Google Scholar
Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36(4) (1989)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Learning to classify text from labeled and unlabeled documents. In: Proc. of the 5^th National Conference on Artificial Intelligence, AAAI Press, Menlo Park (1998)
Google Scholar
Luenberger, D.: Introduction to Linear and Nonlinear Programming. Addison-Wesley publishing company, Reading (1973) (ISBN 0-201-04347-5)
MATH Google Scholar
Zhang, T., Oles, F.: A probability analysis on the value of unlabeled data for classification problems. In: Proc. of the Int. Conference on Machine Learning (2000)
Google Scholar
Park, S.-B., Zhang, B.-T.: Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information. Information Processing and Management: an International Journal 40(3) (2004)
Google Scholar
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proc. of the 9^th int. conference on Information and knowledge management (2000)
Google Scholar
Goldman, S., Zhou, Y.: Enhancing supervised learning with unlabeled data. In: International Joint Conference on Machine Learning (2000)
Google Scholar
Hwa, R., Osborne, M., Sarkar, A., Steedman, M.: Corrected Co-training for Statistical Parsers. In: Proc. of the Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. International Conference of Machine Learning, Washington D.C (2003)
Google Scholar
Pierce, D., Cardie, C.: Limitations of Co-Training for Natural Language Learning from Large Datasets. In: Proc. of the Conference on Empirical Methods in Natural Language Processing (2001)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Dasgupta, S., Littman, M.L., McAllester, D.A.: PAC Generalization Bounds for Co-training. Advances in Neural Information Processing Systems 14 (2001)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1) (1997)
Google Scholar
Mason, L., Baxter, J., Bartlett, P.L., Frean, M.: Boosting algorithms as gradient descent in function space. Technical report, RSISE, Australian National University (1999)
Google Scholar
Bennett, K.P., Demiriz, A., Maclin, R.: Exploiting unlabeled data in ensemble methods. In: Proceedings of the eighth ACM SIGKDD int. conference on Knowledge discovery and data mining (2002)
Google Scholar
Rätsch, G., Mika, S., Warmuth, M.K.: On the convergence of leveraging. NeuroCOLT2 Technical Report 98, Royal Holloway College, London (August 2001)
Google Scholar
Levin, A., Viola, P., Freund, Y.: Unsupervised Improvement of Visual Detectors using Co-Training. In: Int. Conference on Computer Vision (ICCV), Nice, France (October 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

ILLC, University of Amsterdam, Plantage Muidergracht 24, 1018 TV, Amsterdam
Boaz Leskes

Authors

Boaz Leskes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Leoben, A-8700, Leoben, Austria
Peter Auer
Department of Electrical Engineering, Technion, P.O. Box, 3200, Haifa, Israel
Ron Meir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leskes, B. (2005). The Value of Agreement, a New Boosting Algorithm. In: Auer, P., Meir, R. (eds) Learning Theory. COLT 2005. Lecture Notes in Computer Science(), vol 3559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11503415_7

Download citation

DOI: https://doi.org/10.1007/11503415_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26556-6
Online ISBN: 978-3-540-31892-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Value of Agreement, a New Boosting Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

Boosting conditional probability estimators

Constrained Naïve Bayes with application to unbalanced data classification

Large-width machine learning algorithm

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Value of Agreement, a New Boosting Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

Boosting conditional probability estimators

Constrained Naïve Bayes with application to unbalanced data classification

Large-width machine learning algorithm

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation