More Web Proxy on the site http://driver.im/

Article

Exploiting unlabeled data in ensemble methods

Authors:

Kristin P. Bennett,

Richard MaclinAuthors Info & Claims

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 289 - 296

https://doi.org/10.1145/775047.775090

Published: 23 July 2002 Publication History

Abstract

An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudolabeled data. Mathematically, this intuitive algorithm corresponds to maximizing the classification margin in hypothesis space as measured on both the labeled and unlabeled of data. Unlike alternative approaches, ASSEMBLE does not require a semi-supervised learning method for the base classifier. ASSEMBLE can be used in conjunction with any cost-sensitive classification algorithm for both two-class and multi-class problems. ASSEMBLE using decision trees won the NIPS 2001 Unlabeled Data Competition. In addition, strong results on several benchmark datasets using both decision trees and neural networks support the proposed method.

References

[1]

E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36:105--139, 1999.]]

Digital Library

[2]

K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In D. C. M. Kearns, S. Solla, editor, Advances in Neural Information Processing Systems 11, pages 368--374, Cambridge, MA, 1999. MIT Press.]]

Digital Library

[3]

C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.]]

[4]

A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998.]]

Digital Library

[5]

R. Caruana, V. de Sa, M. Kearns, and A. McCallum. Integrating supervised and unsupervised learning, 1998. http://www.cs.cmu.edu/~mccallum/supunsup/.]]

[6]

F. d'Alché Buc, Y. Grandvalet, and C. Ambroise. Semi-supervised marginboost. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14. MIT Press, 2002.]]

[7]

A. Demiriz, K. P. Bennett, and M. J. Embrechts. A genetic algorithm approach for semi-supervised clustering. Journal of Smart Engineering System Design, 4:35--44, 2002. Taylor & Francis.]]

[8]

Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148--156, 1996.]]

Digital Library

[9]

G. Fung and O. L. Mangasarian. Semi-supervised support vector machines for unlabeled data classification. Optimization Methods and Software, 15:29--44, 2001.]]

[10]

T. Graepel, R. Herbrich, and K. Obermayer. Using unlabeled data for supervised learning, 1999. http://stat.cs.tu-berlin.de/nips99/.]]

[11]

Y. Grandvalet, F. d'Alché Buc, and C. Ambroise. Boosting mixture models for semi-supervised learning. In G. Dorffner, H. Bischof, and K. Hornik, editors, ICANN 2001, pages 41--48. LNCS 2130, Springer-Verlag, 2001.]]

Digital Library

[12]

A. J. Grove and D. Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In AAAI/IAAI, pages 692--699, 1998.]]

Digital Library

[13]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, New York, 2001.]]

[14]

S. C. Kremer and D. A. Stacey. Competition: Unlabeled data for supervised learning, 2001. http://q.cis.uoguelph.ca/~skremer/NIPS2001/.]]

[15]

S. C. Kremer, D. A. Stacey, and K. P. Bennett. Unlabeled data supervised learning competition, 2000. http://q.cis.uoguelph.ca/~skremer/NIPS2000/.]]

[16]

L. Mason, P. Bartlett, J. Baxter, and M. Frean. Functional gradient techniques for combining hypotheses. In B. Schölkopf, A. Smola, P. Bartlett, and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 2000.]]

[17]

K. Nigam, A. McCallum, S. Thrum, and T. Mitchell. Using EM to classify test from labeled and unlabeled documents. Machine Learning, 39:2:103--134, 2000.]]

Digital Library

[18]

D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169--198, 1999.]]

Digital Library

[19]

G. Rätsch. Benchmark datasets, 1998. http://ida.first.gmd.de/~raetsch/data/benchmarks.htm.]]

[20]

W. N. Street and Y. Kim. An ensemble method for large-scale classification. In Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), 2001.]]

Digital Library

[21]

T. Therneau and B. Atkinson. Rpart: Recursive partitioning software, February 2000. Available at http://www.mayo.edu/hsr/Sfunc.html.]]

[22]

V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.]]

Digital Library

Cited By

Gu XAngelov PShen Q(2024)Semisupervised Fuzzily Weighted Adaptive Boosting for ClassificationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.334963732:4(2318-2330)Online publication date: Apr-2024
https://doi.org/10.1109/TFUZZ.2024.3349637
Deng LZhao CDu ZXia KWu D(2024)Semisupervised Transfer Boosting (SS-TrBoosting)IEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33505435:7(3431-3444)Online publication date: Jul-2024
https://doi.org/10.1109/TAI.2024.3350543
de Wynter A(2024)An algorithm for learning representations of models with scarce dataInformation Geometry10.1007/s41884-024-00153-07:2(489-521)Online publication date: 30-Oct-2024
https://doi.org/10.1007/s41884-024-00153-0
Show More Cited By

Recommendations

Exploiting unlabeled data to enhance ensemble diversity

Ensemble learning learns from the training data by generating an ensemble of multiple base learners. It is well-known that to construct a good ensemble with strong generalization ability, the base learners are deemed to be accurate as well as diverse. ...
Semi-supervised multi-class Adaboost by exploiting unlabeled data

Research highlights We propose a semi-supervised learning method by using the multi-class boosting. It handles K-class classification without reducing into multiple two-class problems. The classification accuracy of base classifier requires only 1/K or ...
Exploiting Unlabeled Data to Enhance Ensemble Diversity
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data Mining

Ensemble learning aims to improve generalization ability by using multiple base learners. It is well-known that to construct a good ensemble, the base learners should be accurate as well as diverse. In this paper, unlabeled data is exploited to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

July 2002

719 pages

ISBN:158113567X

DOI:10.1145/775047

Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada

Copyright © 2002 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2002

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD02

Sponsor:

KDD02: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 23 - 26, 2002

Alberta, Edmonton, Canada

Acceptance Rates

KDD '02 Paper Acceptance Rate 44 of 307 submissions, 14%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

111
Total Citations
View Citations
1,289
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gu XAngelov PShen Q(2024)Semisupervised Fuzzily Weighted Adaptive Boosting for ClassificationIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.334963732:4(2318-2330)Online publication date: Apr-2024
https://doi.org/10.1109/TFUZZ.2024.3349637
Deng LZhao CDu ZXia KWu D(2024)Semisupervised Transfer Boosting (SS-TrBoosting)IEEE Transactions on Artificial Intelligence10.1109/TAI.2024.33505435:7(3431-3444)Online publication date: Jul-2024
https://doi.org/10.1109/TAI.2024.3350543
de Wynter A(2024)An algorithm for learning representations of models with scarce dataInformation Geometry10.1007/s41884-024-00153-07:2(489-521)Online publication date: 30-Oct-2024
https://doi.org/10.1007/s41884-024-00153-0
Majumder SChakraborty JMenzies T(2024)When less is more: on the value of “co-training” for semi-supervised software defect predictorsEmpirical Software Engineering10.1007/s10664-023-10418-429:2Online publication date: 24-Feb-2024
https://doi.org/10.1007/s10664-023-10418-4
Rodriguez Dominguez AShahzad MHong X(2023)Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses PredictionSSRN Electronic Journal10.2139/ssrn.4558983Online publication date: 2023
https://doi.org/10.2139/ssrn.4558983
Boukir S(2023)Uncertainty-driven ensemble classification exploiting unlabeled dataKnowledge-Based Systems10.1016/j.knosys.2023.111007(111007)Online publication date: Sep-2023
https://doi.org/10.1016/j.knosys.2023.111007
Keren Evangeline IAngeline Kirubha SGlory Precious J(2022)Prediction of Breast Cancer Recurrence in Five Years using Machine Learning Techniques and SHAPIntelligent Computing Techniques for Smart Energy Systems10.1007/978-981-19-0252-9_40(441-453)Online publication date: 14-Jun-2022
https://doi.org/10.1007/978-981-19-0252-9_40
Kojima HKaneko NIto SSumi K(2022)Multimodal Pseudo-Labeling Under Various Shooting Conditions: Case Study on RGB and IR ImagesFrontiers of Computer Vision10.1007/978-3-031-06381-7_9(127-140)Online publication date: 17-May-2022
https://doi.org/10.1007/978-3-031-06381-7_9
Dou YMeng W(2021)An Optimization Algorithm for Computer-Aided Diagnosis of Breast Cancer Based on Support Vector MachineFrontiers in Bioengineering and Biotechnology10.3389/fbioe.2021.6983909Online publication date: 5-Jul-2021
https://doi.org/10.3389/fbioe.2021.698390
Farajzadeh-Zanjani MHallaji ERazavi-Far RSaif MParvania M(2021)Adversarial Semi-Supervised Learning for Diagnosing Faults and Attacks in Power GridsIEEE Transactions on Smart Grid10.1109/TSG.2021.306139512:4(3468-3478)Online publication date: Jul-2021
https://doi.org/10.1109/TSG.2021.3061395
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents