An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Download PDF

Thomas G. Dietterich¹

21k Accesses
17 Altmetric
1 Mention
Explore all metrics

Abstract

Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decision-tree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.

Article PDF

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

Article 26 July 2018

An Empirical Methodology to Analyze the Behavior of Bagging

Voting and Bagging

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ali, K. M. (1995).Acomparison of methods for learning and combining evidence from multiple models. Technical Report 95–47, Department of Information and Computer Science, University of California, Irvine.
Ali, K. M. & Pazzani, M. J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24(3), 173–202.
Google Scholar
Bauer, E. & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1/2), 105–139.
Google Scholar
Breiman, L. (1994). Heuristics of instability and stabilization in model selection. Technical Report 416, Department of Statistics, University of California, Berkeley, CA.
Google Scholar
Breiman, L. (1996a). Bagging predictors. Machine Learning, 24(2), 123–140.
Google Scholar
Breiman, L. (1996b). Bias, variance, and arcing classifiers. Technical Report 460, Department of Statistics, University of California, Berkeley, CA.
Google Scholar
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.
Google Scholar
Dietterich, T. G. & Kong, E. B. (1995). Machine learning bias, statistical bias, and statistical variance of decision tree algorithms.Technical Report, Department of Computer Science, Oregon State University, Corvallis, Oregon. Available from ftp://ftp.cs.orst.edu/pub/tgd/papers/tr-bias.ps.gz.
Google Scholar
Freund, Y. & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning (pp. 148–146). Morgan Kaufmann.
Kohavi, R. & Kunz, C. (1997). Option decision trees with majority votes. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 161–169). San Francisco, CA: Morgan Kaufman.
Google Scholar
Kohavi, R., Sommerfield, D., & Dougherty, J. (1997). Data mining using MLC₊₊, a machine learning library in C₊₊. International Journal on Artificial Intelligence Tools, 6(4), 537–566.
Google Scholar
Maclin, R. & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 546–551). Cambridge, MA: AAAI Press/MIT Press.
Google Scholar
Margineantu, D. D. & Dietterich, T. G. (1997). Pruning adaptive boosting. In Proc. 14th International Conference on Machine Learning (pp. 211–218). Morgan Kaufmann.
Merz, C. J. & Murphy, P. M. (1996). UCI repository of machine learning databases. http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Quinlan, J. R. (1993). C4.5: Programs for empirical learning. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730). Cambridge, MA: AAAI Press/MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Oregon State University, Corvallis, OR, 97331, USA
Thomas G. Dietterich

Authors

Thomas G. Dietterich
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dietterich, T.G. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157 (2000). https://doi.org/10.1023/A:1007607513941

Download citation

Issue Date: August 2000
DOI: https://doi.org/10.1023/A:1007607513941

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Abstract

Article PDF

Similar content being viewed by others

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

An Empirical Methodology to Analyze the Behavior of Bagging

Voting and Bagging

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Abstract

Article PDF

Similar content being viewed by others

On the Interpretation of Ensemble Classifiers in Terms of Bayes Classifiers

An Empirical Methodology to Analyze the Behavior of Bagging

Voting and Bagging

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation