Learning classification trees

Wray Buntine¹

1255 Accesses
211 Citations
3 Altmetric
Explore all metrics

Abstract

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Bahl, L., Brown, P., de Souza, P. and Mercer, R. (1989) A tree-based language model for natural language speech recognition.IEEE Transactions on Acoustics, Speech and Signal Processing,37, 1001–1008.
Google Scholar
Berger, J. O. (1985)Statistical Decision Theory and Bayesian Analysis, Springer-Verlag, New York.
Google Scholar
Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984)Classification and Regression Trees, Wadsworth, Belmont.
Google Scholar
Buntine, W. (1991a) Some experiments with learning classification trees. Technical report, NASA Ames Research Center. In preparation.
Buntine, W. (1991b) A theory of learning classification rules. PhD thesis. University of Technology, Sydney.
Google Scholar
Buntine, W. and Caruana, R. (1991) Introduction to IND and recursive partitioning. Technical Report FIA-91-28, RIACS and NASA Ames Research Center, Moffett Field, CA.
Google Scholar
Buntine, W. and Weigend, A. (1991) Bayesian back-propagation.Complex Systems,5, 603–643.
Google Scholar
Carter, C. and Catlett, J. (1987) Assessing credit card applications using machine learning.IEEE Expert,2, 71–79.
Google Scholar
Catlett, J. (1991) Megainduction: machine learning on very large databases. PhD thesis, University of Sydney.
Cestnik, B., Kononeko, I. and Bratko, I. (1987) ASSISTANT86: A knowledge-elicitation tool for sophisticated users, inProgress in Machine Learning: Proceedings of EWSL-87, Bratko, I. and Lavrač, N. (eds), Sigma Press, Wilmslow, pp. 31–45.
Google Scholar
Chou, P. (1991) Optimal partitioning for classification and regression trees.IEEE Transactions on Pattern Analysis and Machine Intelligence,13.
Clark, P. and Niblett, T. (1989) The CN2 induction algorithm.Machine Learning,3, 261–283.
Google Scholar
Crawford, S. (1989) Extensions to the CART algorithm.International Journal of Man-Machine Studies,31, 197–217.
Google Scholar
Henrion, M. (1990) Towards efficient inference in multiply connected belief networks, inInfluence Diagrams, Belief Nete and Decision Analysis, Oliver, R. and Smith, J. (eds), Wiley, New York, pp. 385–407.
Google Scholar
Kwok, S. and Carter, C. (1990) Multiple decision trees, inUncertainty in Artificial Intelligence 4, Schachter, R., Levitt, T., Kanal, L. and Lemmer, J. (eds), North-Holland, Amsterdam.
Google Scholar
Lee, P. (1989)Bayesian Statistics: An Introduction, Oxford University Press, New York.
Google Scholar
Michie, D., Bain, M. and Hayes-Michie, J. (1990) Cognitive models from subcognitive skills, inKnowledge-based Systems for Industrial Control, McGhee, J., Grimble, M. and Mowforth, P. (eds), Stevenage: Peter Peregrinus.
Google Scholar
Mingers, J. (1989a) An empirical comparison of pruning methods for decision-tree induction.Machine Learning,4, 227–243.
Google Scholar
Mingers, J. (1989b) An empirical comparison of selection measures for decision-tree induction.Machine Learning,3, 319–342.
Google Scholar
Pagallo, G. and Haussler, D. (1990) Boolean feature discovery in empirical learning.Machine Learning,5, 71–99.
Google Scholar
Press, S. (1989)Bayesian Statistics, Wiley, New York.
Google Scholar
Quinlan, J. (1986) Induction of decision trees.Machine Learning,1, 81–106.
Google Scholar
Quinlan, J. (1988) Simplifying decision trees, inKnowledge Acquisition for Knowledge-Based Systems, Gaines, B. and Boose, J. (eds), Academic Press, London, pp. 239–252.
Google Scholar
Quinlan, J., Compton, P., Horn, K. and Lazarus, L. (1987) Inductive knowledge acquisitions: A case study, inApplications of Expert Systems, Quinlan, J. (ed.). Addison-Wesley, London.
Google Scholar
Quinlan, J. and Rivest, R. (1989) Inferring decision trees using the minimum description length principle.Information and Computation,80, 227–248.
Google Scholar
Ripley, B. (1987) An introduction to statistical pattern recognition, inInteractions in Artificial Intelligence and Statistical Methods, Unicom, Gower Technical Press, Aldershot, pp. 176–187.
Google Scholar
Rissanen, J. (1989)Stochastic Complexity in Statistical Enquiry, World Scientific, Section 7.2.
Rodriguez, C. (1990) Objective Bayesianism and geometry, inMaximum Entropy and Bayesian Methods, Fougère, P. (ed.), Kluwer, Dordrecht.
Google Scholar
Stewart, L. (1987). Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models.The Statistician,36, 211–219.
Google Scholar
Utgoff, P. (1989). Incremental induction of decision trees.Machine Learning,4, 161–186.
Google Scholar
Wallace, C. and Patrick, J. (1991). Coding decision trees. Technical Report 151, Monash University, Melbourne, submitted toMachine Learning.
Weiss, S., Galen, R. and Tadepalli, P. (1990) Maximizing the predictive value of production rules.Artificial Intelligence,45, 47–71.
Google Scholar

Download references

Author information

Authors and Affiliations

RIACS & NASA Ames Research Center, Mail Stop 269-2, 94035, Moffett Field, CA, USA
Wray Buntine

Authors

Wray Buntine
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Buntine, W. Learning classification trees. Stat Comput 2, 63–73 (1992). https://doi.org/10.1007/BF01889584

Download citation

Received: 15 January 1991
Accepted: 15 November 1991
Issue Date: June 1992
DOI: https://doi.org/10.1007/BF01889584

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The voice of optimization

Decision Tree

Building semi-supervised decision trees with semi-cart algorithm

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Learning classification trees

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The voice of optimization

Decision Tree

Building semi-supervised decision trees with semi-cart algorithm

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation