Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Robert C. Holte¹

11k Accesses
1191 Citations
8 Altmetric
Explore all metrics

Abstract

This article reports an empirical investigation of the accuracy of rules that classify examples on the basis of a single attribute. On most datasets studied, the best of these very simple rules is as accurate as the rules induced by the majority of machine learning systems. The article explores the implications of this finding for machine learning research and applications.

Article PDF

Classification

Machine Learning and Data Mining

Rule Set

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aha, D.W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Bergandano, F., Matwin, S., Michalski, R.S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The Poseidon system. Machine Learning, 8, 5–44.
Google Scholar
Buntine, W. (1989). Learning classification rules using Bayes. in A Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 94–98). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–86.
Google Scholar
Catlett, J. (1991a). Megainduction: A test flight. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 596–599). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Catlett, J. (1991b). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp: 164–178). Springer-Verlag.
Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In Y. Kodratoff (Ed.) Machine Learning—EWSL-91 (pp. 138–150). Springer-Verlag.
Cestnik, G., Konenenko, I., & Bratko, I. (1987). Assistant-86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavrac (Eds.), Progress in Machine Learning (pp. 31–45). Wilmslow, England: Sigma Press.
Google Scholar
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp. 151–163). Springer-Verlag.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
Google Scholar
Clark, P., & Niblett, T. (1987). Induction in noisy domains. In I. Bratko & N. Lavrac (Eds.), Progress in machine learning (pp. 11–30). Wilmslow, England: Sigma Press.
Google Scholar
de la Maza, M. (1991). A prototype based symbolic concept learning system. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 41–45). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Diaconis, P., & Efron, B. (1983). Computer-intensive methods in statistics. Scientific American, 248.
Fisher, D.H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.
Google Scholar
Fisher, D.H. & McKusick, K.B. (1989). An empirical comparison of ID3 and back-propagation. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Fisher, D.H., & Schlimmer, J.C. (1988). Concept simplification and prediction accuracy. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 22–28). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Hirsh, H. (1990). Learning from data with bounded inconsistency. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 32–39). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Holder, L.B., Jr., (1991). Maintaining the utility of learned knowledge using model-based adaptive control. Ph.D. thesis, Computer Science Department, University of Illinois at Urbana-Champaign.
Holte, R.C., Acker, L., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Iba, W.F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.) Proceedings of the Ninth International Conference on Machine Learning (pp. 233–240). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Iba, W., Wogulis, J., & Langley, P. (1988). Trading off simplicity and coverage in incremental concept learning. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 73–79). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Jensen, D. (1992). Induction with randomization testing: Decision-oriented analysis of large data sets. Ph.D. thesis, Washington University, St. Louis, Missouri.
Kibler, D., & Aha, D.W. (1988). Comparing instance-averaging with instance-filtering learning algorithms. In D. Sleeman (Ed.), EWSL88: Proceedings of the 3rd European Working Session on Learning (pp. 63–69). Pitman.
Lopez de Mantaras, R. (1991). A Distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.
Google Scholar
McLeish, M., & Cecile, M. (1990). Enhancing medical expert systems with knowledge obtained from statistical data. Annals of Mathematics and Artificial Intelligence, 2, 261–276.
Google Scholar
Michalski, R.S. (1990). Learning flexible concepts: fundamental ideas and a method based on two-tiered representation. In Y. Kodratoff & R.S. Michalski (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 3). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Michalski, R.S., & Chilausky, R.L. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4 (2), 125–161.
Google Scholar
Michalski, R.S., Mozetic, I., Hong, J., & Lavrac, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 1041–1045). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 (2), 227–243.
Google Scholar
Quinlan, J.R. (1989). Unknown attribute values in induction. In A. Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 164–168). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J.R. (1987). Generating production rules from decision trees. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J.R. (1986). Induction of decision trees, Machine Learning, 1, 81–106.
Google Scholar
Quinlan, J.R., Compton, P.J., Horn, K.A., & Lazurus, L. (1986). Inductive knowledge acquisition: a case study. Proceedings of the Second Australian Conference on Applications of Expert Systems. Sydney, Australia.
Rendell, L., & Seshu, R. (1990). Learning hard concepts through constructive induction. Computational Intelligence, 6, 247–270.
Google Scholar
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.
Google Scholar
Saxena, S. (1989). Evaluating alternative instance representations. In A. Segre (Ed.), Proceedings of the Sixth International Conference on Machine Learning (pp. 465–468). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Schaffer, C. (in press). Overfitting avoidance as bias. Machine Learning.
Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. Proceedings of AAAI-92, the Tenth National Conference on Artificial Intelligence.
Schlimmer, J.S. (1987). Concept acquisition through representational adjustment (Technical Report 87-19). Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine.
Schoenauer, M., & Sebag, M. (1990). Incremental learning of rules and meta-rules. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 49–57). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Shapiro, A.D. (1987). Structured induction of expert systems. Reading, MA: Addison-Wesley.
Google Scholar
Shavlik, J., Mooney, R.J., & Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6, 111–143.
Google Scholar
Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 121–134). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Tan, M., & Schlimmer, J. (1990). Two case studies in cost-sensitive concept acquisition. Proceedings ofAAAI-90, the Eighth National Conference on Artificial Intelligence (pp. 854–860). Cambridge, MA: MIT Press.
Google Scholar
Utgoff, P.E., & Bradley, C.E. (1990). An incremental method for finding multivariate splits for decision trees. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 58–65). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Weiss, S.M., Galen, R.S., & Tadepalli, P.V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence, 45, 47–71.
Google Scholar
Weiss, S.M., & Kapouleas, I. (1990). An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 781–787). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Wirth, J., & Catlett, J. (1988). Experiments on the costs and benefits of windowing in IDS. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 87–99). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Yeung, D.-Y. (1991). A neural network approach to constructive induction. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 228–232). San Mateo, CA: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Ottawa, Ottawa, Canada, KIN 6N5
Robert C. Holte

Authors

Robert C. Holte
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Holte, R.C. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–90 (1993). https://doi.org/10.1023/A:1022631118932

Download citation

Issue Date: April 1993
DOI: https://doi.org/10.1023/A:1022631118932

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Abstract

Article PDF

Similar content being viewed by others

Classification

Machine Learning and Data Mining

Rule Set

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Abstract

Article PDF

Similar content being viewed by others

Classification

Machine Learning and Data Mining

Rule Set

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation