Abstract
This article reports an empirical investigation of the accuracy of rules that classify examples on the basis of a single attribute. On most datasets studied, the best of these very simple rules is as accurate as the rules induced by the majority of machine learning systems. The article explores the implications of this finding for machine learning research and applications.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha, D.W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799). San Mateo, CA: Morgan Kaufmann.
Bergandano, F., Matwin, S., Michalski, R.S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The Poseidon system. Machine Learning, 8, 5–44.
Buntine, W. (1989). Learning classification rules using Bayes. in A Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 94–98). San Mateo, CA: Morgan Kaufmann.
Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–86.
Catlett, J. (1991a). Megainduction: A test flight. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 596–599). San Mateo, CA: Morgan Kaufmann.
Catlett, J. (1991b). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp: 164–178). Springer-Verlag.
Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In Y. Kodratoff (Ed.) Machine Learning—EWSL-91 (pp. 138–150). Springer-Verlag.
Cestnik, G., Konenenko, I., & Bratko, I. (1987). Assistant-86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavrac (Eds.), Progress in Machine Learning (pp. 31–45). Wilmslow, England: Sigma Press.
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp. 151–163). Springer-Verlag.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
Clark, P., & Niblett, T. (1987). Induction in noisy domains. In I. Bratko & N. Lavrac (Eds.), Progress in machine learning (pp. 11–30). Wilmslow, England: Sigma Press.
de la Maza, M. (1991). A prototype based symbolic concept learning system. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 41–45). San Mateo, CA: Morgan Kaufmann.
Diaconis, P., & Efron, B. (1983). Computer-intensive methods in statistics. Scientific American, 248.
Fisher, D.H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.
Fisher, D.H. & McKusick, K.B. (1989). An empirical comparison of ID3 and back-propagation. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). San Mateo, CA: Morgan Kaufmann.
Fisher, D.H., & Schlimmer, J.C. (1988). Concept simplification and prediction accuracy. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 22–28). San Mateo, CA: Morgan Kaufmann.
Hirsh, H. (1990). Learning from data with bounded inconsistency. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 32–39). San Mateo, CA: Morgan Kaufmann.
Holder, L.B., Jr., (1991). Maintaining the utility of learned knowledge using model-based adaptive control. Ph.D. thesis, Computer Science Department, University of Illinois at Urbana-Champaign.
Holte, R.C., Acker, L., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Mateo, CA: Morgan Kaufmann.
Iba, W.F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.) Proceedings of the Ninth International Conference on Machine Learning (pp. 233–240). San Mateo, CA: Morgan Kaufmann.
Iba, W., Wogulis, J., & Langley, P. (1988). Trading off simplicity and coverage in incremental concept learning. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 73–79). San Mateo, CA: Morgan Kaufmann.
Jensen, D. (1992). Induction with randomization testing: Decision-oriented analysis of large data sets. Ph.D. thesis, Washington University, St. Louis, Missouri.
Kibler, D., & Aha, D.W. (1988). Comparing instance-averaging with instance-filtering learning algorithms. In D. Sleeman (Ed.), EWSL88: Proceedings of the 3rd European Working Session on Learning (pp. 63–69). Pitman.
Lopez de Mantaras, R. (1991). A Distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.
McLeish, M., & Cecile, M. (1990). Enhancing medical expert systems with knowledge obtained from statistical data. Annals of Mathematics and Artificial Intelligence, 2, 261–276.
Michalski, R.S. (1990). Learning flexible concepts: fundamental ideas and a method based on two-tiered representation. In Y. Kodratoff & R.S. Michalski (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 3). San Mateo, CA: Morgan Kaufmann.
Michalski, R.S., & Chilausky, R.L. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4 (2), 125–161.
Michalski, R.S., Mozetic, I., Hong, J., & Lavrac, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 1041–1045). San Mateo, CA: Morgan Kaufmann.
Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 (2), 227–243.
Quinlan, J.R. (1989). Unknown attribute values in induction. In A. Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 164–168). San Mateo, CA: Morgan Kaufmann.
Quinlan, J.R. (1987). Generating production rules from decision trees. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307). San Mateo, CA: Morgan Kaufmann.
Quinlan, J.R. (1986). Induction of decision trees, Machine Learning, 1, 81–106.
Quinlan, J.R., Compton, P.J., Horn, K.A., & Lazurus, L. (1986). Inductive knowledge acquisition: a case study. Proceedings of the Second Australian Conference on Applications of Expert Systems. Sydney, Australia.
Rendell, L., & Seshu, R. (1990). Learning hard concepts through constructive induction. Computational Intelligence, 6, 247–270.
Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.
Saxena, S. (1989). Evaluating alternative instance representations. In A. Segre (Ed.), Proceedings of the Sixth International Conference on Machine Learning (pp. 465–468). San Mateo, CA: Morgan Kaufmann.
Schaffer, C. (in press). Overfitting avoidance as bias. Machine Learning.
Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. Proceedings of AAAI-92, the Tenth National Conference on Artificial Intelligence.
Schlimmer, J.S. (1987). Concept acquisition through representational adjustment (Technical Report 87-19). Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine.
Schoenauer, M., & Sebag, M. (1990). Incremental learning of rules and meta-rules. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 49–57). San Mateo, CA: Morgan Kaufmann.
Shapiro, A.D. (1987). Structured induction of expert systems. Reading, MA: Addison-Wesley.
Shavlik, J., Mooney, R.J., & Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6, 111–143.
Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 121–134). San Mateo, CA: Morgan Kaufmann.
Tan, M., & Schlimmer, J. (1990). Two case studies in cost-sensitive concept acquisition. Proceedings ofAAAI-90, the Eighth National Conference on Artificial Intelligence (pp. 854–860). Cambridge, MA: MIT Press.
Utgoff, P.E., & Bradley, C.E. (1990). An incremental method for finding multivariate splits for decision trees. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 58–65). San Mateo, CA: Morgan Kaufmann.
Weiss, S.M., Galen, R.S., & Tadepalli, P.V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence, 45, 47–71.
Weiss, S.M., & Kapouleas, I. (1990). An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 781–787). San Mateo, CA: Morgan Kaufmann.
Wirth, J., & Catlett, J. (1988). Experiments on the costs and benefits of windowing in IDS. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 87–99). San Mateo, CA: Morgan Kaufmann.
Yeung, D.-Y. (1991). A neural network approach to constructive induction. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 228–232). San Mateo, CA: Morgan Kaufmann.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Holte, R.C. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–90 (1993). https://doi.org/10.1023/A:1022631118932
Issue Date:
DOI: https://doi.org/10.1023/A:1022631118932