[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

ROC 'n' rule learning: towards a better understanding of covering algorithms

Published: 01 January 2005 Publication History

Abstract

This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in coverage space, a variant of ROC space. Our results show that most commonly used metrics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2's significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil's MDL-based encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, pre-pruning heuristics are not very well understood and deserve more investigation.

References

[1]
Bradley, A. P. (1996). ROC curves and the χ2 test. Pattern Recognition Letters, 17, 287-294.]]
[2]
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Pacific Grove, CA: Wadsworth & Brooks.]]
[3]
Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27:4, 349-370.]]
[4]
Cestnik, B. (1990). Estimating probabilities: A crucial task in Machine Learning. In L. Aiello (Ed.), Proceedings of the 9th European Conference on Artificial Intelligence (ECAI-90) (pp. 147-150). Stockholm, Sweden: Pitman.]]
[5]
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff (Ed.), Proceedings of the 5th European Working Session on Learning (EWSL-91) (pp. 151-163), Vol. 482 of Lecture Notes in Artificial Intelligence. Porto, Portugal: Springer-Verlag.]]
[6]
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3:4, 261-283.]]
[7]
Cohen, W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (ML-95) (pp. 115-123). Lake Tahoe, CA: Morgan Kaufmann.]]
[8]
Cohen, W., & Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99) (pp. 335-342). Orlando, FL: AAAI/MIT Press.]]
[9]
De Raedt, L., & Van Laer, W. (1995). Inductive constraint logic. In K. Jantke, T. Shinohara, & T. Zeugmann (Eds.), Proceedings of the 5th Workshop on Algorithmic Learning Theory (ALT-95) (pp. 80-94), Vol. 997 of Lecture Notes in Artificial Intelligence. Fukuoka, Japan: Springer-Verlag.]]
[10]
Džeroski, S., & Bratko, I. (1992). Handling noise in Inductive Logic Programming. In S. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd International Workshop on Inductive Logic Programming (ILP-92 ) (pp. 109-125). Tokyo, Japan.]]
[11]
Ferri, C., Flach, P., & Hernáández, J. (2002). Learning decision trees using the area under the ROC curve. In C. Sammut, & A. Hoffmann (Eds.), Proceedings of the 19th International Conference on Machine Learning (ICML-02) (pp. 139-146). Sydney, Australia, Morgan Kaufmann.]]
[12]
Flach, P. (2003). The geometry of ROC space: Using ROC isometrics to understand machine learning metrics. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 194-201). Washington, DC: AAAI Press.]]
[13]
Flach, P., & Wu, S. (2003). Repairing concavities in ROC curves. In J. Rossiter & T. Martin (Eds.), Proceedings of the 2003 UK Workshop on Computational Intelligence (pp. 38-44). University of Bristol.]]
[14]
Fürnkranz, J. (1994). FOSSIL: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Proceedings of the 7th European Conference on Machine Learning (ECML-94) (pp. 122-137), Vol. 784 of Lecture Notes in Artificial Intelligence. Catania, Italy: Springer-Verlag.]]
[15]
Fürnkranz, J. (1997). Pruning algorithms for rule learning, Machine Learning, 27:2, 139-171.]]
[16]
Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13:1, 3-54.]]
[17]
Fürnkranz, J. (2002). Round robin classification. Journal of Machine Learning Research, 2, 721-747.]]
[18]
Fürnkranz, J., & Flach, P. (2003). An analysis of rule evaluation metrics. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 202-209). Washington, DC: AAAI Press.]]
[19]
Fürnkranz, J., & Flach, P. (2004). An analysis of stopping and filtering criteria for rule learning. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Proceedings of the 14th European Conference on Machine Learning (ECML-04), Vol. 3201 of Lecture Notes in Artificial Intelligence. Pisa, Italy: Springer-Verlag.]]
[20]
Fürnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In W. Cohen & H. Hirsh (Eds.), Proceedings of the 11 th International Conference on Machine Learning (ML-94), (pp. 70-77). New Brunswick, NJ: Morgan Kaufmann.]]
[21]
Gamberger, D., & Lavračč, N. (2002). Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 17, 501-527.]]
[22]
Giordana, A., & Sale, C. (1992). Learning structured concepts using genetic algorithms. In D. Sleeman & Edwards, P. (Eds.), Proceedings of the 9th International Workshop on Machine Learning (ML-92) (pp. 169-178). Edinburgh, United Kingdom, Morgan Kaufmann.]]
[23]
Jovanoski, V., & Lavračč, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil and A. Jorge (Eds.), Proceedings of the 1Oth Portuguese Conference on Artificial Intelligence (EPIA-O1) (pp. 44-51), Vol. 2258 of Lecture Notes in Artificial Intelligence. Porto, Portugal: Springer-Verlag.]]
[24]
Kaufman, K., & Michalski, R. (2000). An adjustable rule learner for pattern discovery using the AQ methodology. Journal of Intelligent Information Systems, 14:2/3, 199-216.]]
[25]
Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining (pp. 249-271). AAAI Press.]]
[26]
Lavrač, N., Flach, P., & Zupan, B. (1999). Rule evaluation measures: A unifying view. In S. Džžeroski & P. Flach (Eds.), Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP-99) (pp. 174-185), Vol. 1634 of Lecture Notes in Artificial Intelligence. Bled, Slovenia: Springer-Verlag.]]
[27]
Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153-188.]]
[28]
Lavrač, N., Cestnik, B., & Džžeroski, S. (1992a). Search heuristics in empirical Inductive Logic Programming. In Logical Approaches to Machine Learning. Workshop Notes of the 10th European Conference on AI. Vienna, Austria.]]
[29]
Lavrač, N., Cestnik, B., & Džžeroski, S. (1992b). Use of heuristics in empirical Inductive Logic Programming. In S. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd International Workshop on Inductive Logic Programming (ILP-92). Tokyo, Japan.]]
[30]
Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 80-86). AAAI Press.]]
[31]
Liu, B., Ma, Y., & Wong, C.-K. (2000). Improving an exhaustive search based rule learner. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-00), (pp. 504-509), Vol. 2258 of Lecture Notes in Artificial Intelligence. Lyon, France: Springer-Verlag.]]
[32]
Michalski, R. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th International Symposium on Information Processing (FCIP-69) (pp. 125-128), Vol. A3 (Switching Circuits). Bled, Yugoslavia.]]
[33]
Mittenecker, E. (1983). Planung und statistische Auswertung von Experimenten, 10th edition. Vienna, Austria: Verlag Franz Deuticke. (In German)]]
[34]
Mladenić, D. (1993). Combinatorial optimization in inductive concept learning. In Proceedings of the 10th International Conference on Machine Learning (ML-93). (pp. 205-211). Amherst, MA: Morgan Kaufmann.]]
[35]
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13:3/4,245-286.]]
[36]
Oates, T., & Jensen, D. (1998). Large data sets lead to overly complex models: An explanation and a solution. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 294-298). AAAI Press.]]
[37]
Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5:1, 71-99.]]
[38]
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42:3, 203-231.]]
[39]
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1, 81-106.]]
[40]
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5:3, 239-266.]]
[41]
Quinlan, J. R., & Cameron-Jones, R. (1995). Induction of logic programs: FOIL and related systems. New Generation Computing, 13:3/4, 287-312.]]
[42]
Swets, J., Dawes, R., & Monahan, J. (2000). Better decisions through science. Scientific American, 283:4, 82-87.]]
[43]
Todorovski, L., Flach, P., & Lavračč, N. (2000). Predictive performance of weighted relative accuracy. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-00) (pp. 255-264), Vol. 2258 of Lecture Notes in Artificial Intelligence. Lyon, France: Springer-Verlag.]]
[44]
van Rijsbergen, C. (1979). Information retrieval, 2nd edition. London, UK: Butterworths.]]
[45]
Vilalta, R., & Oblinger, D. (2000). A quantification of distance-bias between evaluation metrics in classification. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (ICML-00), (pp. 1087-1094). Stanford, CA: Morgan Kaufmann.]]
[46]
Webb, G. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 5, 431-465.]]
[47]
Weiss, S., & Indurkhya, N. (1991). Reduced complexity rule induction. In J. Mylopoulos, & R. Reiter (Eds.), Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-91) (pp. 678-684). Sydney, Australia: Morgan Kaufmann.]]

Cited By

View all
  • (2019)Extreme value correctionMachine Language10.1007/s10994-018-5731-3108:2(297-329)Online publication date: 1-Feb-2019
  • (2018)RDF shape induction using knowledge base profilingProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167341(1952-1959)Online publication date: 9-Apr-2018
  • (2017)Visualizing the behavior and some symmetry properties of Bayesian confirmation measuresData Mining and Knowledge Discovery10.1007/s10618-016-0487-531:3(739-773)Online publication date: 1-May-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language
Machine Language  Volume 58, Issue 1
January 2005
89 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 January 2005

Author Tags

  1. ROC analysis
  2. coverage space
  3. inductive rule learning
  4. rule evaluation metrics

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Extreme value correctionMachine Language10.1007/s10994-018-5731-3108:2(297-329)Online publication date: 1-Feb-2019
  • (2018)RDF shape induction using knowledge base profilingProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167341(1952-1959)Online publication date: 9-Apr-2018
  • (2017)Visualizing the behavior and some symmetry properties of Bayesian confirmation measuresData Mining and Knowledge Discovery10.1007/s10618-016-0487-531:3(739-773)Online publication date: 1-May-2017
  • (2016)Tuning for software analyticsInformation and Software Technology10.1016/j.infsof.2016.04.01776:C(135-146)Online publication date: 1-Aug-2016
  • (2015)To tune or not to tuneData Mining and Knowledge Discovery10.1007/s10618-013-0339-529:1(237-272)Online publication date: 1-Jan-2015
  • (2015)Efficient algorithms for finding optimal binary features in numeric and nominal labeled dataKnowledge and Information Systems10.1007/s10115-013-0714-y42:2(465-492)Online publication date: 1-Feb-2015
  • (2014)A Covering Classification Rule Induction Approach for Big DatasetsProceedings of the 2014 IEEE/ACM International Symposium on Big Data Computing10.1109/BDC.2014.17(45-53)Online publication date: 8-Dec-2014
  • (2014)Behavior-based clustering and analysis of interestingness measures for association rule miningData Mining and Knowledge Discovery10.1007/s10618-013-0326-x28:4(1004-1045)Online publication date: 1-Jul-2014
  • (2013)Redefinition of Decision Rules Based on the Importance of Elementary Conditions EvaluationFundamenta Informaticae10.5555/2594874.2594877123:2(171-197)Online publication date: 1-Apr-2013
  • (2013)CHIRA---Convex Hull Based Iterative Algorithm of Rules AggregationFundamenta Informaticae10.5555/2594874.2594876123:2(143-170)Online publication date: 1-Apr-2013
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media