More Web Proxy on the site http://driver.im/

article

ROC 'n' rule learning: towards a better understanding of covering algorithms

Authors:

Johannes Fürnkranz,

Peter A. FlachAuthors Info & Claims

Machine Learning, Volume 58, Issue 1

Pages 39 - 77

Published: 01 January 2005 Publication History

Abstract

This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in coverage space, a variant of ROC space. Our results show that most commonly used metrics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2's significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil's MDL-based encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, pre-pruning heuristics are not very well understood and deserve more investigation.

References

[1]

Bradley, A. P. (1996). ROC curves and the χ2 test. Pattern Recognition Letters, 17, 287-294.]]

Digital Library

[2]

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Pacific Grove, CA: Wadsworth & Brooks.]]

[3]

Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27:4, 349-370.]]

[4]

Cestnik, B. (1990). Estimating probabilities: A crucial task in Machine Learning. In L. Aiello (Ed.), Proceedings of the 9th European Conference on Artificial Intelligence (ECAI-90) (pp. 147-150). Stockholm, Sweden: Pitman.]]

[5]

Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff (Ed.), Proceedings of the 5th European Working Session on Learning (EWSL-91) (pp. 151-163), Vol. 482 of Lecture Notes in Artificial Intelligence. Porto, Portugal: Springer-Verlag.]]

[6]

Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3:4, 261-283.]]

[7]

Cohen, W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (ML-95) (pp. 115-123). Lake Tahoe, CA: Morgan Kaufmann.]]

[8]

Cohen, W., & Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99) (pp. 335-342). Orlando, FL: AAAI/MIT Press.]]

[9]

De Raedt, L., & Van Laer, W. (1995). Inductive constraint logic. In K. Jantke, T. Shinohara, & T. Zeugmann (Eds.), Proceedings of the 5th Workshop on Algorithmic Learning Theory (ALT-95) (pp. 80-94), Vol. 997 of Lecture Notes in Artificial Intelligence. Fukuoka, Japan: Springer-Verlag.]]

[10]

Džeroski, S., & Bratko, I. (1992). Handling noise in Inductive Logic Programming. In S. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd International Workshop on Inductive Logic Programming (ILP-92 ) (pp. 109-125). Tokyo, Japan.]]

[11]

Ferri, C., Flach, P., & Hernáández, J. (2002). Learning decision trees using the area under the ROC curve. In C. Sammut, & A. Hoffmann (Eds.), Proceedings of the 19th International Conference on Machine Learning (ICML-02) (pp. 139-146). Sydney, Australia, Morgan Kaufmann.]]

[12]

Flach, P. (2003). The geometry of ROC space: Using ROC isometrics to understand machine learning metrics. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 194-201). Washington, DC: AAAI Press.]]

[13]

Flach, P., & Wu, S. (2003). Repairing concavities in ROC curves. In J. Rossiter & T. Martin (Eds.), Proceedings of the 2003 UK Workshop on Computational Intelligence (pp. 38-44). University of Bristol.]]

[14]

Fürnkranz, J. (1994). FOSSIL: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Proceedings of the 7th European Conference on Machine Learning (ECML-94) (pp. 122-137), Vol. 784 of Lecture Notes in Artificial Intelligence. Catania, Italy: Springer-Verlag.]]

[15]

Fürnkranz, J. (1997). Pruning algorithms for rule learning, Machine Learning, 27:2, 139-171.]]

Digital Library

[16]

Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13:1, 3-54.]]

Digital Library

[17]

Fürnkranz, J. (2002). Round robin classification. Journal of Machine Learning Research, 2, 721-747.]]

Digital Library

[18]

Fürnkranz, J., & Flach, P. (2003). An analysis of rule evaluation metrics. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 202-209). Washington, DC: AAAI Press.]]

[19]

Fürnkranz, J., & Flach, P. (2004). An analysis of stopping and filtering criteria for rule learning. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Proceedings of the 14th European Conference on Machine Learning (ECML-04), Vol. 3201 of Lecture Notes in Artificial Intelligence. Pisa, Italy: Springer-Verlag.]]

[20]

Fürnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In W. Cohen & H. Hirsh (Eds.), Proceedings of the 11 th International Conference on Machine Learning (ML-94), (pp. 70-77). New Brunswick, NJ: Morgan Kaufmann.]]

[21]

Gamberger, D., & Lavračč, N. (2002). Expert-guided subgroup discovery: Methodology and application. Journal of Artificial Intelligence Research, 17, 501-527.]]

Digital Library

[22]

Giordana, A., & Sale, C. (1992). Learning structured concepts using genetic algorithms. In D. Sleeman & Edwards, P. (Eds.), Proceedings of the 9th International Workshop on Machine Learning (ML-92) (pp. 169-178). Edinburgh, United Kingdom, Morgan Kaufmann.]]

[23]

Jovanoski, V., & Lavračč, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil and A. Jorge (Eds.), Proceedings of the 1Oth Portuguese Conference on Artificial Intelligence (EPIA-O1) (pp. 44-51), Vol. 2258 of Lecture Notes in Artificial Intelligence. Porto, Portugal: Springer-Verlag.]]

[24]

Kaufman, K., & Michalski, R. (2000). An adjustable rule learner for pattern discovery using the AQ methodology. Journal of Intelligent Information Systems, 14:2/3, 199-216.]]

[25]

Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining (pp. 249-271). AAAI Press.]]

[26]

Lavrač, N., Flach, P., & Zupan, B. (1999). Rule evaluation measures: A unifying view. In S. Džžeroski & P. Flach (Eds.), Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP-99) (pp. 174-185), Vol. 1634 of Lecture Notes in Artificial Intelligence. Bled, Slovenia: Springer-Verlag.]]

[27]

Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153-188.]]

Digital Library

[28]

Lavrač, N., Cestnik, B., & Džžeroski, S. (1992a). Search heuristics in empirical Inductive Logic Programming. In Logical Approaches to Machine Learning. Workshop Notes of the 10th European Conference on AI. Vienna, Austria.]]

[29]

Lavrač, N., Cestnik, B., & Džžeroski, S. (1992b). Use of heuristics in empirical Inductive Logic Programming. In S. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd International Workshop on Inductive Logic Programming (ILP-92). Tokyo, Japan.]]

[30]

Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 80-86). AAAI Press.]]

[31]

Liu, B., Ma, Y., & Wong, C.-K. (2000). Improving an exhaustive search based rule learner. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-00), (pp. 504-509), Vol. 2258 of Lecture Notes in Artificial Intelligence. Lyon, France: Springer-Verlag.]]

[32]

Michalski, R. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th International Symposium on Information Processing (FCIP-69) (pp. 125-128), Vol. A3 (Switching Circuits). Bled, Yugoslavia.]]

[33]

Mittenecker, E. (1983). Planung und statistische Auswertung von Experimenten, 10th edition. Vienna, Austria: Verlag Franz Deuticke. (In German)]]

[34]

Mladenić, D. (1993). Combinatorial optimization in inductive concept learning. In Proceedings of the 10th International Conference on Machine Learning (ML-93). (pp. 205-211). Amherst, MA: Morgan Kaufmann.]]

[35]

Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, 13:3/4,245-286.]]

Digital Library

[36]

Oates, T., & Jensen, D. (1998). Large data sets lead to overly complex models: An explanation and a solution. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 294-298). AAAI Press.]]

[37]

Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5:1, 71-99.]]

Digital Library

[38]

Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42:3, 203-231.]]

Digital Library

[39]

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1, 81-106.]]

[40]

Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5:3, 239-266.]]

[41]

Quinlan, J. R., & Cameron-Jones, R. (1995). Induction of logic programs: FOIL and related systems. New Generation Computing, 13:3/4, 287-312.]]

Digital Library

[42]

Swets, J., Dawes, R., & Monahan, J. (2000). Better decisions through science. Scientific American, 283:4, 82-87.]]

[43]

Todorovski, L., Flach, P., & Lavračč, N. (2000). Predictive performance of weighted relative accuracy. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-00) (pp. 255-264), Vol. 2258 of Lecture Notes in Artificial Intelligence. Lyon, France: Springer-Verlag.]]

[44]

van Rijsbergen, C. (1979). Information retrieval, 2nd edition. London, UK: Butterworths.]]

[45]

Vilalta, R., & Oblinger, D. (2000). A quantification of distance-bias between evaluation metrics in classification. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (ICML-00), (pp. 1087-1094). Stanford, CA: Morgan Kaufmann.]]

[46]

Webb, G. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 5, 431-465.]]

Digital Library

[47]

Weiss, S., & Indurkhya, N. (1991). Reduced complexity rule induction. In J. Mylopoulos, & R. Reiter (Eds.), Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-91) (pp. 678-684). Sydney, Australia: Morgan Kaufmann.]]

Cited By

Mo?Ina MDemšar JBratko I?Abkar J(2019)Extreme value correctionMachine Language10.1007/s10994-018-5731-3108:2(297-329)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s10994-018-5731-3
Mihindukulasooriya NRashid MRizzo GGarcía-Castro RCorcho OTorchiano MHaddad HWainwright RChbeir R(2018)RDF shape induction using knowledge base profilingProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167341(1952-1959)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.1145/3167132.3167341
Celotto E(2017)Visualizing the behavior and some symmetry properties of Bayesian confirmation measuresData Mining and Knowledge Discovery10.1007/s10618-016-0487-531:3(739-773)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s10618-016-0487-5
Show More Cited By

Index Terms

ROC 'n' rule learning: towards a better understanding of covering algorithms
1. Computing methodologies
  1. Machine learning

Recommendations

ROC `n' Rule Learning—Towards a Better Understanding of Covering Algorithms

This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in coverage space, a variant of ROC space. Our results show that most commonly used ...
An evaluation of heuristics for rule ranking

Objective: To evaluate and compare the performance of different rule-ranking algorithms for rule-based classifiers on biomedical datasets. Methodology: Empirical evaluation of five rule ranking algorithms on two biomedical datasets, with performance ...
Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning
Machine Learning and Knowledge Discovery in Databases
Abstract
Conventional rule learning algorithms use a single heuristic for evaluating both, rule refinements and rule selection. In this paper, we argue that these two phases should be separated. Moreover, whereas rule selection proceeds in a bottom-up ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 58, Issue 1

January 2005

89 pages

ISSN:0885-6125

Issue’s Table of Contents

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 January 2005

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mo?Ina MDemšar JBratko I?Abkar J(2019)Extreme value correctionMachine Language10.1007/s10994-018-5731-3108:2(297-329)Online publication date: 1-Feb-2019
https://dl.acm.org/doi/10.1007/s10994-018-5731-3
Mihindukulasooriya NRashid MRizzo GGarcía-Castro RCorcho OTorchiano MHaddad HWainwright RChbeir R(2018)RDF shape induction using knowledge base profilingProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167341(1952-1959)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.1145/3167132.3167341
Celotto E(2017)Visualizing the behavior and some symmetry properties of Bayesian confirmation measuresData Mining and Knowledge Discovery10.1007/s10618-016-0487-531:3(739-773)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s10618-016-0487-5
Fu WMenzies TShen X(2016)Tuning for software analyticsInformation and Software Technology10.1016/j.infsof.2016.04.01776:C(135-146)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1016/j.infsof.2016.04.017
Minnaert BMartens DBacker MBaesens B(2015)To tune or not to tuneData Mining and Knowledge Discovery10.1007/s10618-013-0339-529:1(237-272)Online publication date: 1-Jan-2015
https://dl.acm.org/doi/10.1007/s10618-013-0339-5
Mampaey MNijssen SFeelders AKonijn RKnobbe A(2015)Efficient algorithms for finding optimal binary features in numeric and nominal labeled dataKnowledge and Information Systems10.1007/s10115-013-0714-y42:2(465-492)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1007/s10115-013-0714-y
Kolias VAnagnostopoulos IKayafas E(2014)A Covering Classification Rule Induction Approach for Big DatasetsProceedings of the 2014 IEEE/ACM International Symposium on Big Data Computing10.1109/BDC.2014.17(45-53)Online publication date: 8-Dec-2014
https://dl.acm.org/doi/10.1109/BDC.2014.17
Tew CGiraud-Carrier CTanner KBurton S(2014)Behavior-based clustering and analysis of interestingness measures for association rule miningData Mining and Knowledge Discovery10.1007/s10618-013-0326-x28:4(1004-1045)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.1007/s10618-013-0326-x
Sikora M(2013)Redefinition of Decision Rules Based on the Importance of Elementary Conditions EvaluationFundamenta Informaticae10.5555/2594874.2594877123:2(171-197)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.5555/2594874.2594877
Sikora MGudyś A(2013)CHIRA---Convex Hull Based Iterative Algorithm of Rules AggregationFundamenta Informaticae10.5555/2594874.2594876123:2(143-170)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.5555/2594874.2594876
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents