More Web Proxy on the site http://driver.im/

research-article

Using AUC and Accuracy in Evaluating Learning Algorithms

Authors:

Charles X. LingAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 17, Issue 3

Pages 299 - 310

https://doi.org/10.1109/TKDE.2005.50

Published: 01 March 2005 Publication History

Abstract

The area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been traditionally used in medical diagnosis since the 1970s. It has recently been proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. In this paper, we establish formal criteria for comparing two different measures for learning algorithms and we show theoretically and empirically that AUC is a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC and obtain interesting and surprising new results. For example, it has been well-established and accepted that Naive Bayes and decision trees are very similar in predictive accuracy. We show, however, that Naive Bayes is significantly better than decision trees in AUC. The conclusions drawn in this paper may make a significant impact on machine learning and data mining applications.

References

[1]

C. Blake and C. Merz, UCI Repository of Machine Learning Databases, Univ. of California Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.

[2]

B. Boser I. Guyon and V. Vapnik, “A Training Algorithm for Optimal Margin Classifiers,” Proc. Fifth Conf. Computational Learning Theory, pp. 144-152, 1992.

Digital Library

[3]

A.P. Bradley, “The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms,” Pattern Recognition, vol. 30, pp. 1145-1159, 1997.

Digital Library

[4]

C.J. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.

Digital Library

[5]

C.C. Chang and C. Lin, “LIBSVM: A Library for Support Vector Machines,” version 2.4, 2003.

[6]

W.W. Cohen R.E. Schapire and Y. Singer, “Learning to Order Things,” J. Artificial Intelligence Research, vol. 10, pp. 243-270, 1999.

Digital Library

[7]

N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge Univ. Press, 2000.

Digital Library

[8]

P. Domingos and M. Pazzani, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,” Proc. 13th Int'l Conf. Machine Learning, pp. 105-112, 1996.

[9]

R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. Wiley-Interscience, 1973.

Digital Library

[10]

J. Egan, Signal Detection Theory and ROC Analysis. New York: Academic Press, 1975.

[11]

U. Fayyad and K. Irani, “Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning,” Proc. 13th Int'l Joint Conf. Artificial Intelligence, pp. 1022-1027, 1993.

[12]

C. Ferri P.A. Flach and J. Hernandez-Orallo, “Learning Decision Trees Using the Area under the ROC Curve,” Proc. 19th Int'l Conf. Machine Learning (ICML '02), pp. 139-146, 2002.

Digital Library

[13]

D. Green and J. Swets, Signal Detection Theory and Psychophysics. New York: Wiley, 1966.

[14]

D.J. Hand and R.J. Till, “A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems,” Machine Learning, vol. 45, pp. 171-186, 2001.

Digital Library

[15]

J.A. Hanley and B.J. McNeil, “The Meaning and Use of the Area under a Receiver Operating Characteristics (ROC) Curve,” Radiology, vol. 143, pp. 29-36, 1982.

[16]

T. Hastie R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Springer, 2001.

[17]

C. Hsu and C. Lin, “A Comparison on Methods for Multi-Class Support Vector Machines,” technical report, Dept. of Computer Science and Information Eng., National Taiwan Univ., Taipei, Taiwan, 2001.

[18]

I. Kononenko, “Comparison of Inductive and Naive Bayesian Learning Approaches to Automatic Knowledge Acquisition,” Current Trends in Knowledge Acquisition, B. Wielinga, ed., 1990.

[19]

P. Langley W. Iba and K. Thomas, “An Analysis of Bayesian Classifiers,” Proc. 10th Nat'l Conf. Artificial Intelligence, pp. 223-228, 1992.

[20]

C. Ling and C. Li, “Data Mining for Direct Marketing-Specific Problems and Solutions,” Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD '98), pp. 73-79, 1998.

[21]

C.X. Ling J. Huang and H. Zhang, “AUC: A Statistically Consistent and More Discriminating Measure than Accuracy,” Proc. 18th Int'l Conf. Artificial Intelligence (IJCAI '03), pp. 329-341, 2003.

[22]

C.X. Ling and H. Zhang, “Toward Bayesian Classifiers with Accurate Probabilities,” Proc. Sixth Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 123-134, 2002.

Digital Library

[23]

H. Liu F. Hussain C.L. Tan and M. Dash, “Discretization: An Enabling Technique,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, 2002.

Digital Library

[24]

C. Metz, “Basic Principles of ROC Analysis,” Seminars in Nuclear Medcine, vol. 8, pp. 283-298, 1978.

[25]

D. Meyer F. Leisch and K. Hornik, “Benchmarking Support Vector Machines,” technical report, Vienna Univ. of Economics and Business Administration, 2002.

[26]

F. Provost and P. Domingos, “Well-Trained PETs: Improving Probability Estimation Trees,” Technical Report CDER #0004-IS, Stern School of Business, New York Univ., http://www.stern. nyu.edu/fprovost, 2000.

[27]

F. Provost and P. Domingos, “Tree Induction for Probability-Based Ranking,” Machine Learning, vol. 52, no. 3, pp. 199-215, 2003.

Digital Library

[28]

F. Provost and T. Fawcett, “Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distribution,” Proc. Third Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-48, 1997.

[29]

F. Provost T. Fawcett and R. Kohavi, “The Case Against Accuracy Estimation for Comparing Induction Algorithms,” Proc. 15th Int'l Conf. Machine Learning, pp. 445-453, 1998.

Digital Library

[30]

J. Quinlan, C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, 1993.

Digital Library

[31]

B. Scholkopf and A. Smola, Learning with Kernels. MIT Press, 2002.

[32]

P. Smyth A. Gray and U. Fayyad, “Retrofitting Decision Tree Classifiers using Kernel Density Estimation,” Proc. 12th Int'l Conf. Machine Learning, pp. 506-514, 1995.

[33]

K. Spackman, “Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning,” Proc. Sixth Int'l Workshop Machine Learning, pp. 160-163, 1989.

Digital Library

[34]

J.A.K. Suykens and J. Vandewalle, “Multiclass Least Squares Support Vector Machines,” Proc. Int'l Joint Conf. Neural Networks (IJCNNnbsp'99), 1999.

[35]

J. Swets, “Measuring the Accuracy of Diagnostic Systems,” Science, vol. 240, pp. 1285-1293, 1988.

[36]

V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

Cited By

Zhao MSong YHuang HKim E(2025)Attention-based fuzzy neural networks designed for early warning of financial crises of listed companiesInformation Sciences: an International Journal10.1016/j.ins.2024.121374686:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.ins.2024.121374
Pandey GBagri RGupta RRajpal AAgarwal MKumar N(2024)Robust weighted general performance score for various classification scenariosIntelligent Decision Technologies10.3233/IDT-24046518:3(2033-2054)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.3233/IDT-240465
Zhang HLi CLiu HZhang Y(2024)Application of Random Forest Algorithm for Automatic Monitoring Weight of BroilersProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690418(64-69)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3690407.3690418
Show More Cited By

Index Terms

Using AUC and Accuracy in Evaluating Learning Algorithms
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies

Recommendations

Evolving neural networks with maximum AUC for imbalanced data classification
HAIS'10: Proceedings of the 5th international conference on Hybrid Artificial Intelligence Systems - Volume Part I

Real-world classification problems usually involve imbalanced data sets In such cases, a classifier with high classification accuracy does not necessarily imply a good classification performance for all classes The Area Under the ROC Curve (AUC) has ...
AUC: a better measure than accuracy in comparing learning algorithms
AI'03: Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence

Predictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the ...
AUC: a statistically consistent and more discriminating measure than accuracy
IJCAI'03: Proceedings of the 18th international joint conference on Artificial intelligence

Predictive accuracy has been used as the main and often only evaluation criterion for the predictive performance of classification learning algorithms. In recent years, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 17, Issue 3

March 2005

143 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © 2005.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 March 2005

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

361
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao MSong YHuang HKim E(2025)Attention-based fuzzy neural networks designed for early warning of financial crises of listed companiesInformation Sciences: an International Journal10.1016/j.ins.2024.121374686:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.ins.2024.121374
Pandey GBagri RGupta RRajpal AAgarwal MKumar N(2024)Robust weighted general performance score for various classification scenariosIntelligent Decision Technologies10.3233/IDT-24046518:3(2033-2054)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.3233/IDT-240465
Zhang HLi CLiu HZhang Y(2024)Application of Random Forest Algorithm for Automatic Monitoring Weight of BroilersProceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms10.1145/3690407.3690418(64-69)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3690407.3690418
Verma RMitra BChakraborty S(2024)On-the-Go Automated Break Recommendation for Stress Avoidance during Highway DrivingACM Journal on Autonomous Transportation Systems10.1145/36894382:1(1-25)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1145/3689438
Islam RBae S(2024)FacePsy: An Open-Source Affective Mobile Sensing System - Analyzing Facial Behavior and Head Gesture for Depression Detection in Naturalistic SettingsProceedings of the ACM on Human-Computer Interaction10.1145/36765058:MHCI(1-32)Online publication date: 24-Sep-2024
https://dl.acm.org/doi/10.1145/3676505
Sheng JXu DHu PLi LHuang T(2024)Mining Multimorbidity Trajectories and Co-Medication Effects from Patient Data to Predict Post–Hip Fracture OutcomesACM Transactions on Management Information Systems10.1145/366525015:2(1-24)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3665250
Xie YZhang HBabar M(2024)LogSD: Detecting Anomalies from System Logs through Self-Supervised Learning and Frequency-Based MaskingProceedings of the ACM on Software Engineering10.1145/36608001:FSE(2098-2120)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660800
Capariño E(2024)Improving kNN Classification Performance with Correlated Attribute Imputation MechanismProceedings of the 2024 9th International Conference on Intelligent Information Technology10.1145/3654522.3654547(287-294)Online publication date: 23-Feb-2024
https://dl.acm.org/doi/10.1145/3654522.3654547
Lu YHu YLi LXu ZLiu HLiang HFu X(2024)CvTGNet: A Novel Framework for Chest X-Ray Multi-label ClassificationProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649216(12-20)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649216
An ZGu ZYu LTu KWu ZHu BZhang ZGu LGu JBaeza-Yates RBonchi F(2024)DDCDR: A Disentangle-based Distillation Framework for Cross-Domain RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671605(4764-4773)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671605
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents