A framework for monitoring classifiers’ performance: when and why failure occurs?

David A. Cieslak¹ &
Nitesh V. Chawla¹

399 Accesses
56 Citations
3 Altmetric
Explore all metrics

Abstract

Classifier error is the product of model bias and data variance. While understanding the bias involved when selecting a given learning algorithm, it is similarly important to understand the variability in data over time, since even the One True Model might perform poorly when training and evaluation samples diverge. Thus, it becomes the ability to identify distributional divergence is critical towards pinpointing when fracture points in classifier performance will occur, particularly since contemporary methods such as tenfolds and hold-out are poor predictors in divergent circumstances. This article implement a comprehensive evaluation framework to proactively detect breakpoints in classifiers’ predictions and shifts in data distributions through a series of statistical tests. We outline and utilize three scenarios under which data changes: sample selection bias, covariate shift, and shifting class priors. We evaluate the framework with a variety of classifiers and datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

References

SVM^light Support Vector Machine. http://www.cs.cornell.edu/People/tj/svm_light/
Basu A, Harris IR, Basu S (1997) Minimum distance estimation: the approach using density-based distances. Handb Stat 15: 21–48
Article MathSciNet Google Scholar
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of suppervised learning performance criteria. In: Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 69–78
Chawla NV, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. JAIR 23: 331–366
MATH Google Scholar
Fan W, Davidson I (2006) ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDD
Fan W, Davidson I., Zadrozny B., Yu P. (2005) An improved categorization of classifier’s sensitivity on sample selection bias. In: 5th IEEE International Conference on Data Mining
Gibbons JD (1985) Nonparametric statistical inference, 2nd edn. M. Dekker, New York
MATH Google Scholar
Groot P, ten Teije A, van Harmelen F (2004) A quantitative analysis of the robustness of knowledge-based systems through degradation studies. Knowl Inform Syst 7(2): 224–245
Article Google Scholar
Hall L, Mohney B, Kier L (1991) The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inform Comput Sci 31(76)
Heckman JJ Sample selection bias as a specification error. Econometrica 47(1):153–161
Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun 15(1): 52–60
Article Google Scholar
Kolmogorov AN (1933) On the empirical determination of a distribution function. (Italian) Giornale dell’Instituto Italiano degli Attuari 4: 83–91
MATH Google Scholar
Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30: 195–215
Article Google Scholar
Kukar M (2006) Quality assessment of individual classifications in machine learning and data mining. Knowl Inform Sys 9(3): 364–384
Article Google Scholar
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280
Article Google Scholar
Lindman HR (1974) Analysis of variance in complex experimental designs. W. H. Freeman, San Francisco
MATH Google Scholar
Little R, Rubin D (1987) Statistical analysis with missing data. Wiley, New York
MATH Google Scholar
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3): 199–215
Article MATH Google Scholar
Rao C (1995) A review of canonical coordinates and an alternative to corresponence analysis using hellinger distance. Questiio (Quaderns d’Estadistica i Investigacio Operativa) 19: 23–63
Google Scholar
Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning Theory
Smirnov N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. (Russ) Bull Mosc Univ 2: 3–16
Google Scholar
Vapnik V (1996) The nature of statistical learning. Springer, New York
Google Scholar
Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer WP (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. IJPRAI 7(6): 1417–1436
Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14(1): 1–37
Article Google Scholar
Yamanishi K, ichi Takeuchi J, Williams GJ, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In: Knowledge Discovery and Data Mining, pp 275–300
Zadrozny B (2004) Learning and evaluating under sample selection bias. In: Proceedings of the 21st International Conference on Machine Learning

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Notre Dame, Notre Dame, IN, USA
David A. Cieslak & Nitesh V. Chawla

Authors

David A. Cieslak
View author publications
You can also search for this author in PubMed Google Scholar
Nitesh V. Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitesh V. Chawla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cieslak, D.A., Chawla, N.V. A framework for monitoring classifiers’ performance: when and why failure occurs?. Knowl Inf Syst 18, 83–108 (2009). https://doi.org/10.1007/s10115-008-0139-1

Download citation

Received: 29 October 2007
Revised: 24 December 2007
Accepted: 29 January 2008
Published: 17 May 2008
Issue Date: January 2009
DOI: https://doi.org/10.1007/s10115-008-0139-1

A framework for monitoring classifiers’ performance: when and why failure occurs?

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers

Using p-values for the comparison of classifiers: pitfalls and alternatives

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

A framework for monitoring classifiers’ performance: when and why failure occurs?

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifier calibration: a survey on how to assess and improve predicted class probabilities

Confidence curves: an alternative to null hypothesis significance testing for the comparison of classifiers

Using p-values for the comparison of classifiers: pitfalls and alternatives

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now