Abstract
Classifier error is the product of model bias and data variance. While understanding the bias involved when selecting a given learning algorithm, it is similarly important to understand the variability in data over time, since even the One True Model might perform poorly when training and evaluation samples diverge. Thus, it becomes the ability to identify distributional divergence is critical towards pinpointing when fracture points in classifier performance will occur, particularly since contemporary methods such as tenfolds and hold-out are poor predictors in divergent circumstances. This article implement a comprehensive evaluation framework to proactively detect breakpoints in classifiers’ predictions and shifts in data distributions through a series of statistical tests. We outline and utilize three scenarios under which data changes: sample selection bias, covariate shift, and shifting class priors. We evaluate the framework with a variety of classifiers and datasets.
Similar content being viewed by others
References
SVMlight Support Vector Machine. http://www.cs.cornell.edu/People/tj/svm_light/
Basu A, Harris IR, Basu S (1997) Minimum distance estimation: the approach using density-based distances. Handb Stat 15: 21–48
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of suppervised learning performance criteria. In: Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 69–78
Chawla NV, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. JAIR 23: 331–366
Fan W, Davidson I (2006) ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDD
Fan W, Davidson I., Zadrozny B., Yu P. (2005) An improved categorization of classifier’s sensitivity on sample selection bias. In: 5th IEEE International Conference on Data Mining
Gibbons JD (1985) Nonparametric statistical inference, 2nd edn. M. Dekker, New York
Groot P, ten Teije A, van Harmelen F (2004) A quantitative analysis of the robustness of knowledge-based systems through degradation studies. Knowl Inform Syst 7(2): 224–245
Hall L, Mohney B, Kier L (1991) The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inform Comput Sci 31(76)
Heckman JJ Sample selection bias as a specification error. Econometrica 47(1):153–161
Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun 15(1): 52–60
Kolmogorov AN (1933) On the empirical determination of a distribution function. (Italian) Giornale dell’Instituto Italiano degli Attuari 4: 83–91
Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30: 195–215
Kukar M (2006) Quality assessment of individual classifications in machine learning and data mining. Knowl Inform Sys 9(3): 364–384
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280
Lindman HR (1974) Analysis of variance in complex experimental designs. W. H. Freeman, San Francisco
Little R, Rubin D (1987) Statistical analysis with missing data. Wiley, New York
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3): 199–215
Rao C (1995) A review of canonical coordinates and an alternative to corresponence analysis using hellinger distance. Questiio (Quaderns d’Estadistica i Investigacio Operativa) 19: 23–63
Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning Theory
Smirnov N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. (Russ) Bull Mosc Univ 2: 3–16
Vapnik V (1996) The nature of statistical learning. Springer, New York
Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer WP (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. IJPRAI 7(6): 1417–1436
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14(1): 1–37
Yamanishi K, ichi Takeuchi J, Williams GJ, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In: Knowledge Discovery and Data Mining, pp 275–300
Zadrozny B (2004) Learning and evaluating under sample selection bias. In: Proceedings of the 21st International Conference on Machine Learning
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cieslak, D.A., Chawla, N.V. A framework for monitoring classifiers’ performance: when and why failure occurs?. Knowl Inf Syst 18, 83–108 (2009). https://doi.org/10.1007/s10115-008-0139-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-008-0139-1