[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A framework for monitoring classifiers’ performance: when and why failure occurs?

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Classifier error is the product of model bias and data variance. While understanding the bias involved when selecting a given learning algorithm, it is similarly important to understand the variability in data over time, since even the One True Model might perform poorly when training and evaluation samples diverge. Thus, it becomes the ability to identify distributional divergence is critical towards pinpointing when fracture points in classifier performance will occur, particularly since contemporary methods such as tenfolds and hold-out are poor predictors in divergent circumstances. This article implement a comprehensive evaluation framework to proactively detect breakpoints in classifiers’ predictions and shifts in data distributions through a series of statistical tests. We outline and utilize three scenarios under which data changes: sample selection bias, covariate shift, and shifting class priors. We evaluate the framework with a variety of classifiers and datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. SVMlight Support Vector Machine. http://www.cs.cornell.edu/People/tj/svm_light/

  2. Basu A, Harris IR, Basu S (1997) Minimum distance estimation: the approach using density-based distances. Handb Stat 15: 21–48

    Article  MathSciNet  Google Scholar 

  3. Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of suppervised learning performance criteria. In: Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 69–78

  4. Chawla NV, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. JAIR 23: 331–366

    MATH  Google Scholar 

  5. Fan W, Davidson I (2006) ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDD

  6. Fan W, Davidson I., Zadrozny B., Yu P. (2005) An improved categorization of classifier’s sensitivity on sample selection bias. In: 5th IEEE International Conference on Data Mining

  7. Gibbons JD (1985) Nonparametric statistical inference, 2nd edn. M. Dekker, New York

    MATH  Google Scholar 

  8. Groot P, ten Teije A, van Harmelen F (2004) A quantitative analysis of the robustness of knowledge-based systems through degradation studies. Knowl Inform Syst 7(2): 224–245

    Article  Google Scholar 

  9. Hall L, Mohney B, Kier L (1991) The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inform Comput Sci 31(76)

  10. Heckman JJ Sample selection bias as a specification error. Econometrica 47(1):153–161

  11. Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun 15(1): 52–60

    Article  Google Scholar 

  12. Kolmogorov AN (1933) On the empirical determination of a distribution function. (Italian) Giornale dell’Instituto Italiano degli Attuari 4: 83–91

    MATH  Google Scholar 

  13. Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30: 195–215

    Article  Google Scholar 

  14. Kukar M (2006) Quality assessment of individual classifications in machine learning and data mining. Knowl Inform Sys 9(3): 364–384

    Article  Google Scholar 

  15. Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280

    Article  Google Scholar 

  16. Lindman HR (1974) Analysis of variance in complex experimental designs. W. H. Freeman, San Francisco

    MATH  Google Scholar 

  17. Little R, Rubin D (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  18. Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases

  19. Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3): 199–215

    Article  MATH  Google Scholar 

  20. Rao C (1995) A review of canonical coordinates and an alternative to corresponence analysis using hellinger distance. Questiio (Quaderns d’Estadistica i Investigacio Operativa) 19: 23–63

    Google Scholar 

  21. Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning Theory

  22. Smirnov N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. (Russ) Bull Mosc Univ 2: 3–16

    Google Scholar 

  23. Vapnik V (1996) The nature of statistical learning. Springer, New York

    Google Scholar 

  24. Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer WP (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. IJPRAI 7(6): 1417–1436

    Google Scholar 

  25. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14(1): 1–37

    Article  Google Scholar 

  26. Yamanishi K, ichi Takeuchi J, Williams GJ, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In: Knowledge Discovery and Data Mining, pp 275–300

  27. Zadrozny B (2004) Learning and evaluating under sample selection bias. In: Proceedings of the 21st International Conference on Machine Learning

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitesh V. Chawla.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cieslak, D.A., Chawla, N.V. A framework for monitoring classifiers’ performance: when and why failure occurs?. Knowl Inf Syst 18, 83–108 (2009). https://doi.org/10.1007/s10115-008-0139-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0139-1

Keywords