[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3394486.3403210acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

A Geometric Approach to Predicting Bounds of Downstream Model Performance

Published: 20 August 2020 Publication History

Abstract

This paper presents the motivation and methodology for including model application criteria into baseline analysis. We will focus on detailing the interplay between the common measures of mean square error (MSE) and accuracy as it relates to perceived model performance. MSE is a common aggregate measure for the performance of predictive regression models. The advantages are numerous. MSE is agnostic to the choice of model given that the set of possible outcome values are defined on the appropriate metric space. In practice, decisions on how to subsequently use a trained model are based on predictive performance, relative to a baseline where input features are not used - colloquially a "random model". However, the relative performance gains of a model in terms of MSE to the baseline does not guarantee commensurate gains when deployed in downstream applications, systems, or processes. This paper demonstrates one derivation of a distribution to qualify MSE performance for multi-class decision making systems desiring a certain level of accuracy. The model error is qualified through comparison to relevant baselines tied to the application suited to evaluating individual outcome performance criteria.

References

[1]
Yoshua Bengio, Nicolas L Roux, Pascal Vincent, Olivier Delalleau, and Patrice Marcotte. 2006. Convex neural networks. In Advances in neural information processing systems 18 ed.). MIT Press, Cambridge, 123--130.
[2]
Rich Caruana and Alexandru Niculescu-Mizil. 2006. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning. Association for Computing Machinery, New York, NY, USA, 161--168.
[3]
R Dennis Cook and Sanford Weisberg. 1982. Residuals and influence in regression .New York: Chapman and Hall, New York, NY, USA.
[4]
Pedro Domingos. 2000. A unified bias-variance decomposition for zero-one and squared loss. AAAI/IAAI, Vol. 2000 (2000), 564--569.
[5]
Richard C. Dorf and Robert H. Bishop. 2000. Modern Control Systems 9th ed.). Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
[6]
Matthew J. Salganik et al. 2020. Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences, Vol. 117, 15 (2020), 8398--8403. https://doi.org/10.1073/pnas.1915006117
[7]
Y. Ren et al. 2018. Generative Modeling of Human Behavior and Social Interactions Using Abductive Analysis. In 2018 IEEE/ACM Int. Conf. on Adv in Soc. Networks Analysis and Mining. IEEE, Barcelona, 413--420. https://doi.org/10.1109/ASONAM.2018.8508282
[8]
C. Ferri, J. Hernández-Orallo, and R. Modroiu. 2009. An Experimental Comparison of Performance Measures for Classification. Pattern Recogn. Lett., Vol. 30, 1 (Jan. 2009), 27--38. https://doi.org/10.1016/j.patrec.2008.08.010
[9]
Lisa Gaudette and Nathalie Japkowicz. 2009. Evaluation Methods for Ordinal Classification. In Advances in Artificial Intelligence, 22nd Canadian Conference on Artificial Intelligence, Canadian AI 2009, Kelowna, Canada, May 25--27, 2009, Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, 207--210. https://doi.org/10.1007/978--3--642-01818--3_25
[10]
Brian J. Goode, Debanjan Datta, and Naren Ramakrishnan. 2019. Imputing Data for the Fragile Families Challenge: Identifying Similar Survey Questions with Semiautomated Methods. Socius, Vol. 5 (2019).
[11]
Brian J. Goode and Bianica Pires. 2019. Training and Application Error for Decision Models. Journal of Policy & Complex Systems, Vol. 5, 2 (2019), 165--180.
[12]
José Hernández-Orallo, Peter Flach, and Cèsar Ferri. 2012. A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss. J. Mach. Learn. Res., Vol. 13, 1 (Oct. 2012), 2813--2869. http://dl.acm.org/citation.cfm?id=2503308.2503332
[13]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An introduction to statistical learning. Vol. 112. Springer, New York, NY, USA.
[14]
W. Krauth. 2006. Statistical Mechanics: Algorithms and Computations .Oxford University Press, Oxford.
[15]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. Curran Associates, Inc., Red Hook, NY, USA, 4765--4774.
[16]
Christopher Meek. 2016. A characterization of prediction errors.
[17]
Allan H. Murphy. 1973. A New Vector Partition of the Probability Score. Journal of Applied Meteorology (1962--1982), Vol. 12, 4 (1973), 595--600. http://www.jstor.org/stable/26176769
[18]
Nancy E. Reichman, Julien O.Teitler, Irwin Garfinkel, and Sara S.McLanahan. 2001. Fragile Families: sample and design. Children and Youth Services Review, Vol. 23, 4 (2001), 303--326. https://doi.org/10.1016/S0190--7409(01)00141--4
[19]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier., 1135--1144 pages.
[20]
Philip E. Tetlock. 2006. Expert Political Judgement: How Good Is It? How Can We Know? Princeton University Press, Princeton, New Jersey.
[21]
Paul Tseng. 2010. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Mathematical Programming, Vol. 125, 2 (01 Oct 2010), 263--295. https://doi.org/10.1007/s10107-010-0394--2

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Check for updates

Author Tags

  1. classification
  2. error metrics
  3. interpretability
  4. machine learning
  5. social systems

Qualifiers

  • Research-article

Funding Sources

  • Eunice Kennedy Shriver National Institute of Child Health and Human Developmeny
  • Robert Wood Johnson Foundation

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 835
    Total Downloads
  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)21
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media