More Web Proxy on the site http://driver.im/

research-article

Open access

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

Authors:

Muhammad Bilal Zafar,

Michele Donini,

Krishnaram KenthapadiAuthors Info & Claims

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 3671 - 3681

https://doi.org/10.1145/3534678.3539145

Published: 14 August 2022 Publication History

Abstract

With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models. We describe the key requirements obtained from customers, system design and architecture, and methodology for detecting different types of drift. Further, we provide quantitative evaluations followed by use cases, insights, and lessons learned from more than two years of production deployment.

References

[1]

Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.

[2]

Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning. In MLSys.

[3]

SM Clarify. 2021. Create Feature Attribute Baselines and Explainability Reports. https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-featureattribute-baselines-reports.html Accessed: 2022-02.

[4]

Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, and Pavel Vesel

[5]

y. 2021. Relative Error Streaming Quantiles. In PODS.

[6]

Sanjiv Das, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Muhammad Bilal Zafar. 2021. Fairness Measures for Machine Learning in Finance. The Journal of Financial Data Science 3, 4 (2021), 33--64.

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[8]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[9]

Jordan Edwards et al. 2022. MLOps: Model management, deployment, lineage, and monitoring with Azure Machine Learning. https://tinyurl.com/57y8rrec

[10]

Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap. CRC press.

[11]

Evidently. 2022. Evidently AI: Open-Source Machine Learning Monitoring. https://evidentlyai.com

[12]

Fiddler. 2022. Explainable Monitoring. https://www.fiddler.ai/ml-monitoring

[13]

João Gama, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46, 4 (2014), 1--37.

Digital Library

[14]

Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, and Zachary Lipton. 2020. A Unified View of Label Shift Estimation. In NeurIPS.

[15]

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. The Journal of Machine Learning Research 13, 1 (2012), 723--773.

Digital Library

[16]

Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust random cut forest based anomaly detection on streams. In ICML.

[17]

Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, Ashish Rathi, Scott Rees, Ankit Siva, ErhYuan Tsai, Keerthan Vasist, Pinar Yilmaz, Muhammad Bilal Zafar, Sanjiv Das, Kevin Haas, Tyler Hill, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud. In KDD.

[18]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In NeurIPS.

[19]

Jeremy Hermann and Mike Del Balso. 2017. Meet Michelangelo: Uber's Machine Learning Platform. https://eng.uber.com/michelangelo-machine-learningplatform/

[20]

IBM. 2022. Validating and monitoring AI models with Watson OpenScale. https://tinyurl.com/5yzybu44

[21]

Zohar Karnin, Kevin Lang, and Edo Liberty. 2016. Optimal quantile approximation in streams. In FOCS.

[22]

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. 2018. Detecting and correcting for label shift with black box predictors. In ICML.

[23]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).

[24]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In NeurIPS.

[25]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS.

[26]

Kevin P Murphy. 2012. Machine learning: A probabilistic perspective. MIT press.

Digital Library

[27]

David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models. arXiv preprint arXiv:2111.13657 (2021).

[28]

Sashank Reddi, Barnabas Poczos, and Alex Smola. 2015. Doubly robust covariate shift correction. In AAAI.

[29]

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In KDD.

[30]

Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating large-scale data quality verification. In VLDB.

[31]

David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In NeurIPS.

[32]

Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification?. In China National Conf. on Chinese Computational Linguistics.

Digital Library

[33]

Ankur Taly, Kaz Sato, and Claudiu Gruia. 2021. Monitoring feature attributions: How Google saved one of the largest ML services in trouble. https://tinyurl.com/awt3f5ex Google Cloud Blog.

[34]

Alexey Tsymbal. 2004. The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin 106, 2 (2004), 58.

[35]

Larry Wasserman. 2004. All of statistics: A concise course in statistical inference. Vol. 26. Springer.

Digital Library

[36]

Yifan Wu, Ezra Winston, Divyansh Kaushik, and Zachary Lipton. 2019. Domain adaptation with asymmetrically-relaxed distribution alignment. In ICML.

Cited By

Ohata EMattos CRêgo P(2024)A Pipeline for Monitoring and Maintaining a Text Classification Tool in ProductionAnais do LI Seminário Integrado de Software e Hardware (SEMISH 2024)10.5753/semish.2024.2438(133-144)Online publication date: 21-Jul-2024
https://doi.org/10.5753/semish.2024.2438
Kalinaki KYahya UMalik OLai D(2024)A Review of Big Data Analytics and Artificial Intelligence in Industry 5.0 for Smart Decision-MakingHuman-Centered Approaches in Industry 5.010.4018/979-8-3693-2647-3.ch002(24-47)Online publication date: 16-Jan-2024
https://doi.org/10.4018/979-8-3693-2647-3.ch002
Morales JAntunes LEarl PEdman RHamed JReynolds DMaffey KYankel JYasar H(2024)Insights on Implementing a Metrics Baseline for Post-Deployment AI Container MonitoringProceedings of the 2024 International Conference on Software and Systems Processes10.1145/3666015.3666018(46-55)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3666015.3666018
Show More Cited By

Index Terms

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Elastic Machine Learning Algorithms in Amazon SageMaker
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

There is a large body of research on scalable machine learning (ML). Nevertheless, training ML models on large, continuously evolving datasets is still a difficult and costly undertaking for many companies and institutions. We discuss such challenges ...
Operationalizing machine learning models: a systematic literature review
SE4RAI '22: Proceedings of the 1st Workshop on Software Engineering for Responsible AI

Deploying machine learning (ML) models to production with the same level of rigor and automation as traditional software systems has shown itself to be a non-trivial task, requiring extra care and infrastructure to deal with the additional challenges. ...
Efficient and scalable covariate drift detection in machine learning systems with serverless computing
Abstract
As machine learning models are increasingly deployed in production, robust monitoring and detection of concept and covariate drift become critical. This paper addresses the gap in the widespread adoption of drift detection techniques by proposing ...
Highlights
- Serverless-based architecture enables efficient data drift detection in ML systems.
- Drift detection should be a requirement in the development of ML deployment pipelines.
- An edge ML system can incorporate data drift detection ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2022

5033 pages

ISBN:9781450393850

DOI:10.1145/3534678

General Chairs:
Aidong Zhang
University of Virginia
,
Huzefa Rangwala
Amazon/George Mason University

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '22

Sponsor:

KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2022

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
1,932
Total Downloads

Downloads (Last 12 months)938
Downloads (Last 6 weeks)98

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ohata EMattos CRêgo P(2024)A Pipeline for Monitoring and Maintaining a Text Classification Tool in ProductionAnais do LI Seminário Integrado de Software e Hardware (SEMISH 2024)10.5753/semish.2024.2438(133-144)Online publication date: 21-Jul-2024
https://doi.org/10.5753/semish.2024.2438
Kalinaki KYahya UMalik OLai D(2024)A Review of Big Data Analytics and Artificial Intelligence in Industry 5.0 for Smart Decision-MakingHuman-Centered Approaches in Industry 5.010.4018/979-8-3693-2647-3.ch002(24-47)Online publication date: 16-Jan-2024
https://doi.org/10.4018/979-8-3693-2647-3.ch002
Morales JAntunes LEarl PEdman RHamed JReynolds DMaffey KYankel JYasar H(2024)Insights on Implementing a Metrics Baseline for Post-Deployment AI Container MonitoringProceedings of the 2024 International Conference on Software and Systems Processes10.1145/3666015.3666018(46-55)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3666015.3666018
Naveed HGrundy JArora CKhalajzadeh HHaggag OEgyed AWimmer MChechik MCombemale B(2024)Towards Runtime Monitoring for Responsible Machine Learning using Model-driven EngineeringProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3640310.3674092(195-202)Online publication date: 22-Sep-2024
https://dl.acm.org/doi/10.1145/3640310.3674092
Decker TKoebler ALebacher MThon ITresp VBuettner FBaeza-Yates RBonchi F(2024)Explanatory Model Monitoring to Understand the Effects of Feature Shifts on PerformanceProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671959(550-561)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671959
Seedat NImrie FSchaar M(2024)Navigating Data-Centric Artificial Intelligence With DC-Check: Advances, Challenges, and OpportunitiesIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33458055:6(2589-2603)Online publication date: Jun-2024
https://doi.org/10.1109/TAI.2023.3345805
Sergio WStröele VDantas MBraga RMacedo D(2024)Enhancing well-being in modern education: A comprehensive eHealth proposal for managing stress and anxiety based on machine learningInternet of Things10.1016/j.iot.2023.10105525(101055)Online publication date: Apr-2024
https://doi.org/10.1016/j.iot.2023.101055
Naveed HArora CKhalajzadeh HGrundy JHaggag O(2024)Model driven engineering for machine learning componentsInformation and Software Technology10.1016/j.infsof.2024.107423169:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.infsof.2024.107423
Papastefanopoulos VLinardatos PPanagiotakopoulos TKotsiantis S(2023)Multivariate Time-Series Forecasting: A Review of Deep Learning Methods in Internet of Things Applications to Smart CitiesSmart Cities10.3390/smartcities60501146:5(2519-2552)Online publication date: 23-Sep-2023
https://doi.org/10.3390/smartcities6050114
Tan GChen PLi M(2023)Online Data Drift Detection for Anomaly Detection Services based on Deep Learning towards Multivariate Time Series2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00011(1-11)Online publication date: 22-Oct-2023
https://doi.org/10.1109/QRS60937.2023.00011
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents