[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3534678.3539145acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

Published: 14 August 2022 Publication History

Abstract

With the increasing adoption of machine learning (ML) models and systems in high-stakes settings across different industries, guaranteeing a model's performance after deployment has become crucial. Monitoring models in production is a critical aspect of ensuring their continued performance and reliability. We present Amazon SageMaker Model Monitor, a fully managed service that continuously monitors the quality of machine learning models hosted on Amazon SageMaker. Our system automatically detects data, concept, bias, and feature attribution drift in models in real-time and provides alerts so that model owners can take corrective actions and thereby maintain high quality models. We describe the key requirements obtained from customers, system design and architecture, and methodology for detecting different types of drift. Further, we provide quantitative evaluations followed by use cases, insights, and lessons learned from more than two years of production deployment.

References

[1]
Solon Barocas and Andrew D Selbst. 2016. Big data's disparate impact. Calif. L. Rev. 104 (2016), 671.
[2]
Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning. In MLSys.
[3]
SM Clarify. 2021. Create Feature Attribute Baselines and Explainability Reports. https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-featureattribute-baselines-reports.html Accessed: 2022-02.
[4]
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, and Pavel Vesel
[5]
y. 2021. Relative Error Streaming Quantiles. In PODS.
[6]
Sanjiv Das, Michele Donini, Jason Gelman, Kevin Haas, Mila Hardt, Jared Katzman, Krishnaram Kenthapadi, Pedro Larroy, Pinar Yilmaz, and Muhammad Bilal Zafar. 2021. Fairness Measures for Machine Learning in Finance. The Journal of Financial Data Science 3, 4 (2021), 33--64.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[8]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[9]
Jordan Edwards et al. 2022. MLOps: Model management, deployment, lineage, and monitoring with Azure Machine Learning. https://tinyurl.com/57y8rrec
[10]
Bradley Efron and Robert J Tibshirani. 1994. An introduction to the bootstrap. CRC press.
[11]
Evidently. 2022. Evidently AI: Open-Source Machine Learning Monitoring. https://evidentlyai.com
[12]
Fiddler. 2022. Explainable Monitoring. https://www.fiddler.ai/ml-monitoring
[13]
João Gama, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46, 4 (2014), 1--37.
[14]
Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, and Zachary Lipton. 2020. A Unified View of Label Shift Estimation. In NeurIPS.
[15]
Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. The Journal of Machine Learning Research 13, 1 (2012), 723--773.
[16]
Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust random cut forest based anomaly detection on streams. In ICML.
[17]
Michaela Hardt, Xiaoguang Chen, Xiaoyi Cheng, Michele Donini, Jason Gelman, Satish Gollaprolu, John He, Pedro Larroy, Xinyu Liu, Nick McCarthy, Ashish Rathi, Scott Rees, Ankit Siva, ErhYuan Tsai, Keerthan Vasist, Pinar Yilmaz, Muhammad Bilal Zafar, Sanjiv Das, Kevin Haas, Tyler Hill, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability in the Cloud. In KDD.
[18]
Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. In NeurIPS.
[19]
Jeremy Hermann and Mike Del Balso. 2017. Meet Michelangelo: Uber's Machine Learning Platform. https://eng.uber.com/michelangelo-machine-learningplatform/
[20]
IBM. 2022. Validating and monitoring AI models with Watson OpenScale. https://tinyurl.com/5yzybu44
[21]
Zohar Karnin, Kevin Lang, and Edo Liberty. 2016. Optimal quantile approximation in streams. In FOCS.
[22]
Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. 2018. Detecting and correcting for label shift with black box predictors. In ICML.
[23]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).
[24]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In NeurIPS.
[25]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS.
[26]
Kevin P Murphy. 2012. Machine learning: A probabilistic perspective. MIT press.
[27]
David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini, and Krishnaram Kenthapadi. 2021. Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models. arXiv preprint arXiv:2111.13657 (2021).
[28]
Sashank Reddi, Barnabas Poczos, and Alex Smola. 2015. Doubly robust covariate shift correction. In AAAI.
[29]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In KDD.
[30]
Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating large-scale data quality verification. In VLDB.
[31]
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. In NeurIPS.
[32]
Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification?. In China National Conf. on Chinese Computational Linguistics.
[33]
Ankur Taly, Kaz Sato, and Claudiu Gruia. 2021. Monitoring feature attributions: How Google saved one of the largest ML services in trouble. https://tinyurl.com/awt3f5ex Google Cloud Blog.
[34]
Alexey Tsymbal. 2004. The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin 106, 2 (2004), 58.
[35]
Larry Wasserman. 2004. All of statistics: A concise course in statistical inference. Vol. 26. Springer.
[36]
Yifan Wu, Ezra Winston, Divyansh Kaushik, and Zachary Lipton. 2019. Domain adaptation with asymmetrically-relaxed distribution alignment. In ICML.

Cited By

View all
  • (2024)A Pipeline for Monitoring and Maintaining a Text Classification Tool in ProductionAnais do LI Seminário Integrado de Software e Hardware (SEMISH 2024)10.5753/semish.2024.2438(133-144)Online publication date: 21-Jul-2024
  • (2024)A Review of Big Data Analytics and Artificial Intelligence in Industry 5.0 for Smart Decision-MakingHuman-Centered Approaches in Industry 5.010.4018/979-8-3693-2647-3.ch002(24-47)Online publication date: 16-Jan-2024
  • (2024)Insights on Implementing a Metrics Baseline for Post-Deployment AI Container MonitoringProceedings of the 2024 International Conference on Software and Systems Processes10.1145/3666015.3666018(46-55)Online publication date: 4-Sep-2024
  • Show More Cited By

Index Terms

  1. Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Check for updates

    Author Tags

    1. MLOps
    2. amazon sagemaker
    3. bias & fairness in ML
    4. drift detection
    5. feature attribution
    6. real-time model monitoring

    Qualifiers

    • Research-article

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)938
    • Downloads (Last 6 weeks)98
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Pipeline for Monitoring and Maintaining a Text Classification Tool in ProductionAnais do LI Seminário Integrado de Software e Hardware (SEMISH 2024)10.5753/semish.2024.2438(133-144)Online publication date: 21-Jul-2024
    • (2024)A Review of Big Data Analytics and Artificial Intelligence in Industry 5.0 for Smart Decision-MakingHuman-Centered Approaches in Industry 5.010.4018/979-8-3693-2647-3.ch002(24-47)Online publication date: 16-Jan-2024
    • (2024)Insights on Implementing a Metrics Baseline for Post-Deployment AI Container MonitoringProceedings of the 2024 International Conference on Software and Systems Processes10.1145/3666015.3666018(46-55)Online publication date: 4-Sep-2024
    • (2024)Towards Runtime Monitoring for Responsible Machine Learning using Model-driven EngineeringProceedings of the ACM/IEEE 27th International Conference on Model Driven Engineering Languages and Systems10.1145/3640310.3674092(195-202)Online publication date: 22-Sep-2024
    • (2024)Explanatory Model Monitoring to Understand the Effects of Feature Shifts on PerformanceProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671959(550-561)Online publication date: 25-Aug-2024
    • (2024)Navigating Data-Centric Artificial Intelligence With DC-Check: Advances, Challenges, and OpportunitiesIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33458055:6(2589-2603)Online publication date: Jun-2024
    • (2024)Enhancing well-being in modern education: A comprehensive eHealth proposal for managing stress and anxiety based on machine learningInternet of Things10.1016/j.iot.2023.10105525(101055)Online publication date: Apr-2024
    • (2024)Model driven engineering for machine learning componentsInformation and Software Technology10.1016/j.infsof.2024.107423169:COnline publication date: 2-Jul-2024
    • (2023)Multivariate Time-Series Forecasting: A Review of Deep Learning Methods in Internet of Things Applications to Smart CitiesSmart Cities10.3390/smartcities60501146:5(2519-2552)Online publication date: 23-Sep-2023
    • (2023)Online Data Drift Detection for Anomaly Detection Services based on Deep Learning towards Multivariate Time Series2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS)10.1109/QRS60937.2023.00011(1-11)Online publication date: 22-Oct-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media