[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3493700.3493768acmconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
tutorial

End-to-end Machine Learning using Kubeflow

Published: 08 January 2022 Publication History

Abstract

Usually data scientists are adept in deriving valuable insights from data by applying appropriate machine learning algorithms. However, data scientists are usually not skilled in developing or operating production level software which is the domain of ML Operators. In order to move from initial experiments to production grade systems, the code needs to run at scale on large, realistic data sets, and be able to run on both on-premise equipment as well as on public clouds. Additionally, the entire process needs to be part of some Software Development Lifecycle (SDLC), accounting for some flavour of continuous integration/continuous development (CICD).
In this tutorial, attendees will learn about the components of an end-to-end ML system, and will get hands-on experience on model training, hyperparameter tuning, and model deployment. The tutorial will be based on Kubeflow, a widely used open-source (Apache License 2.0) machine learning toolkit for Kubernetes. The related code and examples can be accessed from a public github repository.

References

[1]
[1] Amazon Elastic Kubernetes Service 2021. https://aws.amazon.com/eks/
[2]
[2] Azure Kubernetes Service 2021. https://azure.microsoft.com/en-us/services/kubernetes-service/
[3]
[3] Cisco Kubeflow Starter Pack 2020. https://github.com/CiscoAI/cisco-kubeflow-starter-pack
[4]
[4] Cloud Native Computing Foundation 2021. https://www.cncf.io/
[5]
Johnu George, Ce Gao, Richard Liu, Hou Gang Liu, Yuan Tang, Ramdoot Pydipaty, and Amit Kumar Saha. 2020. A Scalable and Cloud-Native Hyperparameter Tuning System. CoRR abs/2006.02085(2020). arxiv:2006.02085https://arxiv.org/abs/2006.02085
[6]
[6] Google Kubernetes Engine 2021. https://cloud.google.com/kubernetes-engine/
[7]
[7] Help! My Data Scientists Can’t Write (Production) Code 2019. https://insidebigdata.com/2019/08/13/help-my-data-scientists-cant-write-production-code/
[8]
[8] Introduction to Katib 2021. https://www.kubeflow.org/docs/components/katib/overview/
[9]
Kubeflow 2021. The Machine Learning Toolkit for Kubernetes. https://www.kubeflow.org/
[10]
Kubeflow Webinar 2020. Taming your AI/ML workloads with Kubeflow – The journey to Version 1.0. https://www.cncf.io/online-programs/taming-your-ai-ml-workloads-with-kubeflow-the-journey-to-version-1-0/
[11]
[11] Kubernetes: Production-Grade Container Orchestration 2021. https://kubernetes.io/
[12]
Meraki Vision 2021. Cloud Managed Smart Cameras, Cisco Meraki. https://meraki.cisco.com/products/smart-cameras/
[13]
[13] MXNet: A Scalable Deep Learning Framework 2021. https://mxnet.apache.org/
[14]
[14] PyTorch: a deep learning framework for fast, flexible experimentation 2021. https://pytorch.org/
[15]
[15] Scikit-learn: Machine Learning in Python 2021. https://scikit-learn.org
[16]
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden Technical Debt in Machine Learning Systems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS’15). MIT Press, Cambridge, MA, USA, 2503–2511. http://dl.acm.org/citation.cfm?id=2969442.2969519
[17]
[17] Tensor Processing Unit 2021. https://cloud.google.com/tpu/docs/tpus
[18]
[18] TensorFlow: An open source machine learning framework for everyone 2021. https://www.tensorflow.org/
[19]
Jinan Zhou, Andrey Velichkevich, Kirill Prosvirov, Anubhav Garg, Yuji Oshima, and Debo Dutta. 2019. Katib: A Distributed General AutoML Platform on Kubernetes. In 2019 USENIX Conference on Operational Machine Learning (OpML 19). USENIX Association, Santa Clara, CA, 55–57. https://www.usenix.org/conference/opml19/presentation/zhou

Cited By

View all
  • (2024)PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI TrustworthinessProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648585(20-27)Online publication date: 16-Apr-2024
  • (2024)An MLOps Framework to Data-Driven Modelling of Digital Twins with an Application to Virtual Test RigsAdvances in Conceptual Modeling10.1007/978-3-031-75599-6_5(71-86)Online publication date: 26-Oct-2024
  • (2023)Neural Rendering in the Cloud with Tensor Processing Unit2023 IEEE XXX International Conference on Electronics, Electrical Engineering and Computing (INTERCON)10.1109/INTERCON59652.2023.10326073(1-7)Online publication date: 2-Nov-2023
  • Show More Cited By

Index Terms

  1. End-to-end Machine Learning using Kubeflow
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CODS-COMAD '22: Proceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)
      January 2022
      357 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 January 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Kubeflow
      2. ML Operations
      3. MLOps
      4. Machine Learning

      Qualifiers

      • Tutorial
      • Research
      • Refereed limited

      Conference

      CODS-COMAD 2022
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)120
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)PARMA: a Platform Architecture to enable Automated, Reproducible, and Multi-party Assessments of AI TrustworthinessProceedings of the 2nd International Workshop on Responsible AI Engineering10.1145/3643691.3648585(20-27)Online publication date: 16-Apr-2024
      • (2024)An MLOps Framework to Data-Driven Modelling of Digital Twins with an Application to Virtual Test RigsAdvances in Conceptual Modeling10.1007/978-3-031-75599-6_5(71-86)Online publication date: 26-Oct-2024
      • (2023)Neural Rendering in the Cloud with Tensor Processing Unit2023 IEEE XXX International Conference on Electronics, Electrical Engineering and Computing (INTERCON)10.1109/INTERCON59652.2023.10326073(1-7)Online publication date: 2-Nov-2023
      • (2023)The Robustness of Machine Learning Models Using MLSecOps: A Case Study On Delivery Service Forecasting2023 14th International Conference on Information & Communication Technology and System (ICTS)10.1109/ICTS58770.2023.10330833(265-270)Online publication date: 4-Oct-2023
      • (2023)MLOps Challenges in Industry 4.0SN Computer Science10.1007/s42979-023-02282-24:6Online publication date: 28-Oct-2023
      • (2023)From Source Code to Model Service: A Framework’s PerspectiveAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-031-20738-9_147(1355-1362)Online publication date: 30-Jan-2023
      • (2022)MLOpsProceedings of the 1st Workshop on Software Engineering for Responsible AI10.1145/3526073.3527591(45-49)Online publication date: 19-May-2022

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media