[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3209889.3209897acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Accelerating Human-in-the-loop Machine Learning: Challenges and Opportunities

Published: 15 June 2018 Publication History

Abstract

Development of machine learning (ML) workflows is a tedious process of iterative experimentation: developers repeatedly make changes to workflows until the desired accuracy is attained. We describe our vision for a "human-in-the-loop" ML system that accelerates this process: by intelligently tracking changes and intermediate results over time, such a system can enable rapid iteration, quick responsive feedback, introspection and debugging, and background execution and automation. We finally describe Helix, our preliminary attempt at such a system that has already led to speedups of upto 10x on typical iterative workflows against competing systems.

References

[1]
M. Abadi et al. Tensorflow: A system for large-scale machine learning. In OSDI, volume 16, pages 265--283, 2016.
[2]
M. Armbrust et al. Spark sql: Relational data processing in spark. In SIGMOD, 2015.
[3]
A. Ghoting et al. Systemml: Declarative machine learning on mapreduce. In ICDE, 2011.
[4]
T. Kraska et al. Mlbase: A distributed machine-learning system. In CIDR, 2013.
[5]
X. Meng et al. Mllib: Machine learning in apache spark. JMLR, 2016.
[6]
H. Miao et al. On model discovery for hosted data science projects. In DEEM, 2017.
[7]
H. Miao et al. Towards unified data and lifecycle management for deep learning. In ICDE, pages 571--582. IEEE, 2017.
[8]
F. Pedregosa et al. Scikit-learn: Machine learning in python. JMLR, 2011.
[9]
E. R. Sparks et al. Tupaq: An efficient planner for large-scale predictive analytic queries. arXiv preprint arXiv:1502.00068, 2015.
[10]
E. R. Sparks et al. Keystoneml: Optimizing pipelines for large-scale advanced analytics. In ICDE, 2017.
[11]
T. van der Weide et al. Versioning for end-to-end machine learning pipelines. In DEEM, 2017.
[12]
M. Vartak et al. Supporting fast iteration in model building. In MPS Workshop LearningSys, 2015.
[13]
M. Vartak et al. Modeldb: a system for machine learning model management. In HILDA, page 14. ACM, 2016.
[14]
D. Xin et al. Helix: Holistic optimization for accelerating iterative machine learning. Technical Report http://data-people.cs.illinois.edu/helix-tr.pdf, 2018.
[15]
M. Zaharia et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012.
[16]
C. Zhang. DeepDive: a data management system for automatic knowledge base construction. PhD thesis, The University of Wisconsin-Madison, 2015.
[17]
C. Zhang, A. Kumar, and C. Re. Materialization optimizations for feature selection workloads. ACM Trans. Database Syst., 2016.

Cited By

View all
  • (2025)Data-Driven Approach for a Continuous Information Flow in a Closed-Loop Supply ChainSustainable Manufacturing as a Driver for Growth10.1007/978-3-031-77429-4_56(509-515)Online publication date: 7-Jan-2025
  • (2024)Research evolution of metal organic frameworks: A scientometric approach with human-in-the-loopJournal of Data and Information Science10.2478/jdis-2024-00199:3(44-64)Online publication date: 19-Jul-2024
  • (2024)Application of Convolutional Neural Network for Seismic Event Classification: Impact of Dataset Quality, Distribution, and Human-in-the-Loop FeedbackBulletin of the Seismological Society of America10.1785/0120240179Online publication date: 31-Dec-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DEEM'18: Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning
June 2018
63 pages
ISBN:9781450358286
DOI:10.1145/3209889
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • National Science Foundation Graduate Research Fellowship Program

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

DEEM'18 Paper Acceptance Rate 10 of 16 submissions, 63%;
Overall Acceptance Rate 44 of 67 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)339
  • Downloads (Last 6 weeks)26
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Data-Driven Approach for a Continuous Information Flow in a Closed-Loop Supply ChainSustainable Manufacturing as a Driver for Growth10.1007/978-3-031-77429-4_56(509-515)Online publication date: 7-Jan-2025
  • (2024)Research evolution of metal organic frameworks: A scientometric approach with human-in-the-loopJournal of Data and Information Science10.2478/jdis-2024-00199:3(44-64)Online publication date: 19-Jul-2024
  • (2024)Application of Convolutional Neural Network for Seismic Event Classification: Impact of Dataset Quality, Distribution, and Human-in-the-Loop FeedbackBulletin of the Seismological Society of America10.1785/0120240179Online publication date: 31-Dec-2024
  • (2024)Systematic review using a spiral approach with machine learningSystematic Reviews10.1186/s13643-023-02421-z13:1Online publication date: 17-Jan-2024
  • (2024)Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps ApproachACM Computing Surveys10.1145/370849757:5(1-35)Online publication date: 18-Dec-2024
  • (2024)A Caching-based Framework for Scalable Temporal Graph Neural Network TrainingACM Transactions on Database Systems10.1145/370589450:1(1-46)Online publication date: 25-Nov-2024
  • (2024)Optimizing Data Analytics Workflows through User-driven ExperimentationProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644971(253-255)Online publication date: 14-Apr-2024
  • (2024)A Meta-Bayesian Approach for Rapid Online Parametric Optimization for Wrist-based InteractionsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642071(1-38)Online publication date: 11-May-2024
  • (2024)SC-AIRL: Share-Critic in Adversarial Inverse Reinforcement Learning for Long-Horizon TaskIEEE Robotics and Automation Letters10.1109/LRA.2024.33660239:4(3179-3186)Online publication date: Apr-2024
  • (2024)Exploring Skin Potential Signals in Electrodermal Activity: Identifying Key Features for Attention State DifferentiationIEEE Access10.1109/ACCESS.2024.340693212(100832-100847)Online publication date: 2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media