More Web Proxy on the site http://driver.im/

research-article

Open access

Clockwork: A Delay-Based Global Scheduling Framework for More Consistent Landing Times in the Data Warehouse

Authors:

Martin Valdez-Vivas,

Josh MetzlerAuthors Info & Claims

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 3627 - 3637

https://doi.org/10.1145/3447548.3467119

Published: 14 August 2021 Publication History

Abstract

Recurring batch data pipelines are a staple of the modern enterprise-scale data warehouse. As a data warehouse scales to support more products and services, a growing number of interdependent pipelines running at various cadences can give rise to periodic resource bottlenecks for the cluster. This resource contention results in pipelines starting at unpredictable times each day and consequently variable landing times for the data artifacts they produce. The variability gets compounded by the dependency structure of the workload, and the resulting unpredictability can disrupt the project workstreams which consume this data. We present Clockwork, a delay-based global scheduling framework for data pipelines which improves landing time stability by spreading out tasks throughout the day. Whereas most scheduling algorithms optimize for makespan or average job completion times, Clockwork's execution plan optimizes for stability in task completion times while also targeting predefined pipeline SLOs. We present this new problem formulation and design a list scheduling algorithm based on its analytic properties. We also discuss how we estimate the resource requirements for our recurring pipelines, and the architecture for integrating Clockwork with Dataswarm, Facebook's existing data workflow management service. Online experiments comparing this novel scheduling algorithm and a previously proposed greedy procrastinating heuristic show tasks complete almost an hour earlier on average, while exhibiting lower landing time variance and producing significantly less competition for resources in the cluster.

References

[1]

Mainak Adhikari, Tarachand Amgoth, and Satish Narayana Srirama. 2019. A Survey on Scheduling Strategies for Workflows in Cloud Environment and Emerging Trends. ACM Comput. Surv., Vol. 52, 4, Article 68 (Aug. 2019), 36 pages. https://doi.org/10.1145/3325097

Digital Library

[2]

Apache Software Foundation. 2020. Airflow. https://airflow.apache.org/

[3]

Anne Benoit, Ümit V. cCatalyürek, Yves Robert, and Erik Saule. 2013. A Survey of Pipelined Workflow Scheduling: Models and Algorithms. ACM Comput. Surv., Vol. 45, 4, Article 50 (Aug. 2013), 36 pages. https://doi.org/10.1145/2501654.2501664

Digital Library

[4]

Peter Brucker. 2010. Scheduling Algorithms 5th ed.). Springer Publishing Company, Incorporated.

[5]

Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-Based Scheduling: If You're Late Don't Blame Us!. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/2670979.2670981

Digital Library

[6]

Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-Resource Packing for Cluster Schedulers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). Association for Computing Machinery, New York, NY, USA, 455--466. https://doi.org/10.1145/2619239.2626334

Digital Library

[7]

Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. Graphene: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 81--97.

Digital Library

[8]

David Hand. 2006. Classifier Technology and the Illusion of Progress. Statist. Sci., Vol. 21 (02 2006), 1--14. https://doi.org/10.1214/088342306000000060

[9]

Alexey Ilyushkin and Dick Epema. 2018. The Impact of Task Runtime Estimate Accuracy on Scheduling Workloads of Workflows. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '18). IEEE Press, 331--341. https://doi.org/10.1109/CCGRID.2018.00048

Digital Library

[10]

Alexey Ilyushkin, Bogdan Ghit, and Dick Epema. 2015. Scheduling Workloads of Workflows with Unknown Task Runtimes. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGRID '15). IEEE Press, 606--616. https://doi.org/10.1109/CCGrid.2015.27

Digital Library

[11]

Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Í nigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards Automated SLOs for Enterprise Clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 117--134.

[12]

Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. Int. J. Data Sci. Anal., Vol. 6, 2 (2018), 81--107. https://doi.org/10.1007/s41060-018-0107-0

[13]

Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (HotNets '16). Association for Computing Machinery, New York, NY, USA, 50--56. https://doi.org/10.1145/3005745.3005750

Digital Library

[14]

Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '19). Association for Computing Machinery, New York, NY, USA, 270--288. https://doi.org/10.1145/3341302.3342080

Digital Library

[15]

Bilkisu L. Muhammad-Bello and Masayoshi Aritsugi. 2017. Robust Deadline-Constrained Resource Provisioning and Workflow Scheduling Algorithm for Handling Performance Uncertainty in IaaS Clouds. In Companion Proceedings of The10th International Conference on Utility and Cloud Computing (UCC '17 Companion). Association for Computing Machinery, New York, NY, USA, 29--34. https://doi.org/10.1145/3147234.3148110

Digital Library

[16]

Michael L. Pinedo. 2016. Scheduling: Theory, Algorithms, and Systems 5th ed.). Springer Publishing Company, Incorporated.

[17]

Cynthia Rudin and Joanna Radin. 2019. Why Are We Using Black Box Models in AI When We Don't Need To? A Lesson From An Explainable AI Competition. Harvard Data Science Review, Vol. 1, 2 (2019). https://doi.org/10.1162/99608f92.5a8a3a3d

[18]

Subhash Sarin, Hanif Sherali, and Lingrui Liao. 2014. Minimizing conditional-value-at-risk for stochastic scheduling problems. Journal of Scheduling, Vol. 17 (02 2014). https://doi.org/10.1007/s10951-013-0349--6

[19]

Subhash C. Sarin, Balaji Nagarajan, and Lingrui Liao. 2010. Stochastic Scheduling: Expectation-Variance Analysis of a Schedule .Cambridge University Press. https://doi.org/10.1017/CBO9780511778032

[20]

Martin Skutella and Marc Uetz. 2005. Stochastic machine scheduling with precedence constraints. SIAM J. Comput., Vol. 34, 4 (2005), 788--802.

Digital Library

[21]

Mike Starr. 2014. Dataswarm. PyData Silicon Valley 2014. https://pyvideo.org/pydata-silicon-valley-2014/mike-starr-dataswarm.html.

[22]

Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. 2010. Data Warehousing and Analytics Infrastructure at Facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 1013----1020. https://doi.org/10.1145/1807167.1807278

Digital Library

[23]

Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber. 2013. Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). Association for Computing Machinery, New York, NY, USA, 197--210. https://doi.org/10.1145/2465351.2465371

Digital Library

[24]

Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM, Vol. 59, 11 (Oct. 2016), 56--65. https://doi.org/10.1145/2934664

Digital Library

Index Terms

Clockwork: A Delay-Based Global Scheduling Framework for More Consistent Landing Times in the Data Warehouse
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
    2. Software system structures
      1. Abstraction, modeling and modularity
      2. Distributed systems organizing principles
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms
  2. Theory and algorithms for application domains

Recommendations

Global EDF-based scheduling with laxity-driven priority promotion

This paper presents an algorithm, called Earliest Deadline Critical Laxity (EDCL), for scheduling sporadic task systems on multiprocessors. EDCL is a derivative of the Earliest Deadline Zero Laxity (EDZL) algorithm. Each job is assigned a priority based ...
Parallel Machine Scheduling with Batch Setup Times

<P>We consider a problem of scheduling several batches of jobs on two identical parallel machines to minimize the total completion time of jobs. A setup time is incurred whenever there is a switch from processing a job in one batch to a job in another ...
Single-machine scheduling with past-sequence-dependent delivery times and release times

This paper studies the problem of single-machine scheduling with past-sequence-dependent delivery times, which was introduced in Koulamas and Kyparisis (2010) [5]. We focus on the scenario with release times such that any job is available for processing ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

August 2021

4259 pages

ISBN:9781450383325

DOI:10.1145/3447548

General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '21

Sponsor:

KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 14 - 18, 2021

Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
656
Total Downloads

Downloads (Last 12 months)133
Downloads (Last 6 weeks)14

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents