[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3447548.3467119acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Clockwork: A Delay-Based Global Scheduling Framework for More Consistent Landing Times in the Data Warehouse

Published: 14 August 2021 Publication History

Abstract

Recurring batch data pipelines are a staple of the modern enterprise-scale data warehouse. As a data warehouse scales to support more products and services, a growing number of interdependent pipelines running at various cadences can give rise to periodic resource bottlenecks for the cluster. This resource contention results in pipelines starting at unpredictable times each day and consequently variable landing times for the data artifacts they produce. The variability gets compounded by the dependency structure of the workload, and the resulting unpredictability can disrupt the project workstreams which consume this data. We present Clockwork, a delay-based global scheduling framework for data pipelines which improves landing time stability by spreading out tasks throughout the day. Whereas most scheduling algorithms optimize for makespan or average job completion times, Clockwork's execution plan optimizes for stability in task completion times while also targeting predefined pipeline SLOs. We present this new problem formulation and design a list scheduling algorithm based on its analytic properties. We also discuss how we estimate the resource requirements for our recurring pipelines, and the architecture for integrating Clockwork with Dataswarm, Facebook's existing data workflow management service. Online experiments comparing this novel scheduling algorithm and a previously proposed greedy procrastinating heuristic show tasks complete almost an hour earlier on average, while exhibiting lower landing time variance and producing significantly less competition for resources in the cluster.

References

[1]
Mainak Adhikari, Tarachand Amgoth, and Satish Narayana Srirama. 2019. A Survey on Scheduling Strategies for Workflows in Cloud Environment and Emerging Trends. ACM Comput. Surv., Vol. 52, 4, Article 68 (Aug. 2019), 36 pages. https://doi.org/10.1145/3325097
[2]
Apache Software Foundation. 2020. Airflow. https://airflow.apache.org/
[3]
Anne Benoit, Ümit V. cCatalyürek, Yves Robert, and Erik Saule. 2013. A Survey of Pipelined Workflow Scheduling: Models and Algorithms. ACM Comput. Surv., Vol. 45, 4, Article 50 (Aug. 2013), 36 pages. https://doi.org/10.1145/2501654.2501664
[4]
Peter Brucker. 2010. Scheduling Algorithms 5th ed.). Springer Publishing Company, Incorporated.
[5]
Carlo Curino, Djellel E. Difallah, Chris Douglas, Subru Krishnan, Raghu Ramakrishnan, and Sriram Rao. 2014. Reservation-Based Scheduling: If You're Late Don't Blame Us!. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). Association for Computing Machinery, New York, NY, USA, 1--14. https://doi.org/10.1145/2670979.2670981
[6]
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-Resource Packing for Cluster Schedulers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). Association for Computing Machinery, New York, NY, USA, 455--466. https://doi.org/10.1145/2619239.2626334
[7]
Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. Graphene: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 81--97.
[8]
David Hand. 2006. Classifier Technology and the Illusion of Progress. Statist. Sci., Vol. 21 (02 2006), 1--14. https://doi.org/10.1214/088342306000000060
[9]
Alexey Ilyushkin and Dick Epema. 2018. The Impact of Task Runtime Estimate Accuracy on Scheduling Workloads of Workflows. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '18). IEEE Press, 331--341. https://doi.org/10.1109/CCGRID.2018.00048
[10]
Alexey Ilyushkin, Bogdan Ghit, and Dick Epema. 2015. Scheduling Workloads of Workflows with Unknown Task Runtimes. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGRID '15). IEEE Press, 606--616. https://doi.org/10.1109/CCGrid.2015.27
[11]
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Í nigo Goiri, Subru Krishnan, Janardhan Kulkarni, and Sriram Rao. 2016. Morpheus: Towards Automated SLOs for Enterprise Clusters. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, USA, 117--134.
[12]
Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. Int. J. Data Sci. Anal., Vol. 6, 2 (2018), 81--107. https://doi.org/10.1007/s41060-018-0107-0
[13]
Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. 2016. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks (HotNets '16). Association for Computing Machinery, New York, NY, USA, 50--56. https://doi.org/10.1145/3005745.3005750
[14]
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning Scheduling Algorithms for Data Processing Clusters. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM '19). Association for Computing Machinery, New York, NY, USA, 270--288. https://doi.org/10.1145/3341302.3342080
[15]
Bilkisu L. Muhammad-Bello and Masayoshi Aritsugi. 2017. Robust Deadline-Constrained Resource Provisioning and Workflow Scheduling Algorithm for Handling Performance Uncertainty in IaaS Clouds. In Companion Proceedings of The10th International Conference on Utility and Cloud Computing (UCC '17 Companion). Association for Computing Machinery, New York, NY, USA, 29--34. https://doi.org/10.1145/3147234.3148110
[16]
Michael L. Pinedo. 2016. Scheduling: Theory, Algorithms, and Systems 5th ed.). Springer Publishing Company, Incorporated.
[17]
Cynthia Rudin and Joanna Radin. 2019. Why Are We Using Black Box Models in AI When We Don't Need To? A Lesson From An Explainable AI Competition. Harvard Data Science Review, Vol. 1, 2 (2019). https://doi.org/10.1162/99608f92.5a8a3a3d
[18]
Subhash Sarin, Hanif Sherali, and Lingrui Liao. 2014. Minimizing conditional-value-at-risk for stochastic scheduling problems. Journal of Scheduling, Vol. 17 (02 2014). https://doi.org/10.1007/s10951-013-0349--6
[19]
Subhash C. Sarin, Balaji Nagarajan, and Lingrui Liao. 2010. Stochastic Scheduling: Expectation-Variance Analysis of a Schedule .Cambridge University Press. https://doi.org/10.1017/CBO9780511778032
[20]
Martin Skutella and Marc Uetz. 2005. Stochastic machine scheduling with precedence constraints. SIAM J. Comput., Vol. 34, 4 (2005), 788--802.
[21]
Mike Starr. 2014. Dataswarm. PyData Silicon Valley 2014. https://pyvideo.org/pydata-silicon-valley-2014/mike-starr-dataswarm.html.
[22]
Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. 2010. Data Warehousing and Analytics Infrastructure at Facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 1013----1020. https://doi.org/10.1145/1807167.1807278
[23]
Shivaram Venkataraman, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber. 2013. Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). Association for Computing Machinery, New York, NY, USA, 197--210. https://doi.org/10.1145/2465351.2465371
[24]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM, Vol. 59, 11 (Oct. 2016), 56--65. https://doi.org/10.1145/2934664

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2021

Check for updates

Author Tags

  1. completion time stability
  2. data pipeline scheduling
  3. data warehouse management
  4. delay-based scheduling
  5. global cluster scheduling
  6. systems data science

Qualifiers

  • Research-article

Conference

KDD '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 656
    Total Downloads
  • Downloads (Last 12 months)133
  • Downloads (Last 6 weeks)14
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media