More Web Proxy on the site http://driver.im/

research-article

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems

Authors:

Hong XuAuthors Info & Claims

IEEE/ACM Transactions on Networking, Volume 29, Issue 2

Pages 681 - 694

https://doi.org/10.1109/TNET.2021.3050927

Published: 15 April 2021 Publication History

Abstract

Modern data analytics systems use long-running executors to run an application’s entire DAG. Executors exhibit salient time-varying resource requirements. Yet, existing schedulers simply reserve resources for executors statically, and use the peak resource demand to guide executor placement. This leads to low utilization and poor application performance. We present Elasecutor, a novel executor scheduler for data analytics systems. Elasecutor dynamically allocates and explicitly sizes resources to executors over time according to the predicted time/varying resource demands. Rather than placing executors using their peak demand, Elasecutor strategically assigns them to machines based on a concept called <italic>dominant remaining resource</italic> to minimize resource fragmentation. Elasecutor further adaptively reprovisions resources in order to tolerate inaccurate demand prediction and reschedules tasks to deal with inadequate reprovisioning resources on one machine. Testbed evaluation on a 35-node cluster with our Spark-based prototype implementation shows that Elasecutor reduces makespan by more than 36% on average, and improves cluster utilization by up to 55% compared to existing work.

References

[1]

Apache Flink. Accessed: May 16, 2018. [Online]. Available: http://flink.apache.org

[2]

Apache Hadoop. Accessed: May 16, 2018. [Online]. Available: http://hadoop.apache.org

[3]

Apache Spark. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org

[4]

Apache Storm. Accessed: May 16, 2018. [Online]. Available: http://storm.apache.org

[5]

Apache Tez. Accessed: May 16, 2018. [Online]. Available: http://tez.apache.org

[6]

Capacity Scheduler. Accessed: May 16, 2018. [Online]. Available: http://bit.ly/1tGpbDN

[7]

Cluster Mode Overview. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/cluster-overview.html

[8]

Elasecutor. Accessed: May 16, 2018. [Online]. Available: https://github.com/NetX-lab/Elasecutor

[9]

Fair Scheduler. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/job-scheduling.html#fair-scheduler-pools

[10]

HiBench. Accessed: May 16, 2018. [Online]. Available: https://github.com/intel-hadoop/HiBench

[11]

How-to: Tune Your Apache Spark Jobs. Accessed: May 16, 2018. [Online]. Available: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2

[12]

Kubernetes. Accessed: May 16, 2018. [Online]. Available: https://kubernetes.io/

[13]

OpenJDK. Accessed: May 16, 2018. [Online]. Available: http://openjdk.java.net

[14]

Resource Allocation Policy in Spark 2.1.0. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/job-scheduling.html#resource-allocation-policy

[15]

Spark Configuration. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/configuration.html

[16]

TPC-H Benchmark. Accessed: May 28, 2020. [Online]. Available: https://github.com/cartershanklin/hive-testbench/tree/master/sample-queries-tpch

[17]

S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou, “Re-optimizing data parallel computing,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2012, pp. 281–294.

[18]

M. K. Aguileraet al., “Remote memory in the age of fast networks,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2017, pp. 121–127.

[19]

O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang, “CherryPick: Adaptively unearthing the best cloud configurations for big data analytics,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2017, pp. 469–482.

[20]

N. Bansal and A. Khan, “Improved approximation algorithm for two-dimensional bin packing,” in Proc. 25th Annu. ACM-SIAM Symp. Discrete Algorithms, Jan. 2014, pp. 13–25.

[21]

J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, “Algorithms for hyper-parameter optimization,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2011, pp. 2546–2554.

[22]

E. Boutinet al., “Apollo: Scalable and coordinated scheduling for cloud-scale computing,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2014, pp. 285–300.

[23]

M. Chowdhury and I. Stoica, “Efficient coflow scheduling without prior knowledge,” in Proc. ACM Conf. Special Interest Group Data Commun., Aug. 2015, pp. 393–406.

[24]

C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan, and S. Rao, “Reservation-based Scheduling: If you’re late don’t blame us!,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2014, pp. 1–14.

[25]

J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. Symp. Operating Syst. Design Implement. (OSDI), 2004, pp. 1–13.

[26]

C. Delimitrou and C. Kozyrakis, “Paragon: QoS-aware scheduling for heterogeneous datacenters,” in Proc. ACM ASPLOS, 2013, pp. 1–25.

[27]

C. Delimitrou and C. Kozyrakis, “Quasar: Resource-efficient and QoS-aware cluster management,” in Proc. ACM ASPLOS, 2014, pp. 1–17.

[28]

H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 1996, pp. 155–161.

[29]

A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca, “Jockey: Guaranteed job latency in data parallel clusters,” in Proc. 7th ACM Eur. Conf. Comput. Syst. (EuroSys), 2012, pp. 99–112.

[30]

P. Garefalakis, K. Karanasos, P. Pietzuch, A. Suresh, and S. Rao, “MEDEA: Scheduling of long running applications in shared production clusters,” in Proc. 13th EuroSys Conf., Apr. 2018, pp. 1–13.

[31]

M. R. Garey, R. L. Graham, and J. D. Ullman, “Worst-case analysis of memory allocation algorithms,” in Proc. 4th Annu. ACM Symp. Theory Comput. (STOC), 1972, pp. 143–150.

[32]

A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, “Dominant resource fairness: Fair allocation of multiple resource types,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2011, p. 24.

[33]

R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella, “Multi-resource packing for cluster schedulers,” in Proc. ACM Conf. SIGCOMM, Aug. 2014, pp. 455–466.

[34]

R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan, “Altruistic scheduling in multi-resource clusters,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2016, pp. 65–80.

[35]

R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni, “Graphene: Packing and dependency-aware scheduling for data-parallel clusters,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2016, pp. 81–97.

[36]

M. Grzegorzet al., “Pregel: A system for large-scale graph processing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 135–146.

[37]

B. Hindmanet al., “Mesos: A platform for fine-grained resource sharing in the data center,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2011, p. 22.

[38]

C. Iorgulescu, F. Dinu, A. Raza, W. U. Hassan, and W. Zwaenepoel, “Don’t cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2017, pp. 97–109.

[39]

M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential building blocks,” in Proc. ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst. (EuroSys), 2007, pp. 59–72.

[40]

M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg, “Quincy: Fair scheduling for distributed computing clusters,” in Proc. ACM SIGOPS 22nd Symp. Operating Syst. Princ. (SOSP), 2009, pp. 261–276.

[41]

E. G. Joseph, S. X. Reynold, D. Ankur, C. Daniel, J. F. Michael, and S. Ion, “GraphX: Graph processing in a distributed dataflow framework,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2014, pp. 599–613.

[42]

K. Karanasoset al., “Mercury: Hybrid centralized and distributed scheduling in large shared clusters,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2015, pp. 485–497.

[43]

S. Kimet al., “GPUnet: Networking abstractions for GPU programs,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2014, pp. 201–216.

[44]

R. Kondor and T. Jebara, “A kernel between sets of vectors,” in Proc. Int. Conf. Mach. Learn. (ICML), 2003, pp. 361–368.

[45]

T. Koponenet al., “Onix: A distributed control platform for large-scale production networks,” in Proc. OSDI, 2010, pp. 1–6.

[46]

M. Kornackeret al., “Impala: A modern, open-source SQL engine for Hadoop,” in Proc. CIDR, 2015, p. 9.

[47]

M. Korupolu, A. Singh, and B. Bamba, “Coupled placement in modern data centers,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. (IPTPS), May 2009, pp. 1–12.

[48]

A. Kuzmanovska, R. H. Mak, and D. Epema, “KOALA-F: A resource manager for scheduling frameworks in clusters,” in Proc. 16th IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput. (CCGrid), May 2016, pp. 80–89.

[49]

B. Liet al., “ClickNP: Highly flexible and high performance network processing with reconfigurable hardware,” in Proc. ACM SIGCOMM Conf., Aug. 2016, pp. 1–14.

[50]

L. Liu and H. Xu, “Elasecutor: Elastic executor scheduling in data analytics systems,” in Proc. ACM Symp. Cloud Comput., Oct. 2018, pp. 107–120.

[51]

Y. Lu, A. Chowdhery, and S. Kandula, “Optasia: A relational platform for efficient large-scale video analytics,” in Proc. 7th ACM Symp. Cloud Comput., Oct. 2016, pp. 57–70.

[52]

H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proc. 15th ACM Workshop Hot Topics Netw., Nov. 2016, pp. 50–56.

[53]

M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized streams: Fault-tolerant streaming computation at scale,” in Proc. 24th ACM Symp. Operating Syst. Princ., Nov. 2013, pp. 423–438.

[54]

C. Michele, R. Yan, and L. Zheng, “Adaptive kernel approximation for large-scale non-linear SVM prediction,” in Proc. Int. Conf. Mach. Learn. (ICML), 2011, pp. 409–416.

[55]

K. Morton, M. Balazinska, and D. Grossman, “ParaTimer: A progress indicator for MapReduce DAGs,” in Proc. Int. Conf. Manage. Data (SIGMOD), 2010, pp. 507–518.

[56]

K. Ousterhout, C. Canel, S. Ratnasamy, and S. Shenker, “Monotasks: Architecting for performance clarity in data analytics frameworks,” in Proc. 26th Symp. Operating Syst. Princ., Oct. 2017, pp. 184–200.

[57]

K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun, “Making sense of performance in data analytics frameworks,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2015, pp. 293–307.

[58]

K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: Distributed, low latency scheduling,” in Proc. 24th ACM Symp. Operating Syst. Princ., Nov. 2013, pp. 69–84.

[59]

J. Rasley, K. Karanasos, S. Kandula, R. Fonseca, M. Vojnovic, and S. Rao, “Efficient queue management for cluster scheduling,” in Proc. ACM Eur. Conf. Comput. Syst. (EuroSys), 2016, pp. 1–15.

[60]

A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter) network,” in Proc. ACM Conf. Special Interest Group Data Commun., Aug. 2015, pp. 123–137.

[61]

B. SchOikopr, P. Bartlett, A. Smola, and R. Williamson, “Shrinking the tube: A new support vector regression algorithm,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 1999, pp. 330–336.

[62]

M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, “Omega: Flexible, scalable schedulers for large compute clusters,” in Proc. 8th ACM Eur. Conf. Comput. Syst. (EuroSys), 2013, pp. 351–364.

[63]

A. Singhet al., “Jupiter rising: A decade of Clos topologies and centralized control in Google’s datacenter network,” in Proc. ACM SIGCOMM, 2015, pp. 1–15.

[64]

A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statist. Comput., vol. 14, no. 3, pp. 199–222, 2004.

[65]

J. Son, Y. Xiong, K. Tan, P. Wang, Z. Gan, and S. Moon, “Protego: Cloud-scale multitenant IPsec gateway,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2017, pp. 473–485.

[66]

A. Toshniwalet al., “Storm@Twitter,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2014, pp. 147–156.

[67]

A. Toshniwalet al., “Morpheus: Towards automated SLOs for enterprise clusters,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2016, pp. 117–134.

[68]

V. K. Vavilapalliet al., “Apache Hadoop YARN: Yet another resource negotiator,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2013, pp. 1–16.

[69]

S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica, “Ernest: Efficient performance prediction for large-scale advanced analytics,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2016, pp. 363–378.

[70]

A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster Management at Google with Borg,” in Proc. ACM Eur. Conf. Comput. Syst. (EuroSys), 2015, pp. 1–17.

[71]

J. Wang and M. Balazinska, “Elastic memory management for cloud data analytics,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2017, pp. 745–758.

[72]

M. Weimeret al., “REEF: Retainable evaluator execution framework,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, May 2015, pp. 1343–1355.

[73]

G. J. Woeginger, “There is no asymptotic PTAS for two-dimensional vector packing,” Inf. Process. Lett., vol. 64, no. 6, pp. 293–297, Dec. 1997.

[74]

W. Xia, H. Jiang, D. Feng, and Y. Hua, “SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2011, pp. 26–30.

[75]

D. Xie, N. Ding, Y. C. Hu, and R. Kompella, “The only constant is change: Incorporating time-varying network reservations in data centers,” in Proc. ACM SIGCOMM, 2012, pp. 199–210.

[76]

G. Xu and C.-Z. Xu, “Prometheus: Online estimation of optimal memory demands for workers in in-memory distributed computation,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2017, p. 655.

[77]

G. Xu, C.-Z. Xu, and S. Jiang, “Prophet: Scheduling executors with time-varying resource demands on data-parallel computation frameworks,” in Proc. IEEE Int. Conf. Autonomic Comput. (ICAC), Jul. 2016, pp. 45–54.

[78]

Y. Yan, Y. Gao, Y. Chen, Z. Guo, B. Chen, and T. Moscibroda, “TR-spark: Transient computing for big data analytics,” in Proc. 7th ACM Symp. Cloud Comput., Oct. 2016, pp. 484–496.

[79]

M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling,” in Proc. ACM Eur. Conf. Comput. Syst. (EuroSys), 2010, pp. 265–278.

[80]

M. Zahariaet al., “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2012, pp. 15–28.

[81]

H. Zhang, L. Stafman, A. Or, and M. J. Freedman, “SLAQ: Quality-driven scheduling for distributed machine learning,” in Proc. Symp. Cloud Comput., Sep. 2017, pp. 390–404.

Cited By

Li HLuo WXie WYe HDuan X(2024)Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid AlgorithmsJournal of Grid Computing10.1007/s10723-024-09756-422:1Online publication date: 9-Mar-2024
https://dl.acm.org/doi/10.1007/s10723-024-09756-4
Liu LLi JNiu ZZhang WXue JXu H(2023)Efficient Time-Series Data Delivery in IoT With XenderIEEE Transactions on Mobile Computing10.1109/TMC.2023.329660823:5(4777-4792)Online publication date: 18-Jul-2023
https://dl.acm.org/doi/10.1109/TMC.2023.3296608
Li HLu LShi WTan GLuo H(2023)Energy-aware scheduling for spark job based on deep reinforcement learning in cloudComputing10.1007/s00607-023-01171-z105:8(1717-1743)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1007/s00607-023-01171-z
Show More Cited By

Index Terms

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
      1. Scheduling algorithms
    2. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Sequential decision making

Index terms have been assigned to the content through auto-classification.

Recommendations

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems
SoCC '18: Proceedings of the ACM Symposium on Cloud Computing

Modern data analytics systems use long-running executors to run an application's entire DAG. Executors exhibit salient time-varying resource requirements. Yet, existing schedulers simply reserve resources for executors statically, and use the peak ...
A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling

The scheduling and execution of bag-of-tasks applications (BoTs) in Clouds is performed on sets of virtualized Cloud resources that start being exhausted right after their allocation disregarding whether tasks are being executed. In addition, BoTs may ...
Semi-Clairvoyant Scheduling in Data Analytics Systems
Popular data analytics systems including Apache Hadoop, Dryad, and Apache Spark abstract jobs as directed acyclic graphs (DAGs). Speeding up completions for DAG jobs matter in practice in order to support real-time decisions. State-of-the-art works ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Networking

IEEE/ACM Transactions on Networking Volume 29, Issue 2

April 2021

475 pages

ISSN:1063-6692

Issue’s Table of Contents

1558-2566 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 15 April 2021

Published in TON Volume 29, Issue 2

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
36
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li HLuo WXie WYe HDuan X(2024)Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid AlgorithmsJournal of Grid Computing10.1007/s10723-024-09756-422:1Online publication date: 9-Mar-2024
https://dl.acm.org/doi/10.1007/s10723-024-09756-4
Liu LLi JNiu ZZhang WXue JXu H(2023)Efficient Time-Series Data Delivery in IoT With XenderIEEE Transactions on Mobile Computing10.1109/TMC.2023.329660823:5(4777-4792)Online publication date: 18-Jul-2023
https://dl.acm.org/doi/10.1109/TMC.2023.3296608
Li HLu LShi WTan GLuo H(2023)Energy-aware scheduling for spark job based on deep reinforcement learning in cloudComputing10.1007/s00607-023-01171-z105:8(1717-1743)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1007/s00607-023-01171-z
Shi WLi HZeng H(2022)DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing EnvironmentsJournal of Grid Computing10.1007/s10723-022-09630-120:4Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10723-022-09630-1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents