[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Elasecutor: Elastic Executor Scheduling in Data Analytics Systems

Published: 15 April 2021 Publication History

Abstract

Modern data analytics systems use long-running executors to run an application&#x2019;s entire DAG. Executors exhibit salient time-varying resource requirements. Yet, existing schedulers simply reserve resources for executors statically, and use the peak resource demand to guide executor placement. This leads to low utilization and poor application performance. We present Elasecutor, a novel executor scheduler for data analytics systems. Elasecutor dynamically allocates and explicitly sizes resources to executors over time according to the predicted time/varying resource demands. Rather than placing executors using their peak demand, Elasecutor strategically assigns them to machines based on a concept called <italic>dominant remaining resource</italic> to minimize resource fragmentation. Elasecutor further adaptively reprovisions resources in order to tolerate inaccurate demand prediction and reschedules tasks to deal with inadequate reprovisioning resources on one machine. Testbed evaluation on a 35-node cluster with our Spark-based prototype implementation shows that Elasecutor reduces makespan by more than 36&#x0025; on average, and improves cluster utilization by up to 55&#x0025; compared to existing work.

References

[1]
Apache Flink. Accessed: May 16, 2018. [Online]. Available: http://flink.apache.org
[2]
Apache Hadoop. Accessed: May 16, 2018. [Online]. Available: http://hadoop.apache.org
[3]
Apache Spark. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org
[4]
Apache Storm. Accessed: May 16, 2018. [Online]. Available: http://storm.apache.org
[5]
Apache Tez. Accessed: May 16, 2018. [Online]. Available: http://tez.apache.org
[6]
Capacity Scheduler. Accessed: May 16, 2018. [Online]. Available: http://bit.ly/1tGpbDN
[7]
Cluster Mode Overview. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/cluster-overview.html
[8]
Elasecutor. Accessed: May 16, 2018. [Online]. Available: https://github.com/NetX-lab/Elasecutor
[9]
Fair Scheduler. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/job-scheduling.html#fair-scheduler-pools
[10]
HiBench. Accessed: May 16, 2018. [Online]. Available: https://github.com/intel-hadoop/HiBench
[11]
How-to: Tune Your Apache Spark Jobs. Accessed: May 16, 2018. [Online]. Available: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2
[12]
Kubernetes. Accessed: May 16, 2018. [Online]. Available: https://kubernetes.io/
[13]
OpenJDK. Accessed: May 16, 2018. [Online]. Available: http://openjdk.java.net
[14]
Resource Allocation Policy in Spark 2.1.0. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/job-scheduling.html#resource-allocation-policy
[15]
Spark Configuration. Accessed: May 16, 2018. [Online]. Available: https://spark.apache.org/docs/2.1.0/configuration.html
[16]
[17]
S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou, “Re-optimizing data parallel computing,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2012, pp. 281–294.
[18]
M. K. Aguileraet al., “Remote memory in the age of fast networks,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2017, pp. 121–127.
[19]
O. Alipourfard, H. H. Liu, J. Chen, S. Venkataraman, M. Yu, and M. Zhang, “CherryPick: Adaptively unearthing the best cloud configurations for big data analytics,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2017, pp. 469–482.
[20]
N. Bansal and A. Khan, “Improved approximation algorithm for two-dimensional bin packing,” in Proc. 25th Annu. ACM-SIAM Symp. Discrete Algorithms, Jan. 2014, pp. 13–25.
[21]
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, “Algorithms for hyper-parameter optimization,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2011, pp. 2546–2554.
[22]
E. Boutinet al., “Apollo: Scalable and coordinated scheduling for cloud-scale computing,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2014, pp. 285–300.
[23]
M. Chowdhury and I. Stoica, “Efficient coflow scheduling without prior knowledge,” in Proc. ACM Conf. Special Interest Group Data Commun., Aug. 2015, pp. 393–406.
[24]
C. Curino, D. E. Difallah, C. Douglas, S. Krishnan, R. Ramakrishnan, and S. Rao, “Reservation-based Scheduling: If you’re late don’t blame us!,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2014, pp. 1–14.
[25]
J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. Symp. Operating Syst. Design Implement. (OSDI), 2004, pp. 1–13.
[26]
C. Delimitrou and C. Kozyrakis, “Paragon: QoS-aware scheduling for heterogeneous datacenters,” in Proc. ACM ASPLOS, 2013, pp. 1–25.
[27]
C. Delimitrou and C. Kozyrakis, “Quasar: Resource-efficient and QoS-aware cluster management,” in Proc. ACM ASPLOS, 2014, pp. 1–17.
[28]
H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 1996, pp. 155–161.
[29]
A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca, “Jockey: Guaranteed job latency in data parallel clusters,” in Proc. 7th ACM Eur. Conf. Comput. Syst. (EuroSys), 2012, pp. 99–112.
[30]
P. Garefalakis, K. Karanasos, P. Pietzuch, A. Suresh, and S. Rao, “MEDEA: Scheduling of long running applications in shared production clusters,” in Proc. 13th EuroSys Conf., Apr. 2018, pp. 1–13.
[31]
M. R. Garey, R. L. Graham, and J. D. Ullman, “Worst-case analysis of memory allocation algorithms,” in Proc. 4th Annu. ACM Symp. Theory Comput. (STOC), 1972, pp. 143–150.
[32]
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, “Dominant resource fairness: Fair allocation of multiple resource types,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2011, p. 24.
[33]
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella, “Multi-resource packing for cluster schedulers,” in Proc. ACM Conf. SIGCOMM, Aug. 2014, pp. 455–466.
[34]
R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan, “Altruistic scheduling in multi-resource clusters,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2016, pp. 65–80.
[35]
R. Grandl, S. Kandula, S. Rao, A. Akella, and J. Kulkarni, “Graphene: Packing and dependency-aware scheduling for data-parallel clusters,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2016, pp. 81–97.
[36]
M. Grzegorzet al., “Pregel: A system for large-scale graph processing,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2010, pp. 135–146.
[37]
B. Hindmanet al., “Mesos: A platform for fine-grained resource sharing in the data center,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2011, p. 22.
[38]
C. Iorgulescu, F. Dinu, A. Raza, W. U. Hassan, and W. Zwaenepoel, “Don’t cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2017, pp. 97–109.
[39]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential building blocks,” in Proc. ACM SIGOPS/EuroSys Eur. Conf. Comput. Syst. (EuroSys), 2007, pp. 59–72.
[40]
M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg, “Quincy: Fair scheduling for distributed computing clusters,” in Proc. ACM SIGOPS 22nd Symp. Operating Syst. Princ. (SOSP), 2009, pp. 261–276.
[41]
E. G. Joseph, S. X. Reynold, D. Ankur, C. Daniel, J. F. Michael, and S. Ion, “GraphX: Graph processing in a distributed dataflow framework,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2014, pp. 599–613.
[42]
K. Karanasoset al., “Mercury: Hybrid centralized and distributed scheduling in large shared clusters,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2015, pp. 485–497.
[43]
S. Kimet al., “GPUnet: Networking abstractions for GPU programs,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2014, pp. 201–216.
[44]
R. Kondor and T. Jebara, “A kernel between sets of vectors,” in Proc. Int. Conf. Mach. Learn. (ICML), 2003, pp. 361–368.
[45]
T. Koponenet al., “Onix: A distributed control platform for large-scale production networks,” in Proc. OSDI, 2010, pp. 1–6.
[46]
M. Kornackeret al., “Impala: A modern, open-source SQL engine for Hadoop,” in Proc. CIDR, 2015, p. 9.
[47]
M. Korupolu, A. Singh, and B. Bamba, “Coupled placement in modern data centers,” in Proc. IEEE Int. Symp. Parallel Distrib. Process. (IPTPS), May 2009, pp. 1–12.
[48]
A. Kuzmanovska, R. H. Mak, and D. Epema, “KOALA-F: A resource manager for scheduling frameworks in clusters,” in Proc. 16th IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput. (CCGrid), May 2016, pp. 80–89.
[49]
B. Liet al., “ClickNP: Highly flexible and high performance network processing with reconfigurable hardware,” in Proc. ACM SIGCOMM Conf., Aug. 2016, pp. 1–14.
[50]
L. Liu and H. Xu, “Elasecutor: Elastic executor scheduling in data analytics systems,” in Proc. ACM Symp. Cloud Comput., Oct. 2018, pp. 107–120.
[51]
Y. Lu, A. Chowdhery, and S. Kandula, “Optasia: A relational platform for efficient large-scale video analytics,” in Proc. 7th ACM Symp. Cloud Comput., Oct. 2016, pp. 57–70.
[52]
H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource management with deep reinforcement learning,” in Proc. 15th ACM Workshop Hot Topics Netw., Nov. 2016, pp. 50–56.
[53]
M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica, “Discretized streams: Fault-tolerant streaming computation at scale,” in Proc. 24th ACM Symp. Operating Syst. Princ., Nov. 2013, pp. 423–438.
[54]
C. Michele, R. Yan, and L. Zheng, “Adaptive kernel approximation for large-scale non-linear SVM prediction,” in Proc. Int. Conf. Mach. Learn. (ICML), 2011, pp. 409–416.
[55]
K. Morton, M. Balazinska, and D. Grossman, “ParaTimer: A progress indicator for MapReduce DAGs,” in Proc. Int. Conf. Manage. Data (SIGMOD), 2010, pp. 507–518.
[56]
K. Ousterhout, C. Canel, S. Ratnasamy, and S. Shenker, “Monotasks: Architecting for performance clarity in data analytics frameworks,” in Proc. 26th Symp. Operating Syst. Princ., Oct. 2017, pp. 184–200.
[57]
K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun, “Making sense of performance in data analytics frameworks,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2015, pp. 293–307.
[58]
K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: Distributed, low latency scheduling,” in Proc. 24th ACM Symp. Operating Syst. Princ., Nov. 2013, pp. 69–84.
[59]
J. Rasley, K. Karanasos, S. Kandula, R. Fonseca, M. Vojnovic, and S. Rao, “Efficient queue management for cluster scheduling,” in Proc. ACM Eur. Conf. Comput. Syst. (EuroSys), 2016, pp. 1–15.
[60]
A. Roy, H. Zeng, J. Bagga, G. Porter, and A. C. Snoeren, “Inside the social network’s (datacenter) network,” in Proc. ACM Conf. Special Interest Group Data Commun., Aug. 2015, pp. 123–137.
[61]
B. SchOikopr, P. Bartlett, A. Smola, and R. Williamson, “Shrinking the tube: A new support vector regression algorithm,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 1999, pp. 330–336.
[62]
M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, and J. Wilkes, “Omega: Flexible, scalable schedulers for large compute clusters,” in Proc. 8th ACM Eur. Conf. Comput. Syst. (EuroSys), 2013, pp. 351–364.
[63]
A. Singhet al., “Jupiter rising: A decade of Clos topologies and centralized control in Google’s datacenter network,” in Proc. ACM SIGCOMM, 2015, pp. 1–15.
[64]
A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statist. Comput., vol. 14, no. 3, pp. 199–222, 2004.
[65]
J. Son, Y. Xiong, K. Tan, P. Wang, Z. Gan, and S. Moon, “Protego: Cloud-scale multitenant IPsec gateway,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2017, pp. 473–485.
[66]
A. Toshniwalet al., “Storm@Twitter,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2014, pp. 147–156.
[67]
A. Toshniwalet al., “Morpheus: Towards automated SLOs for enterprise clusters,” in Proc. USENIX Symp. Operating Syst. Design Implement. (OSDI), 2016, pp. 117–134.
[68]
V. K. Vavilapalliet al., “Apache Hadoop YARN: Yet another resource negotiator,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2013, pp. 1–16.
[69]
S. Venkataraman, Z. Yang, M. Franklin, B. Recht, and I. Stoica, “Ernest: Efficient performance prediction for large-scale advanced analytics,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2016, pp. 363–378.
[70]
A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes, “Large-scale cluster Management at Google with Borg,” in Proc. ACM Eur. Conf. Comput. Syst. (EuroSys), 2015, pp. 1–17.
[71]
J. Wang and M. Balazinska, “Elastic memory management for cloud data analytics,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2017, pp. 745–758.
[72]
M. Weimeret al., “REEF: Retainable evaluator execution framework,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, May 2015, pp. 1343–1355.
[73]
G. J. Woeginger, “There is no asymptotic PTAS for two-dimensional vector packing,” Inf. Process. Lett., vol. 64, no. 6, pp. 293–297, Dec. 1997.
[74]
W. Xia, H. Jiang, D. Feng, and Y. Hua, “SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2011, pp. 26–30.
[75]
D. Xie, N. Ding, Y. C. Hu, and R. Kompella, “The only constant is change: Incorporating time-varying network reservations in data centers,” in Proc. ACM SIGCOMM, 2012, pp. 199–210.
[76]
G. Xu and C.-Z. Xu, “Prometheus: Online estimation of optimal memory demands for workers in in-memory distributed computation,” in Proc. ACM Symp. Cloud Comput. (SoCC), 2017, p. 655.
[77]
G. Xu, C.-Z. Xu, and S. Jiang, “Prophet: Scheduling executors with time-varying resource demands on data-parallel computation frameworks,” in Proc. IEEE Int. Conf. Autonomic Comput. (ICAC), Jul. 2016, pp. 45–54.
[78]
Y. Yan, Y. Gao, Y. Chen, Z. Guo, B. Chen, and T. Moscibroda, “TR-spark: Transient computing for big data analytics,” in Proc. 7th ACM Symp. Cloud Comput., Oct. 2016, pp. 484–496.
[79]
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling,” in Proc. ACM Eur. Conf. Comput. Syst. (EuroSys), 2010, pp. 265–278.
[80]
M. Zahariaet al., “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proc. USENIX Symp. Netw. Syst. Design Implement. (NSDI), 2012, pp. 15–28.
[81]
H. Zhang, L. Stafman, A. Or, and M. J. Freedman, “SLAQ: Quality-driven scheduling for distributed machine learning,” in Proc. Symp. Cloud Comput., Sep. 2017, pp. 390–404.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Networking
IEEE/ACM Transactions on Networking  Volume 29, Issue 2
April 2021
475 pages

Publisher

IEEE Press

Publication History

Published: 15 April 2021
Published in TON Volume 29, Issue 2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid AlgorithmsJournal of Grid Computing10.1007/s10723-024-09756-422:1Online publication date: 9-Mar-2024
  • (2023)Efficient Time-Series Data Delivery in IoT With XenderIEEE Transactions on Mobile Computing10.1109/TMC.2023.329660823:5(4777-4792)Online publication date: 18-Jul-2023
  • (2023)Energy-aware scheduling for spark job based on deep reinforcement learning in cloudComputing10.1007/s00607-023-01171-z105:8(1717-1743)Online publication date: 1-Aug-2023
  • (2022)DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing EnvironmentsJournal of Grid Computing10.1007/s10723-022-09630-120:4Online publication date: 1-Dec-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media