[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Public Access

Towards Optimality in Parallel Scheduling

Published: 19 December 2017 Publication History

Abstract

To keep pace with Moore's law, chip designers have focused on increasing the number of cores per chip rather than single core performance. In turn, modern jobs are often designed to run on any number of cores. However, to effectively leverage these multi-core chips, one must address the question of how many cores to assign to each job. Given that jobs receive sublinear speedups from additional cores, there is an obvious tradeoff: allocating more cores to an individual job reduces the job's runtime, but in turn decreases the efficiency of the overall system. We ask how the system should schedule jobs across cores so as to minimize the mean response time over a stream of incoming jobs.
To answer this question, we develop an analytical model of jobs running on a multi-core machine. We prove that EQUI, a policy which continuously divides cores evenly across jobs, is optimal when all jobs follow a single speedup curve and have exponentially distributed sizes. EQUI requires jobs to change their level of parallelization while they run. Since this is not possible for all workloads, we consider a class of "fixed-width" policies, which choose a single level of parallelization, k, to use for all jobs. We prove that, surprisingly, it is possible to achieve EQUI's performance without requiring jobs to change their levels of parallelization by using the optimal fixed level of parallelization, k*. We also show how to analytically derive the optimal k* as a function of the system load, the speedup curve, and the job size distribution.
In the case where jobs may follow different speedup curves, finding a good scheduling policy is even more challenging. In particular, we find that policies like EQUI which performed well in the case of a single speedup function now perform poorly. We propose a very simple policy, GREEDY*, which performs near-optimally when compared to the numerically-derived optimal policy.

References

[1]
I. Adan, G. J. J. A. N. van Houtum, and J. van der Wal. 1994. Upper and lower bounds for the waiting time in the symmetric shortest queue system. Annals of Operations Research 48 (1994), 197--217.
[2]
K. Agrawal, J. Li, K. Lu, and B. Moseley. 2016. Scheduling Parallelizable Jobs Online to Minimize the Maximum Flow Time. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '16). ACM, New York, NY, USA, 195--205.
[3]
G. Ananthanarayanan, M. C. Hung, X. Ren, I. Stoica, A. Wierman, and M. Yu. 2014. Grass: Trimming stragglers in approximation analytics. (2014).
[4]
S. V. Anastasiadis and K. C. Sevcik. 1997. Parallel Application Scheduling on Networks of Workstations. J. Parallel and Distrib. Comput. 43 (1997), 109 -- 124.
[5]
F. Baskett, K. M. Chandy, R. Muntz, and F. G. Palacios. 1975. Open, Closed, and Mixed Networks of Queues with Different Classes of Customers. J. ACM 22 (1975), 248--260.
[6]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC Benchmark Suite: Characterization and ArchitecturalImplications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT '08). ACM, New York, NY, USA, 72--81.
[7]
T. Bonald and A. Proutière. 2002. Insensitivity in processor-sharing networks. Performance Evaluation 49 (2002), 193--209.
[8]
A. Bušić, I. Vliegen, and A. Scheller-Wolf. 2012. Comparing Markov chains: aggregation and precedence relations applied to sets of states, with applications to assemble-to-order systems. Mathematics of Operations Research 37 (2012), 259--287.
[9]
S. Chaitanya, B. Urgaonkar, and A. Sivasubramaniam. 2008. Qdsl: a queuing model for systems with differential service levels. ACM SIGMETRICS Performance Evaluation Review 36, 1 (2008), 289--300.
[10]
W. Cirne and F. Berman. 2002. Using Moldability to Improve the Performance of Supercomputer Jobs. J. Parallel and Distrib. Comput. 62 (2002), 1571--1601.
[11]
J. Edmonds. 1999. Scheduling in the dark. Theoretical Computer Science 235 (1999), 109--141.
[12]
J. Edmonds and K. Pruhs. 2009. Scalably scheduling processes with arbitrary speedup curves. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '09). ACM, New York, NY, USA, 685--692.
[13]
D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, K. C. Sevcik, and P. Wong. 1997. Theory and Practice in Parallel Job Scheduling. In Proceedings of the International Workshop on Job Scheduling Strategies for Parallel Processing (IPPS '97). Springer-Verlag, London, UK, 1--34. http://dl.acm.org/citation.cfm?id=646378.689517
[14]
A. Gandhi, M. Harchol-Balter, R. Das, and C. Lefurgy. 2009. Optimal power allocation in server farms. In ACM SIGMETRICS Performance Evaluation Review, Vol. 37. ACM, 157--168.
[15]
V. Gupta, M. Harchol-Balter, K. Sigman, and W. Whitt. 2007. Analysis of join-the-shortest-queue routing for web server farms. Performance Evaluation 64 (2007), 1062--1081.
[16]
M. Harchol-Balter. 2013. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press.
[17]
M. Harchol-Balter, A. Scheller-Wolf, and A. R. Young. 2009. Surprising results on task assignment in server farms with high-variability workloads. ACM SIGMETRICS Performance Evaluation Review 37 (2009), 287--298.
[18]
M. D. Hill and M. R. Marty. 2008. Amdahl's Law in the Multicore Era. Computer 41 (2008), 33--38.
[19]
K.-C. Huang, T.-C. Huang, Y.-H. Tung, and P.-Z. Shih. 2013. Effective Processor Allocation for Moldable Jobs with Application Speedup Model. In Advances in Intelligent Systems and Applications - Volume 2. Springer, 563--572.
[20]
L. Kleinrock. 1976. Queueing Systems, Volume II: Computer Applications. Wiley, New York.
[21]
S.-S. Ko and R. F. Serfozo. 2004. Response times in M/M/s fork-join networks. Advances in Applied Probability 36 (2004), 854--871.
[22]
G. M. Koole. 2006. Monotonicity in Markov reward and decision chains: Theory and applications. Foundations and Trends in Stochastic Systems 1 (2006), 1--76.
[23]
S. A. Lippman. 1973. Semi-Markov decision processes with unbounded rewards. Management Science 19 (1973), 717--731.
[24]
Y. Lu, Q. Xie, G. Kliot, A. Geller, J. R. Larus, and A. Greenberg. 2011. Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services. Performance Evaluation 68 (2011), 1056--1071.
[25]
J. McCool, M. Robison, and A. Reinders. 2012. Structured Parallel Programming: Patterns for Efficient Computation. Elsevier.
[26]
R. D. Nelson and T. K. Philips. 1993. An Approximation for the Mean Response Time for Shortest Queue Routing with General Interarrival and Service Times. Performance Evaluation 17 (1993), 123--139.
[27]
M. L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Chichester.
[28]
X. Ren, G. Ananthanarayanan, A. Wierman, and M. Yu. 2015. Hopper: Decentralized speculation-aware cluster scheduling at scale. ACM SIGCOMM Computer Communication Review 45, 4 (2015), 379--392.
[29]
Z. Scully, G. Blelloch, M. Harchol-Balter, and A. Scheller-Wolf. 2017. Optimally Scheduling Jobs with Multiple Tasks. In Proceedings of the ACM Workshop on Mathematical Performance Modeling and Analysis.
[30]
S. Srinivasan, S. Krishnamoorthy, and P. Sadayappan. 2003. A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER '03). 92--99.
[31]
J. N. Tsitsiklis and K. Xu. 2011. On the power of (even a little) centralization in distributed processing. ACM SIGMETRICS Performance Evaluation Review 39 (2011), 121--132.
[32]
Y. Xu, A. Scheller-Wolf, and K. P. Sycara. 2015. The Benefit of Introducing Variability in Single-Server Queues with Application to Quality-Based Service Domains. Operations Research 63 (2015), 233--246.
[33]
X. Zhan, Y. Bao, C. Bienia, and K. Li. 2017. PARSEC3.0: A Multicore Benchmark Suite with Network Stacks and SPLASH-2X. ACM SIGARCH Computer Architecture News 44 (2017), 1--16.

Cited By

View all
  • (2023)A reinforcement learning algorithm for scheduling parallel processors with identical speedup functionsMachine Learning with Applications10.1016/j.mlwa.2023.10048513(100485)Online publication date: Sep-2023
  • (2022)Speed Scaling with Multiple Servers under a Sum-Power ConstraintACM SIGMETRICS Performance Evaluation Review10.1145/3529113.352912849:3(45-50)Online publication date: 25-Mar-2022
  • (2022)WCFS: a new framework for analyzing multiserver systemsQueueing Systems10.1007/s11134-022-09848-6102:1-2(143-174)Online publication date: 28-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 1, Issue 2
December 2017
480 pages
EISSN:2476-1249
DOI:10.1145/3175501
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 December 2017
Published in POMACS Volume 1, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multicore architecture
  2. operating systems
  3. performance modeling
  4. queueing theory

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)8
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A reinforcement learning algorithm for scheduling parallel processors with identical speedup functionsMachine Learning with Applications10.1016/j.mlwa.2023.10048513(100485)Online publication date: Sep-2023
  • (2022)Speed Scaling with Multiple Servers under a Sum-Power ConstraintACM SIGMETRICS Performance Evaluation Review10.1145/3529113.352912849:3(45-50)Online publication date: 25-Mar-2022
  • (2022)WCFS: a new framework for analyzing multiserver systemsQueueing Systems10.1007/s11134-022-09848-6102:1-2(143-174)Online publication date: 28-Jul-2022
  • (2021)Optimal Scheduling of Parallel Jobs With Unknown Service RequirementsHandbook of Research on Methodologies and Applications of Supercomputing10.4018/978-1-7998-7156-9.ch003(18-40)Online publication date: 2021
  • (2021)Open problems in queueing theory inspired by datacenter computingQueueing Systems: Theory and Applications10.1007/s11134-020-09684-697:1-2(3-37)Online publication date: 1-Feb-2021
  • (2020)Optimal Resource Allocation for Elastic and Inelastic JobsProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400265(75-87)Online publication date: 6-Jul-2020
  • (2020)Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.293817231:2(318-331)Online publication date: 1-Feb-2020
  • (2020)EXPPO: EXecution Performance Profiling and Optimization for CPS Co-simulation-as-a-Service2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC)10.1109/ISORC49007.2020.00040(184-191)Online publication date: May-2020
  • (2020)Energy Consumption by Servers under Unknown Service DemandElectronic Notes in Theoretical Computer Science10.1016/j.entcs.2020.09.017353(21-38)Online publication date: Nov-2020
  • (2019)heSRPTACM SIGMETRICS Performance Evaluation Review10.1145/3374888.337489647:2(18-20)Online publication date: 4-Dec-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media