[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1531743.1531752acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Published: 18 May 2009 Publication History

Abstract

Cache sharing in Chip Multiprocessors brings cache contention among corunning processes, which often causes considerable degradation of program performance and system fairness. Recent studies have seen the effectiveness of job co-scheduling in alleviating the contention. But finding optimal schedules is challenging. Previous explorations tackle the problem under highly constrained settings. In this work, we show that relaxing those constraints, particularly the assumptions on job lengths and reschedulings, increases the complexity of the problem significantly. Subsequently, we propose a series of algorithms to compute or approximate the optimal schedules in the more general setting.
Specifically, we propose an A*-based approach to accelerating the search for optimal schedules by as much as several orders of magnitude. For large problems, we design and evaluate two approximation algorithms, A*-cluster and local-matching algorithms, to effectively approximate the optimal schedules with good accuracy and high scalability. This study contributes better understanding to the optimal co-scheduling problem, facilitates the evaluation of co-scheduling systems, and offers insights and techniques for the development of practical co-scheduling algorithms.

References

[1]
Gnu linear programming kit. Available at http://www.gnu.org/software/glpk/glpk.html.
[2]
Stanford parallel applications for shared memory (splash) benchmark. Available at http://www-flash.stanford.edu/SPLASH/.
[3]
James R. Bulpin and Ian A. Pratt. Hyper-threading aware process scheduling heuristics. In 2005 USENIX Annual Technical Conference, pages 103--106, 2005.
[4]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005.
[5]
P. Denning. Thrashing: Its causes and prevention. In Proceedings of the AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968.
[6]
M. DeVuyst, R. Kumar, and D.M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.
[7]
J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B, 69B:125--130, 1965.
[8]
Ali El-Moursy, R. Garg, D.H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a cmp of multi-threaded processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.
[9]
A. Fedorova, M. Seltzer, and M.D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007.
[10]
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2001.
[11]
L.R. Hsu, S.K. Reinhardt, R. Lyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.
[12]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S.W. Keckler. A nuca substrate for flexible cmp cache sharing. In Proceedings of International Conference on Supercomputing, pages 31--40, 2005.
[13]
Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), 2008.
[14]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2004.
[15]
R. Kumar, D.M. Tullsen, and N.P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.
[16]
Nakijima and Pallipadi. Enhancements for hyperthreading technology in the operating system seeking the optimal scheduling. In Proceedings of USENIX Annual Technical Conference, 2002.
[17]
S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington, June 2000.
[18]
M.K. Qureshi and Y.N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture, 2006.
[19]
N. Rafique, W. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.
[20]
S. Russell and P. Norvig. Artificial Intelligence. Prentice Hall, 2002.
[21]
A. Settle, J. L. Kihm, A. Janiszewski, and D.A. Connors. Architectural support for enhanced smt job scheduling. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 63--73, 2004.
[22]
X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages (POPL), 2007.
[23]
A. Snavely and D.M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of ASPLOS, 2000.
[24]
A. Snavely, D.M. Tullsen, and G. Voelker. Symbiotic job scheduling with priorities for a simultaneous multithreading processor. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2002.
[25]
G.E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In Proceedings of the 15th international conference on Supercomputing, 2001.
[26]
G.E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture, 2002.
[27]
N. Tuck and D.M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.
[28]
X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In Proceedings of the 11th Workshop on Hot Topics in Operating Systems, 2007.

Cited By

View all
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • (2022)Online energy-efficient fair scheduling for heterogeneous multi-cores considering shared resource contentionThe Journal of Supercomputing10.1007/s11227-021-04159-878:6(7729-7748)Online publication date: 3-Jan-2022
  • (2019)Co-scheduling HPC workloads on cache-partitioned CMP platformsInternational Journal of High Performance Computing Applications10.1177/109434201984695633:6(1221-1239)Online publication date: 1-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '09: Proceedings of the 6th ACM conference on Computing frontiers
May 2009
238 pages
ISBN:9781605584133
DOI:10.1145/1531743
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. a*-search
  2. cache contention
  3. cmp scheduling
  4. co-scheduling
  5. perfect matching

Qualifiers

  • Research-article

Conference

CF '09
Sponsor:
CF '09: Computing Frontiers Conference
May 18 - 20, 2009
Ischia, Italy

Acceptance Rates

CF '09 Paper Acceptance Rate 26 of 113 submissions, 23%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
  • (2022)Online energy-efficient fair scheduling for heterogeneous multi-cores considering shared resource contentionThe Journal of Supercomputing10.1007/s11227-021-04159-878:6(7729-7748)Online publication date: 3-Jan-2022
  • (2019)Co-scheduling HPC workloads on cache-partitioned CMP platformsInternational Journal of High Performance Computing Applications10.1177/109434201984695633:6(1221-1239)Online publication date: 1-Nov-2019
  • (2019)Adaptive Thread Scheduling in Chip MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-019-00637-yOnline publication date: 14-May-2019
  • (2018)Co-scheduling Amdahl applications on cache-partitioned systemsInternational Journal of High Performance Computing Applications10.1177/109434201771080632:1(123-138)Online publication date: 1-Jan-2018
  • (2018)Co-Scheduling HPC Workloads on Cache-Partitioned CMP Platforms2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00052(348-358)Online publication date: Sep-2018
  • (2018)Resolving the GPU responsiveness dilemma through program transformationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6206-y12:3(545-559)Online publication date: 1-Jun-2018
  • (2017)Co-Scheduling Algorithms for Cache-Partitioned Systems2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.60(874-883)Online publication date: May-2017
  • (2017)Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.124(967-977)Online publication date: May-2017
  • (2017)Capacity planning for cluster tools in the semiconductor industryInternational Journal of Production Economics10.1016/j.ijpe.2017.01.005194(167-180)Online publication date: Dec-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media