More Web Proxy on the site http://driver.im/

research-article

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Authors:

Xipeng ShenAuthors Info & Claims

CF '09: Proceedings of the 6th ACM conference on Computing frontiers

Pages 41 - 50

https://doi.org/10.1145/1531743.1531752

Published: 18 May 2009 Publication History

Abstract

Cache sharing in Chip Multiprocessors brings cache contention among corunning processes, which often causes considerable degradation of program performance and system fairness. Recent studies have seen the effectiveness of job co-scheduling in alleviating the contention. But finding optimal schedules is challenging. Previous explorations tackle the problem under highly constrained settings. In this work, we show that relaxing those constraints, particularly the assumptions on job lengths and reschedulings, increases the complexity of the problem significantly. Subsequently, we propose a series of algorithms to compute or approximate the optimal schedules in the more general setting.

Specifically, we propose an A*-based approach to accelerating the search for optimal schedules by as much as several orders of magnitude. For large problems, we design and evaluate two approximation algorithms, A*-cluster and local-matching algorithms, to effectively approximate the optimal schedules with good accuracy and high scalability. This study contributes better understanding to the optimal co-scheduling problem, facilitates the evaluation of co-scheduling systems, and offers insights and techniques for the development of practical co-scheduling algorithms.

References

[1]

Gnu linear programming kit. Available at http://www.gnu.org/software/glpk/glpk.html.

[2]

Stanford parallel applications for shared memory (splash) benchmark. Available at http://www-flash.stanford.edu/SPLASH/.

[3]

James R. Bulpin and Ian A. Pratt. Hyper-threading aware process scheduling heuristics. In 2005 USENIX Annual Technical Conference, pages 103--106, 2005.

Digital Library

[4]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA), 2005.

Digital Library

[5]

P. Denning. Thrashing: Its causes and prevention. In Proceedings of the AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968.

Digital Library

[6]

M. DeVuyst, R. Kumar, and D.M. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a cmp of smt processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.

Digital Library

[7]

J. Edmonds. Maximum matching and a polyhedron with 0,1-vertices. Journal of Research of the National Bureau of Standards B, 69B:125--130, 1965.

[8]

Ali El-Moursy, R. Garg, D.H. Albonesi, and S. Dwarkadas. Compatible phase co-scheduling on a cmp of multi-threaded processors. In Proceedings of International Parallel and Distribute Processing Symposium (IPDPS), 2006.

Digital Library

[9]

A. Fedorova, M. Seltzer, and M.D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2007.

Digital Library

[10]

T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer, 2001.

[11]

L.R. Hsu, S.K. Reinhardt, R. Lyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.

Digital Library

[12]

J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S.W. Keckler. A nuca substrate for flexible cmp cache sharing. In Proceedings of International Conference on Supercomputing, pages 31--40, 2005.

Digital Library

[13]

Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT), 2008.

Digital Library

[14]

S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2004.

Digital Library

[15]

R. Kumar, D.M. Tullsen, and N.P. Jouppi. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.

Digital Library

[16]

Nakijima and Pallipadi. Enhancements for hyperthreading technology in the operating system seeking the optimal scheduling. In Proceedings of USENIX Annual Technical Conference, 2002.

Digital Library

[17]

S. Parekh, S. Eggers, H. Levy, and J. Lo. Thread-sensitive scheduling for smt processors. Technical Report 2000-04-02, University of Washington, June 2000.

[18]

M.K. Qureshi and Y.N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture, 2006.

Digital Library

[19]

N. Rafique, W. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, 2006.

Digital Library

[20]

S. Russell and P. Norvig. Artificial Intelligence. Prentice Hall, 2002.

Digital Library

[21]

A. Settle, J. L. Kihm, A. Janiszewski, and D.A. Connors. Architectural support for enhanced smt job scheduling. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 63--73, 2004.

Digital Library

[22]

X. Shen, J. Shaw, B. Meeker, and C. Ding. Locality approximation using time. In Proceedings of the ACM SIGPLAN Conference on Principles of Programming Languages (POPL), 2007.

Digital Library

[23]

A. Snavely and D.M. Tullsen. Symbiotic jobscheduling for a simultaneous multithreading processor. In Proceedings of ASPLOS, 2000.

Digital Library

[24]

A. Snavely, D.M. Tullsen, and G. Voelker. Symbiotic job scheduling with priorities for a simultaneous multithreading processor. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2002.

Digital Library

[25]

G.E. Suh, S. Devadas, and L. Rudolph. Analytical cache models with applications to cache partitioning. In Proceedings of the 15th international conference on Supercomputing, 2001.

Digital Library

[26]

G.E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture, 2002.

Digital Library

[27]

N. Tuck and D.M. Tullsen. Initial observations of the simultaneous multithreading Pentium 4 processor. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.

Digital Library

[28]

X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor hardware counter statistics as a first-class system resource. In Proceedings of the 11th Workshop on Hot Topics in Operating Systems, 2007.

Digital Library

Cited By

Saroliya UArima ELiu DSchulz M(2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00023
Salami BNoori HNaghibzadeh M(2022)Online energy-efficient fair scheduling for heterogeneous multi-cores considering shared resource contentionThe Journal of Supercomputing10.1007/s11227-021-04159-878:6(7729-7748)Online publication date: 3-Jan-2022
https://doi.org/10.1007/s11227-021-04159-8
Dongarra JTourancheau BAupy GBenoit AGoglin BPottier LRobert Y(2019)Co-scheduling HPC workloads on cache-partitioned CMP platformsInternational Journal of High Performance Computing Applications10.1177/109434201984695633:6(1221-1239)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1177/1094342019846956
Show More Cited By

Index Terms

A study on optimally co-scheduling jobs of different lengths on chip multiprocessors
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Scheduling

Recommendations

Analysis and approximation of optimal co-scheduling on chip multiprocessors
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Cache sharing among processors is important for Chip Multiprocessors to reduce inter-thread latency, but also brings cache contention, degrading program performance considerably. Recent studies have shown that job co-scheduling can effectively alleviate ...
The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions

In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning ...
Scheduling jobs under decreasing linear deterioration

This paper considers the scheduling problems under decreasing linear deterioration. Deterioration of a job means that its processing time is a function of its execution start time. Optimal algorithms are presented respectively for single machine ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '09: Proceedings of the 6th ACM conference on Computing frontiers

May 2009

238 pages

ISBN:9781605584133

DOI:10.1145/1531743

General Chairs:
Gearold Johnson
Colorado State University, USA
,
Cartsen Trinitis
TU München, Germany
,
Program Chairs:
Georgi N. Gaydadjiev
TU Delft, The Nederland
,
Alex Veidenbaum
University of California, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CF '09

Sponsor:

CF '09: Computing Frontiers Conference

May 18 - 20, 2009

Ischia, Italy

Acceptance Rates

CF '09 Paper Acceptance Rate 26 of 113 submissions, 23%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
344
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Saroliya UArima ELiu DSchulz M(2023)Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00023(185-196)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00023
Salami BNoori HNaghibzadeh M(2022)Online energy-efficient fair scheduling for heterogeneous multi-cores considering shared resource contentionThe Journal of Supercomputing10.1007/s11227-021-04159-878:6(7729-7748)Online publication date: 3-Jan-2022
https://doi.org/10.1007/s11227-021-04159-8
Dongarra JTourancheau BAupy GBenoit AGoglin BPottier LRobert Y(2019)Co-scheduling HPC workloads on cache-partitioned CMP platformsInternational Journal of High Performance Computing Applications10.1177/109434201984695633:6(1221-1239)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1177/1094342019846956
Akturk IOzturk O(2019)Adaptive Thread Scheduling in Chip MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-019-00637-yOnline publication date: 14-May-2019
https://doi.org/10.1007/s10766-019-00637-y
Aupy GBenoit ADai SPottier LRaghavan PRobert YShantharam M(2018)Co-scheduling Amdahl applications on cache-partitioned systemsInternational Journal of High Performance Computing Applications10.1177/109434201771080632:1(123-138)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1177/1094342017710806
Aupy GBenoit AGoglin BPottier LRobert Y(2018)Co-Scheduling HPC Workloads on Cache-Partitioned CMP Platforms2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00052(348-358)Online publication date: Sep-2018
https://doi.org/10.1109/CLUSTER.2018.00052
Zhu QWu BShen XShen KShen LWang Z(2018)Resolving the GPU responsiveness dilemma through program transformationsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6206-y12:3(545-559)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11704-016-6206-y
Aupy GBenoit APottier LRaghavan PRobert YShantharam M(2017)Co-Scheduling Algorithms for Cache-Partitioned Systems2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.60(874-883)Online publication date: May-2017
https://doi.org/10.1109/IPDPSW.2017.60
Zhu QWu BShen XShen LWang Z(2017)Co-Run Scheduling with Power Cap on Integrated CPU-GPU Systems2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.124(967-977)Online publication date: May-2017
https://doi.org/10.1109/IPDPS.2017.124
Romauch MHartl R(2017)Capacity planning for cluster tools in the semiconductor industryInternational Journal of Production Economics10.1016/j.ijpe.2017.01.005194(167-180)Online publication date: Dec-2017
https://doi.org/10.1016/j.ijpe.2017.01.005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents