More Web Proxy on the site http://driver.im/

research-article

Collaborative scheduling of DAG structured computations on multicore processors

Authors:

Viktor K. PrasannaAuthors Info & Claims

CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Pages 63 - 72

https://doi.org/10.1145/1787275.1787287

Published: 17 May 2010 Publication History

Abstract

Many computational solutions can be expressed as directed acyclic graphs (DAGs), in which the nodes represent tasks to be executed. A fundamental challenge in parallel computing is to schedule such DAGs onto multicore processors while preserving the precedence constraints. In this paper, we propose a lightweight scheduling method for DAG structured computations on multicore processors. We distribute the scheduling activities across the cores and let the schedulers collaborate with each other to balance the workload. In addition, we develop a software lock-free local task list for the scheduler to reduce the scheduling overhead. We experimentally evaluated the proposed method by comparing with various baseline methods on state-of-the-art multicore processors. For a representative set of DAG structured computations from both synthetic and real problems, the proposed scheduler with lock-free local task lists achieved 15.12x average speedup on a platform with four quadcore processors, compared to 8.77x achieved by lock-based baseline methods. The observed scheduling overhead of the proposed scheduler was less than 1% of the overall execution time.

References

[1]

I. Ahmad, Y.-K. Kwok, and M.-Y. Wu. Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors. In Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks, pages 207--213, 1996.

Digital Library

[2]

I. Ahmad, S. Ranka, and S. Khan. Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy. In Intl. Sym. on Parallel Dist. Proc., pages 1--6, 2008.

[3]

S. Alarm, R. Barrett, J. Kuehn, P. Roth, and J. Vetter. Characterization of scientific workloads on systems with multicore processors. In IEEE International Symposium on Workload Characterization, pages 225--236, 2006.

[4]

J. H. Anderson and S. Ramamurthy. Lock-free and practical doubly linked list-based deques using single-word compare-and-swap. In Proceedings of the 8th International Conference On Principles Of Distributed Systems, pages 240--255, 2005.

Digital Library

[5]

O. Beaumont, L. Carter, J. Ferrante, A. Legr, L. Marchal, and Y. Robert. Centralized versus distributed schedulers for multiple bag-of-task applications. In International Parallel and Distributed Processing Symposium (IPDPS), pages 1--10, 2006.

Digital Library

[6]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: Anefficient multithreaded runtime system. Technical report, Cambridge, 1996.

Digital Library

[7]

H.-L. Chen and C.-T. King. Eager scheduling with lazy retry in multiprocessors. Future Gener. Comput. Syst., 17(3):215--226, 2000.

Digital Library

[8]

E. G. Coffman. Computer and Job-Shop Scheduling Theory. John Wiley and Sons, New York, NY, 1976.

[9]

G. Cong and D. A. Bader. Designing irregular parallel logorithms with mutual exclusion and lock-free protocols. Journal of Parallel and Distributed Computing, 66:854--866, 2006.

Digital Library

[10]

M. De Vuyst, R. Kumar, and D. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Intl. Sym. on Parallel Dist. Proc., pages 1--6, 2006.

[11]

M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. 1990.

Digital Library

[12]

Intel Threading Building Blocks. http://www.threadingbuldingblocks.org/.

[13]

T. S. Jaakkola and M. I. Jordan. Variational probabilistic inference and the QMR-DT network. Journal of Artificial Intelligence Research, 10(1):291--322, 1999.

Digital Library

[14]

J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia. Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exper., 22(1):15--44, 2010.

Digital Library

[15]

Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 31(4):406--471, 1999.

Digital Library

[16]

J. Liu, D. M. Nicol, and K. Tan. Lock-free scheduling of logical processes in parallel simulation. In Proceedings of the 2000 Parallel and Distributed Simulation Conference, pages 22--31, 2001.

Digital Library

[17]

OpenMP Application Programming Interface. http://www.openmp.org/.

[18]

C. Papadimitriou and M. Yannakakis. Towards an architecture-independent analysis of parallel algorithms. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 510--513, 1988.

Digital Library

[19]

M. S. Squillante and R. D. Nelson. Analysis of task migration in shared memory multiprocessor scheduling. In ACM Conf. on the Measurement and Modeling of Comp. Syst., pages 143--155, 1991.

Digital Library

[20]

X. Tang and G. R. Gao. Automatically partitioning threads for multithreaded architectures. J. Parallel Distrib. Comput., 58(2):159--189, 1999.

Digital Library

[21]

Y. Xia and V. K. Prasanna. Scalable node-level computation kernels for parallel exact inference. IEEE Trans. Comput., 59(1):103--115, 2010.

Digital Library

[22]

H. Zhao and R. Sakellariou. Scheduling multiple DAGs onto heterogeneous systems. In IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--12, 2006.

Digital Library

Cited By

Meng DFeng ZRen ZLiu Y(2023)Transitive Reduction Approach to Large-Scale Parallel Machine Rescheduling Problem With Controllable Processing Times, Precedence Constraints and Random Machine BreakdownIEEE Access10.1109/ACCESS.2023.323863911(7727-7738)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3238639
La Fratta PKogge P(2013)Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.01173:12(1551-1562)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1016/j.jpdc.2013.07.011
Ma NXia YPrasanna V(2012)Task Parallel Implementation of Belief Propagation in Factor GraphsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.238(1944-1953)Online publication date: 21-May-2012
https://dl.acm.org/doi/10.1109/IPDPSW.2012.238

Index Terms

Collaborative scheduling of DAG structured computations on multicore processors
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
2. Theory of computation
  1. Models of computation
    1. Concurrency
      1. Parallel computing models

Recommendations

Hierarchical scheduling of DAG structured computations on manycore processors with dynamic thread grouping
JSSPP'10: Proceedings of the 15th international conference on Job scheduling strategies for parallel processing

Many computational solutions can be expressed as directed acyclic graphs (DAGs) with weighted nodes. In parallel computing, scheduling such DAGs onto manycore processors remains a fundamental challenge, since synchronization across dozens of threads and ...
Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors

Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for ...
A Collaborative Multi-factor Scheduler for Asymmetric Multicore Processors
PACT '19: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques

Asymmetric multicore processors (AMP) are necessary for extracting performance in an era of limited power budget and dark silicon. We have efficient symmetric schedulers, efficient asymmetric schedulers for single-threaded workloads, and efficient ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

May 2010

370 pages

ISBN:9781450300445

DOI:10.1145/1787275

General Chair:
Nancy M. Amato
Texas A&M University, USA
,
Program Chairs:
Hubertus Franke
IBM Research, USA
,
Paul H.J. Kelly
Imperial College London, UK

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CF'10

Sponsor:

SIGMICRO

CF'10: Computing Frontiers Conference

May 17 - 19, 2010

Bertinoro, Italy

Acceptance Rates

CF '10 Paper Acceptance Rate 30 of 113 submissions, 27%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
249
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Meng DFeng ZRen ZLiu Y(2023)Transitive Reduction Approach to Large-Scale Parallel Machine Rescheduling Problem With Controllable Processing Times, Precedence Constraints and Random Machine BreakdownIEEE Access10.1109/ACCESS.2023.323863911(7727-7738)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3238639
La Fratta PKogge P(2013)Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.01173:12(1551-1562)Online publication date: 1-Dec-2013
https://dl.acm.org/doi/10.1016/j.jpdc.2013.07.011
Ma NXia YPrasanna V(2012)Task Parallel Implementation of Belief Propagation in Factor GraphsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.238(1944-1953)Online publication date: 21-May-2012
https://dl.acm.org/doi/10.1109/IPDPSW.2012.238

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents