[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1787275.1787287acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Collaborative scheduling of DAG structured computations on multicore processors

Published: 17 May 2010 Publication History

Abstract

Many computational solutions can be expressed as directed acyclic graphs (DAGs), in which the nodes represent tasks to be executed. A fundamental challenge in parallel computing is to schedule such DAGs onto multicore processors while preserving the precedence constraints. In this paper, we propose a lightweight scheduling method for DAG structured computations on multicore processors. We distribute the scheduling activities across the cores and let the schedulers collaborate with each other to balance the workload. In addition, we develop a software lock-free local task list for the scheduler to reduce the scheduling overhead. We experimentally evaluated the proposed method by comparing with various baseline methods on state-of-the-art multicore processors. For a representative set of DAG structured computations from both synthetic and real problems, the proposed scheduler with lock-free local task lists achieved 15.12x average speedup on a platform with four quadcore processors, compared to 8.77x achieved by lock-based baseline methods. The observed scheduling overhead of the proposed scheduler was less than 1% of the overall execution time.

References

[1]
I. Ahmad, Y.-K. Kwok, and M.-Y. Wu. Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors. In Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks, pages 207--213, 1996.
[2]
I. Ahmad, S. Ranka, and S. Khan. Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy. In Intl. Sym. on Parallel Dist. Proc., pages 1--6, 2008.
[3]
S. Alarm, R. Barrett, J. Kuehn, P. Roth, and J. Vetter. Characterization of scientific workloads on systems with multicore processors. In IEEE International Symposium on Workload Characterization, pages 225--236, 2006.
[4]
J. H. Anderson and S. Ramamurthy. Lock-free and practical doubly linked list-based deques using single-word compare-and-swap. In Proceedings of the 8th International Conference On Principles Of Distributed Systems, pages 240--255, 2005.
[5]
O. Beaumont, L. Carter, J. Ferrante, A. Legr, L. Marchal, and Y. Robert. Centralized versus distributed schedulers for multiple bag-of-task applications. In International Parallel and Distributed Processing Symposium (IPDPS), pages 1--10, 2006.
[6]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: Anefficient multithreaded runtime system. Technical report, Cambridge, 1996.
[7]
H.-L. Chen and C.-T. King. Eager scheduling with lazy retry in multiprocessors. Future Gener. Comput. Syst., 17(3):215--226, 2000.
[8]
E. G. Coffman. Computer and Job-Shop Scheduling Theory. John Wiley and Sons, New York, NY, 1976.
[9]
G. Cong and D. A. Bader. Designing irregular parallel logorithms with mutual exclusion and lock-free protocols. Journal of Parallel and Distributed Computing, 66:854--866, 2006.
[10]
M. De Vuyst, R. Kumar, and D. Tullsen. Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In Intl. Sym. on Parallel Dist. Proc., pages 1--6, 2006.
[11]
M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. 1990.
[12]
Intel Threading Building Blocks. http://www.threadingbuldingblocks.org/.
[13]
T. S. Jaakkola and M. I. Jordan. Variational probabilistic inference and the QMR-DT network. Journal of Artificial Intelligence Research, 10(1):291--322, 1999.
[14]
J. Kurzak, H. Ltaief, J. Dongarra, and R. M. Badia. Scheduling dense linear algebra operations on multicore processors. Concurr. Comput.: Pract. Exper., 22(1):15--44, 2010.
[15]
Y.-K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, 31(4):406--471, 1999.
[16]
J. Liu, D. M. Nicol, and K. Tan. Lock-free scheduling of logical processes in parallel simulation. In Proceedings of the 2000 Parallel and Distributed Simulation Conference, pages 22--31, 2001.
[17]
OpenMP Application Programming Interface. http://www.openmp.org/.
[18]
C. Papadimitriou and M. Yannakakis. Towards an architecture-independent analysis of parallel algorithms. In Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 510--513, 1988.
[19]
M. S. Squillante and R. D. Nelson. Analysis of task migration in shared memory multiprocessor scheduling. In ACM Conf. on the Measurement and Modeling of Comp. Syst., pages 143--155, 1991.
[20]
X. Tang and G. R. Gao. Automatically partitioning threads for multithreaded architectures. J. Parallel Distrib. Comput., 58(2):159--189, 1999.
[21]
Y. Xia and V. K. Prasanna. Scalable node-level computation kernels for parallel exact inference. IEEE Trans. Comput., 59(1):103--115, 2010.
[22]
H. Zhao and R. Sakellariou. Scheduling multiple DAGs onto heterogeneous systems. In IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pages 1--12, 2006.

Cited By

View all
  • (2023)Transitive Reduction Approach to Large-Scale Parallel Machine Rescheduling Problem With Controllable Processing Times, Precedence Constraints and Random Machine BreakdownIEEE Access10.1109/ACCESS.2023.323863911(7727-7738)Online publication date: 2023
  • (2013)Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.01173:12(1551-1562)Online publication date: 1-Dec-2013
  • (2012)Task Parallel Implementation of Belief Propagation in Factor GraphsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.238(1944-1953)Online publication date: 21-May-2012

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers
May 2010
370 pages
ISBN:9781450300445
DOI:10.1145/1787275
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collaborative scheduling
  2. dag structured computations
  3. lock free structures
  4. task sharing

Qualifiers

  • Research-article

Conference

CF'10
Sponsor:
CF'10: Computing Frontiers Conference
May 17 - 19, 2010
Bertinoro, Italy

Acceptance Rates

CF '10 Paper Acceptance Rate 30 of 113 submissions, 27%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Transitive Reduction Approach to Large-Scale Parallel Machine Rescheduling Problem With Controllable Processing Times, Precedence Constraints and Random Machine BreakdownIEEE Access10.1109/ACCESS.2023.323863911(7727-7738)Online publication date: 2023
  • (2013)Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2013.07.01173:12(1551-1562)Online publication date: 1-Dec-2013
  • (2012)Task Parallel Implementation of Belief Propagation in Factor GraphsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.238(1944-1953)Online publication date: 21-May-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media