More Web Proxy on the site http://driver.im/

Article

Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

Authors:

Hironori Kasahara,

Kazuhisa IshizakaAuthors Info & Claims

LCPC '00: Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers

Pages 189 - 207

Published: 10 August 2000 Publication History

Abstract

This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by "Earliest Executable Condition Analysis" considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.

References

[1]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.

[2]

U. Banerjee. LoopP arallelization. Kluwer Academic Pub., 1994.

[3]

U. Barnerjee. Dependence Analysis for Supercomputing. Kluwer Pub., 1989.

[4]

P. Petersen and D. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. Proc. Int'l conf. on supemputing, Jun. 1993.

[5]

W. Pugh. The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Alysis. Proc. Supercomputing'91, 1991.

[6]

M. R. Haghighat and C. D. Polychronopoulos. Symbolic Analysis for Parallelizing Compliers. Kluwer Academic Publishers, 1995.

[7]

P. Tu and D. Padua. Automatic Array Privatization. Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.

[8]

M. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.

[9]

D. Padua and M. Wolfe. Advanced Compiler Optimizations for Supercomputers. C. ACM, 29(12):1184-1201, Dec. 1986.

[10]

Polaris. http://polaris.cs.uiuc.edu/polaris/.

[11]

R. Eigenmann, J. Hoeflinger, and D. Padua. On the Automatic Parallelization of the Perfect Benchmarks. IEEE Trans. on parallel and distributed systems, 9(1), Jan. 1998.

[12]

L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-Time Methods for Parallelizing Partially Parallel Loops. Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 137-146, Jul. 1995.

[13]

M. W. Hall, B. R. Murp hy, S. P. Amarasinghe, S. Liao, and M. S. Lam. Interprocedural Parallelization Analysis: A Case Study. Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (LCPC95), Aug. 1995.

[14]

M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 1996.

[15]

S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng. The SUIF Compiler for Scalable Parallel Machines. Proc. of the 7th SIAM conference on parallel processing for scientific computing, 1995.

[16]

M. S. Lam. Locallity Optimizations for Parallel Machines. Third Joint International Conference on Vector and Parallel Processing, Nov. 1994.

[17]

J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and Computation Transformations for Multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, Jul. 1995.

[18]

H. Han, G. Rivera, and C.-W. Tseng. Software Support for Improving Locality in Scientific Codes. 8th Workshop on Compilers for Parallel Computers (CPC'2000), Jan. 2000.

[19]

G. Rivera and C.-W. Tseng. Locality Optimizations for Multi-Level Caches. Super Computing '99, Nov. 1999.

[20]

A. Yoshida, K. Koshizuka, M. Okamoto, and H. Kasahara. A Data-Localization Scheme among Loops for each Layer in Hierarchical Coarse Grain Parallel Processing. Trans. of IPSJ, 40(5), May. 1999.

[21]

PROMIS. http://www.csrd.uiuc.edu/promis/.

[22]

C. J. Brownhill, A. Nicolau, S. Novack, and C. D. Polychronopoulos. Achieving Multi-level Parallelization. Proc. of ISHPC'97, Nov. 1997.

[23]

Parafrase2. http://www.csrd.uiuc.edu/parafrase2/.

[24]

M. Girkar and C. Polychronopoulos. Optimization of Data/Control Conditions in Task Graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.

[25]

X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gozalez, and J. Labarta. Thread Fork/Join Techniques for Multi-level Parllelism Exploitation in NUMA Multiprocessors. ICS'99 Rhodes Greece, 1999.

[26]

E. Ayguade, X. Martorell, J. Labarta, M. Gonzalez, and N. Navarro. Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. ICPP'99, Sep. 1999.

[27]

OpenMP: Simple, Portable, Scalable SMP Programming http://www.openmp.org/.

[28]

L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 1998.

[29]

H. K. et al. A Multi-grain Parallelizing Compilation Scheme on OSCAR. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.

[30]

M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara. A Hierarchical Macro-dataflow Computation Scheme of OSCAR Multi-grain Compiler. Trans. IPSJ, 35(4):513-521, Apr. 1994.

[31]

H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H.Honda. OSCAR Multi-grain Architecture and Its Evaluation. Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Oct. 1997.

[32]

H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems. Proc. Int'l. Conf. on Parallel Processing, Aug. 1990.

[33]

H. Honda, M. Iwata, and H. Kasahara. Coarse Grain Parallelism Detection Scheme of Fortran programs. Trans. IEICE (in Japanese), J73-D-I(12), Dec. 1990.

[34]

H. Kasahara. Parallel Processing Technology. Corona Publishing, Tokyo (in Japanese), Jun. 1991.

[35]

H. Kasahara, H. Honda, and S. Narita. Parallel Processing of Near Fine Grain Tasks Using Static Scheduling on OSCAR. Proc. IEEE ACM Supercomputing'90, Nov. 1990.

[36]

J. E. Moreira and C. D. Polychronopoulos. Autoscheduling in a Shared Memory Multiprocessor. CSRD Report No.1337, 1994.

[37]

H. Kasahara, S. Narita, and S. Hashimoto. OSCAR's Architecture. Trans. IEICE (in Japanese), J71-D-I(8), Aug. 1988.

[38]

IBM. XL Fortran for AIX Language Reference.

[39]

D. H. Kulkarni, S. Tandri, L. Martin, N. Copty, R. Silvera, X.-M. Tian, X. Xue, and J. Wang. XL Fortran Compiler for IBM SMP Systems. AIXpert Magazine, Dec. 1997.

Cited By

Saad MPalmieri RRavindran B(2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
https://dl.acm.org/doi/10.1145/3310368
Ramos PSouza GSoares DAraújo GPereira FEvripidou SStenström PO'Boyle M(2018)Automatic annotation of tasks in structured codeProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243200(1-13)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243200
Saad MPalmieri RRavindran BBreitgand DYadgar GPorter DEyal I(2018)LernaProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211897(37-48)Online publication date: 4-Jun-2018
https://dl.acm.org/doi/10.1145/3211890.3211897
Show More Cited By

Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP
1. Theory of computation
  1. Models of computation
    1. Concurrency

Recommendations

Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler
ISHPC '00: Proceedings of the Third International Symposium on High Performance Computing

This paper describes automatic coarse grain parallel processing on a shared memory multiprocessor system using a newly developed OpenMP backend of OSCAR multigrain parallelizing compiler for from single chip multiprocessor to a high performance ...
Coarse grain task parallel processing with cache optimization on shared memory multiprocessor
LCPC'01: Proceedings of the 14th international conference on Languages and compilers for parallel computing

In multiprocessor systems, the gap between peak and effective performance has getting larger. To cope with this performance gap, it is important to use multigrain parallelism in addition to ordinary loop level parallelism. Also, effective use of memory ...
An overlapping task assignment scheme for hierarchical coarse-grain task parallel processing: Research Articles
10th International Workshop on Compilers for Parallel Computers (CPC 2003)

This paper proposes an overlapping task assignment scheme for the hierarchical coarse-grain task parallel processing on multiprocessor systems. In coarse-grain task parallel processing, the compiler extracts parallelism among coarse-grain tasks ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

LCPC '00: Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers

August 2000

382 pages

ISBN:3540428623

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 10 August 2000

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Saad MPalmieri RRavindran B(2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
https://dl.acm.org/doi/10.1145/3310368
Ramos PSouza GSoares DAraújo GPereira FEvripidou SStenström PO'Boyle M(2018)Automatic annotation of tasks in structured codeProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243200(1-13)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243200
Saad MPalmieri RRavindran BBreitgand DYadgar GPorter DEyal I(2018)LernaProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211897(37-48)Online publication date: 4-Jun-2018
https://dl.acm.org/doi/10.1145/3211890.3211897
Hayashi AWada YWatanabe TSekiguchi TMase MShirako JKimura KKasahara H(2010)Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicoresProceedings of the 23rd international conference on Languages and compilers for parallel computing10.5555/1964536.1964549(184-198)Online publication date: 7-Oct-2010
https://dl.acm.org/doi/10.5555/1964536.1964549
Margineanu ACiocarlie H(2010)Specific problems in programming multicore systemsProceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics10.5555/1948759.1948783(159-164)Online publication date: 14-Dec-2010
https://dl.acm.org/doi/10.5555/1948759.1948783
Kimura KMase MMikami HMiyamoto TShirako JKasahara H(2009)OSCAR API for real-time low-power multicores and its performance on multicores and SMP serversProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_13(188-202)Online publication date: 8-Oct-2009
https://dl.acm.org/doi/10.1007/978-3-642-13374-9_13
Koide HOie Y(2004)A New Task Scheduling Method for Distributed Programs which Require Memory Management in GridsProceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)10.5555/968884.969525Online publication date: 26-Jan-2004
https://dl.acm.org/doi/10.5555/968884.969525
Ishizaka KMiyamoto TShirako JObata MKimura KKasahara H(2004)Performance of OSCAR multigrain parallelizing compiler on SMP serversProceedings of the 17th international conference on Languages and Compilers for High Performance Computing10.1007/11532378_23(319-331)Online publication date: 22-Sep-2004
https://dl.acm.org/doi/10.1007/11532378_23
Obata MShirako JKaminaga HIshizaka KKasahara H(2002)Hierarchical parallelism control for multigrain parallel processingProceedings of the 15th international conference on Languages and Compilers for Parallel Computing10.1007/11596110_3(31-44)Online publication date: 25-Jul-2002
https://dl.acm.org/doi/10.1007/11596110_3
Ishizaka KObata MKasahara H(2001)Coarse grain task parallel processing with cache optimization on shared memory multiprocessorProceedings of the 14th international conference on Languages and compilers for parallel computing10.5555/1769331.1769354(352-365)Online publication date: 1-Aug-2001
https://dl.acm.org/doi/10.5555/1769331.1769354

View Options

View options

Media

Figures

Other

Tables

View Table of Contents