[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/645678.663944guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

Published: 10 August 2000 Publication History

Abstract

This paper proposes a simple and efficient implementation method for a hierarchical coarse grain task parallel processing scheme on a SMP machine. OSCAR multigrain parallelizing compiler automatically generates parallelized code including OpenMP directives and its performance is evaluated on a commercial SMP machine. The coarse grain task parallel processing is important to improve the effective performance of wide range of multiprocessor systems from a single chip multiprocessor to a high performance computer beyond the limit of the loop parallelism. The proposed scheme decomposes a Fortran program into coarse grain tasks, analyzes parallelism among tasks by "Earliest Executable Condition Analysis" considering control and data dependencies, statically schedules the coarse grain tasks to threads or generates dynamic task scheduling codes to assign the tasks to threads and generates OpenMP Fortran source code for a SMP machine. The thread parallel code using OpenMP generated by OSCAR compiler forks threads only once at the beginning of the program and joins only once at the end even though the program is processed in parallel based on hierarchical coarse grain task parallel processing concept. The performance of the scheme is evaluated on 8-processor SMP machine, IBM RS6000 SP 604e High Node, using a newly developed OpenMP backend of OSCAR multigrain compiler. The evaluation shows that OSCAR compiler with IBM XL Fortran compiler version 5.1 gives us 1.5 to 3 times larger speedup than the native XL Fortran compiler for SPEC 95fp SWIM, TOMCATV, HYDRO2D, MGRID and Perfect Benchmarks ARC2D.

References

[1]
M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
[2]
U. Banerjee. LoopP arallelization. Kluwer Academic Pub., 1994.
[3]
U. Barnerjee. Dependence Analysis for Supercomputing. Kluwer Pub., 1989.
[4]
P. Petersen and D. Padua. Static and Dynamic Evaluation of Data Dependence Analysis. Proc. Int'l conf. on supemputing, Jun. 1993.
[5]
W. Pugh. The OMEGA Test: A Fast and Practical Integer Programming Algorithm for Dependence Alysis. Proc. Supercomputing'91, 1991.
[6]
M. R. Haghighat and C. D. Polychronopoulos. Symbolic Analysis for Parallelizing Compliers. Kluwer Academic Publishers, 1995.
[7]
P. Tu and D. Padua. Automatic Array Privatization. Proc. 6th Annual Workshop on Languages and Compilers for Parallel Computing, 1993.
[8]
M. Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.
[9]
D. Padua and M. Wolfe. Advanced Compiler Optimizations for Supercomputers. C. ACM, 29(12):1184-1201, Dec. 1986.
[10]
Polaris. http://polaris.cs.uiuc.edu/polaris/.
[11]
R. Eigenmann, J. Hoeflinger, and D. Padua. On the Automatic Parallelization of the Perfect Benchmarks. IEEE Trans. on parallel and distributed systems, 9(1), Jan. 1998.
[12]
L. Rauchwerger, N. M. Amato, and D. A. Padua. Run-Time Methods for Parallelizing Partially Parallel Loops. Proceedings of the 9th ACM International Conference on Supercomputing, Barcelona, Spain, pages 137-146, Jul. 1995.
[13]
M. W. Hall, B. R. Murp hy, S. P. Amarasinghe, S. Liao, and M. S. Lam. Interprocedural Parallelization Analysis: A Case Study. Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing (LCPC95), Aug. 1995.
[14]
M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, E. Bugnion, and M. S. Lam. Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer, 1996.
[15]
S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng. The SUIF Compiler for Scalable Parallel Machines. Proc. of the 7th SIAM conference on parallel processing for scientific computing, 1995.
[16]
M. S. Lam. Locallity Optimizations for Parallel Machines. Third Joint International Conference on Vector and Parallel Processing, Nov. 1994.
[17]
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and Computation Transformations for Multiprocessors. Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing, Jul. 1995.
[18]
H. Han, G. Rivera, and C.-W. Tseng. Software Support for Improving Locality in Scientific Codes. 8th Workshop on Compilers for Parallel Computers (CPC'2000), Jan. 2000.
[19]
G. Rivera and C.-W. Tseng. Locality Optimizations for Multi-Level Caches. Super Computing '99, Nov. 1999.
[20]
A. Yoshida, K. Koshizuka, M. Okamoto, and H. Kasahara. A Data-Localization Scheme among Loops for each Layer in Hierarchical Coarse Grain Parallel Processing. Trans. of IPSJ, 40(5), May. 1999.
[21]
PROMIS. http://www.csrd.uiuc.edu/promis/.
[22]
C. J. Brownhill, A. Nicolau, S. Novack, and C. D. Polychronopoulos. Achieving Multi-level Parallelization. Proc. of ISHPC'97, Nov. 1997.
[23]
Parafrase2. http://www.csrd.uiuc.edu/parafrase2/.
[24]
M. Girkar and C. Polychronopoulos. Optimization of Data/Control Conditions in Task Graphs. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
[25]
X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gozalez, and J. Labarta. Thread Fork/Join Techniques for Multi-level Parllelism Exploitation in NUMA Multiprocessors. ICS'99 Rhodes Greece, 1999.
[26]
E. Ayguade, X. Martorell, J. Labarta, M. Gonzalez, and N. Navarro. Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. ICPP'99, Sep. 1999.
[27]
OpenMP: Simple, Portable, Scalable SMP Programming http://www.openmp.org/.
[28]
L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared Memory Programming. IEEE Computational Science & Engineering, 1998.
[29]
H. K. et al. A Multi-grain Parallelizing Compilation Scheme on OSCAR. Proc. 4th Workshop on Languages and Compilers for Parallel Computing, Aug. 1991.
[30]
M. Okamoto, K. Aida, M. Miyazawa, H. Honda, and H. Kasahara. A Hierarchical Macro-dataflow Computation Scheme of OSCAR Multi-grain Compiler. Trans. IPSJ, 35(4):513-521, Apr. 1994.
[31]
H. Kasahara, M. Okamoto, A. Yoshida, W. Ogata, K. Kimura, G. Matsui, H. Matsuzaki, and H.Honda. OSCAR Multi-grain Architecture and Its Evaluation. Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Oct. 1997.
[32]
H. Kasahara, H. Honda, M. Iwata, and M. Hirota. A Macro-dataflow Compilation Scheme for Hierarchical Multiprocessor Systems. Proc. Int'l. Conf. on Parallel Processing, Aug. 1990.
[33]
H. Honda, M. Iwata, and H. Kasahara. Coarse Grain Parallelism Detection Scheme of Fortran programs. Trans. IEICE (in Japanese), J73-D-I(12), Dec. 1990.
[34]
H. Kasahara. Parallel Processing Technology. Corona Publishing, Tokyo (in Japanese), Jun. 1991.
[35]
H. Kasahara, H. Honda, and S. Narita. Parallel Processing of Near Fine Grain Tasks Using Static Scheduling on OSCAR. Proc. IEEE ACM Supercomputing'90, Nov. 1990.
[36]
J. E. Moreira and C. D. Polychronopoulos. Autoscheduling in a Shared Memory Multiprocessor. CSRD Report No.1337, 1994.
[37]
H. Kasahara, S. Narita, and S. Hashimoto. OSCAR's Architecture. Trans. IEICE (in Japanese), J71-D-I(8), Aug. 1988.
[38]
IBM. XL Fortran for AIX Language Reference.
[39]
D. H. Kulkarni, S. Tandri, L. Martin, N. Copty, R. Silvera, X.-M. Tian, X. Xue, and J. Wang. XL Fortran Compiler for IBM SMP Systems. AIXpert Magazine, Dec. 1997.

Cited By

View all
  • (2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
  • (2018)Automatic annotation of tasks in structured codeProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243200(1-13)Online publication date: 1-Nov-2018
  • (2018)LernaProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211897(37-48)Online publication date: 4-Jun-2018
  • Show More Cited By
  1. Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    LCPC '00: Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
    August 2000
    382 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 10 August 2000

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)LernaACM Transactions on Storage10.1145/331036815:1(1-24)Online publication date: 22-Mar-2019
    • (2018)Automatic annotation of tasks in structured codeProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243200(1-13)Online publication date: 1-Nov-2018
    • (2018)LernaProceedings of the 11th ACM International Systems and Storage Conference10.1145/3211890.3211897(37-48)Online publication date: 4-Jun-2018
    • (2010)Parallelizing compiler framework and API for power reduction and software productivity of real-time heterogeneous multicoresProceedings of the 23rd international conference on Languages and compilers for parallel computing10.5555/1964536.1964549(184-198)Online publication date: 7-Oct-2010
    • (2010)Specific problems in programming multicore systemsProceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics10.5555/1948759.1948783(159-164)Online publication date: 14-Dec-2010
    • (2009)OSCAR API for real-time low-power multicores and its performance on multicores and SMP serversProceedings of the 22nd international conference on Languages and Compilers for Parallel Computing10.1007/978-3-642-13374-9_13(188-202)Online publication date: 8-Oct-2009
    • (2004)A New Task Scheduling Method for Distributed Programs which Require Memory Management in GridsProceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)10.5555/968884.969525Online publication date: 26-Jan-2004
    • (2004)Performance of OSCAR multigrain parallelizing compiler on SMP serversProceedings of the 17th international conference on Languages and Compilers for High Performance Computing10.1007/11532378_23(319-331)Online publication date: 22-Sep-2004
    • (2002)Hierarchical parallelism control for multigrain parallel processingProceedings of the 15th international conference on Languages and Compilers for Parallel Computing10.1007/11596110_3(31-44)Online publication date: 25-Jul-2002
    • (2001)Coarse grain task parallel processing with cache optimization on shared memory multiprocessorProceedings of the 14th international conference on Languages and compilers for parallel computing10.5555/1769331.1769354(352-365)Online publication date: 1-Aug-2001

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media