More Web Proxy on the site http://driver.im/

Article

A GSA-based compiler infrastructure to extract parallelism from complex loops

Authors:

Ramón DoalloAuthors Info & Claims

ICS '03: Proceedings of the 17th annual international conference on Supercomputing

Pages 193 - 204

https://doi.org/10.1145/782814.782842

Published: 23 June 2003 Publication History

Abstract

This paper presents a new approach for the detection of coarse-grain parallelism in loop nests that contain complex computations, including subscripted subscripts as well as conditional statements that introduce complex control flows at run-time. The approach is based on the recognition of the computational kernels calculated in a loop without considering the semantics of the code. The detection is carried out on top of the Gated Single Assignment (GSA) program representation at two different levels. First, the use-def chains between the statements that compose the strongly connected components (SCCs) of the GSA use-def chain graph are analyzed (intra-SCC analysis). As a result, the kernel computed in each SCC is recognized. Second, the use-def chains between statements of different SCCs are examined (inter-SCC analysis). This second abstraction level enables the detection of more complex computational kernels by the compiler. A prototype was implemented using the infrastructure provided by the Polaris compiler. Experimental results that show the effectiveness of our approach for the detection of coarse-grain parallelism in a suite of real codes are presented.

References

[1]

M. Arenaz. Compiler Framework for the Automatic Detection of Loop-Level Parallelism. PhD thesis, Department of Electronics and Systems, University of A Coruña, Spain, Mar. 2003. Available at www.des.udc.es/~juan/publicationsjuan.html.]]

[2]

M. Arenaz, J. Touriño, and R. Doallo. A compiler framework to detect parallelism in irregular codes. In 14th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2001, Cumberland Falls, KY, Aug. 2001.]]

[3]

M. Arenaz, J. Touriño, and R. Doallo. Run-time support for parallel irregular assignments. In 6th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, LCR'02, Washington DC, Mar. 2002.]]

[4]

W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. A. Padua, Y. Paek, W. M. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):78--82, Dec. 1996.]]

Digital Library

[5]

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451--490, Oct. 1991.]]

Digital Library

[6]

M. P. Gerlek, E. Stoltz, and M. Wolfe. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA. ACM Transactions on Programming Languages and Systems, 17(1):85--122, Jan. 1995.]]

Digital Library

[7]

E. Gutiérrez, O. G. Plata, and E. L. Zapata. Balanced, locality-based parallel irregular reductions. In 14th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2001, Cumberland Falls, KY, Aug. 2001.]]

[8]

C. W. Keßler. Applicability of program comprehension to sparse matrix computations. In 3rd International European Conference on Parallel Processing, Euro-Par'97, pages 347--351, Passau, Germany, Aug. 1997.]]

Digital Library

[9]

K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. In 25th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1998, pages 107--120, San Diego, CA, Jan. 1998.]]

Digital Library

[10]

Y. Lin and D. A. Padua. On the automatic parallelization of sparse and irregular Fortran programs. In 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers, LCR'98, pages 41--56, Pittsburgh, PA, May 1998.]]

Digital Library

[11]

M. J. Martín, D. E. Singh, J. Touriño, and F. F. Rivera. Exploiting locality in the run-time parallelization of irregular loops. In 31st International Conference on Parallel Processing, ICPP 2002, pages 27--34, Vancouver, Canada, Aug. 2002.]]

Digital Library

[12]

W. M. Pottenger and R. Eigenmann. Idiom recognition in the Polaris parallelizing compiler. In 9th ACM International Conference on Supercomputing, ICS'95, pages 444--448, Barcelona, Spain, July 1995.]]

Digital Library

[13]

K. Psarris and K. Kyriakopoulos. Data dependence testing in practice. In 1999 International Conference on Parallel Architectures and Compilation Techniques, PACT'99, pages 264--273, Newport Beach, CA, Oct. 1999.]]

Digital Library

[14]

Y. Saad. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Available at www-users.cs.umn.edu/~saad/software/SPARS-KIT/sparskit.html.]]

[15]

T. Suganuma, H. Komatsu, and T. Nakatani. Detection and global optimization of reduction operations for distributed parallel machines. In 10th ACM International Conference on Supercomputing, ICS'96, pages 18--25, Philadelphia, PA, May 1996.]]

Digital Library

[16]

P. Tu and D. A. Padua. Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In 9th ACM International Conference on Supercomputing, ICS'95, pages 414--423, Barcelona, Spain, July 1995.]]

Digital Library

[17]

M. Wolfe. Beyond induction variables. In 1992 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'92, pages 162--174, San Francisco, CA, June 1992.]]

Digital Library

[18]

M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.]]

Digital Library

[19]

P. Wu, A. Cohen, J. Hoeflinger, and D. A. Padua. Monotonic evolution: An alternative to induction variable substitution for dependence analysis. In 15th ACM International Conference on Supercomputing, ICS'01, pages 78--91, Sorrento, Italy, June 2001.]]

Digital Library

[20]

C.-Z. Xu and V. Chaudhary. Time stamp algorithms for runtime parallelization of DOACROSS loops with dynamic dependences. IEEE Transactions on Parallel and Distributed Systems, 12(5):433--450, May 2001.]]

Digital Library

[21]

H. Yu and L. Rauchwerger. Adaptive reduction parallelization techniques. In 14th ACM International Conference on Supercomputing, ICS'00, pages 66--77, Santa Fe, NM, May 2000.]]

Digital Library

[22]

F. Zhang and E. H. D'Hollander. Enhancing parallelism by removing cyclic data dependencies. In 6th International PARLE Conference, Parallel Architectures and Languages Europe, PARLE'94, pages 387--397, Athens, Greece, July 1994.]]

Digital Library

Cited By

Lobeiras JArenaz MHernández OChandrasekaran SFoertter F(2015)Experiences in extending parallware to support OpenACCProceedings of the Second Workshop on Accelerator Programming using Directives10.1145/2832105.2832112(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2832105.2832112
Kartsaklis CHernandez OHsu CIlsche TJoubert WGraham R(2012)HERCULESProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.69(574-583)Online publication date: 21-May-2012
https://dl.acm.org/doi/10.1109/IPDPSW.2012.69
Ozturk O(2011)Data locality and parallelism optimization using a constraint-based approachJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.00571:2(280-287)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1016/j.jpdc.2010.08.005
Show More Cited By

Index Terms

A GSA-based compiler infrastructure to extract parallelism from complex loops
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Exploitation of parallelism to nested loops with dependence cycles

In this paper, we analyze the recurrences from the breakability of the dependence links formed in general multi-statements in a nested loop. The major findings include: (1) A sink variable renaming technique, which can reposition an undesired anti-...
Transformations techniques for extracting parallelism in non-uniform nested loops

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...
Parallelizing while loops for multiprocessor systems
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

Current parallelizing compilers treat while loops and do loops with conditional exits as sequential constructs because their iteration space is unknown. Because these types of loops arise frequently in practice, we have developed techniques that can ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '03: Proceedings of the 17th annual international conference on Supercomputing

June 2003

380 pages

ISBN:1581137338

DOI:10.1145/782814

General Chair:
Utpal Banerjee
Intel Corporation
,
Program Chairs:
Kyle A. Gallivan
Florida State University
,
Antonio Gonzalez
Intel Labs & Univ. Politècnica de Catalunya

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICS03

Sponsor:

ICS03: International Conference on Supercomputing 2003

June 23 - 26, 2003

CA, San Francisco, USA

Acceptance Rates

ICS '03 Paper Acceptance Rate 36 of 171 submissions, 21%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
483
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lobeiras JArenaz MHernández OChandrasekaran SFoertter F(2015)Experiences in extending parallware to support OpenACCProceedings of the Second Workshop on Accelerator Programming using Directives10.1145/2832105.2832112(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2832105.2832112
Kartsaklis CHernandez OHsu CIlsche TJoubert WGraham R(2012)HERCULESProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum10.1109/IPDPSW.2012.69(574-583)Online publication date: 21-May-2012
https://dl.acm.org/doi/10.1109/IPDPSW.2012.69
Ozturk O(2011)Data locality and parallelism optimization using a constraint-based approachJournal of Parallel and Distributed Computing10.1016/j.jpdc.2010.08.00571:2(280-287)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1016/j.jpdc.2010.08.005
Yemliha TKandemir MOzturk OKultursay EMuralidhara S(2010)Code scheduling for optimizing parallelism and data localityProceedings of the 16th international Euro-Par conference on Parallel processing: Part I10.5555/1887695.1887718(204-216)Online publication date: 31-Aug-2010
https://dl.acm.org/doi/10.5555/1887695.1887718
Yemliha TKandemir MOzturk OKultursay EMuralidhara S(2010)Code Scheduling for Optimizing Parallelism and Data LocalityEuro-Par 2010 - Parallel Processing10.1007/978-3-642-15277-1_20(204-216)Online publication date: 2010
https://doi.org/10.1007/978-3-642-15277-1_20
Arenaz MTouriño JDoallo R(2008)XARKACM Transactions on Programming Languages and Systems10.1145/1391956.139195930:6(1-56)Online publication date: 30-Oct-2008
https://dl.acm.org/doi/10.1145/1391956.1391959
Arenaz MAmoedo PTouriño J(2008)Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Optimizing CompilersProceedings of the 14th international Euro-Par conference on Parallel Processing10.1007/978-3-540-85451-7_39(360-369)Online publication date: 26-Aug-2008
https://dl.acm.org/doi/10.1007/978-3-540-85451-7_39
Arenaz MTouriño JDoallo R(2007)Program behavior characterization through advanced kernel recognitionProceedings of the 13th international Euro-Par conference on Parallel Processing10.5555/2391541.2391572(237-247)Online publication date: 28-Aug-2007
https://dl.acm.org/doi/10.5555/2391541.2391572
Andrade DFraguela BDoallo R(2007)Precise automatable analytical modeling of the cache behavior of codes with indirectionsACM Transactions on Architecture and Code Optimization10.1145/1275937.12759404:3(16-es)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1145/1275937.1275940
Arenaz MTouriño JDoallo R(2007)Program Behavior Characterization Through Advanced Kernel RecognitionEuro-Par 2007 Parallel Processing10.1007/978-3-540-74466-5_27(237-247)Online publication date: 2007
https://doi.org/10.1007/978-3-540-74466-5_27
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents