Article

Free access

Runtime performance of parallel array assignment: an empirical study

Authors:

Lei Wang,

James M. Stichnoth,

Siddhartha ChatterjeeAuthors Info & Claims

Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing

Pages 4 - es

https://doi.org/10.1145/369028.369036

Published: 17 November 1996 Publication History

PDF eReader

Abstract

Compiling the array assignment statement of High Performance Fortran in the presence of block-cyclic distributions of data arrays is considered difficult, and several algorithms have been published to solve this problem. We present a comprehensive study of the performance of these algorithms. We classify these algorithms into several families and identify several issues of interest in the compilation process, and present experimental performance data for the various algorithms. We demonstrate that block-cyclic distributions can be compiled almost as efficiently as block and cyclic distributions.

References

[1]

C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. In Workshop on Compilers for Parallel Computers, Delft, The Netherlands, Dec. 1993.]]

Google Scholar

[2]

S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S.-H. Teng. Generating local addresses and communication sets for data-parallel programs.Journal of Parallel and Distributed Computing, 26(1):72-84, Apr. 1995.]]

Digital Library

Google Scholar

[3]

J. J. Dongarra and D. W. Walker. Software libraries for linear algebra computations on high performance computers.SIAM Review, 37(2):151-180, June 1995.]]

Digital Library

Google Scholar

[4]

S. K. S. Gupta, S. D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. In A. N. Choudhary and P. B. Berra, editors, Proceedings of the 1993 International Conference on Parallel Processing, volume II, pages 301-305. CRC Press, Inc., Aug. 1993.]]

Digital Library

Google Scholar

[5]

High Performance Fortran Forum. High Performance Fortran language specification.Scientific Programming, 2(1-2):1-170, 1993.]]

Crossref

Google Scholar

[6]

K. Kennedy, N. Nedeljković, and A. Sethi. Efficient address generation for block-cyclic distributions. In Proceedings of the 1995 International Conference on Supercomputing, pages 180-184, Barcelona, Spain, July 1995.]]

Digital Library

Google Scholar

[7]

K. Kennedy, N. Nedeljković, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 102-111, Santa Barbara, CA, July 1995.]]

Digital Library

Google Scholar

[8]

K. Kennedy, N. Nedeljković, and A. Sethi. Communication generation for cyclic(k) distributions. In B. Szymanski and B. Sinharoy, editors, Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 185-197. Kluwer Academic Publishers, 1996.]]

Google Scholar

[9]

C. Koelbel. Compile-time generation of regular communication patterns. In Proceedings of Supercomputing'91, pages 101-110, Albuquerque, NM, Nov. 1991.]]

Digital Library

Google Scholar

[10]

C. Koelbel and P. Mehrotra. Compiling global name-space parallel loops for distributed execution.IEEE Transactions on Parallel and Distributed Systems, 2(4):440-451, Oct. 1991.]]

Digital Library

Google Scholar

[11]

T. MacDonald, D. Pase, and A. Meltzer. Addressing in Cray Research's MPP Fortran. In Proceedings of the Third Workshop on Compilers for Parallel Computers, pages 161-172, Vienna, Austria, July 1992. Austrian Center for Parallel Computation.]]

Google Scholar

[12]

S. Midkiff. Local iteration set computation for block-cyclic distributions. In C. Polychronopoulos, editor, Proceedings of the 24th International Conference on Parallel Processing, pages 77-84. CRC Press, Aug. 1995.]]

Google Scholar

[13]

J. M. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Feb. 1993.]]

Digital Library

Google Scholar

[14]

A. Thirumalai and J. Ramanujam. Fast address sequence generation for data-parallel programs using integer lattices. In C.-H. Huang, P. Sadayappan, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing (LNCS 1033), pages 191-208. Springer, 1995.]]

Digital Library

Google Scholar

Cited By

View all

Herault THerrmann JMarchal LRobert Y(2014)Determining the Optimal Redistribution for a Given Data PartitionProceedings of the 2014 IEEE 13th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2014.16(95-102)Online publication date: 24-Jun-2014
https://dl.acm.org/doi/10.1109/ISPDC.2014.16
Midkiff S(2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
https://doi.org/10.2200/S00340ED1V01Y201201CAC019
Souravlas SRoumeliotis M(2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1007/s11227-008-0185-1
Show More Cited By

Index Terms

Recommendations

Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation

High Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
Multi-phase array redistribution: modeling and evaluation
IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing

Array redistribution is used in languages such as High Performance Fortran to allow programmers to dynamically change the distribution of arrays across processors. Distributed-memory implementations of several scientific applications require array ...
Correct translation of data parallel assignment onto array processors
Abstract
Two models of computation which are suitable for the description of parallel scientific calculations on large data structures are presented. Data parallel assignment is a construct which may be used to specify the application of a set of order ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing

November 1996

898 pages

ISBN:0897918541

Chairman:
Beverly Clayton

Publisher

IEEE Computer Society

United States

Publication History

Published: 17 November 1996

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SC '96

Sponsor:

SIGARCH
IEEE-CS

SC '96: International Conference for High Performance Computing, Networking, Storage and Analysis

January 1, 1996

Pennsylvania, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
342
Total Downloads

Downloads (Last 12 months)209
Downloads (Last 6 weeks)42

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Herault THerrmann JMarchal LRobert Y(2014)Determining the Optimal Redistribution for a Given Data PartitionProceedings of the 2014 IEEE 13th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2014.16(95-102)Online publication date: 24-Jun-2014
https://dl.acm.org/doi/10.1109/ISPDC.2014.16
Midkiff S(2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
https://doi.org/10.2200/S00340ED1V01Y201201CAC019
Souravlas SRoumeliotis M(2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1007/s11227-008-0185-1
Ramanujam JDutta SVenkatachar A(2005)Code generation for complex subscripts in data-parallel programsLanguages and Compilers for Parallel Computing10.1007/BFb0032683(49-63)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0032683
Midkiff S(2005)Optimizing the representation of local iteration sets and access sequences for block-cyclic distributionsLanguages and Compilers for Parallel Computing10.1007/BFb0017267(420-434)Online publication date: 10-Jun-2005
https://doi.org/10.1007/BFb0017267
Souravlas SRoumeliotis M(2004)A pipeline technique for dynamic data transfer on a multiprocessor gridInternational Journal of Parallel Programming10.1023/B:IJPP.0000038068.80639.5232:5(361-388)Online publication date: 1-Oct-2004
https://dl.acm.org/doi/10.1023/B%3AIJPP.0000038068.80639.52
Lee PChen W(2002)Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computersParallel Computing10.1016/S0167-8191(02)00113-828:9(1329-1368)Online publication date: 1-Sep-2002
https://dl.acm.org/doi/10.1016/S0167-8191%2802%2900113-8
Ramanujam J(2001)Integer lattice based methods for local address generation for block-cyclic distributionsCompiler optimizations for scalable parallel systems10.5555/380466.380483(597-645)Online publication date: 1-Jun-2001
https://dl.acm.org/doi/10.5555/380466.380483
Ramanujam J(2001)Integer Lattice Based Methods for Local Address Generation for Block-Cyclic DistributionsCompiler Optimizations for Scalable Parallel Systems10.1007/3-540-45403-9_17(597-645)Online publication date: 18-May-2001
https://doi.org/10.1007/3-540-45403-9_17
Scherer ALu HGross TZwaenepoel W(1999)Transparent adaptive parallelism on NOWs using OpenMPACM SIGPLAN Notices10.1145/329366.30111334:8(96-106)Online publication date: 1-May-1999
https://dl.acm.org/doi/10.1145/329366.301113
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks

Multi-phase array redistribution: modeling and evaluation

Correct translation of data parallel assignment onto array processors