[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/369028.369036acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Runtime performance of parallel array assignment: an empirical study

Published: 17 November 1996 Publication History

Abstract

Compiling the array assignment statement of High Performance Fortran in the presence of block-cyclic distributions of data arrays is considered difficult, and several algorithms have been published to solve this problem. We present a comprehensive study of the performance of these algorithms. We classify these algorithms into several families and identify several issues of interest in the compilation process, and present experimental performance data for the various algorithms. We demonstrate that block-cyclic distributions can be compiled almost as efficiently as block and cyclic distributions.

References

[1]
C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. In Workshop on Compilers for Parallel Computers, Delft, The Netherlands, Dec. 1993.]]
[2]
S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S.-H. Teng. Generating local addresses and communication sets for data-parallel programs.Journal of Parallel and Distributed Computing, 26(1):72-84, Apr. 1995.]]
[3]
J. J. Dongarra and D. W. Walker. Software libraries for linear algebra computations on high performance computers.SIAM Review, 37(2):151-180, June 1995.]]
[4]
S. K. S. Gupta, S. D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. In A. N. Choudhary and P. B. Berra, editors, Proceedings of the 1993 International Conference on Parallel Processing, volume II, pages 301-305. CRC Press, Inc., Aug. 1993.]]
[5]
High Performance Fortran Forum. High Performance Fortran language specification.Scientific Programming, 2(1-2):1-170, 1993.]]
[6]
K. Kennedy, N. Nedeljković, and A. Sethi. Efficient address generation for block-cyclic distributions. In Proceedings of the 1995 International Conference on Supercomputing, pages 180-184, Barcelona, Spain, July 1995.]]
[7]
K. Kennedy, N. Nedeljković, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 102-111, Santa Barbara, CA, July 1995.]]
[8]
K. Kennedy, N. Nedeljković, and A. Sethi. Communication generation for cyclic(k) distributions. In B. Szymanski and B. Sinharoy, editors, Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 185-197. Kluwer Academic Publishers, 1996.]]
[9]
C. Koelbel. Compile-time generation of regular communication patterns. In Proceedings of Supercomputing'91, pages 101-110, Albuquerque, NM, Nov. 1991.]]
[10]
C. Koelbel and P. Mehrotra. Compiling global name-space parallel loops for distributed execution.IEEE Transactions on Parallel and Distributed Systems, 2(4):440-451, Oct. 1991.]]
[11]
T. MacDonald, D. Pase, and A. Meltzer. Addressing in Cray Research's MPP Fortran. In Proceedings of the Third Workshop on Compilers for Parallel Computers, pages 161-172, Vienna, Austria, July 1992. Austrian Center for Parallel Computation.]]
[12]
S. Midkiff. Local iteration set computation for block-cyclic distributions. In C. Polychronopoulos, editor, Proceedings of the 24th International Conference on Parallel Processing, pages 77-84. CRC Press, Aug. 1995.]]
[13]
J. M. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Feb. 1993.]]
[14]
A. Thirumalai and J. Ramanujam. Fast address sequence generation for data-parallel programs using integer lattices. In C.-H. Huang, P. Sadayappan, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing (LNCS 1033), pages 191-208. Springer, 1995.]]

Cited By

View all
  • (2014)Determining the Optimal Redistribution for a Given Data PartitionProceedings of the 2014 IEEE 13th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2014.16(95-102)Online publication date: 24-Jun-2014
  • (2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
  • (2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
Supercomputing '96: Proceedings of the 1996 ACM/IEEE conference on Supercomputing
November 1996
898 pages
ISBN:0897918541

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 17 November 1996

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Array assignment
  2. High Performance Fortran
  3. block-cyclic distribution
  4. code generation
  5. performance evaluation

Qualifiers

  • Article

Conference

SC '96
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)209
  • Downloads (Last 6 weeks)42
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2014)Determining the Optimal Redistribution for a Given Data PartitionProceedings of the 2014 IEEE 13th International Symposium on Parallel and Distributed Computing10.1109/ISPDC.2014.16(95-102)Online publication date: 24-Jun-2014
  • (2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
  • (2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
  • (2005)Code generation for complex subscripts in data-parallel programsLanguages and Compilers for Parallel Computing10.1007/BFb0032683(49-63)Online publication date: 9-Jun-2005
  • (2005)Optimizing the representation of local iteration sets and access sequences for block-cyclic distributionsLanguages and Compilers for Parallel Computing10.1007/BFb0017267(420-434)Online publication date: 10-Jun-2005
  • (2004)A pipeline technique for dynamic data transfer on a multiprocessor gridInternational Journal of Parallel Programming10.1023/B:IJPP.0000038068.80639.5232:5(361-388)Online publication date: 1-Oct-2004
  • (2002)Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computersParallel Computing10.1016/S0167-8191(02)00113-828:9(1329-1368)Online publication date: 1-Sep-2002
  • (2001)Integer lattice based methods for local address generation for block-cyclic distributionsCompiler optimizations for scalable parallel systems10.5555/380466.380483(597-645)Online publication date: 1-Jun-2001
  • (2001)Integer Lattice Based Methods for Local Address Generation for Block-Cyclic DistributionsCompiler Optimizations for Scalable Parallel Systems10.1007/3-540-45403-9_17(597-645)Online publication date: 18-May-2001
  • (1999)Transparent adaptive parallelism on NOWs using OpenMPACM SIGPLAN Notices10.1145/329366.30111334:8(96-106)Online publication date: 1-May-1999
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media