More Web Proxy on the site http://driver.im/

article

Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Authors:

Wen-Yao ChenAuthors Info & Claims

Parallel Computing, Volume 28, Issue 9

Pages 1329 - 1368

https://doi.org/10.1016/S0167-8191(02)00113-8

Published: 01 September 2002 Publication History

Abstract

This paper is concerned with the design of efficient algorithms for generating global name-space communication sets based on execution of array assignment statements for arbitrary strides and block sizes on distributed-memory parallel computers. We will present a hybrid approach, which combines the advantages of the set-theoretic method and the integer lattice method for generating communication sets. When block sizes are extremely small or large, a cyclic-based or a row-wise set-theoretic method is used. For other cases when block sizes are medium, we propose a new integer lattice method, in which data in each local block are treated as a unit. The first virtual referenced element in each virtual referenced local block can be generated efficiently by using an integer lattice method, in which the left boundary of index domain in each processing element is extended for this purpose. Then, the physical referenced elements in each physical referenced local block can be determined by the intersection of two closed forms, whose result is also a closed form. Because the cost of generating indices for packing and unpacking messages at the sending and receiving ends may be expensive for certain cases, we also study the conventional communication model and the deposit communication model. As each of the proposed algorithms and the communication models has its special use for certain cases, we thus identify rules of thumb to decide the most suitable algorithm for dealing with general cases.

References

[1]

{1} V. Adve, J. Mellor-Crummey, Using integer sets for data-parallel program analysis and optimization, in: Proceedings of ACM-SIGPLAN PLDI, Montreal, CA, 1998.

[2]

{2} C. Ancourt, F. Coelho, F. Irigoin, R. Keryell, A linear algebra framework for static high performance Fortran code distribution, Scientific Programming 6 (1997) 3-27.

Digital Library

[3]

{3} S. Benkner, Handling block-cyclic distributed arrays in Vienna Fortran 90, in: Proceedings of International Conference on Parallel Architecures and Compilation Techniques (PACT-95), Limassol, Cyprus, 1995.

[4]

{4} S. Benkner, P. Brezany, H. Zima, Processing array statements and procedure interfaces in the PREPARE high performance Fortran compiler, in: Lecture Notes in Computer Science, vol. 786, 1993, pp. 324-338.

[5]

{5} Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, M.-Y. Wu, Compiling Fortran 90D/HPF for distributed memory MIMD computers, Journal of Parallel and Distributed Computing 21 (1994) 15-26.

Digital Library

[6]

{6} S. Chatterjee, J.R. Gilbert, F.J.E. Long, R. Schreiber, S.H. Teng, Generating local addresses and communication sets for data-parallel programs, Journal of Parallel and Distributed Computing 26 (1995) 72-84.

Digital Library

[7]

{7} Y.-C. Chung, C.-H. Hsu, S.-W. Bai, A basic-cycle calculation technique for efficient dynamic data redistribution, IEEE Transactions on Parallel Distributed Systems 9 (4) (1998) 359-377.

Digital Library

[8]

{8} F. Coelho, C. Germain, J.-L. Pazat, State of the art in compiling HPF, in: U. Banerjee et al. (Eds.), Proceedings of Sixth International Workshop on Languages and Compilers for Parallel Computing, 1997.

[9]

{9} F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro, Y. Robert, Scheduling block-cyclic array redistribution, IEEE Transactions on Parallel Distributed Systems 9 (2) (1998) 192-205.

Digital Library

[10]

{10} M. Le Fur, J.-L. Pazat, F. André, An array partitioning analysis for parallel loop distribution, in: Lecture Notes in Computer Science 966, International Conference EURO-PAR'95 Parallel Processing, Stockholm, Sweden, 1995, pp. 351-364.

[11]

{11} S.K.S. Gupta, S.D. Kaushik, C.H. Huang, P. Sadayappan, Compiling array expressions for efficient execution on distributed-memory machines, Journal of Parallel and Distributed Computing 32 (1996) 155-172.

Digital Library

[12]

{12} S. Hiranandani, K. Kennedy, J. Mellor-Crummey, A. Sethi, Compilation techniques for block-cyclic distributions, in: Proceedings of ACM International Conference on Supercomputing, Manchester, UK, 1994, pp. 392-403.

[13]

{13} S. Hiranandani, K. Kennedy, C.-W. Tseng, Compiling Fortran D for MIMD distributed-memory machines, Communications of the ACM 35 (8) (1992) 66-80.

Digital Library

[14]

{14} E.T. Kalns, L.M. Ni, Processor mapping techniques toward efficient data redistribution, IEEE Transactions on Parallel Distributed Systems 6 (12) (1995) 1234-1247.

Digital Library

[15]

{15} S.D. Kaushik, C.H. Huang, R.W. Johnson, P. Sadayappan, An approach to communication-efficient data redistribution, in: Proceedings of ACM International Conference on Supercomputing, Manchester, UK, 1994, pp. 364-373.

[16]

{16} S.D. Kaushik, C.H. Huang, P. Sadayappan, Efficient index set generation for compiling HPF array statements on distributed-memory machines, Journal of Parallel and Distributed Computing 38 (2) (1996) 237-247.

Digital Library

[17]

{17} W. Kelly, V. Masloy, W. Pugh, W. Rosser, T. Shpeisman, D. Wonnacott, The omega library interface guide, Technical Report, Department of Computer Science, University of Maryland, April 1996.

Digital Library

[18]

{18} K. Kennedy, N. Nedeljkovic, A. Sethi, Communication generation for cyclic(k) distributions, in: Proceedings of Third Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers, Troy, New York, 1995, pp. 185-197.

[19]

{19} K. Kennedy, N. Nedeljkovic, A. Sethi, Efficient address generation for block-cyclic distributions, in: Proceedings of ACM International Conference on Supercomputing, Barcelona, Spain, 1995, pp. 180-184.

[20]

{20} K. Kennedy, N. Nedeljkovic, A. Sethi, A linear-time algorithm for computing the memory access sequence in data-parallel programs, in: Proceedings of ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Santa Barbara, CA, 1995, pp. 149-158.

Digital Library

[21]

{21} C. Koelbel, Compile-time generation of regular communications patterns, in: Proceedings of Supercomputing'91, 1991, pp. 101-110.

[22]

{22} C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., M. Zosel, The High Performance Fortran Handbook, The MIT Press, Cambridge, MA, 1994.

[23]

{23} C. Koelbel, P. Mehrotra, Compiling global name-space parallel loops for distributed execution, IEEE Transactions on Parallel Distributed Systems 2 (4) (1991) 440-451.

Digital Library

[24]

{24} P.-Z. Lee, Techniques for compiling programs on distributed memory multicomputers, Parallel Computing 21 (12) (1995) 1895-1923.

Digital Library

[25]

{25} P.-Z. Lee, Efficient algorithms for data distribution on distributed memory parallel computers, IEEE Transactions on Parallel Distributed Systems 8 (8) (1997) 825-839.

Digital Library

[26]

{26} P.-Z. Lee, W.Y. Chen, Compiler techniques for determining data distribution and generating communication sets on distributed-memory multicomputers, in: Proceedings of 29th Hawaii International Conference on System Sciences, vol. 1, Maui, Hawaii, 1996, pp. 537-546. Also Technical Report TR-95-007, Institute of Information Science, Academia Sinica, Taipei, Taiwan.

[27]

{27} P.-Z. Lee, W.Y. Chen. Generating global name-space communication sets for array assignment statements, Technical Report TR-IIS-97-016, Institute of Information Science, Academia Sinica, Taipei, Taiwan, October 1997, Available via WWW at http://www.iis.sinica.edu.tw/leepe/PAPER/ tr97016.ps.

[28]

{28} S. Midkiff, Local iteration set computation for block-cyclic distributions, in: Proceedings of International Conference on Parallel Processing, 1995, pp. II-77-84.

[29]

{29} S. Ramaswamy, B. Simons, P. Banerjee, Optimizations for efficient array redistribution on distributed memory multicomputers, Journal of Parallel and Distributed Computing 38 (2) (1996) 217-228.

Digital Library

[30]

{30} J.M. Stichnoth, D. O'Hallaron, T.R. Gross, Generating communication for array statements: Design, implementation, and evaluation, Journal of Parallel and Distributed Computing 21 (1994) 150-159.

Digital Library

[31]

{31} R. Thakur, A. Choudhary, G. Fox, Runtime array redistribution in HPF program, in: Proceedings of Scalable High Performance Computing Conference, 1994, pp. 309-316.

[32]

{32} R. Thakur, A. Choudhary, J. Ramanujam, Efficient algorithms for array redistribution, IEEE Transactions on Parallel Distributed Systems 7 (6) (1996) 587-594.

Digital Library

[33]

{33} A. Thirumalai, J. Ramanujam, Efficient computation of address sequences in data parallel programs using closed forms for basis vectors, Journal of Parallel and Distributed Computing 38 (2) (1996) 188-203.

Digital Library

[34]

{34} A. Thirumalai, J. Ramanujam, A. Venkatachar, Communication generation and optimization for HPF, in: B. Szymanski, B. Sinharoy (Eds.), Proceedings of Third Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers, Troy, Kluwer Academic Publishers, New York, 1995, pp. 311-316.

[35]

{35} E.H.Y. Tseng, J.L. Gaudiot, Communication generation for aligned and cyclic(k) distributions using integer lattice, IEEE Transactions on Parallel Distributed Systems 10 (2) (1999) 136-146.

[36]

{36} V. van Dongen, Compiling distributed loops onto SPMD code, Parallel Processing Letters 4 (3) (1994) 301-312.

[37]

{37} K. van Reeuwijk, W. Denissen, H.J. Sips, E.M.R.M. Paalvast, An implementation framework for HPF distributed arrays on message-passing parallel computer systems, IEEE Transactions on Parallel Distributed Systems 7 (9) (1996) 897-914.

Digital Library

[38]

{38} A. Venkatachar, J. Ramanujam, A. Thirumalai, Communication generation for block-cyclic distribution, Parallel Processing Letters 7 (2) (1997) 195-202.

[39]

{39} A. Venkatachar, J. Ramanujam, A. Thirumalai, Generalized overlap regions for communication optimization in data-parallel programs, in: U. Banerjee et al. (Eds.), Proceedings of Tenth International Workshop on Languages and Compilers for Parallel Computing, 1997.

[40]

{40} A. Wakatani, M. Wolfe, Optimization of array redistribution for distributed memory multicomputers, Parallel Computing 21 (1995) 1485-1490.

Digital Library

[41]

{41} L. Wang, J.M. Stichnoth, S. Chatterjee, Runtime performance of parallel array assignment: An empirical study, in: Proceedings of Supercomputing '96, Pittsburgh, PA, November 1996, Available via WWW at http://www.supercomp.org/sc96/proceedings/SC96PROC/CHATTER/INDEX.HTM.

[42]

{42} M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, CA, 1996.

Index Terms

Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Recommendations

Efficient communication sets generation for block-cyclic distribution on distributed-memory machines

How to generate local memory access sequence and communication sets efficiently is an important issue in compiling a data-parallel language into a single program multiple data (SPMD) code for distributed-memory machines. Many methods have been developed ...
Generating Realignment-Based Communication for HPF Programs
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

This paper presents methods for generating communication on compiling HPF programs for distributed-memory machines. We introduce the concept of an iteration template corresponding to an iteration space. Our HPF compiler performs the loop iteration ...
Symbolic Communication Set Generation for Irregular Parallel Applications

Communication set generation significantly influences the performance of parallel programs. However, studies seldom give attention to the problem of communication set generation for irregular applications. In this paper, we propose communication ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Parallel Computing

Parallel Computing Volume 28, Issue 9

September 2002

145 pages

ISSN:0167-8191

Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2002

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents