[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computers

Published: 01 September 2002 Publication History

Abstract

This paper is concerned with the design of efficient algorithms for generating global name-space communication sets based on execution of array assignment statements for arbitrary strides and block sizes on distributed-memory parallel computers. We will present a hybrid approach, which combines the advantages of the set-theoretic method and the integer lattice method for generating communication sets. When block sizes are extremely small or large, a cyclic-based or a row-wise set-theoretic method is used. For other cases when block sizes are medium, we propose a new integer lattice method, in which data in each local block are treated as a unit. The first virtual referenced element in each virtual referenced local block can be generated efficiently by using an integer lattice method, in which the left boundary of index domain in each processing element is extended for this purpose. Then, the physical referenced elements in each physical referenced local block can be determined by the intersection of two closed forms, whose result is also a closed form. Because the cost of generating indices for packing and unpacking messages at the sending and receiving ends may be expensive for certain cases, we also study the conventional communication model and the deposit communication model. As each of the proposed algorithms and the communication models has its special use for certain cases, we thus identify rules of thumb to decide the most suitable algorithm for dealing with general cases.

References

[1]
{1} V. Adve, J. Mellor-Crummey, Using integer sets for data-parallel program analysis and optimization, in: Proceedings of ACM-SIGPLAN PLDI, Montreal, CA, 1998.
[2]
{2} C. Ancourt, F. Coelho, F. Irigoin, R. Keryell, A linear algebra framework for static high performance Fortran code distribution, Scientific Programming 6 (1997) 3-27.
[3]
{3} S. Benkner, Handling block-cyclic distributed arrays in Vienna Fortran 90, in: Proceedings of International Conference on Parallel Architecures and Compilation Techniques (PACT-95), Limassol, Cyprus, 1995.
[4]
{4} S. Benkner, P. Brezany, H. Zima, Processing array statements and procedure interfaces in the PREPARE high performance Fortran compiler, in: Lecture Notes in Computer Science, vol. 786, 1993, pp. 324-338.
[5]
{5} Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, M.-Y. Wu, Compiling Fortran 90D/HPF for distributed memory MIMD computers, Journal of Parallel and Distributed Computing 21 (1994) 15-26.
[6]
{6} S. Chatterjee, J.R. Gilbert, F.J.E. Long, R. Schreiber, S.H. Teng, Generating local addresses and communication sets for data-parallel programs, Journal of Parallel and Distributed Computing 26 (1995) 72-84.
[7]
{7} Y.-C. Chung, C.-H. Hsu, S.-W. Bai, A basic-cycle calculation technique for efficient dynamic data redistribution, IEEE Transactions on Parallel Distributed Systems 9 (4) (1998) 359-377.
[8]
{8} F. Coelho, C. Germain, J.-L. Pazat, State of the art in compiling HPF, in: U. Banerjee et al. (Eds.), Proceedings of Sixth International Workshop on Languages and Compilers for Parallel Computing, 1997.
[9]
{9} F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro, Y. Robert, Scheduling block-cyclic array redistribution, IEEE Transactions on Parallel Distributed Systems 9 (2) (1998) 192-205.
[10]
{10} M. Le Fur, J.-L. Pazat, F. André, An array partitioning analysis for parallel loop distribution, in: Lecture Notes in Computer Science 966, International Conference EURO-PAR'95 Parallel Processing, Stockholm, Sweden, 1995, pp. 351-364.
[11]
{11} S.K.S. Gupta, S.D. Kaushik, C.H. Huang, P. Sadayappan, Compiling array expressions for efficient execution on distributed-memory machines, Journal of Parallel and Distributed Computing 32 (1996) 155-172.
[12]
{12} S. Hiranandani, K. Kennedy, J. Mellor-Crummey, A. Sethi, Compilation techniques for block-cyclic distributions, in: Proceedings of ACM International Conference on Supercomputing, Manchester, UK, 1994, pp. 392-403.
[13]
{13} S. Hiranandani, K. Kennedy, C.-W. Tseng, Compiling Fortran D for MIMD distributed-memory machines, Communications of the ACM 35 (8) (1992) 66-80.
[14]
{14} E.T. Kalns, L.M. Ni, Processor mapping techniques toward efficient data redistribution, IEEE Transactions on Parallel Distributed Systems 6 (12) (1995) 1234-1247.
[15]
{15} S.D. Kaushik, C.H. Huang, R.W. Johnson, P. Sadayappan, An approach to communication-efficient data redistribution, in: Proceedings of ACM International Conference on Supercomputing, Manchester, UK, 1994, pp. 364-373.
[16]
{16} S.D. Kaushik, C.H. Huang, P. Sadayappan, Efficient index set generation for compiling HPF array statements on distributed-memory machines, Journal of Parallel and Distributed Computing 38 (2) (1996) 237-247.
[17]
{17} W. Kelly, V. Masloy, W. Pugh, W. Rosser, T. Shpeisman, D. Wonnacott, The omega library interface guide, Technical Report, Department of Computer Science, University of Maryland, April 1996.
[18]
{18} K. Kennedy, N. Nedeljkovic, A. Sethi, Communication generation for cyclic(k) distributions, in: Proceedings of Third Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers, Troy, New York, 1995, pp. 185-197.
[19]
{19} K. Kennedy, N. Nedeljkovic, A. Sethi, Efficient address generation for block-cyclic distributions, in: Proceedings of ACM International Conference on Supercomputing, Barcelona, Spain, 1995, pp. 180-184.
[20]
{20} K. Kennedy, N. Nedeljkovic, A. Sethi, A linear-time algorithm for computing the memory access sequence in data-parallel programs, in: Proceedings of ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Santa Barbara, CA, 1995, pp. 149-158.
[21]
{21} C. Koelbel, Compile-time generation of regular communications patterns, in: Proceedings of Supercomputing'91, 1991, pp. 101-110.
[22]
{22} C. Koelbel, D. Loveman, R. Schreiber, G. Steele Jr., M. Zosel, The High Performance Fortran Handbook, The MIT Press, Cambridge, MA, 1994.
[23]
{23} C. Koelbel, P. Mehrotra, Compiling global name-space parallel loops for distributed execution, IEEE Transactions on Parallel Distributed Systems 2 (4) (1991) 440-451.
[24]
{24} P.-Z. Lee, Techniques for compiling programs on distributed memory multicomputers, Parallel Computing 21 (12) (1995) 1895-1923.
[25]
{25} P.-Z. Lee, Efficient algorithms for data distribution on distributed memory parallel computers, IEEE Transactions on Parallel Distributed Systems 8 (8) (1997) 825-839.
[26]
{26} P.-Z. Lee, W.Y. Chen, Compiler techniques for determining data distribution and generating communication sets on distributed-memory multicomputers, in: Proceedings of 29th Hawaii International Conference on System Sciences, vol. 1, Maui, Hawaii, 1996, pp. 537-546. Also Technical Report TR-95-007, Institute of Information Science, Academia Sinica, Taipei, Taiwan.
[27]
{27} P.-Z. Lee, W.Y. Chen. Generating global name-space communication sets for array assignment statements, Technical Report TR-IIS-97-016, Institute of Information Science, Academia Sinica, Taipei, Taiwan, October 1997, Available via WWW at http://www.iis.sinica.edu.tw/leepe/PAPER/ tr97016.ps.
[28]
{28} S. Midkiff, Local iteration set computation for block-cyclic distributions, in: Proceedings of International Conference on Parallel Processing, 1995, pp. II-77-84.
[29]
{29} S. Ramaswamy, B. Simons, P. Banerjee, Optimizations for efficient array redistribution on distributed memory multicomputers, Journal of Parallel and Distributed Computing 38 (2) (1996) 217-228.
[30]
{30} J.M. Stichnoth, D. O'Hallaron, T.R. Gross, Generating communication for array statements: Design, implementation, and evaluation, Journal of Parallel and Distributed Computing 21 (1994) 150-159.
[31]
{31} R. Thakur, A. Choudhary, G. Fox, Runtime array redistribution in HPF program, in: Proceedings of Scalable High Performance Computing Conference, 1994, pp. 309-316.
[32]
{32} R. Thakur, A. Choudhary, J. Ramanujam, Efficient algorithms for array redistribution, IEEE Transactions on Parallel Distributed Systems 7 (6) (1996) 587-594.
[33]
{33} A. Thirumalai, J. Ramanujam, Efficient computation of address sequences in data parallel programs using closed forms for basis vectors, Journal of Parallel and Distributed Computing 38 (2) (1996) 188-203.
[34]
{34} A. Thirumalai, J. Ramanujam, A. Venkatachar, Communication generation and optimization for HPF, in: B. Szymanski, B. Sinharoy (Eds.), Proceedings of Third Workshop on Languages, Compilers, and Runtime Systems for Scalable Computers, Troy, Kluwer Academic Publishers, New York, 1995, pp. 311-316.
[35]
{35} E.H.Y. Tseng, J.L. Gaudiot, Communication generation for aligned and cyclic(k) distributions using integer lattice, IEEE Transactions on Parallel Distributed Systems 10 (2) (1999) 136-146.
[36]
{36} V. van Dongen, Compiling distributed loops onto SPMD code, Parallel Processing Letters 4 (3) (1994) 301-312.
[37]
{37} K. van Reeuwijk, W. Denissen, H.J. Sips, E.M.R.M. Paalvast, An implementation framework for HPF distributed arrays on message-passing parallel computer systems, IEEE Transactions on Parallel Distributed Systems 7 (9) (1996) 897-914.
[38]
{38} A. Venkatachar, J. Ramanujam, A. Thirumalai, Communication generation for block-cyclic distribution, Parallel Processing Letters 7 (2) (1997) 195-202.
[39]
{39} A. Venkatachar, J. Ramanujam, A. Thirumalai, Generalized overlap regions for communication optimization in data-parallel programs, in: U. Banerjee et al. (Eds.), Proceedings of Tenth International Workshop on Languages and Compilers for Parallel Computing, 1997.
[40]
{40} A. Wakatani, M. Wolfe, Optimization of array redistribution for distributed memory multicomputers, Parallel Computing 21 (1995) 1485-1490.
[41]
{41} L. Wang, J.M. Stichnoth, S. Chatterjee, Runtime performance of parallel array assignment: An empirical study, in: Proceedings of Supercomputing '96, Pittsburgh, PA, November 1996, Available via WWW at http://www.supercomp.org/sc96/proceedings/SC96PROC/CHATTER/INDEX.HTM.
[42]
{42} M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, CA, 1996.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Parallel Computing
Parallel Computing  Volume 28, Issue 9
September 2002
145 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2002

Author Tags

  1. array assignment statements
  2. communication sets
  3. distributed-memory machines
  4. global name space
  5. integer lattice method
  6. parallelizing compilers
  7. set-theoretic method

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media