[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

Published: 01 December 1999 Publication History

Abstract

Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on $P$ processors to cyclic(Kx) on $Q$ processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes $O(max(P,Q))$ time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation.

References

[1]
L. Blackford J. Choi A. Cleary E. D'Azevedo J. Demmel I. Dhillon J. Dongarra S. Hammarling G. Henry A. Petitet K. Stanley D. Walker and R.C. Whaley, ScaLAPACK Users' Guide. SIAM Publications, 1997.
[2]
J. Bruck C.-H. Ho S. Kipnis and D. Weathersby, “Efficient Algorithms for All-to-All Communications in Multi-Port Message-Passing Systems,” Proc. Sixth Ann. ACM Symp. Parallel Algorithms and Architecture, pp. 298–309, July 1994.
[3]
J. Choi J. Dongarra and D. Walker, “Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers,” Proc. Fourth Symp. Frontiers of Massively Parallel Computation, 1993.
[4]
Y.C. Chung C.H. Hsu and S.W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.
[5]
F. Desprez J. Dongarra A. Petitet C. Randriamaro and Y. Robert, “Scheduling Block-Cyclic Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, Feb. 1998.
[6]
E.A. Dinic, “Algorithm for Solution of Maximum Flow in a Network with Power Estimation,” Soviet Math Doklady, vol. 11, pp. 1,277-1,280, 1970.
[7]
S. Hiranandani K. Kennedy J. Mellor-Crummey and A. Sethi, “Compilation Techniques for Block-Cyclic Distributions,” Proc. Int'l Conf. Supercomputing, pp. 392–403, July 1994.
[8]
E.T. Kalns and L.M. Ni, “Processor Mapping Techniques toward Efficient Data Redistribution,” Proc. Int'l Parallel Processing Symp., Apr. 1994.
[9]
S.D. Kaushik C.-H. Huang R.W. Johnson and P. Sadayappan, “An Approach to Communication Efficient Data Redistribution,” Proc. Int'l Conf. Supercomputing, pp. 364–373, July 1994.
[10]
S.D. Kaushik C.H. Huang J. Ramanujamand and P. Sadayappan, “Multiphase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., pp. 441–445, 1995.
[11]
C. Koelbel D. Loveman R. Schreiber G. Steele, Jr. and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.
[12]
Y.W. Lim P.B. Bhat and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of an Array,” Proc. IEEE Symp. Parallel and Distributed Processing, Oct. 1996.
[13]
Y.W. Lim and V.K. Prasanna, “Scalable Portable Implementations of Space-Time Adaptive Processing,” Proc. 10th Int'l. Conf. High Performance Computing, 1996.
[14]
W. Liu W. Kostis and V.K. Prasanna, “Communication Issues in Heterogeneous Embedded Systems,” Proc. Workshop Parallel and Distributed Real Time Systems, Apr. 1996.
[15]
W. Liu and V.K. Prasanna, “Design of Application Software for Embedded Signal Processing,” IEEE Signal Processing Magazine, Sept. 1998.
[16]
Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard,” Int'l J. Supercomputer Applications and High Performance Computing, vol. 8, nos. 3–4, 1994.
[17]
C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, pp. 225-226. Prentice Hall, 1982.
[18]
L. Prylli and B. Tourancheau, “{Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, Aug. 1997.
[19]
S. Ramaswamy and P. Banerjee, “Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers,” Proc. Fifth Symp. Frontiers of Massively Parallel Computation, pp. 342–349, Feb. 1995.
[20]
J.C. Setubal, “Sequential and Parallel Experimental Results with Bipartite Matching Algorithms,” Technical Report IC-96-09, Inst. Computing, State Univ. Campinas, Brazil, 1996. http://www.cs. sunysb.edu/~algorith/implement/bipm/implement.shtml.
[21]
J. Suh M. Ung and V.K. Prasanna, “Parallel Implementation of Synthetic Aperture Radar on High Performance Computing Platforms,” Proc. Int'l Conf. Algorithms and Architectures for Parallel Processing, Dec. 1997.
[22]
R. Thakur A. Choudhary and G. Fox, “Runtime Array Redistribution in HPF Programs,” Proc. Scalable High Performance Computing Conf., pp. 309–316, May 1994.
[23]
R. Thakur A. Choudhary and J. Ramanujam, “Efficient Algorithms for Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, pp. 587-594, June 1996.
[24]
D.W. Walker and S.W. Otto, “Redistribution of Block-Cyclic Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, Nov. 1996.
[25]
C.-L. Wang P.B. Bhat and V.K. Prasanna, “High-Performance Computing for Vision,” Proc. IEEE, vol. 84, pp. 931–946, 1996.

Cited By

View all
  • (2008)A message combining approach for efficient array redistribution in non-all-to-all communication networksInternational Journal of Computer Mathematics10.1080/0020716070153787385:11(1609-1619)Online publication date: 1-Nov-2008
  • (2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
  • (2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 10, Issue 12
December 1999
152 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 December 1999

Author Tags

  1. Block-cyclic distribution
  2. interprocessor communication.
  3. redistribution algorithms

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2008)A message combining approach for efficient array redistribution in non-all-to-all communication networksInternational Journal of Computer Mathematics10.1080/0020716070153787385:11(1609-1619)Online publication date: 1-Nov-2008
  • (2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
  • (2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
  • (2007)On the complexity of the max-edge-coloring problem with its variantsProceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies10.5555/2399256.2399288(350-361)Online publication date: 7-Apr-2007
  • (2007)Efficient multidimensional data redistribution for resizable parallel computationsProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395991(182-194)Online publication date: 29-Aug-2007
  • (2007)ISOProceedings of the 9th international conference on Parallel Computing Technologies10.5555/2392094.2392149(507-515)Online publication date: 3-Sep-2007
  • (2007)Contention-free communication scheduling for group communication in data parallelismProceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II10.5555/1784707.1784729(1349-1366)Online publication date: 25-Nov-2007
  • (2007)Scheduling contention-free irregular redistributions in parallelizing compilersThe Journal of Supercomputing10.1007/s11227-006-0024-140:3(229-247)Online publication date: 1-Jun-2007
  • (2007)Contention-Free Communication Scheduling for Group Communication in Data ParallelismOn the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS10.1007/978-3-540-76843-2_16(1349-1366)Online publication date: 25-Nov-2007
  • (2006)Scheduling Messages For Data RedistributionInternational Journal of High Performance Computing Applications10.1177/109434200606840720:4(443-454)Online publication date: 1-Nov-2006
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media