More Web Proxy on the site http://driver.im/

research-article

Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

Authors:

Viktor K. Prasanna,

Cauligi S. RaghavendraAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 10, Issue 12

Pages 1217 - 1240

https://doi.org/10.1109/71.819945

Published: 01 December 1999 Publication History

Abstract

Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on $P$ processors to cyclic(Kx) on $Q$ processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes $O(max(P,Q))$ time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation.

References

[1]

L. Blackford J. Choi A. Cleary E. D'Azevedo J. Demmel I. Dhillon J. Dongarra S. Hammarling G. Henry A. Petitet K. Stanley D. Walker and R.C. Whaley, ScaLAPACK Users' Guide. SIAM Publications, 1997.

Digital Library

[2]

J. Bruck C.-H. Ho S. Kipnis and D. Weathersby, “Efficient Algorithms for All-to-All Communications in Multi-Port Message-Passing Systems,” Proc. Sixth Ann. ACM Symp. Parallel Algorithms and Architecture, pp. 298–309, July 1994.

Digital Library

[3]

J. Choi J. Dongarra and D. Walker, “Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers,” Proc. Fourth Symp. Frontiers of Massively Parallel Computation, 1993.

[4]

Y.C. Chung C.H. Hsu and S.W. Bai, “A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 4, Apr. 1998.

Digital Library

[5]

F. Desprez J. Dongarra A. Petitet C. Randriamaro and Y. Robert, “Scheduling Block-Cyclic Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 9, no. 2, Feb. 1998.

Digital Library

[6]

E.A. Dinic, “Algorithm for Solution of Maximum Flow in a Network with Power Estimation,” Soviet Math Doklady, vol. 11, pp. 1,277-1,280, 1970.

[7]

S. Hiranandani K. Kennedy J. Mellor-Crummey and A. Sethi, “Compilation Techniques for Block-Cyclic Distributions,” Proc. Int'l Conf. Supercomputing, pp. 392–403, July 1994.

Digital Library

[8]

E.T. Kalns and L.M. Ni, “Processor Mapping Techniques toward Efficient Data Redistribution,” Proc. Int'l Parallel Processing Symp., Apr. 1994.

Digital Library

[9]

S.D. Kaushik C.-H. Huang R.W. Johnson and P. Sadayappan, “An Approach to Communication Efficient Data Redistribution,” Proc. Int'l Conf. Supercomputing, pp. 364–373, July 1994.

Digital Library

[10]

S.D. Kaushik C.H. Huang J. Ramanujamand and P. Sadayappan, “Multiphase Array Redistribution: Modeling and Evaluation,” Proc. Int'l Parallel Processing Symp., pp. 441–445, 1995.

Digital Library

[11]

C. Koelbel D. Loveman R. Schreiber G. Steele, Jr. and M. Zosel, The High Performance Fortran Handbook. MIT Press, 1994.

Digital Library

[12]

Y.W. Lim P.B. Bhat and V.K. Prasanna, “Efficient Algorithms for Block-Cyclic Redistribution of an Array,” Proc. IEEE Symp. Parallel and Distributed Processing, Oct. 1996.

Digital Library

[13]

Y.W. Lim and V.K. Prasanna, “Scalable Portable Implementations of Space-Time Adaptive Processing,” Proc. 10th Int'l. Conf. High Performance Computing, 1996.

[14]

W. Liu W. Kostis and V.K. Prasanna, “Communication Issues in Heterogeneous Embedded Systems,” Proc. Workshop Parallel and Distributed Real Time Systems, Apr. 1996.

Digital Library

[15]

W. Liu and V.K. Prasanna, “Design of Application Software for Embedded Signal Processing,” IEEE Signal Processing Magazine, Sept. 1998.

[16]

Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard,” Int'l J. Supercomputer Applications and High Performance Computing, vol. 8, nos. 3–4, 1994.

[17]

C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity, pp. 225-226. Prentice Hall, 1982.

Digital Library

[18]

L. Prylli and B. Tourancheau, “{Fast Runtime Block Cyclic Data Redistribution on Multiprocessors,” J. Parallel and Distributed Computing, vol. 45, Aug. 1997.

Digital Library

[19]

S. Ramaswamy and P. Banerjee, “Automatic Generation of Efficient Array Redistribution Routines for Distributed Memory Multicomputers,” Proc. Fifth Symp. Frontiers of Massively Parallel Computation, pp. 342–349, Feb. 1995.

Digital Library

[20]

J.C. Setubal, “Sequential and Parallel Experimental Results with Bipartite Matching Algorithms,” Technical Report IC-96-09, Inst. Computing, State Univ. Campinas, Brazil, 1996. http://www.cs. sunysb.edu/~algorith/implement/bipm/implement.shtml.

[21]

J. Suh M. Ung and V.K. Prasanna, “Parallel Implementation of Synthetic Aperture Radar on High Performance Computing Platforms,” Proc. Int'l Conf. Algorithms and Architectures for Parallel Processing, Dec. 1997.

[22]

R. Thakur A. Choudhary and G. Fox, “Runtime Array Redistribution in HPF Programs,” Proc. Scalable High Performance Computing Conf., pp. 309–316, May 1994.

[23]

R. Thakur A. Choudhary and J. Ramanujam, “Efficient Algorithms for Array Redistribution,” IEEE Trans. Parallel and Distributed Systems, vol. 7, no. 6, pp. 587-594, June 1996.

Digital Library

[24]

D.W. Walker and S.W. Otto, “Redistribution of Block-Cyclic Data Distributions Using MPI,” Concurrency: Practice and Experience, vol. 8, no. 9, pp. 707-728, Nov. 1996.

[25]

C.-L. Wang P.B. Bhat and V.K. Prasanna, “High-Performance Computing for Vision,” Proc. IEEE, vol. 84, pp. 931–946, 1996.

Cited By

Souravlas SRoumeliotis M(2008)A message combining approach for efficient array redistribution in non-all-to-all communication networksInternational Journal of Computer Mathematics10.1080/0020716070153787385:11(1609-1619)Online publication date: 1-Nov-2008
https://dl.acm.org/doi/10.1080/00207160701537873
Souravlas SRoumeliotis M(2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1007/s11227-008-0185-1
Huang JChu C(2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1007/s11227-007-0166-9
Show More Cited By

Index Terms

Efficient Algorithms for Block-Cyclic Array Redistribution Between Processor Sets

Recommendations

Scheduling Block-Cyclic Array Redistribution

This article is devoted to the run-time redistribution of one-dimensional arrays that are distributed in a block-cyclic fashion over a processor grid. While previous studies have concentrated on efficiently generating the communication messages to be ...
Improving communication scheduling for array redistribution

Many scientific applications require array redistribution when the programs run on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs will degrade ...
Efficient algorithms for block-cyclic array redistribution between processor sets
SC '98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing

Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 10, Issue 12

December 1999

152 pages

ISSN:1045-9219

Editor:
John A. Stankovic
Univ. of Virginia, Charlottesville

Issue’s Table of Contents

Copyright © Copyright © 1999 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 December 1999

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Souravlas SRoumeliotis M(2008)A message combining approach for efficient array redistribution in non-all-to-all communication networksInternational Journal of Computer Mathematics10.1080/0020716070153787385:11(1609-1619)Online publication date: 1-Nov-2008
https://dl.acm.org/doi/10.1080/00207160701537873
Souravlas SRoumeliotis M(2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1007/s11227-008-0185-1
Huang JChu C(2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1007/s11227-007-0166-9
Yu C(2007)On the complexity of the max-edge-coloring problem with its variantsProceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies10.5555/2399256.2399288(350-361)Online publication date: 7-Apr-2007
https://dl.acm.org/doi/10.5555/2399256.2399288
Sudarsan RRibbens C(2007)Efficient multidimensional data redistribution for resizable parallel computationsProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395991(182-194)Online publication date: 29-Aug-2007
https://dl.acm.org/doi/10.5555/2395970.2395991
Chen SHsu C(2007)ISOProceedings of the 9th international conference on Parallel Computing Technologies10.5555/2392094.2392149(507-515)Online publication date: 3-Sep-2007
https://dl.acm.org/doi/10.5555/2392094.2392149
Wang JHu CLi J(2007)Contention-free communication scheduling for group communication in data parallelismProceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II10.5555/1784707.1784729(1349-1366)Online publication date: 25-Nov-2007
https://dl.acm.org/doi/10.5555/1784707.1784729
Hsu CChen SLan C(2007)Scheduling contention-free irregular redistributions in parallelizing compilersThe Journal of Supercomputing10.1007/s11227-006-0024-140:3(229-247)Online publication date: 1-Jun-2007
https://dl.acm.org/doi/10.1007/s11227-006-0024-1
Wang JHu CLi J(2007)Contention-Free Communication Scheduling for Group Communication in Data ParallelismOn the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS10.1007/978-3-540-76843-2_16(1349-1366)Online publication date: 25-Nov-2007
https://dl.acm.org/doi/10.1007/978-3-540-76843-2_16
Jeannot EWagner F(2006)Scheduling Messages For Data RedistributionInternational Journal of High Performance Computing Applications10.1177/109434200606840720:4(443-454)Online publication date: 1-Nov-2006
https://dl.acm.org/doi/10.1177/1094342006068407
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents