More Web Proxy on the site http://driver.im/

article

Free access

Interprocessor communication speed and performance in distributed-memory parallel processors

Authors:

R. RühlAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 17, Issue 3

Pages 315 - 324

https://doi.org/10.1145/74926.74961

Published: 01 April 1989 Publication History

Abstract

We have simulated several numerical and non-numerical algorithms on five distributed-memory parallel processors (DMPPs). All five DMPPs have the same topology (a torus), and the same number of nodes. The architectures differ only in the communication speed between neighboring nodes, while the computation unit is kept unchanged. The goal of the paper is to quantify the effect that interprocessor communication speed and synchronization overhead have on the performance of the DMPPs. After introducing the rationale for this study and reviewing related work, we present and discuss the results of the simulations.

References

[1]

C. L. Lawson, R. J. Hanson, D. Ft. Kin&d, and F. T. Krogb. Basic kinear algebra subprograms for Fortran usage. ACM Tranaociions on Math. So&, 5(3):308323, September 1979.

Digital Library

[2]

P. Close. The iPSC/2 node architecture. In Concrrrtent Supercompuling, pages 4349, Intel Scientific Computers, 1988.

[3]

S.F. Nugent. The iPSC/2 direct-connect communication technology. In Concurrent Supercomputing, pages 59-67, Intel Scientific Computers, 1988.

[4]

T.H. Dunigan. Performance of a second generation hypercube. Technical Report ORNL TM-10881, Oak Ridge National Laboratory, November 1988.

[5]

M. T. Heath and C.H. Romine. Parallel solution of triangular systems on distributed-memory multiprocessors. SIAM J. Sci. Stat. Cornput., 9(3):558-588, May 1988.

Digital Library

[6]

J.L. Gustafson, G.R. Montry, and R.E. Benner. Develop ment of parallel methods for a 102Cprocessor hypercube. SIAM J. Sci. Stat. Compul., 9(4):609-638, July 1988.

Digital Library

[7]

INMOS Limited. Tranrputer Reference Manual. Prentice Hall, 1988.

Digital Library

[8]

M. Annaratone et al. The Warp Computer: Architeo ture, Implementation, and Performance. IEEE Trans. on Compuiera, December 1987.

Digital Library

[9]

S. Borkar et al. iWarp: an integrated solution to highspeed parallel computation. In Proc. SupercompuGng 88, November 1988.

Digital Library

[10]

V. Cherkassy and R. Smith. Efficient mapping and implementation of matrix algorithms on a hypercube. The Journal of Supercomputing, 2(1):7-27, September 1988.

[11]

P. Sadayappan and F. Ercal. Nearest-neighbor mapping of finite element graphs onto processor meshes. IEEE Transactions on Computers, C-36(12):1408-1424, December 1987.

Digital Library

[12]

J. L. Gust&son, G. R. Montry, and R. E. Benner. Development of parsllel methods for a 1024-processor hypercube. SIAM J. Sci. Slat. Comprt., 9(4):609-638, July 1988.

Digital Library

[13]

T. Priol and K, Bouatouch. Experimenting with a Parallel Ray-Tracing Algorithm on a Hypcrc*bc Machine. PubIication Inteme 405, Institut de Recherche en Mormatique et Systbmes Albatoires, Campus Universitaire de Beaulieu, 35042 - Rennet Cklex, France, Avril 1988.

[14]

S. C. Eisenstat, M. T. Heath, C. S. Henkel, and C. H. Romine. Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessor. SIAM J. Sci. Stat. Cornput., 9(3):589-600, May 1988.

Digital Library

[15]

D. M. Nicol and F. H. Willard. Problem size, parallel architecture, and optimal speedup. Journal of Parallel and Distributed Computing, 5:404420, 1988.

Digital Library

[16]

S. L. Johnsson. Communication efficient basic linear alge bra computations on hypercube architectures. J. of Parallel and Distributed Processing, 4(2):133-172, April 1987.

Digital Library

[17]

H.T. Kung. Memory requirements for balanced computer architectures. ln Proc. 13th Computer Architecture Symposium, pages 49-54, June 1986. Meiko Inc. The Meiko Computing Surface. 1988.

Digital Library

[18]

L. M. Adams and T. W. Crockett. Modeling algorithm execution time on processor arrays. IEEE Computer, 38- 43, July 1984.

[19]

W. J. Dally. Wire-Eficient VLSI Multiprocessor Communication Networka. VLSI Memo 86345, Massachusetts Institute of Technology, Cambridge, Massachusetts, October 1986.

[20]

P. M. B. VitBnyi. Locality, communication, and interconnect length in multicomputem. SIAM J. Comput., 17(4):659-672, August 1988.

Digital Library

[21]

I.C. Ipaen, Y. Saad, 8ndM.H. Schultz. Complexity of dense linear system solution on a multiprocessor-ring. Linear Algebra and its applications, 77:205-239, May 1986.

[22]

C. L. Lawson, J. D. Croz, S. Hammar ting, and Ft. J. Hanson. A proposal for an extended set of fortran basic linear algebra subprograms. SIGNUM Newsletter, 20(1):2-18, September 1985.

Digital Library

[23]

J.H. Wilkinson and C. Reinsch. Linear Algebra. Springer- Verlag, 1971.

[24]

E. M. M-an and J. D. Cole. Calculation of plane steady iransonic flows. Technical Report lD-820943, Boeing Scientific Research Laboratories, 1970.

[25]

Jameson and Caughney. Numerical calculation of transonic flow past a swept wing. ERDA Report COO-3077-140, New York University, 1977.

[26]

J. Moran. An Zntroduction to Theoretical and Computational Aerodynamics. John Wiley & Sons, 1984.

[27]

M. R. He&mea and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Ree. Nat. Eur. Stand., 49:409-436,1952.

[28]

M. R. Garey and D. S. Johnson. Computers and Zntractability, A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York, 1979.

Digital Library

[29]

D. A. Reed, L. M. Adams, and M. L. Patrick. Stencils and problem partitionings: their inlIuence on the performance of multiple processor systems. IEEE Transactions on Computers, C-36(7):845-858, July 1987.

Digital Library

[30]

L. A. Hageman and D. M. Young. Applied Iterative Methods. Computer Science and Applied Mathematics, Academic Press, 1981.

[31]

C. Aykanat and F. Ozguner. Large grain parallel conjugate gradient algorithms on a hypercube multiprocessor. ln S. K. Sahni, editor, Proceedings of the 1987 International Conference on Parallel Processing, pages 641-644, August 1987.

[32]

O. A. McBryan and E. F. van de Velde. Hypercube algorithms and implementations. SIAM J. Sci. Stat. Comput., 8(2):s227-6287, March 1987.

Digital Library

[33]

C. Aykanat, F. ozgiiner, F. Ercal, and P. Sadayappan. It erative algorithms for solution of large sparse systems of linear equations on hypercubes. IEEE Transactions on Computers, C-37(12):1554-1568, December 1988.

Digital Library

[34]

J. H. Sdtz, V. K. Naik, and D. M. Nicol. Reduction of the effects of the communication delays in scientific dgorithms on message passing MIMD architectures. SIAM J. Sci. Stat. Comput., 8(1):slI8-&34, January 1988.

Digital Library

[35]

P. K. W. Vinsome. ORTHOMIN - an iterative method for solving sparse sets of simultaneous linear equations. In Proc. Fourth SPE Symposium on Reservoir Simulation, pages 14%160, Society of Petroleum Engineers, Lee Angeles, Febrnary 1976. Paper SPE 5739.

[36]

D. M. Young and K. C. Jea. Generalized conjugate-gradient acceleration of nonsymmetric iterative methods. Linear Algebra and its Applications, 34:159-194, 1980.

[37]

P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. Technical Report 8416, Delft University, Dept. of Math., Delft, The Netherlands, 1984.

[38]

P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. Technical Report 8416, Delft University, Dept. of Math., Delft, The Netherlands, 1984.

[39]

R. Fletcher. Conjugate gradient methods for indefinite systems. In G. A. Watson, editor, Proc. of the Dundee Biennal Conference on Numerical Analysis, Springer- Verlag, New York, 1975.

[40]

D. E. Knuth. The Art of Computer Programming - Sorting and Searehing. Addison-Wesley, Reading, MA, 1973.

Digital Library

[41]

D. A. Reed and H. D. Schwetman. Cost-performance bounds for multimicrocomputer networks. IEEE Tranractions on Computers, C-32(1):8%95, January 1983.

[42]

Thinking Machines Corporation. Connection Machine Model CM-2 Technical Summary. Technical Report HA87- 4, Thinking Machines Corporation, April 3987.

[43]

W.J. Dally et d. Architecture of a messagedriven processor. In 14th Ann. Intl. Symp. on Computer Architecture, pages 189-196, IEEE Computer Society, June 1987.

Digital Library

Cited By

van Gemund A(2005)Compile-time performance prediction of parallel systemsQuantitative Evaluation of Computing and Communication Systems10.1007/BFb0024323(299-313)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0024323
Annaratone MDossa D(1995)Ensemble Computing for the Petroleum IndustrySPE Computer Applications10.2118/27559-PA7:01(9-12)Online publication date: 1-Feb-1995
https://doi.org/10.2118/27559-PA
Clémençon CEndo AFritscher JMüller ARühl RWylie B(1994)An Environment for Portable Distributed Memory Parallel ProgrammingProgramming Environments for Massively Parallel Distributed Systems10.1007/978-3-0348-8534-8_16(159-170)Online publication date: 1994
https://doi.org/10.1007/978-3-0348-8534-8_16
Show More Cited By

Index Terms

Recommendations

Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture

We have simulated several numerical and non-numerical algorithms on five distributed-memory parallel processors (DMPPs). All five DMPPs have the same topology (a torus), and the same number of nodes. The architectures differ only in the communication ...
Parallel Performance Study of Monte Carlo Photon Transport Code on Shared-, Distributed-, and Distributed-Shared-Memory Architectures
IPDPS '00: Proceedings of the 14th International Symposium on Parallel and Distributed Processing

We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera specific directives. The second version, ...
Parallelization of FORTRAN code on distributed-memory parallel processors
ICS '90: Proceedings of the 4th international conference on Supercomputing

This paper presents some preliminary results toward the automatic parallelization of uniprocessor FORTRAN code on distributed-memory parallel processors (DMPPs). The paper introduces Oxygen, a compiler for a DMPP under development at the Laboratory. The ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 17, Issue 3

Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture

June 1989

400 pages

ISSN:0163-5964

DOI:10.1145/74926

Editor:
Jean-Claude Syre

Issue’s Table of Contents

ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture
April 1989
426 pages
ISBN:0897913191
DOI:10.1145/74925
Chairman:
Jean-Claude Syre

Copyright © 1989 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Published in SIGARCH Volume 17, Issue 3

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
572
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)27

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

van Gemund A(2005)Compile-time performance prediction of parallel systemsQuantitative Evaluation of Computing and Communication Systems10.1007/BFb0024323(299-313)Online publication date: 9-Jun-2005
https://doi.org/10.1007/BFb0024323
Annaratone MDossa D(1995)Ensemble Computing for the Petroleum IndustrySPE Computer Applications10.2118/27559-PA7:01(9-12)Online publication date: 1-Feb-1995
https://doi.org/10.2118/27559-PA
Clémençon CEndo AFritscher JMüller ARühl RWylie B(1994)An Environment for Portable Distributed Memory Parallel ProgrammingProgramming Environments for Massively Parallel Distributed Systems10.1007/978-3-0348-8534-8_16(159-170)Online publication date: 1994
https://doi.org/10.1007/978-3-0348-8534-8_16
Long YLiu YRajan H(2016)First-class effect reflection for effect-guided programmingACM SIGPLAN Notices10.1145/3022671.298403751:10(820-837)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1145/3022671.2984037
Blum BGibson G(2016)Stateless model checking with data-race preemption pointsACM SIGPLAN Notices10.1145/3022671.298403651:10(477-493)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1145/3022671.2984036
Dan ALam PHoefler TVechev M(2016)Modeling and analysis of remote memory access programmingACM SIGPLAN Notices10.1145/3022671.298403351:10(129-144)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1145/3022671.2984033
Nunez DGuyer SBerger E(2016)Prioritized garbage collection: explicit GC support for software cachesACM SIGPLAN Notices10.1145/3022671.298402851:10(695-710)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1145/3022671.2984028
Annaratone Mzur Bonsen GFillo MHalbherr MRühl RSteiner PViredaz M(2005)Architecture, implementation, and system software of K2Distributed Memory Computing10.1007/BFb0032963(473-484)Online publication date: 17-Jun-2005
https://doi.org/10.1007/BFb0032963
Ware JKidner DRallings PMakki KPissinou NLaurini R(1998)Parallel distributed viewshed analysisProceedings of the 6th ACM international symposium on Advances in geographic information systems10.1145/288692.288719(151-156)Online publication date: 1-Nov-1998
https://dl.acm.org/doi/10.1145/288692.288719
Chen JEndo YChan KMazieres DDias ASeltzer MSmith M(1995)The measured performance of personal computer operating systemsACM SIGOPS Operating Systems Review10.1145/224057.22407929:5(299-313)Online publication date: 3-Dec-1995
https://dl.acm.org/doi/10.1145/224057.224079
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents