[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Interprocessor communication speed and performance in distributed-memory parallel processors

Published: 01 April 1989 Publication History

Abstract

We have simulated several numerical and non-numerical algorithms on five distributed-memory parallel processors (DMPPs). All five DMPPs have the same topology (a torus), and the same number of nodes. The architectures differ only in the communication speed between neighboring nodes, while the computation unit is kept unchanged. The goal of the paper is to quantify the effect that interprocessor communication speed and synchronization overhead have on the performance of the DMPPs. After introducing the rationale for this study and reviewing related work, we present and discuss the results of the simulations.

References

[1]
C. L. Lawson, R. J. Hanson, D. Ft. Kin&d, and F. T. Krogb. Basic kinear algebra subprograms for Fortran usage. ACM Tranaociions on Math. So&, 5(3):308323, September 1979.
[2]
P. Close. The iPSC/2 node architecture. In Concrrrtent Supercompuling, pages 4349, Intel Scientific Computers, 1988.
[3]
S.F. Nugent. The iPSC/2 direct-connect communication technology. In Concurrent Supercomputing, pages 59-67, Intel Scientific Computers, 1988.
[4]
T.H. Dunigan. Performance of a second generation hypercube. Technical Report ORNL TM-10881, Oak Ridge National Laboratory, November 1988.
[5]
M. T. Heath and C.H. Romine. Parallel solution of triangular systems on distributed-memory multiprocessors. SIAM J. Sci. Stat. Cornput., 9(3):558-588, May 1988.
[6]
J.L. Gustafson, G.R. Montry, and R.E. Benner. Develop ment of parallel methods for a 102Cprocessor hypercube. SIAM J. Sci. Stat. Compul., 9(4):609-638, July 1988.
[7]
INMOS Limited. Tranrputer Reference Manual. Prentice Hall, 1988.
[8]
M. Annaratone et al. The Warp Computer: Architeo ture, Implementation, and Performance. IEEE Trans. on Compuiera, December 1987.
[9]
S. Borkar et al. iWarp: an integrated solution to highspeed parallel computation. In Proc. SupercompuGng 88, November 1988.
[10]
V. Cherkassy and R. Smith. Efficient mapping and implementation of matrix algorithms on a hypercube. The Journal of Supercomputing, 2(1):7-27, September 1988.
[11]
P. Sadayappan and F. Ercal. Nearest-neighbor mapping of finite element graphs onto processor meshes. IEEE Transactions on Computers, C-36(12):1408-1424, December 1987.
[12]
J. L. Gust&son, G. R. Montry, and R. E. Benner. Development of parsllel methods for a 1024-processor hypercube. SIAM J. Sci. Slat. Comprt., 9(4):609-638, July 1988.
[13]
T. Priol and K, Bouatouch. Experimenting with a Parallel Ray-Tracing Algorithm on a Hypcrc*bc Machine. PubIication Inteme 405, Institut de Recherche en Mormatique et Systbmes Albatoires, Campus Universitaire de Beaulieu, 35042 - Rennet Cklex, France, Avril 1988.
[14]
S. C. Eisenstat, M. T. Heath, C. S. Henkel, and C. H. Romine. Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessor. SIAM J. Sci. Stat. Cornput., 9(3):589-600, May 1988.
[15]
D. M. Nicol and F. H. Willard. Problem size, parallel architecture, and optimal speedup. Journal of Parallel and Distributed Computing, 5:404420, 1988.
[16]
S. L. Johnsson. Communication efficient basic linear alge bra computations on hypercube architectures. J. of Parallel and Distributed Processing, 4(2):133-172, April 1987.
[17]
H.T. Kung. Memory requirements for balanced computer architectures. ln Proc. 13th Computer Architecture Symposium, pages 49-54, June 1986. Meiko Inc. The Meiko Computing Surface. 1988.
[18]
L. M. Adams and T. W. Crockett. Modeling algorithm execution time on processor arrays. IEEE Computer, 38- 43, July 1984.
[19]
W. J. Dally. Wire-Eficient VLSI Multiprocessor Communication Networka. VLSI Memo 86345, Massachusetts Institute of Technology, Cambridge, Massachusetts, October 1986.
[20]
P. M. B. VitBnyi. Locality, communication, and interconnect length in multicomputem. SIAM J. Comput., 17(4):659-672, August 1988.
[21]
I.C. Ipaen, Y. Saad, 8ndM.H. Schultz. Complexity of dense linear system solution on a multiprocessor-ring. Linear Algebra and its applications, 77:205-239, May 1986.
[22]
C. L. Lawson, J. D. Croz, S. Hammar ting, and Ft. J. Hanson. A proposal for an extended set of fortran basic linear algebra subprograms. SIGNUM Newsletter, 20(1):2-18, September 1985.
[23]
J.H. Wilkinson and C. Reinsch. Linear Algebra. Springer- Verlag, 1971.
[24]
E. M. M-an and J. D. Cole. Calculation of plane steady iransonic flows. Technical Report lD-820943, Boeing Scientific Research Laboratories, 1970.
[25]
Jameson and Caughney. Numerical calculation of transonic flow past a swept wing. ERDA Report COO-3077-140, New York University, 1977.
[26]
J. Moran. An Zntroduction to Theoretical and Computational Aerodynamics. John Wiley & Sons, 1984.
[27]
M. R. He&mea and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Ree. Nat. Eur. Stand., 49:409-436,1952.
[28]
M. R. Garey and D. S. Johnson. Computers and Zntractability, A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, New York, 1979.
[29]
D. A. Reed, L. M. Adams, and M. L. Patrick. Stencils and problem partitionings: their inlIuence on the performance of multiple processor systems. IEEE Transactions on Computers, C-36(7):845-858, July 1987.
[30]
L. A. Hageman and D. M. Young. Applied Iterative Methods. Computer Science and Applied Mathematics, Academic Press, 1981.
[31]
C. Aykanat and F. Ozguner. Large grain parallel conjugate gradient algorithms on a hypercube multiprocessor. ln S. K. Sahni, editor, Proceedings of the 1987 International Conference on Parallel Processing, pages 641-644, August 1987.
[32]
O. A. McBryan and E. F. van de Velde. Hypercube algorithms and implementations. SIAM J. Sci. Stat. Comput., 8(2):s227-6287, March 1987.
[33]
C. Aykanat, F. ozgiiner, F. Ercal, and P. Sadayappan. It erative algorithms for solution of large sparse systems of linear equations on hypercubes. IEEE Transactions on Computers, C-37(12):1554-1568, December 1988.
[34]
J. H. Sdtz, V. K. Naik, and D. M. Nicol. Reduction of the effects of the communication delays in scientific dgorithms on message passing MIMD architectures. SIAM J. Sci. Stat. Comput., 8(1):slI8-&34, January 1988.
[35]
P. K. W. Vinsome. ORTHOMIN - an iterative method for solving sparse sets of simultaneous linear equations. In Proc. Fourth SPE Symposium on Reservoir Simulation, pages 14%160, Society of Petroleum Engineers, Lee Angeles, Febrnary 1976. Paper SPE 5739.
[36]
D. M. Young and K. C. Jea. Generalized conjugate-gradient acceleration of nonsymmetric iterative methods. Linear Algebra and its Applications, 34:159-194, 1980.
[37]
P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. Technical Report 8416, Delft University, Dept. of Math., Delft, The Netherlands, 1984.
[38]
P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. Technical Report 8416, Delft University, Dept. of Math., Delft, The Netherlands, 1984.
[39]
R. Fletcher. Conjugate gradient methods for indefinite systems. In G. A. Watson, editor, Proc. of the Dundee Biennal Conference on Numerical Analysis, Springer- Verlag, New York, 1975.
[40]
D. E. Knuth. The Art of Computer Programming - Sorting and Searehing. Addison-Wesley, Reading, MA, 1973.
[41]
D. A. Reed and H. D. Schwetman. Cost-performance bounds for multimicrocomputer networks. IEEE Tranractions on Computers, C-32(1):8%95, January 1983.
[42]
Thinking Machines Corporation. Connection Machine Model CM-2 Technical Summary. Technical Report HA87- 4, Thinking Machines Corporation, April 3987.
[43]
W.J. Dally et d. Architecture of a messagedriven processor. In 14th Ann. Intl. Symp. on Computer Architecture, pages 189-196, IEEE Computer Society, June 1987.

Cited By

View all
  • (2005)Compile-time performance prediction of parallel systemsQuantitative Evaluation of Computing and Communication Systems10.1007/BFb0024323(299-313)Online publication date: 9-Jun-2005
  • (1995)Ensemble Computing for the Petroleum IndustrySPE Computer Applications10.2118/27559-PA7:01(9-12)Online publication date: 1-Feb-1995
  • (1994)An Environment for Portable Distributed Memory Parallel ProgrammingProgramming Environments for Massively Parallel Distributed Systems10.1007/978-3-0348-8534-8_16(159-170)Online publication date: 1994
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 17, Issue 3
Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture
June 1989
400 pages
ISSN:0163-5964
DOI:10.1145/74926
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture
    April 1989
    426 pages
    ISBN:0897913191
    DOI:10.1145/74925

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989
Published in SIGARCH Volume 17, Issue 3

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)27
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2005)Compile-time performance prediction of parallel systemsQuantitative Evaluation of Computing and Communication Systems10.1007/BFb0024323(299-313)Online publication date: 9-Jun-2005
  • (1995)Ensemble Computing for the Petroleum IndustrySPE Computer Applications10.2118/27559-PA7:01(9-12)Online publication date: 1-Feb-1995
  • (1994)An Environment for Portable Distributed Memory Parallel ProgrammingProgramming Environments for Massively Parallel Distributed Systems10.1007/978-3-0348-8534-8_16(159-170)Online publication date: 1994
  • (2016)First-class effect reflection for effect-guided programmingACM SIGPLAN Notices10.1145/3022671.298403751:10(820-837)Online publication date: 19-Oct-2016
  • (2016)Stateless model checking with data-race preemption pointsACM SIGPLAN Notices10.1145/3022671.298403651:10(477-493)Online publication date: 19-Oct-2016
  • (2016)Modeling and analysis of remote memory access programmingACM SIGPLAN Notices10.1145/3022671.298403351:10(129-144)Online publication date: 19-Oct-2016
  • (2016)Prioritized garbage collection: explicit GC support for software cachesACM SIGPLAN Notices10.1145/3022671.298402851:10(695-710)Online publication date: 19-Oct-2016
  • (2005)Architecture, implementation, and system software of K2Distributed Memory Computing10.1007/BFb0032963(473-484)Online publication date: 17-Jun-2005
  • (1998)Parallel distributed viewshed analysisProceedings of the 6th ACM international symposium on Advances in geographic information systems10.1145/288692.288719(151-156)Online publication date: 1-Nov-1998
  • (1995)The measured performance of personal computer operating systemsACM SIGOPS Operating Systems Review10.1145/224057.22407929:5(299-313)Online publication date: 3-Dec-1995
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media