Abstract
Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation based on a new algorithm. Our tests show that existing all-software implementations can be improved and highlight the advantages of the Cray hardware-assisted implementation.
Preview
Unable to display preview. Download preview PDF.
References
A. Bar-Noy and S. Knipis, “Designing Broadcasting Algorithms in the Postal Model for Message-Passing Systems”, Procs. of the 4th Annual ACM Symp. on Parallel Algorithms and Architectures, June 1992, pp. 11–22.
M. Bernaschi and G. Iannello, “Collective Communication Operations: Experimental Results vs. Theory”, Concurrency: Practice and Experience, vol. 10, No. 5, pp. 359–386, April 1998.
M. Bernaschi, G. Iannello, M. Lauria, “Efficient Implementation of Reduce-Scatter in MPI”, Quaderno IAC, n.6/1998 submitted to Parallel Computing.
M. Bernaschi, G. Iannello and F. Papetti, “Efficient Collective Communication Operations for Parallel Industrial Codes”, Proceedings of HPCN96, H. Liddell, A. Colbrook, B. Hertzberger and P. Sloot editors, Lecture Notes in Computer Science (Springer) n. 1067.
J. Bruck, C.T. Ho, S. Kipnis and D. Weathersby, “Efficient Algorithms for All-to-All Communications in Multi-Port Message Passing Systems”, Procs. of SPAA 94, 1994, pp. 298–309.
R.M. Karp et al., “Optimal Broadcast and Summation in the LogP Model”, Procs. of the 5th Annual ACM Symp. on Parallel Algorithms and Architectures, June 1993, pp. 142–153.
Message Passing Interface Forum, “Document for standard message-passing interface”, The International Journal of Supercomputer Applications and High Performance Computing, vol. 8, No. 3/4, 1994.
G. J. Miller (Cray Research-Global Product Support), private communication
S. Scott, “Synchronization and Communication in the T3E Multiprocessor”, Procs. of ASPLOS-VII, Cambridge, 1996.
E. Anderson, J. Brooks, C. Grassl, S. Scott, “Performance of the Cray T3E Multiprocessor”, Procs. of Supercomputing 97, http://www.supercomp.org/sc97/proceedings/TECH/ANDERSON/INDEX.HTM
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag
About this paper
Cite this paper
Bernaschi, M., Iannello, G., Lauria, M. (1999). Experimental results about MPI collective communication operations. In: Sloot, P., Bubak, M., Hoekstra, A., Hertzberger, B. (eds) High-Performance Computing and Networking. HPCN-Europe 1999. Lecture Notes in Computer Science, vol 1593. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100638
Download citation
DOI: https://doi.org/10.1007/BFb0100638
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65821-4
Online ISBN: 978-3-540-48933-7
eBook Packages: Springer Book Archive