More Web Proxy on the site http://driver.im/

research-article

Non-blocking PMI extensions for fast MPI startup

Authors:

S. Chakraborty,

D. K. PandaAuthors Info & Claims

CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

Pages 131 - 140

https://doi.org/10.1109/CCGrid.2015.151

Published: 04 May 2015 Publication History

Abstract

An efficient implementation of the Process Management Interface (PMI) is crucial to enable fast startup of MPI jobs. We propose three extensions to the PMI specification: a blocking allgather collective (PMIX_Allgather), a non-blocking allgather collective (PMIX_Iallgather), and a non-blocking fence (PMIX_KVS_Ifence). We design and evaluate several PMI implementations to demonstrate how such extensions reduce MPI startup cost. In particular, when sufficient work can be overlapped, these extensions allow for a constant initialization cost of MPI jobs at different core counts. At 16,384 cores, the designs lead to a speedup of 2.88 times over the state-of-the-art startup schemes.

References

[1]

MPI: A Message-Passing Interface Standard, Message Passing Interface Forum, Mar 1994.

Digital Library

[2]

P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Krishna, E. Lusk, and R. Thakur, "PMI: A Scalable Parallel Process-management Interface for Extreme-scale Systems," in Recent Advances in the Message Passing Interface. Springer, 2010, pp. 31--41.

Digital Library

[3]

S. Chakraborty, H. Subramoni, J. Perkins, A. Moody, M. Arnold, and D. K. Panda, "PMI Extensions for Scalable MPI Startup," in Proceedings of the 21st European MPI Users' Group Meeting, ser. EuroMPI/ASIA '14. New York, NY, USA: ACM, 2014, pp. 21:21--21:26. {Online}. Available

Digital Library

[4]

A. B. Yoo, M. A. Jette, and M. Grondona, "SLURM: Simple Linux Utility for Resource Management," in JSSPP 2003. Springer, 2003, pp. 44--60.

[5]

D. K. Panda, K. Tomko, K. Schulz, and A. Majumdar, "The MVAPICH Project: Evolution and Sustainability of an Open Source Production Quality MPI Library for HPC," in Int'l Workshop on Sustainable Software for Science: Practice and Experiences, Held in Conjunction with Int'l Conference on Supercomputing, SC, 2013.

[6]

Open MPI: Open Source High Performance Computing, "PMI Exascale (PMIx)," https://github.com/open-mpi/pmix/wiki.

[7]

InfiniBand Trade Association, "InfiniBand Architecture Specification, Volume 1, Release 1.0," http://www.infinibandta.com.

[8]

H. Subramoni, K. Kandalla, S. Sur, and D. K. Panda, "Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using Connectx-2 Offload Engine," in High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 2010, pp. 40--49.

Digital Library

[9]

K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur, and D. K. Panda, "High-performance and scalable non-blocking all-to-all with collective offload on infiniband clusters: a study with parallel 3d fft," Computer Science-Research and Development, vol. 26, no. 3--4, pp. 237--246, 2011.

Digital Library

[10]

T. Hoefler, P. Gottschling, A. Lumsdaine, and W. Rehm, "Optimizing a Conjugate Gradient Solver with Non-blocking Collective Operations," Parallel Computing, vol. 33, no. 9, pp. 624--633, 2007.

Digital Library

[11]

T. Hoefler, P. Kambadur, R. L. Graham, G. Shipman, and A. Lumsdaine, "A Case for Standard Non-blocking Collective Operations," in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, 2007, pp. 125--134.

Digital Library

[12]

T. Hoefler, A. Lumsdaine, and W. Rehm, "Implementation and Performance Analysis of Non-blocking Collective Operations for MPI," in Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on. IEEE, 2007, pp. 1--10.

Digital Library

[13]

J. C. Sancho, K. J. Barker, D. Kerbyson, and K. Davis, "Quantifying the Potential Benefit of Overlapping Communication and Computation in Large-scale Scientific Applications," in SC 2006 Conference, Proceedings of the ACM/IEEE. IEEE, 2006, pp. 17--17.

Digital Library

[14]

J.C. Sancho, D.J. Kerbyson and K.J. Barker, "Efficient Offloading of Collective Communications in Large-Scale Systems," Cluster Computing, IEEE International Conference on, vol. 0, pp. 169--178, 2007.

Digital Library

[15]

Rabinovitz, Ishai and Pavel Shamis and Richard L. Graham and Noam Bloch and Gilad Shainer, "Network Offloaded Hierarchical Collectives Using ConnectX-2's CORE-Direct Capabilities," in Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface, ser. EuroMPI'10, 2010, pp. 102--112.

Digital Library

[16]

Venkata, Manjunath G. and Richard L. Graham and Joshua S. Ladd and Pavel Shamis and Ishai Rabinovitz and Vasily Filipov and Gilad Shainer, "ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications," in CASS'11, Mar. 2011.

Digital Library

[17]

H. Subramoni, K. Kandalla, S. Sur and D K. Panda, "Design and Evaluation of Generalized Collective Communication Primitives with Overlap using ConnectX-2 Offload Engine," in HotI'18, 2010.

Digital Library

[18]

K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur and D. K. Panda, "High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A Study with Parallel 3D FFT," in ISC, June, 2011.

Digital Library

[19]

K. Kandalla, H. Subramoni, J. Vienne, K. Tomko, S. Sur and D. K. Panda, "Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL," in Hot Interconnects, August, 2011.

Digital Library

[20]

T. Hoefler and A. Lumsdaine, "Message Progression in Parallel Computing - to Thread or Not to Thread?" in Cluster Computing, 2008 IEEE International Conference on. IEEE, 2008, pp. 213--222.

[21]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber et al., "The NAS Parallel Benchmarks," IJHPCA, vol. 5, no. 3, pp. 63--73, 1991.

Digital Library

[22]

R. L. Henderson, "Job Scheduling under the Portable Batch System," in Job scheduling Strategies for Parallel Processing. Springer, 1995, pp. 279--294.

Digital Library

[23]

P. Balaji, W. Bland, W. Gropp, R. Latham, H. Lu, A. J. Pena, K. Raffenetti, R. Thakur, and J. Zhang, "MPICH Users Guide," 2014.

[24]

K. Wang, X. Zhou, H. Chen, M. Lang, and I. Raicu, "Next Generation Job Management Systems for Extreme-scale Ensemble Computing," in Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. ACM, 2014, pp. 111--114.

Digital Library

[25]

W. Yu, J. Wu, and D. K. Panda, "Fast and Scalable Startup of MPI Programs in InfiniBand Clusters," in HiPC 2004. Springer, 2005, pp. 440--449.

Digital Library

[26]

J. K. Sridhar, M. J. Koop, J. L. Perkins, and D. K. Panda, "ScELA: Scalable and Extensible Launching Architecture for Clusters," in HiPC 2008. Springer, 2008, pp. 323--335.

Digital Library

[27]

A. Gupta, G. Zheng, and L. V. Kalé, "A Multi-level Scalable Startup for Parallel Applications," in Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers. ACM, 2011, pp. 41--48.

Digital Library

[28]

J. D. Goehner, D. C. Arnold, D. H. Ahn, G. L. Lee, B. R. de Supinski, M. P. LeGendre, B. P. Miller, and M. Schulz, "LIBI: A Framework for Bootstrapping Extreme Scale Software Systems," Parallel Computing, vol. 39, no. 3, pp. 167--176, 2013.

Digital Library

[29]

J. K. Sridhar and D. K. Panda, "Impact of Node Level Caching in MPI Job Launch Mechanisms," in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, 2009, pp. 230--239.

Digital Library

Cited By

Polyakov AKarasev BHursey JLadd JBrinskii MShipunova E(2019)A performance analysis and optimization of PMIx-based HPC software stacksProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343220(1-10)Online publication date: 11-Sep-2019
https://dl.acm.org/doi/10.1145/3343211.3343220
Chakraborty SSubramoni HPerkins JPanda DVarela CCastro HBarrios C(2016)SHMEMPMIProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.99(60-69)Online publication date: 16-May-2016
https://dl.acm.org/doi/10.1109/CCGrid.2016.99

Non-blocking PMI extensions for fast MPI startup

Recommendations

PMI Extensions for Scalable MPI Startup
EuroMPI/ASIA '14: Proceedings of the 21st European MPI Users' Group Meeting

An efficient implementation of the Process Management Interface (PMI) is crucial to enable a scalable startup of MPI jobs. We propose three extensions to the PMI specification: a ring exchange collective, a broadcast hint to Put, and an enhanced Get. We ...
On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI
IPDPSW '15: Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop

Partitioned Global Address Space (PGAS) programming models like Open SHMEM and hybrid models like Open SHMEM+MPI can deliver high performance and improved programmability. However, current implementations of Open SHMEM assume a fully-connected process ...
Using the Intel MPI benchmarks (IMB) to evaluate MPI implementations on an Infiniband Nehalem Linux cluster
SpringSim '10: Proceedings of the 2010 Spring Simulation Multiconference

According to Moore's Law, computer speeds are expected to double approximately every 2 years. But with the current challenges that computer manufacturers are facing to double speeds of individual processors, due to various reasons, i.e., processor ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

May 2015

1277 pages

ISBN:9781479980062

General Chairs:
Pavan Balaji
Argonne National Laboratory
,
Cheng-Zhong Xu
Wayne State University

Publisher

IEEE Press

Publication History

Published: 04 May 2015

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCGrid '15

CCGrid '15: 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

May 4 - 7, 2015

Shenzhen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
16
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Polyakov AKarasev BHursey JLadd JBrinskii MShipunova E(2019)A performance analysis and optimization of PMIx-based HPC software stacksProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343220(1-10)Online publication date: 11-Sep-2019
https://dl.acm.org/doi/10.1145/3343211.3343220
Chakraborty SSubramoni HPerkins JPanda DVarela CCastro HBarrios C(2016)SHMEMPMIProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.99(60-69)Online publication date: 16-May-2016
https://dl.acm.org/doi/10.1109/CCGrid.2016.99

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten