[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/CCGrid.2015.151acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
research-article

Non-blocking PMI extensions for fast MPI startup

Published: 04 May 2015 Publication History

Abstract

An efficient implementation of the Process Management Interface (PMI) is crucial to enable fast startup of MPI jobs. We propose three extensions to the PMI specification: a blocking allgather collective (PMIX_Allgather), a non-blocking allgather collective (PMIX_Iallgather), and a non-blocking fence (PMIX_KVS_Ifence). We design and evaluate several PMI implementations to demonstrate how such extensions reduce MPI startup cost. In particular, when sufficient work can be overlapped, these extensions allow for a constant initialization cost of MPI jobs at different core counts. At 16,384 cores, the designs lead to a speedup of 2.88 times over the state-of-the-art startup schemes.

References

[1]
MPI: A Message-Passing Interface Standard, Message Passing Interface Forum, Mar 1994.
[2]
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, J. Krishna, E. Lusk, and R. Thakur, "PMI: A Scalable Parallel Process-management Interface for Extreme-scale Systems," in Recent Advances in the Message Passing Interface. Springer, 2010, pp. 31--41.
[3]
S. Chakraborty, H. Subramoni, J. Perkins, A. Moody, M. Arnold, and D. K. Panda, "PMI Extensions for Scalable MPI Startup," in Proceedings of the 21st European MPI Users' Group Meeting, ser. EuroMPI/ASIA '14. New York, NY, USA: ACM, 2014, pp. 21:21--21:26. {Online}. Available
[4]
A. B. Yoo, M. A. Jette, and M. Grondona, "SLURM: Simple Linux Utility for Resource Management," in JSSPP 2003. Springer, 2003, pp. 44--60.
[5]
D. K. Panda, K. Tomko, K. Schulz, and A. Majumdar, "The MVAPICH Project: Evolution and Sustainability of an Open Source Production Quality MPI Library for HPC," in Int'l Workshop on Sustainable Software for Science: Practice and Experiences, Held in Conjunction with Int'l Conference on Supercomputing, SC, 2013.
[6]
Open MPI: Open Source High Performance Computing, "PMI Exascale (PMIx)," https://github.com/open-mpi/pmix/wiki.
[7]
InfiniBand Trade Association, "InfiniBand Architecture Specification, Volume 1, Release 1.0," http://www.infinibandta.com.
[8]
H. Subramoni, K. Kandalla, S. Sur, and D. K. Panda, "Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using Connectx-2 Offload Engine," in High Performance Interconnects (HOTI), 2010 IEEE 18th Annual Symposium on. IEEE, 2010, pp. 40--49.
[9]
K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur, and D. K. Panda, "High-performance and scalable non-blocking all-to-all with collective offload on infiniband clusters: a study with parallel 3d fft," Computer Science-Research and Development, vol. 26, no. 3--4, pp. 237--246, 2011.
[10]
T. Hoefler, P. Gottschling, A. Lumsdaine, and W. Rehm, "Optimizing a Conjugate Gradient Solver with Non-blocking Collective Operations," Parallel Computing, vol. 33, no. 9, pp. 624--633, 2007.
[11]
T. Hoefler, P. Kambadur, R. L. Graham, G. Shipman, and A. Lumsdaine, "A Case for Standard Non-blocking Collective Operations," in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, 2007, pp. 125--134.
[12]
T. Hoefler, A. Lumsdaine, and W. Rehm, "Implementation and Performance Analysis of Non-blocking Collective Operations for MPI," in Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on. IEEE, 2007, pp. 1--10.
[13]
J. C. Sancho, K. J. Barker, D. Kerbyson, and K. Davis, "Quantifying the Potential Benefit of Overlapping Communication and Computation in Large-scale Scientific Applications," in SC 2006 Conference, Proceedings of the ACM/IEEE. IEEE, 2006, pp. 17--17.
[14]
J.C. Sancho, D.J. Kerbyson and K.J. Barker, "Efficient Offloading of Collective Communications in Large-Scale Systems," Cluster Computing, IEEE International Conference on, vol. 0, pp. 169--178, 2007.
[15]
Rabinovitz, Ishai and Pavel Shamis and Richard L. Graham and Noam Bloch and Gilad Shainer, "Network Offloaded Hierarchical Collectives Using ConnectX-2's CORE-Direct Capabilities," in Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface, ser. EuroMPI'10, 2010, pp. 102--112.
[16]
Venkata, Manjunath G. and Richard L. Graham and Joshua S. Ladd and Pavel Shamis and Ishai Rabinovitz and Vasily Filipov and Gilad Shainer, "ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications," in CASS'11, Mar. 2011.
[17]
H. Subramoni, K. Kandalla, S. Sur and D K. Panda, "Design and Evaluation of Generalized Collective Communication Primitives with Overlap using ConnectX-2 Offload Engine," in HotI'18, 2010.
[18]
K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur and D. K. Panda, "High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A Study with Parallel 3D FFT," in ISC, June, 2011.
[19]
K. Kandalla, H. Subramoni, J. Vienne, K. Tomko, S. Sur and D. K. Panda, "Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL," in Hot Interconnects, August, 2011.
[20]
T. Hoefler and A. Lumsdaine, "Message Progression in Parallel Computing - to Thread or Not to Thread?" in Cluster Computing, 2008 IEEE International Conference on. IEEE, 2008, pp. 213--222.
[21]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber et al., "The NAS Parallel Benchmarks," IJHPCA, vol. 5, no. 3, pp. 63--73, 1991.
[22]
R. L. Henderson, "Job Scheduling under the Portable Batch System," in Job scheduling Strategies for Parallel Processing. Springer, 1995, pp. 279--294.
[23]
P. Balaji, W. Bland, W. Gropp, R. Latham, H. Lu, A. J. Pena, K. Raffenetti, R. Thakur, and J. Zhang, "MPICH Users Guide," 2014.
[24]
K. Wang, X. Zhou, H. Chen, M. Lang, and I. Raicu, "Next Generation Job Management Systems for Extreme-scale Ensemble Computing," in Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. ACM, 2014, pp. 111--114.
[25]
W. Yu, J. Wu, and D. K. Panda, "Fast and Scalable Startup of MPI Programs in InfiniBand Clusters," in HiPC 2004. Springer, 2005, pp. 440--449.
[26]
J. K. Sridhar, M. J. Koop, J. L. Perkins, and D. K. Panda, "ScELA: Scalable and Extensible Launching Architecture for Clusters," in HiPC 2008. Springer, 2008, pp. 323--335.
[27]
A. Gupta, G. Zheng, and L. V. Kalé, "A Multi-level Scalable Startup for Parallel Applications," in Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers. ACM, 2011, pp. 41--48.
[28]
J. D. Goehner, D. C. Arnold, D. H. Ahn, G. L. Lee, B. R. de Supinski, M. P. LeGendre, B. P. Miller, and M. Schulz, "LIBI: A Framework for Bootstrapping Extreme Scale Software Systems," Parallel Computing, vol. 39, no. 3, pp. 167--176, 2013.
[29]
J. K. Sridhar and D. K. Panda, "Impact of Node Level Caching in MPI Job Launch Mechanisms," in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, 2009, pp. 230--239.

Cited By

View all
  • (2019)A performance analysis and optimization of PMIx-based HPC software stacksProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343220(1-10)Online publication date: 11-Sep-2019
  • (2016)SHMEMPMIProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.99(60-69)Online publication date: 16-May-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CCGRID '15: Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
May 2015
1277 pages
ISBN:9781479980062

Publisher

IEEE Press

Publication History

Published: 04 May 2015

Check for updates

Author Tags

  1. InfiniBand
  2. job launch
  3. non-blocking
  4. process management interface

Qualifiers

  • Research-article

Conference

CCGrid '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A performance analysis and optimization of PMIx-based HPC software stacksProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343220(1-10)Online publication date: 11-Sep-2019
  • (2016)SHMEMPMIProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.99(60-69)Online publication date: 16-May-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media