[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3343211.3343220acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

A performance analysis and optimization of PMIx-based HPC software stacks

Published: 11 September 2019 Publication History

Abstract

Process management libraries and runtime environments serve an important role in the HPC application lifecycle. This work provides a roadmap for implementing a high-performance PMIx based software stacks and targets four performance-critical areas presenting novel codesigned solutions that significantly improve application performance during initialization and wire-up at scale.
First, the new locking and thread-safety schemes of the PMIx on-host communication are designed demonstrating up to 66x reduction in PMIx_Get latency.
Second, the optimizations of protocols involved in the wire-up procedure are proposed. Specific improvements in the UCX endpoint address representation, the layout of PMIx metadata, and the use of Little-Endian Base 128 encoding decreased the volume of inter-node data exchanged by up to 8.6x.
Third, a modification of the Bruck concatenation algorithm is presented that scales better than ring- and tree-based implementations currently used in resource managers for PMIx data exchange.
Lastly, an out-of-band channel leveraging the high-performance fabric is evaluated demonstrating orders of magnitude performance improvement compared to the existing implementation.

References

[1]
2018. IEEE Standard for Information Technology-Portable Operating System Interface (POSIX(R)) Base Specifications, Issue 7.IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) (Jan 2018), 1--3951. 2018.8277153
[2]
Argonne National Laboratory. 2014. Hydra Process Management Framework. https://wiki.mpich.org/mpich/index.php/Hydra_Process_Management_Framework. {Online; accessed 26-Apr-2017}.
[3]
Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing Lusk, and Rajeev Thakur. 2010. PMI: A Scalable Parallel Process-management Interface for Extreme-scale Systems. In Proceedings of the 17th European MPI Users' Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI'10). Springer-Verlag, Berlin, Heidelberg, 31--41. http://dl.acm.org/citation.cfm?id=1894122.1894127
[4]
J. Bruck, S. Kipnis, E. Upfal, and D. Weathersby. 1997. Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Transactions on Parallel and Distributed Systems 8, 11 (Nov 1997), 1143--1156.
[5]
I. Calciu, D. Dice, Y. Lev, V. Luchangco, V.J. Marathe, and N. Shavit. 2013. NUMA-aware reader-writer locks. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, New York, NY, USA, 157--166.
[6]
Ralph H. Castain, David Solt, Joshua Hursey, and Aurelien Bouteiller. 2017. PMIx: Process Management for Exascale Environments. In Proceedings of the 24th European MPI Users' Group Meeting (EuroMPI '17). ACM, New York, NY, USA, Article 14, 10 pages.
[7]
Sourav Chakraborty, Hari Subramoni, Adam Moody, Akshay Venkatesh, Jonathan Perkins, and D.K. Panda. 2015. Non-blocking PMI extensions for fast MPI startup. Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015 (07 2015), 131--140. 2015.151
[8]
S. Chakraborty, H. Subramoni, J. Perkins, A. Moody, M. Arnold, and D. K. Panda. 2014. PMI Extensions for Scalable MPI Startup. In Proceedings of the 21st European MPI Users' Group Meeting (EuroMPI/ASIA '14). ACM, New York, NY, USA, Article 21, 6 pages.
[9]
Sourav Chakraborty, Hari Subramoni, Jonathan Perkins, and Dhabaleswar K. Panda. 2016. SHMEMPMI - Shared Memory Based PMI for Improved Performance and Scalability. In Proceedings of the 24th European MPI Users' Group Meeting (CCGrid). IEEE, 60--69.
[10]
PMIx Consortium. 2017-2018. PMIx-based Reference RunTime Environment (PRRTE). https://github.com/pmix/prrte. {Online; accessed 01-Apr-2019}.
[11]
PMIx Consortium. 2017-2019. PMIx Reference Library. https://github.com/pmix/pmix.
[12]
PMIx Consortium. 2017-2019. Process Management Interface for Exascale (PMIx) Standard (version 3.1). https://github.com/pmix/pmix-standard/releases/download/v3.1/pmix-standard-3.1.pdf.
[13]
Message Passing Interface Forum. 2015. MPI-3.1: Official document. http://www.mpi-forum.org/docs.
[14]
Message Passing Interface Forum. 2017. DWARF Debugging Information Format (Version 5). http://www.dwarfstd.org/. {Online; accessed 04-Apr-2019}.
[15]
Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. 2004. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In Proceedings, 11th European PVM/MPI Users' Group Meeting. Budapest, Hungary, 97--104.
[16]
W.C. Hsieh and W.E Weihl. 1992. Scalable Reader-Writer LocksforParallel Systems. In Proceedings of the Sixth International Parallel Processing Symposium. IEEE, 656--659.
[17]
W. Huang, G. Santhanaraman, H. Jin, Q. Gao, and D. K. Panda. 2006. Design of High Performance MVAPICH2: MPI2 over InfiniBand. In Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), Vol. 1. 43--48.
[18]
IBM. 2019. LSF Job Step Manager Version 10.3. https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/jsm/10.3/base/jsm_kickoff.html. {Online; accessed 25-Apr-2019}.
[19]
Mathematics and Argonne National Laboratory Computer Science Division. 2006. MPICH-2, implementation of MPI 2 standard. http://www-unix.mcs.anl.gov/mpi/mpich2/. {Online; accessed 26-Apr-2017}.
[20]
Paul E. McKenney. 2017. Is Parallel Programming Hard, And, If So, What Can You Do About It? (v2017.01.02a). CoRR abs/1701.00854 (2017). arXiv:1701.00854 http://arxiv.org/abs/1701.00854
[21]
A.Y. Polyakov, J.S. Ladd, and B.I. Karasev. 2017. Towards Exascale: Leveraging InfiniBand to accelerate the performance and scalability of Slurm jobstart. https://slurm.schedmd.com/SC17/Mellanox_Slurm_pmix_UCX_backend_v4.pdf (Slurm booth presentation, SC17). {Online; accessed 04-Apr-2019}.
[22]
A. Polyakov, J. Ladd, E. Shipunova, and B. Karasev. 2018. Poster: A Scalable PMIx Database (EuroMPI'18). 1--2. https://eurompi2018.bsc.es/sites/default/files/uploaded/dstore_empi2018.pdf
[23]
SchedMD, LLC. 2017. SLURM Workload Manager. https://slurm.schedmd.com/. {Online; accessed 26-Apr-2017}.
[24]
Pavel Shamis, Manjunath Gorentla Venkata, M. Graham Lopez, Matthew B. Baker, Oscar Hernandez, Yossi Itigin, Mike Dubman, Gilad Shainer, Richard L. Graham, Liran Liss, Yiftah Shahar, Sreeram Potluri, Davide Rossetti, Donald Becker, Duncan Poole, Christopher Lamb, Sameer Kumar, Craig Stunkel, George Bosilca, and Aurelien Bouteiller. 2015. UCX: An Open Source Framework for HPC Network APIs and Beyond. In Proceedings of the 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI'15). IEEE Computer Society, Washington, DC, USA, 40--43.
[25]
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. 2005. Optimization of Collective Communication Operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (Feb. 2005), 49--66.
[26]
Jesper Larsson Träff. 2006. Efficient Allgather for Regular SMP-Clusters. In Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface. 13th European PVM/MPI Users' Group Meeting. Springer, 58--65. http://eprints.cs.univie.ac.at/632/
[27]
Jesper Larsson Träff, Andreas Ripke, Christian Siebert, Pavan Balaji, Rajeev Thakur, and William Gropp. 2010. A Pipelined Algorithm for Large, Irregular All-Gather Problems. Int. J. High Perform. Comput. Appl. 24, 1 (Feb. 2010), 58--68.
[28]
Weikuan Yu, Jiesheng Wu, and Dhabaleswar K. Panda. 2004. Fast and Scalable Startup of MPI Programs in InfiniBand Clusters. In High Performance Computing - HiPC 2004, 11th International Conference, Bangalore, India, December 19--22, 2004, Proceedings. 440--449.

Cited By

View all
  • (2024)Faster and Scalable MPI Applications LaunchingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.321807735:2(264-279)Online publication date: Feb-2024
  • (2024)Shared Memory Access Optimization Analysis System for PMIx Standard Implementation2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)10.1109/SIBIRCON63777.2024.10758451(457-461)Online publication date: 30-Sep-2024
  • (2023)Shared Memory Access Optimization Analysis System for PMIx Standard ImplementationThe Herald of the Siberian State University of Telecommunications and Information Science10.55648/1998-6920-2024-18-1-29-3918:1(29-39)Online publication date: 17-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '19: Proceedings of the 26th European MPI Users' Group Meeting
September 2019
134 pages
ISBN:9781450371759
DOI:10.1145/3343211
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC middleware
  2. PMIx
  3. allgatherv
  4. collective communication
  5. process management interface
  6. resource management

Qualifiers

  • Research-article

Conference

EuroMPI 2019
EuroMPI 2019: 26th European MPI Users' Group Meeting
September 11 - 13, 2019
Zürich, Switzerland

Acceptance Rates

EuroMPI '19 Paper Acceptance Rate 13 of 26 submissions, 50%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)5
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Faster and Scalable MPI Applications LaunchingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.321807735:2(264-279)Online publication date: Feb-2024
  • (2024)Shared Memory Access Optimization Analysis System for PMIx Standard Implementation2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)10.1109/SIBIRCON63777.2024.10758451(457-461)Online publication date: 30-Sep-2024
  • (2023)Shared Memory Access Optimization Analysis System for PMIx Standard ImplementationThe Herald of the Siberian State University of Telecommunications and Information Science10.55648/1998-6920-2024-18-1-29-3918:1(29-39)Online publication date: 17-Dec-2023
  • (2023)Scheduling of Elastic Message Passing Applications on HPC SystemsJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-22698-4_9(172-191)Online publication date: 12-Jan-2023
  • (2022)Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical ProcessesOptoelectronics, Instrumentation and Data Processing10.3103/S875669902105011357:5(552-560)Online publication date: 18-Mar-2022
  • (2022)Enabling Global MPI Process Addressing in MPI ApplicationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555829(27-36)Online publication date: 14-Sep-2022
  • (2022)The Fast and Scalable MPI Application Launch of the Tianhe HPC system2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00129(1305-1315)Online publication date: May-2022
  • (2021)Key-Value Database Access Optimization For PMIx Standard Implementation2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT)10.1109/USBEREIT51232.2021.9455075(0362-0366)Online publication date: 13-May-2021
  • (2020)Benchmark informed software upgrades on Quest, Northwestern’s HPC clusterPractice and Experience in Advanced Research Computing 2020: Catch the Wave10.1145/3311790.3399615(526-529)Online publication date: 26-Jul-2020
  • (2019)Scalable, Fault-Tolerant Job Step Management for High Performance SystemsIBM Journal of Research and Development10.1147/JRD.2019.2958909(1-1)Online publication date: 2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media