Abstract
The MPI multithreading model has been historically difficult to optimize; the interface that it provides for threads was designed as a process-level interface. This model has led to implementations that treat function calls as critical regions and protect them with locks to avoid race conditions. We hypothesize that an interface designed specifically for threads can provide superior performance than current approaches and even outperform single-threaded MPI.
In this paper, we describe a design for partitioned communication in MPI that we call finepoints. First, we assess the existing communication models for MPI two-sided communication and then introduce finepoints as a hybrid of MPI models that has the best features of each existing MPI communication model. In addition, “partitioned communication” created with finepoints leverages new network hardware features that cannot be exploited with current MPI point-to-point semantics, making this new approach both innovative and useful both now and in the future.
To demonstrate the validity of our hypothesis, we implement a finepoints library and show improvements against a state-of-the-art multithreaded optimized Open MPI implementation on a Cray XC40 with an Aries network. Our experiments demonstrate up to a 12\(\times \) reduction in wait time for completion of send operations. This new model is shown working on a nuclear reactor physics neutron-transport proxy-application, providing up to 26.1% improvement in communication time and up to 4.8% improvement in runtime over the best performing MPI communication mode, single-threaded MPI.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc. for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barrett, B.W., Brightwell, R., et al.: The Portals 4.1 networking programming interface. Technical report SAND2017-3825, Sandia National Laboratories (SNL-NM), Albuquerque, NM, United States (2017)
Bernholdt, D.E., Boehm, S., et al.: A survey of MPI usage in the U.S. Exascale Computing Project. Concurr. Comput. Pract. Exp. (2018)
Derradji, S. Palfer-Sollier, T., et al.: The BXI interconnect architecture. In: Proceedings of the 23rd Annual Symposium on High Performance Interconnects, HOTI 2015. IEEE (2015)
Dimitrov, R., Skjellum, A.: Software architecture and performance comparison of MPI/Pro and MPICH. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003, Part III. LNCS, vol. 2659, pp. 307–315. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44863-2_31
Dinan, J., Grant, R.E., et al.: Enabling communication concurrency through flexible MPI endpoints. Int. J. High Perform. Comput. Appl. 28(4), 390–405 (2014)
Doerfler, D.W., Rajan, M., et al.: A comparison of the performance characteristics of capability and capacity class HPC systems. Technical report, Sandia National Lab. (SNL-NM), Albuquerque, NM, United States (2011)
Dosanjh, M.G.F., Grant, R.E., et al.: Re-evaluating network onload vs. offload for the many-core era. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 342–350. IEEE (2015)
Dosanjh, M.G.F., Groves, T., et al.: RMA-MT: a benchmark suite for assessing MPI multi-threaded RMA performance. In: 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 550–559. IEEE (2016)
Grant, R.E., Rashti, M.J., et al.: RDMA capable iWARP over datagrams. In: IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 628–639. IEEE (2011)
Gunow, G., Tramm, J.R., et al.: SimpleMOC - a performance abstraction for 3D MOC. In: ANS MC2015. American Nuclear Society, American Nuclear Society (2015)
Heroux, M.A., Doerfler, D.W., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical report SAND2009-5574, vol. 3 (2009)
Hjelm, N., Dosanjh, M.G.F., et al.: Improving MPI multi-threaded RMA communication performance. In: Proceedings of the International Conference on Parallel Processing, pp. 1–10 (2018)
Kamal, H., Wagner, A.: An integrated fine-grain runtime system for MPI. Computing 96(4), 293–309 (2014). ISSN: 0010-485X
Mendygral, P., Radcliffe, N., et al.: WOMBAT: a scalable and high-performance astrophysical magnetohydrodynamics code. Astrophys. J. Suppl. Ser. 228(2), 23 (2017)
MPI Forum. MPI: A message-passing interface standard version 3.1. Technical report, University of Tennessee, Knoxville (2015)
Petrini, F., Kerbyson, D.J., et al.: The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 55 (2003)
Rashti, M.J., Grant, R.E., et al.: iWARP redefined: scalable connectionless communication over high-speed Ethernet. In: International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2010)
Schneider, T., Hoefler, T., et al.: Protocols for fully offloaded collective operations on accelerated network adapters. In: 42nd International Conference on Parallel Processing (ICPP 2013), Lyon, France, October 2013
Weeks, H., Dosanjh, M.G.F., Bridges, P.G., Grant, R.E.: SHMEM-MT: a benchmark suite for assessing multi-threaded SHMEM performance. In: Gorentla Venkata, M., Imam, N., Pophale, S., Mintz, T.M. (eds.) OpenSHMEM 2016. LNCS, vol. 10007, pp. 227–231. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50995-2_16
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Grant, R.E., Dosanjh, M.G.F., Levenhagen, M.J., Brightwell, R., Skjellum, A. (2019). Finepoints: Partitioned Multithreaded MPI Communication. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-20656-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20655-0
Online ISBN: 978-3-030-20656-7
eBook Packages: Computer ScienceComputer Science (R0)