research-article

Efficient and truly passive MPI-3 RMA using InfiniBand atomics

Authors:

M. Li,

S. Potluri,

K. Hamidouche,

J. Jose,

D. K. PandaAuthors Info & Claims

EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

Pages 91 - 96

https://doi.org/10.1145/2488551.2488573

Published: 15 September 2013 Publication History

Get Access

Abstract

Multi/many-core architectures offer high compute density on modern supercomputing clusters. It is critical for applications to minimize communication and synchronization overheads to achieve peak performance. MPI offers one-sided communication semantics that are aimed at enabling this. In this paper, we propose a novel design for implementing truly passive shared and exclusive MPI_Win_lock/unlock using InfiniBand atomics. We address limitations in designs published earlier. We also present the impact of our design on MPI_Win_lock all introduced in MPI-3. We demonstrate superior overlap compared to existing two-sided implementations. Using Splash LU kernel, proposed design delivers upto 49% performance improvement compared to existing designs.

References

[1]

InfiniBand Trade Association, http://www.infinibandta.org.

Google Scholar

[2]

Titan Supercomputer. http://www.olcf.ornl.gov/titan/.

Google Scholar

[3]

TOP 500 Supercomputer Sites. http://www.top500.org.

Google Scholar

[4]

W. Jiang, J. Liu, H-W. Jin, D. K. Panda, D. Buntinas, R. Thakur, and W. Gropp. Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters. In Proceedings of EuroPVM/MPI '04, Budapest, Hungary, September 2004.

Crossref

Google Scholar

[5]

J. Liu, W. Jiang, P. Wyckoff, D. K. Panda, D. Ashton, D. Buntinas, W. Gropp, and B. Toonen. Design and Implementation of MPICH2 over InfiniBand with RDMA Support. In Proceedings of Int'l Parallel and Distributed Processing Symposium (IPDPS '04), April 2004.

Google Scholar

[6]

John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(l):21--65, February 1991.

Digital Library

Google Scholar

[7]

S. Narravula, A. Mamidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda. High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations. In International Symposium on Cluster Computing and the Grid, 2007.

Digital Library

Google Scholar

[8]

Open MPI: Open Source High Performance Computing, http://www.open-mpi.org.

Google Scholar

[9]

G. Santhanaraman, P. Balaji, K. Gopalakrishnan, R. Thakur, W. Gropp, and D. K. Panda. Natively Supporting True One-sided Communication in MPI on Multi-core Systems with InfiniBand. International Symposium on Cluster Computing and the Grid (CCGrid), May 2009.

Digital Library

Google Scholar

[10]

G. Santhanaraman, S. Narravula, and D. K. Panda. Designing Passive Synchronization for MPI-2 One-Sided Communication to Maximize Overlap. In Proceedings of Int'l Parallel and Distributed Processing Symposium (IPDPS 2008), 2008.

Crossref

Google Scholar

Cited By

View all

Chu CLu XAwan ASubramoni HElton BPanda D(2019)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient BroadcastIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286722230:3(575-588)Online publication date: 1-Mar-2019
https://doi.org/10.1109/TPDS.2018.2867222
Fujita HCao CSur SArcher CPaulson EGarzaran M(2019)Efficient implementation of MPI-3 RMA over openFabrics interfacesParallel Computing10.1016/j.parco.2019.04.00887:C(1-10)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.parco.2019.04.008
Bao WRawat PKong MKrishnamoorthy SPouchet LSadayappan P(2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
https://doi.org/10.1007/978-3-030-35225-7_6
Show More Cited By

Recommendations

High performance RDMA-based MPI implementation over infiniBand
Special issue I: The 17th annual international conference on supercomputing (ICS'03)

Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) ...
Implementing OpenSHMEM Using MPI-3 One-Sided Communication
OpenSHMEM 2014: Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356

This paper reports the design and implementation of Open- SHMEM over MPI using new one-sided communication features in MPI- 3, which include not only new functions (e.g. remote atomics) but also a newmemory model that is consistent with that of SHMEM.We ...
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing

AWM-Olsen is a widely used ground motion simulation code based on a parallel finite difference solution of the 3-D velocity-stress wave equation. This application runs on tens of thousands of cores and consumes several million CPU hours on the TeraGrid ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

September 2013

289 pages

ISBN:9781450319034

DOI:10.1145/2488551

General Chair:
Jack Dongarra
University of Tennessee
,
Program Chairs:
Javier Garcia Blas
University Carlos III, Spain
,
Jesus Carretero
University Carlos III, Spain

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

EuroMPI '13

Sponsor:

ARCOS

EuroMPI '13: 20th European MPI Users's Group Meeting

September 15 - 18, 2013

Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;

Overall Acceptance Rate 66 of 139 submissions, 47%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chu CLu XAwan ASubramoni HElton BPanda D(2019)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient BroadcastIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286722230:3(575-588)Online publication date: 1-Mar-2019
https://doi.org/10.1109/TPDS.2018.2867222
Fujita HCao CSur SArcher CPaulson EGarzaran M(2019)Efficient implementation of MPI-3 RMA over openFabrics interfacesParallel Computing10.1016/j.parco.2019.04.00887:C(1-10)Online publication date: 1-Sep-2019
https://dl.acm.org/doi/10.1016/j.parco.2019.04.008
Bao WRawat PKong MKrishnamoorthy SPouchet LSadayappan P(2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
https://doi.org/10.1007/978-3-030-35225-7_6
Mendygral PRadcliffe NKandalla KPorter DO’Neill BNolting CEdmon PDonnert JJones T(2017)WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics CodeThe Astrophysical Journal Supplement Series10.3847/1538-4365/aa5b9c228:2(23)Online publication date: 23-Feb-2017
https://doi.org/10.3847/1538-4365/aa5b9c
Li MLu XSubramoni HPanda D(2017)Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00017(62-71)Online publication date: Dec-2017
https://doi.org/10.1109/HiPC.2017.00017
Zhou HGracia J(2016)Asynchronous Progress Design for a MPI-Based PGAS One-Sided Communication System2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0133(999-1006)Online publication date: Dec-2016
https://doi.org/10.1109/ICPADS.2016.0133

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

High performance RDMA-based MPI implementation over infiniBand

Implementing OpenSHMEM Using MPI-3 One-Sided Communication

Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application