[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2488551.2488573acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Efficient and truly passive MPI-3 RMA using InfiniBand atomics

Published: 15 September 2013 Publication History

Abstract

Multi/many-core architectures offer high compute density on modern supercomputing clusters. It is critical for applications to minimize communication and synchronization overheads to achieve peak performance. MPI offers one-sided communication semantics that are aimed at enabling this. In this paper, we propose a novel design for implementing truly passive shared and exclusive MPI_Win_lock/unlock using InfiniBand atomics. We address limitations in designs published earlier. We also present the impact of our design on MPI_Win_lock all introduced in MPI-3. We demonstrate superior overlap compared to existing two-sided implementations. Using Splash LU kernel, proposed design delivers upto 49% performance improvement compared to existing designs.

References

[1]
InfiniBand Trade Association, http://www.infinibandta.org.
[2]
Titan Supercomputer. http://www.olcf.ornl.gov/titan/.
[3]
TOP 500 Supercomputer Sites. http://www.top500.org.
[4]
W. Jiang, J. Liu, H-W. Jin, D. K. Panda, D. Buntinas, R. Thakur, and W. Gropp. Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters. In Proceedings of EuroPVM/MPI '04, Budapest, Hungary, September 2004.
[5]
J. Liu, W. Jiang, P. Wyckoff, D. K. Panda, D. Ashton, D. Buntinas, W. Gropp, and B. Toonen. Design and Implementation of MPICH2 over InfiniBand with RDMA Support. In Proceedings of Int'l Parallel and Distributed Processing Symposium (IPDPS '04), April 2004.
[6]
John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(l):21--65, February 1991.
[7]
S. Narravula, A. Mamidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda. High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations. In International Symposium on Cluster Computing and the Grid, 2007.
[8]
Open MPI: Open Source High Performance Computing, http://www.open-mpi.org.
[9]
G. Santhanaraman, P. Balaji, K. Gopalakrishnan, R. Thakur, W. Gropp, and D. K. Panda. Natively Supporting True One-sided Communication in MPI on Multi-core Systems with InfiniBand. International Symposium on Cluster Computing and the Grid (CCGrid), May 2009.
[10]
G. Santhanaraman, S. Narravula, and D. K. Panda. Designing Passive Synchronization for MPI-2 One-Sided Communication to Maximize Overlap. In Proceedings of Int'l Parallel and Distributed Processing Symposium (IPDPS 2008), 2008.

Cited By

View all
  • (2019)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient BroadcastIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286722230:3(575-588)Online publication date: 1-Mar-2019
  • (2019)Efficient implementation of MPI-3 RMA over openFabrics interfacesParallel Computing10.1016/j.parco.2019.04.00887:C(1-10)Online publication date: 1-Sep-2019
  • (2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
September 2013
289 pages
ISBN:9781450319034
DOI:10.1145/2488551
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. InfiniBand
  2. MPI-3
  3. RDMA
  4. RMA
  5. latency hiding
  6. one-sided

Qualifiers

  • Research-article

Funding Sources

Conference

EuroMPI '13
Sponsor:
  • ARCOS
EuroMPI '13: 20th European MPI Users's Group Meeting
September 15 - 18, 2013
Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Exploiting Hardware Multicast and GPUDirect RDMA for Efficient BroadcastIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286722230:3(575-588)Online publication date: 1-Mar-2019
  • (2019)Efficient implementation of MPI-3 RMA over openFabrics interfacesParallel Computing10.1016/j.parco.2019.04.00887:C(1-10)Online publication date: 1-Sep-2019
  • (2019)Efficient Cache Simulation for Affine ComputationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_6(65-85)Online publication date: 15-Nov-2019
  • (2017)WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics CodeThe Astrophysical Journal Supplement Series10.3847/1538-4365/aa5b9c228:2(23)Online publication date: 23-Feb-2017
  • (2017)Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand2017 IEEE 24th International Conference on High Performance Computing (HiPC)10.1109/HiPC.2017.00017(62-71)Online publication date: Dec-2017
  • (2016)Asynchronous Progress Design for a MPI-Based PGAS One-Sided Communication System2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS.2016.0133(999-1006)Online publication date: Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media