[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/CCGRID.2017.29acmconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
tutorial

Offloading communication control logic in GPU accelerated applications

Published: 14 May 2017 Publication History

Abstract

NVIDIA GPUDirect is a family of technologies aimed at optimizing data movement among GPUs (P2P) or between GPUs and third-party devices (RDMA). GPUDirect Async, introduced in CUDA 8.0, is a new addition which allows direct synchronization between GPU and third party devices. For example, Async allows an NVIDIA GPU to directly trigger and poll for completion of communication operations queued to an InfiniBand Connect-IB network adapter, removing CPU involvement from the critical path in GPU accelerated applications. In this paper, we present the building blocks of GPUDirect Async and explain the supported usage models of this new technology. We also present a performance evaluation using a micro-benchmark and a synthetic stencil benchmark. Finally, we demonstrate the use of Async in a few multi-GPU MPI applications: HPGMG-FV (geometric multi-grid), achieving up to 25% improvement in total execution time; CoMD-CUDA (classical molecular dynamics), reducing communications times up to 30%; LULESH2-CUDA, achieving an average performance improvement of 13% in the total execution time.

References

[1]
GPUDirect family https://developer.nvidia.com/gpudirect
[2]
Mellanox GDR http://www.mellanox.com/page/products_dyn?product_family=116
[3]
IB Verbs RDMA programming guide. http://www.mellanox.com/relateddocs/prod_software/RDMA_Aware_Programming_user_manual.pdf
[4]
CUDA-Aware MPI. https://devblogs.nvidia.com/parallelforall/introductioncuda-aware-mpi
[5]
S. Kim, S. Huh, Y. Hu, X. Zhang, E. Witchel GPUnet: Networking Abstractions for GPU Programs, Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, October, 2014.
[6]
F. Daoud, A. Watad, M. Silberstein GPUrdma: GPU-side library for high performance networking from GPU kernels, Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, Article No. 6.
[7]
L. Oden, H. Frning, F. Pfreundt Infiniband-Verbs on GPU: A case study of controlling an Infiniband network device from the GPU, Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International.
[8]
S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy, D. K. Panda, Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.
[9]
Infiniband. http://www.mellanox.com/pdf/whitepapers/IB_Intro_WP_190.pdf
[10]
HPGMG https://hpgmg.org
[11]
N. Sakharnykh High-Performance Geometric Multi-Grid with GPU Acceleration. https://devblogs.nvidia.com/parallelforall/high-performance-geometric-multi-grid-gpu-acceleration
[12]
Finite Volume method. https://en.wikipedia.org/wiki/Finite_volume_method
[13]
Full MultiGrid method. https://en.wikipedia.org/wiki/Multigrid_method
[14]
Unified Memory. https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6
[15]
ORNL Titan supercomputer. https://www.olcf.ornl.gov/titan
[16]
Wilkes cluster Cambridge, UK. www.hpc.cam.ac.uk
[17]
ExMaTex. http://www.exmatex.org/comd.html
[18]
CoMD Code. https://github.com/exmatex/CoMD
[19]
CoMD-CUDA Code. https://github.com/NVIDIA/CoMD-CUDA
[20]
Lulesh Website. https://codesign.llnl.gov/lulesh.php
[21]
I. Karlin, J. Keasler, R. Neely LULESH 2.0 Updates and Changes. https://codesign.llnl.gov/lulesh.php, August 2013, pages 1--9
[22]
GPUDirect libmlx5. https://github.com/gpudirect/libmlx5
[23]
GPUDirect libgdsync. https://github.com/gpudirect/libgdsync
[24]
GPUDirect RDMA Considerations. http://docs.nvidia.com/cuda/gpudirectrdma/#design-considerations

Cited By

View all
  • (2024)Snoopie: A Multi-GPU Communication Profiler and VisualizerProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656597(525-536)Online publication date: 30-May-2024
  • (2023)Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in ChargeProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593713(192-202)Online publication date: 21-Jun-2023
  • (2020)FVMProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488820(955-971)Online publication date: 4-Nov-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
May 2017
1167 pages
ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Author Tags

  1. CUDA 8.0
  2. GPUDirect Async
  3. InfiniBand
  4. asynchronous communications

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CCGrid '17
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Snoopie: A Multi-GPU Communication Profiler and VisualizerProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656597(525-536)Online publication date: 30-May-2024
  • (2023)Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in ChargeProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593713(192-202)Online publication date: 21-Jun-2023
  • (2020)FVMProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488820(955-971)Online publication date: 4-Nov-2020
  • (2019)FIDRProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358303(239-252)Online publication date: 12-Oct-2019
  • (2018)ComP-netProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243179(1-13)Online publication date: 1-Nov-2018
  • (2018)DCS-ctrlProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00048(491-504)Online publication date: 2-Jun-2018
  • (2017)GPU triggered networking for intra-kernel communicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126950(1-12)Online publication date: 12-Nov-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media