[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
poster

Effective communication and computation overlap with hybrid MPI/SMPSs

Published: 09 January 2010 Publication History

Abstract

Communication overhead is one of the dominant factors affecting performance in high-performance computing systems. To reduce the negative impact of communication, programmers overlap communication and computation by using asynchronous communication primitives. This increases code complexity, requiring more development effort and making less readable programs. This paper presents the hybrid use of MPI and SMPSs (SMP superscalar, a task-based shared-memory programming model) that allows the programmer to easily introduce the asynchrony necessary to overlap communication and computation. We demonstrate the hybrid use of MPI/SMPSs with the high-performance LINPACK benchmark (HPL), and compare it to the pure MPI implementation, which uses the look-ahead technique to overlap communication and computation. The hybrid MPI/SMPSs version significantly improves the performance of the pure MPI version, getting close to the asymptotic performance at medium problem sizes and still getting significant benefits at small/large problem sizes.

References

[1]
J.M. Perez, L. Martinell, R.M. Badia and J. Labarta. A dependency aware task-based programming environment for multi-core architecture. Proceedings of IEEE Cluster 2008, September 2008.
[2]
V. Marjanovic, J.M. Perez, E. Ayguadé, J. Labarta and M. Valero. Overlapping Communication and Computation by Using a Hybrid MPI/SMPSs Approach. UPC-DAC-RR-2009-35 Research Report, Technical University of Catalunya. May 2009.
[3]
J. Dongarra, P. Luszczek and A. Petitet. The LINPACK Benchmark: Past, Present and Future. Concurrency and Computation: Practice and Experience. Vol. 15, issue 9, pp. 803--820. 2003.

Cited By

View all
  • (2016)The AXIOM software layersMicroprocessors & Microsystems10.1016/j.micpro.2016.07.00247:PB(262-277)Online publication date: 1-Nov-2016
  • (2015)RadFlowProceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)10.1109/SBAC-PADW.2015.26(115-119)Online publication date: 18-Oct-2015
  • (2011)Quantifying the Potential Task-Based Dataflow Parallelism in MPI ApplicationsEuro-Par 2011 Parallel Processing10.1007/978-3-642-23400-2_5(39-51)Online publication date: 2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 45, Issue 5
PPoPP '10
May 2010
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1837853
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    January 2010
    372 pages
    ISBN:9781605588773
    DOI:10.1145/1693453

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2010
Published in SIGPLAN Volume 45, Issue 5

Check for updates

Author Tags

  1. hybrid mpi/smpss
  2. linpack
  3. mpi
  4. parallel programming model

Qualifiers

  • Poster

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)The AXIOM software layersMicroprocessors & Microsystems10.1016/j.micpro.2016.07.00247:PB(262-277)Online publication date: 1-Nov-2016
  • (2015)RadFlowProceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)10.1109/SBAC-PADW.2015.26(115-119)Online publication date: 18-Oct-2015
  • (2011)Quantifying the Potential Task-Based Dataflow Parallelism in MPI ApplicationsEuro-Par 2011 Parallel Processing10.1007/978-3-642-23400-2_5(39-51)Online publication date: 2011
  • (2021)Communication-Aware Task Scheduling Strategy in Hybrid MPI+OpenMP ApplicationsOpenMP: Enabling Massive Node-Level Parallelism10.1007/978-3-030-85262-7_14(197-210)Online publication date: 8-Sep-2021
  • (2016)A Microcoded Kernel Recursive Least Squares Processor Using FPGA TechnologyACM Transactions on Reconfigurable Technology and Systems10.1145/295006110:1(1-22)Online publication date: 24-Sep-2016
  • (2013)Polyhedral parallel code generation for CUDAACM Transactions on Architecture and Code Optimization10.1145/2400682.24007139:4(1-23)Online publication date: 20-Jan-2013
  • (2013)OpenStreamACM Transactions on Architecture and Code Optimization10.1145/2400682.24007129:4(1-25)Online publication date: 20-Jan-2013
  • (2012)Productive Programming of GPU Clusters with OmpSsProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2012.58(557-568)Online publication date: 21-May-2012
  • (2012)A high-productivity task-based programming model for clustersConcurrency and Computation: Practice & Experience10.1002/cpe.283124:18(2421-2448)Online publication date: 1-Dec-2012
  • (2011)Productive cluster programming with OmpSsProceedings of the 17th international conference on Parallel processing - Volume Part I10.5555/2033345.2033405(555-566)Online publication date: 29-Aug-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media