More Web Proxy on the site http://driver.im/

research-article

Understanding the formation of wait states in applications with one-sided communication

Authors:

Marc-André Hermanns,

Manfred Miklosch,

Felix WolfAuthors Info & Claims

EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

Pages 73 - 78

https://doi.org/10.1145/2488551.2488569

Published: 15 September 2013 Publication History

Abstract

To better understand the formation of wait states in MPI programs and to support the user in finding optimization targets in the case of load imbalance, a major source of wait states, we added in our earlier work two new trace-analysis techniques to Scalasca, a performance analysis tool designed for large-scale applications. In this paper, we show how the two techniques, which were originally restricted to two-sided and collective MPI communication, are extended to cover also one-sided communication. We demonstrate our experiences with benchmark programs and a mini-application representing the core of the POP ocean model.

References

[1]

C. A. Alexander, D. S. Reese, and J. C. Harden. Near-critical path analysis of program activity graphs. In Proc. of the 2nd Intl. Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '94), pages 308--317, Jan. 1994.

Digital Library

[2]

D. Böhme, B. R. de Supinski, M. Geimer, M. Schulz, and F. Wolf. Scalable critical-path based performance analysis. In Proc. of the 26th IEEE Intl. Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.

Digital Library

[3]

D. Böhme, M. Geimer, F. Wolf, and L. Arnold. Identifying the root causes of wait states in large-scale parallel applications. In Proc. of the 39th Intl. Conference on Parallel Processing (ICPP), San Diego, CA, USA, pages 90--100, Sept. 2010.

Digital Library

[4]

M. Geimer, F. Wolf, B. J. N. Wylie, E. Abraham, D. Becker, and B. Mohr. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6):702--719, Apr. 2010.

Digital Library

[5]

M. T. Heath, A. D. Malony, and D. T. Rover. The visual display of parallel performance data. IEEE Computer, 28(11):21--28, November 1995.

Digital Library

[6]

M.-A. Hermanns, M. Geimer, B. Mohr, and F. Wolf. Scalable detection of MPI-2 remote memory access inefficiency patterns. Intl. Journal of High Performance Computing Applications, 26(3):227--236, Aug. 2012.

Digital Library

[7]

J. K. Hollingsworth. An online computation of critical path profiling. In Proc. of the SIGMETRICS symposium on Parallel and distributed tools, 1996.

Digital Library

[8]

L. Kalé, S. Kumar, G. Zheng, and C. Lee. Scaling molecular dynamics to 3000 processors with Projections: A performance analysis case study. In Computational Science --- ICCS 2003, volume 2660 of LNCS, pages 23--32. 2003.

Digital Library

[9]

W. Meira, Jr., T. J. LeBlanc, and V. A. F. Almeida. Using cause-effect analysis to understand the performance of distributed programs. In Proc. of the SIGMETRICS symposium on Parallel and distributed tools, SPDT '98, pages 101--111, 1998.

Digital Library

[10]

W. Meira Jr., T. J. LeBlanc, and A. Poulos. Waiting time analysis and performance visualization in carnival. In SPDT '96: Proc. of the SIGMETRICS symposium on Parallel and distributed tools, pages 1--10, 1996.

Digital Library

[11]

B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The Paradyn parallel performance measurement tool. Computer, 28:37--46, November 1995.

Digital Library

[12]

M. S. Müller, A. Knüpfer, M. Jurenz, M. Lieber, H. Brunst, H. Mix, and W. E. Nagel. Developing scalable applications with Vampir, VampirServer and VampirTrace. In PARCO, pages 637--644, 2007.

[13]

W. B. Sawyer and A. A. Mirin. The implementation of the finite-volume dynamical core in the community atmosphere model. Journal of Computational and Applied Mathematics, 203(2):387--396, 2007.

Digital Library

[14]

M. Schulz. Extracting critical path graphs from MPI applications. In Proc. of the 7th IEEE Intl. Conference on Cluster Computing, September 2005.

[15]

S. S. Shende and A. D. Malony. The TAU parallel performance system. Intl. Journal of High Performance Computing Applications, 20(2):287--311, 2006.

Digital Library

[16]

C. Siebert and J. L. Träff. Efficient MPI implementation of a parallel, stable merge algorithm. In Recent Advances in the Message Passing Interface, volume 7490 of LNCS, pages 204--213, Sept. 2012.

Digital Library

[17]

A. Stone, J. Dennis, and M. M. Strout. The CGPOP miniapp, version 1.0. Technical Report CS-11-103, Colorado State University, July 2011.

[18]

H.-H. Su, M. Billingsley, and A. D. George. Parallel Performance Wizard: A performance system for the analysis of partitioned global-address-space applications. Intl. Journal of High Performance Computing Applications, 24:485--510, November 2010.

Digital Library

[19]

H.-H. Su, D. Bonachea, A. Leko, H. Sherburne, M. Billingsley, III., and A. D. George. GASP! a standardized performance analysis tool interface for global address space programming models. In PARA'06: Proc. of the 8th international conference on Applied parallel computing, pages 450--459, 2007.

Digital Library

[20]

N. R. Tallent, L. Adhianto, and J. Mellor-Crummey. Scalable identification of load imbalance in parallel executions using call path profiles. In Supercomputing 2010, Nov. 2010.

Digital Library

Cited By

Korkhov VGankevich IGavrikov AMingazova MPetriakov ITereshchenko DShatalin ASlobodskoy V(2023)Finding Bottlenecks in Message Passing Interface Programs by Scalable Critical Path AnalysisAlgorithms10.3390/a1611050516:11(505)Online publication date: 31-Oct-2023
https://doi.org/10.3390/a16110505
Saillard ESergent MAit Kaci CBarthou D(2022)Static Local Concurrency Errors Detection in MPI-RMA Programs2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)10.1109/Correctness56720.2022.00008(18-26)Online publication date: Nov-2022
https://doi.org/10.1109/Correctness56720.2022.00008
Diep TFürlinger KThoai N(2018)MC-CCheckerProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236369(1-11)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/3236367.3236369
Show More Cited By

Understanding the formation of wait states in applications with one-sided communication

Recommendations

Implementing OpenSHMEM Using MPI-3 One-Sided Communication
OpenSHMEM 2014: Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356

This paper reports the design and implementation of Open- SHMEM over MPI using new one-sided communication features in MPI- 3, which include not only new functions (e.g. remote atomics) but also a newmemory model that is consistent with that of SHMEM.We ...
Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication

One-sided communication in Message Passing Interface (MPI) requires the use of one of three different synchronization mechanisms, which indicate when the one-sided operation can be started and when the operation is completed. Efficient implementation of ...
Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication
IPDPS '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium

The industry-standard Message Passing Interface (MPI) provides one-sided communication functionality and is available on virtually every parallel computing system. However, it is believed that MPI's one-sided model is not rich enough to support higher-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

September 2013

289 pages

ISBN:9781450319034

DOI:10.1145/2488551

General Chair:
Jack Dongarra
University of Tennessee
,
Program Chairs:
Javier Garcia Blas
University Carlos III, Spain
,
Jesus Carretero
University Carlos III, Spain

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EuroMPI '13

Sponsor:

ARCOS

EuroMPI '13: 20th European MPI Users's Group Meeting

September 15 - 18, 2013

Madrid, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
94
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Korkhov VGankevich IGavrikov AMingazova MPetriakov ITereshchenko DShatalin ASlobodskoy V(2023)Finding Bottlenecks in Message Passing Interface Programs by Scalable Critical Path AnalysisAlgorithms10.3390/a1611050516:11(505)Online publication date: 31-Oct-2023
https://doi.org/10.3390/a16110505
Saillard ESergent MAit Kaci CBarthou D(2022)Static Local Concurrency Errors Detection in MPI-RMA Programs2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)10.1109/Correctness56720.2022.00008(18-26)Online publication date: Nov-2022
https://doi.org/10.1109/Correctness56720.2022.00008
Diep TFürlinger KThoai N(2018)MC-CCheckerProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236369(1-11)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/3236367.3236369
Herold CKrzikalla OKnupfer A(2017)Optimizing One-Sided Communication of Parallel Applications Using Critical Path Methods2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.64(567-576)Online publication date: May-2017
https://doi.org/10.1109/IPDPSW.2017.64
Kowalewski RFürlinger K(2017)Debugging Latent Synchronization Errors in MPI-3 One-Sided CommunicationTools for High Performance Computing 201610.1007/978-3-319-56702-0_5(83-96)Online publication date: 9-May-2017
https://doi.org/10.1007/978-3-319-56702-0_5
Böhme DGeimer MArnold LVoigtlaender FWolf F(2016)Identifying the Root Causes of Wait States in Large-Scale Parallel ApplicationsACM Transactions on Parallel Computing10.1145/29346613:2(1-24)Online publication date: 20-Jul-2016
https://dl.acm.org/doi/10.1145/2934661
Kowalewski RFürlinger K(2016)Nasty-MPIProceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 983310.1007/978-3-319-43659-3_4(51-62)Online publication date: 24-Aug-2016
https://dl.acm.org/doi/10.1007/978-3-319-43659-3_4
Chen ZDinan JTang ZBalaji PZhong HWei JHuang TQin FDamkroger TDongarra J(2014)MC-checkerProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.46(499-510)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.46
Schmitt FStolle JDietrich R(2014)CASITAProceedings of the 2014 43rd International Conference on Parallel Processing Workshops10.1109/ICPPW.2014.35(186-195)Online publication date: 9-Sep-2014
https://dl.acm.org/doi/10.1109/ICPPW.2014.35

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents