[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2488551.2488569acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Understanding the formation of wait states in applications with one-sided communication

Published: 15 September 2013 Publication History

Abstract

To better understand the formation of wait states in MPI programs and to support the user in finding optimization targets in the case of load imbalance, a major source of wait states, we added in our earlier work two new trace-analysis techniques to Scalasca, a performance analysis tool designed for large-scale applications. In this paper, we show how the two techniques, which were originally restricted to two-sided and collective MPI communication, are extended to cover also one-sided communication. We demonstrate our experiences with benchmark programs and a mini-application representing the core of the POP ocean model.

References

[1]
C. A. Alexander, D. S. Reese, and J. C. Harden. Near-critical path analysis of program activity graphs. In Proc. of the 2nd Intl. Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '94), pages 308--317, Jan. 1994.
[2]
D. Böhme, B. R. de Supinski, M. Geimer, M. Schulz, and F. Wolf. Scalable critical-path based performance analysis. In Proc. of the 26th IEEE Intl. Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.
[3]
D. Böhme, M. Geimer, F. Wolf, and L. Arnold. Identifying the root causes of wait states in large-scale parallel applications. In Proc. of the 39th Intl. Conference on Parallel Processing (ICPP), San Diego, CA, USA, pages 90--100, Sept. 2010.
[4]
M. Geimer, F. Wolf, B. J. N. Wylie, E. Abraham, D. Becker, and B. Mohr. The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience, 22(6):702--719, Apr. 2010.
[5]
M. T. Heath, A. D. Malony, and D. T. Rover. The visual display of parallel performance data. IEEE Computer, 28(11):21--28, November 1995.
[6]
M.-A. Hermanns, M. Geimer, B. Mohr, and F. Wolf. Scalable detection of MPI-2 remote memory access inefficiency patterns. Intl. Journal of High Performance Computing Applications, 26(3):227--236, Aug. 2012.
[7]
J. K. Hollingsworth. An online computation of critical path profiling. In Proc. of the SIGMETRICS symposium on Parallel and distributed tools, 1996.
[8]
L. Kalé, S. Kumar, G. Zheng, and C. Lee. Scaling molecular dynamics to 3000 processors with Projections: A performance analysis case study. In Computational Science --- ICCS 2003, volume 2660 of LNCS, pages 23--32. 2003.
[9]
W. Meira, Jr., T. J. LeBlanc, and V. A. F. Almeida. Using cause-effect analysis to understand the performance of distributed programs. In Proc. of the SIGMETRICS symposium on Parallel and distributed tools, SPDT '98, pages 101--111, 1998.
[10]
W. Meira Jr., T. J. LeBlanc, and A. Poulos. Waiting time analysis and performance visualization in carnival. In SPDT '96: Proc. of the SIGMETRICS symposium on Parallel and distributed tools, pages 1--10, 1996.
[11]
B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. The Paradyn parallel performance measurement tool. Computer, 28:37--46, November 1995.
[12]
M. S. Müller, A. Knüpfer, M. Jurenz, M. Lieber, H. Brunst, H. Mix, and W. E. Nagel. Developing scalable applications with Vampir, VampirServer and VampirTrace. In PARCO, pages 637--644, 2007.
[13]
W. B. Sawyer and A. A. Mirin. The implementation of the finite-volume dynamical core in the community atmosphere model. Journal of Computational and Applied Mathematics, 203(2):387--396, 2007.
[14]
M. Schulz. Extracting critical path graphs from MPI applications. In Proc. of the 7th IEEE Intl. Conference on Cluster Computing, September 2005.
[15]
S. S. Shende and A. D. Malony. The TAU parallel performance system. Intl. Journal of High Performance Computing Applications, 20(2):287--311, 2006.
[16]
C. Siebert and J. L. Träff. Efficient MPI implementation of a parallel, stable merge algorithm. In Recent Advances in the Message Passing Interface, volume 7490 of LNCS, pages 204--213, Sept. 2012.
[17]
A. Stone, J. Dennis, and M. M. Strout. The CGPOP miniapp, version 1.0. Technical Report CS-11-103, Colorado State University, July 2011.
[18]
H.-H. Su, M. Billingsley, and A. D. George. Parallel Performance Wizard: A performance system for the analysis of partitioned global-address-space applications. Intl. Journal of High Performance Computing Applications, 24:485--510, November 2010.
[19]
H.-H. Su, D. Bonachea, A. Leko, H. Sherburne, M. Billingsley, III., and A. D. George. GASP! a standardized performance analysis tool interface for global address space programming models. In PARA'06: Proc. of the 8th international conference on Applied parallel computing, pages 450--459, 2007.
[20]
N. R. Tallent, L. Adhianto, and J. Mellor-Crummey. Scalable identification of load imbalance in parallel executions using call path profiles. In Supercomputing 2010, Nov. 2010.

Cited By

View all
  • (2023)Finding Bottlenecks in Message Passing Interface Programs by Scalable Critical Path AnalysisAlgorithms10.3390/a1611050516:11(505)Online publication date: 31-Oct-2023
  • (2022)Static Local Concurrency Errors Detection in MPI-RMA Programs2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)10.1109/Correctness56720.2022.00008(18-26)Online publication date: Nov-2022
  • (2018)MC-CCheckerProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236369(1-11)Online publication date: 23-Sep-2018
  • Show More Cited By
  1. Understanding the formation of wait states in applications with one-sided communication

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
        September 2013
        289 pages
        ISBN:9781450319034
        DOI:10.1145/2488551
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 15 September 2013

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. critical path
        2. one-sided communication
        3. performance analysis
        4. performance optimization
        5. root cause

        Qualifiers

        • Research-article

        Conference

        EuroMPI '13
        Sponsor:
        • ARCOS
        EuroMPI '13: 20th European MPI Users's Group Meeting
        September 15 - 18, 2013
        Madrid, Spain

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 11 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Finding Bottlenecks in Message Passing Interface Programs by Scalable Critical Path AnalysisAlgorithms10.3390/a1611050516:11(505)Online publication date: 31-Oct-2023
        • (2022)Static Local Concurrency Errors Detection in MPI-RMA Programs2022 IEEE/ACM Sixth International Workshop on Software Correctness for HPC Applications (Correctness)10.1109/Correctness56720.2022.00008(18-26)Online publication date: Nov-2022
        • (2018)MC-CCheckerProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236369(1-11)Online publication date: 23-Sep-2018
        • (2017)Optimizing One-Sided Communication of Parallel Applications Using Critical Path Methods2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2017.64(567-576)Online publication date: May-2017
        • (2017)Debugging Latent Synchronization Errors in MPI-3 One-Sided CommunicationTools for High Performance Computing 201610.1007/978-3-319-56702-0_5(83-96)Online publication date: 9-May-2017
        • (2016)Identifying the Root Causes of Wait States in Large-Scale Parallel ApplicationsACM Transactions on Parallel Computing10.1145/29346613:2(1-24)Online publication date: 20-Jul-2016
        • (2016)Nasty-MPIProceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 983310.1007/978-3-319-43659-3_4(51-62)Online publication date: 24-Aug-2016
        • (2014)MC-checkerProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.46(499-510)Online publication date: 16-Nov-2014
        • (2014)CASITAProceedings of the 2014 43rd International Conference on Parallel Processing Workshops10.1109/ICPPW.2014.35(186-195)Online publication date: 9-Sep-2014

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media