[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3624062.3624143acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

PEAK: a Light-Weight Profiler for HPC Systems

Published: 12 November 2023 Publication History

Abstract

In the context of the expanding landscape of contemporary High-Performance Computing (HPC) applications from petascale to exascale, the pursuit of performance optimization emerges as a significant impediment within software development endeavors. In the meantime, the escalating intricacies inherent in parallel architectures and systems serve to compound the challenges associated with performance enhancement. Here, we introduce PEAK (Performance Evaluation and Analysis Kit), a light-weight profiling tool developed with a specific focus on large-scale HPC applications. Using Dynamic Binary Instrumentation, PEAK is able to profile large-scale multi-threaded, multi-process applications with low overhead and high accuracy. We analyzed the overhead and accuracy of PEAK using synthetic benchmarks and real applications and compared it against the other widely used HPC profiling tools available. Our demonstration underscores that PEAK exhibits comparable overhead and accuracy to alternative profiling tools, while preserving its inherent simplicity.

Supplemental Material

MP4 File
Recording of "PEAK: a Light-Weight Profiler for HPC Systems" presentation at HUST-23.

References

[1]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. 2010. HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience 22, 6 (2010), 685–701. https://doi.org/10.1002/cpe.1553 _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.1553.
[2]
Andrew R. Bernat and Barton P. Miller. 2011. Anywhere, any-time binary instrumentation. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (2011-09-05). ACM, Szeged Hungary, 9–16. https://doi.org/10.1145/2024569.2024572
[3]
Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization (2003-03-23) (CGO ’03). IEEE Computer Society, USA, 265–275.
[4]
Galen Hunt and Doug Brubacher. 1999. Detours: Binary Interception of Win32 Functions. In Proceedings of the 3rd Conference on USENIX Windows NT Symposium - Volume 3 (Seattle, Washington) (WINSYM’99). USENIX Association, USA, 14.
[5]
Christopher January, Jonathan Byrd, Xavier Oró, and Mark O’Connor. 2015. Allinea MAP: Adding Energy and OpenMP Profiling Without Increasing Overhead. In Tools for High Performance Computing 2014, Christoph Niethammer, José Gracia, Andreas Knüpfer, Michael M. Resch, and Wolfgang E. Nagel (Eds.). Springer International Publishing, Cham, 25–35. https://doi.org/10.1007/978-3-319-16012-2_2
[6]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. ACM SIGPLAN Notices 40, 6 (2005), 190–200. https://doi.org/10.1145/1064978.1065034
[7]
Dieter An Mey, Scott Biersdorf, Christian Bischof, Kai Diethelm, Dominic Eschweiler, Michael Gerndt, Andreas Knüpfer, Daniel Lorenz, Allen Malony, Wolfgang E. Nagel, Yury Oleynik, Christian Rössel, Pavel Saviankou, Dirk Schmidl, Sameer Shende, Michael Wagner, Bert Wesarg, and Felix Wolf. 2011. Score-P: A Unified Performance Measurement System for Petascale Applications. In Competence in High Performance Computing 2010, Christian Bischof, Heinz-Gerd Hegering, Wolfgang E. Nagel, and Gabriel Wittum (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 85–97. https://doi.org/10.1007/978-3-642-24025-6_8
[8]
Ole André V Ravnås. 2019. Frida: Dynamic instrumentation toolkit for developers, reverse-engineers, and security researchers.
[9]
James Reinders. 2005. VTune performance analyzer essentials: measurement and tuning techniques for software developers (1. print ed.). Intel Press, Hillsboro, Or.
[10]
Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. The International Journal of High Performance Computing Applications 20, 2 (May 2006), 287–311. https://doi.org/10.1177/1094342006064482 Publisher: SAGE Publications Ltd STM.
[11]
Dan Stanzione, John West, R. Todd Evans, Tommy Minyard, Omar Ghattas, and Dhabaleswar K. Panda. 2020. Frontera: The Evolution of Leadership Computing at the National Science Foundation. In Practice and Experience in Advanced Research Computing (Portland, OR, USA) (PEARC ’20). Association for Computing Machinery, New York, NY, USA, 106–111. https://doi.org/10.1145/3311790.3396656
[12]
Jeffrey Vetter and Chris Chambreau. 2005. mpiP: Lightweight, Scalable MPI Profiling.

Cited By

View all
  • (2024)Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-HopperPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670561(1-5)Online publication date: 17-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Check for updates

Author Tags

  1. application performance
  2. profiling
  3. system tools

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)368
  • Downloads (Last 6 weeks)36
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-HopperPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670561(1-5)Online publication date: 17-Jul-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media