[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/370049.370424acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

A scalable cross-platform infrastructure for application performance tuning using hardware counters

Published: 01 November 2000 Publication History

Abstract

The purpose of the PAPI project is to specify a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count “events”, which are occurrences of specific signals and states related to the processor's function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis and tuning. The PAPI project has proposed a standard set of hardware events and a standard cross-platform library interface to the underlying counter hardware. The PAPI library has been or is in the process of being implemented on all major HPC platforms. The PAPI project is developing end-user tools for dynamically selecting and displaying hardware counter performance data. PAPI support is also being incorporated into a number of third-party tools.

References

[1]
Stephan Andersson, Ron Bell, John Hague, Holger Holthoff, Peter Mayes, Jun Nakano, Danny Shieh, and Jim Tuccillo. POWER3 Introduction and Tuning Guide. IBM, October 1998. http://www.redbooks.ibm.com]]
[2]
Rudolph Berrendorf and Heinz Ziegler. PCL - the Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors, Version 1.3. http://www.fz-juelich.de/zam/PCL/]]
[3]
Mark Brehob, Travis Doom, Richard Enbody, William H Moore, Sherry Q. Moore, Ron Sass, Charles Severance, "Beyond RISC - The Post-RISC Architecture", Michigan State University Department of Computer Science, Technical Report CPS-96-11, March 1996.]]
[4]
David Cortesi, Origin 2000 and Onyx2 Performance Tuning and Optimization Guide. Document Number 007-3430-002, Silicon Graphics Inc., 1998. http://techpubs.sgi.com/]]
[5]
Luiz DeRose and Daniel A. Reed. "SvPablo: A Multi-Language Performance Analysis System", Proceedings of the 1999 International Conference on Parallel Processing, September 1999, pp. 311-318.]]
[6]
Luiz DeRose, Ying Zhang, and Daniel A. Reed. "SvPablo: A Multi-Language Performance Analysis System", 10 th International Conference on Computer Performance Evaluation - Modeling Techniques and Tools - Performance Tools'98, pp. 352-355. Palma de Mallorca, Spain, September 1998. http://vibes.cs.uiuc.edu/]]
[7]
John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach, second edition. Morgan Kaufmann, 1996.]]
[8]
Curtis L. Janssen. The Visual Profiler, Version 0.4, October 1999. http://aros.ca.sandia.gov/~cljanss/perf/vprof/doc/README.html]]
[9]
Luc Smolders. System and Kernel Thread Performance Monitor API Reference Guide, Version 0.5. IBM RS/6000 Division, May 1999.]]

Cited By

View all
  • (2020)Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilersProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377916(317-329)Online publication date: 22-Feb-2020
  • (2018)TEMProfProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00076(881-893)Online publication date: 20-Oct-2018
  • (2017)Performance analysis and comparison of cellular automata GPU implementationsCluster Computing10.1007/s10586-017-0850-320:3(2763-2777)Online publication date: 1-Sep-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing
November 2000
889 pages
ISBN:0780398025

Sponsors

In-Cooperation

  • SIAM: Society for Industrial and Applied Mathematics

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2000

Check for updates

Qualifiers

  • Article

Conference

SC '00
Sponsor:

Acceptance Rates

SC '00 Paper Acceptance Rate 62 of 179 submissions, 35%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)10
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilersProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377916(317-329)Online publication date: 22-Feb-2020
  • (2018)TEMProfProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00076(881-893)Online publication date: 20-Oct-2018
  • (2017)Performance analysis and comparison of cellular automata GPU implementationsCluster Computing10.1007/s10586-017-0850-320:3(2763-2777)Online publication date: 1-Sep-2017
  • (2016)ErnestProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930635(363-378)Online publication date: 16-Mar-2016
  • (2016)So many performance events, so little timeProceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/2967360.2967375(1-9)Online publication date: 4-Aug-2016
  • (2016)The case for colocation of high performance computing workloadsConcurrency and Computation: Practice & Experience10.1002/cpe.318728:2(232-251)Online publication date: 1-Feb-2016
  • (2015)Enhancing the usability and utilization of accelerated architectures via dockerProceedings of the 8th International Conference on Utility and Cloud Computing10.5555/3233397.3233456(361-367)Online publication date: 7-Dec-2015
  • (2015)Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiencyACM SIGPLAN Notices10.1145/2858788.268851450:8(205-214)Online publication date: 24-Jan-2015
  • (2015)Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiencyProceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/2688500.2688514(205-214)Online publication date: 24-Jan-2015
  • (2015)Modeling gather and scatter with hardware performance counters for Xeon PhiProceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2015.59(713-716)Online publication date: 4-May-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media