[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1065944.1065977acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

System-wide performance monitors and their application to the optimization of coherent memory accesses

Published: 15 June 2005 Publication History

Abstract

Inspired by recent advances in microprocessor performance monitors, this paper shows how a shared-memory multiprocessor chipset and interconnect can be equipped with performance monitors that associate performance events with the PCs of the individual instructions causing these events. Such monitors greatly simplify performance debugging of shared-memory programs---for example, they make finding pairs of instructions in false sharing straightforward. These monitors also enable precise feedback-directed compiler optimizations and, as a second contribution, we show how they can guide the code generator to use the version of the load instruction that makes the best use of the coherence protocol. Experiments show up to almost 10% coherence traffic reduction on SPLASH2 applications.

References

[1]
M. E. Acacio et al. Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture. In Proc. of Supercomputing SC02, pages 1--12, 2002.
[2]
Y. Choi, A. Knies, G. Vedaraman, and J. Williamson. Design and experience: Using the Intel Itanium2 processor performance monitoring unit to implement feedback optimizations. In EPIC2 Workshop, 2002.
[3]
M. D. Hill, J. R. Larus, S. K. Reinhardt, and D. A. Wood. Cooperative shared memory: Software and hardware for scalable multiprocessors. ACM Trans. on Comp. Sys., 11(4):300--318, Nov. 1993.
[4]
S. Kaxiras and C. Young. Coherence communication prediction in shared-memory multiprocessors. In Proc. 6th Int'l Symp on High-Performance Architecture, pages 156--167, 2000.
[5]
D. Kim et al. Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors. In Proc. 2nd Symp. on Code Gen. and Optim (CGO), pages 27--38, Palo Alto, CA, Mar. 2004.
[6]
D. Koufaty and J. Torrellas. Compiler support for data forwarding in scalable shared-memory multiprocessors. In Intl. Conf. on Parallel Proc., 1999.
[7]
K. London et al. End-user tools for application performance analysis using hardware counters. In Intl. Conf. on Parallel and Distributed Computing Systems, Aug. 2001.
[8]
C. Luk et al. Ispike: A post-link optimizer for the Intel Itanium2 architecture. In Proc. 2nd Intl. Symp. on Code Generation and Optimization (CGO), pages 15--26, Palo Alto, CA, Mar. 2004.
[9]
M. M. K. Martin et al. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In Proc. 30th Intl. Symp. on Computer Arch. (ISCA), pages 206--217, 2003.
[10]
C. McCurdy and C. Fischer. User-controllable coherence for high performance shared memory multiprocessors. In Proc. of Symp. on Prin. and Practice of Parallel Prog. (PPoPP), pages 73--83, San Diego, June 2003.
[11]
A. Nagarajan, J. Marathe, and F. Mueller. Detailed cache coherence characterization for OpenMP benchmarks. In Proc. Intl. Conf. on Supercomputing (ICS), pages 287--297, Saint-Malo, France, June 2004.
[12]
L. Noordergraaf and R. Zak. SMP system interconnect instrumentation for performance analysis. In Proc. of Supercomputing, SC-2002, Baltimore, Maryland, Nov. 2002.
[13]
M. S. others. Owl: Next generation system monitoring. In Proc. of Computing Frontiers 2005, Ischia, Italy, May 2005.
[14]
M. E. Shaw. Superdome. Interex Enterprise Solutions. http://www.interex.org/pubcontent/enterprise/sep01/frame_usr.html, Sept. 2001.
[15]
T. Suh et al. Evaluating system-wide monitoring capsule design using Xilinx Virtex-II Pro FPGA. In Workshop on Arch. Res. using FPGA Platforms, in conj. with HPCA'05, San Francisco, CA, Feb. 2005.
[16]
S. C. Woo et al. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. 22nd Intl, Symp. on Computer Arch., pages 24--36, Santa Margherita Ligure, Italy, June 1995.

Cited By

View all
  • (2007)Performance monitor unit design for an AXI-based multi-core SoC platformProceedings of the 2007 ACM symposium on Applied computing10.1145/1244002.1244336(1565-1572)Online publication date: 11-Mar-2007
  • (2007)COBRAProceedings of the 2007 International Conference on Parallel Processing10.1109/ICPP.2007.23Online publication date: 10-Sep-2007

Index Terms

  1. System-wide performance monitors and their application to the optimization of coherent memory accesses

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
    June 2005
    310 pages
    ISBN:1595930809
    DOI:10.1145/1065944
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 June 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. coherence traffic
    2. performance monitors

    Qualifiers

    • Article

    Conference

    PPoPP05
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 230 of 1,014 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2007)Performance monitor unit design for an AXI-based multi-core SoC platformProceedings of the 2007 ACM symposium on Applied computing10.1145/1244002.1244336(1565-1572)Online publication date: 11-Mar-2007
    • (2007)COBRAProceedings of the 2007 International Conference on Parallel Processing10.1109/ICPP.2007.23Online publication date: 10-Sep-2007

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media