[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2259016.2259040acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Phase guided profiling for fast cache modeling

Published: 31 March 2012 Publication History

Abstract

Statistical cache models are powerful tools for understanding application behavior as a function of cache allocation. However, previous techniques have modeled only the average application behavior, which hides the effect of program variations over time. Without detailed time-based information, transient behavior, such as exceeding bandwidth or cache capacity, may be missed. Yet these events, while short, often play a disproportionate role and are critical to understanding program behavior.
In this work we extend earlier techniques to incorporate program phase information when collecting runtime profiling data. This allows us to model an application's cache miss ratio as a function of its cache allocation over time. To reduce overhead and improve accuracy we use online phase detection and phase-guided profiling. The phase-guided profiling reduces overhead by more intelligently selecting portions of the application to sample, while accuracy is improved by combining samples from different instances of the same phase.
The result is a new technique that accurately models the time-varying behavior of an application's miss ratio as a function of its cache allocation on modern hardware. By leveraging phase-guided profiling, this work both improves on the accuracy of previous techniques and reduces the overhead.

References

[1]
Threadspotter, 2010. URL http://www.roguewave.com/products/threadspotter.aspx.
[2]
Intel VTune Amplifier XE 2011 Getting Started Tutorials for Linux* OS, 2010. Section Key Concept: Event Skid.
[3]
K. K. Agaram, S. W. Keckler, C. Lin, and K. S. McKinley. Decomposing memory performance: data structures and phases. In Int. Symposium on Memory management, 2006.
[4]
M. Annavaram, R. Rakvic, M. Polito, J.-Y. Bouguet, R. A. Hankins, and B. Davies. The fuzzy correlation between code and performance predictability. In Int. Symposium on Microarchitecture, 2004.
[5]
E. Berg and E. Hagersten. Statcache: a probabilistic approach to efficient and accurate data locality analysis. In Int. Symposium on Performance Analysis of Systems and Software, 2004.
[6]
E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In Int. Conf. on Measurement and Modeling of Computer Systems, 2005.
[7]
E. Berg, H. Zeffer, and E. Hagersten. A statistical multiprocessor cache model. In Int. Symposium on Performance Analysis of Systems and Software, 2006.
[8]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Conf. on Programming Language Design and Implementation, 2005.
[9]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In Int. Symposium on High-Performance Computer Architecture, 2005.
[10]
A. S. Dhodapkar and J. E. Smith. Managing multi-configuration hardware via dynamic working set analysis. In Int. Symposium on Computer Architecture, 2002.
[11]
A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Int. Symposium on Microarchitecture, 2003.
[12]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, chapter 10.11. Online Clustering, pages 559--565. Wiley-Interscience, 2 edition, 2001. ISBN 0-471-05669-3.
[13]
J. Edler and M. D. Hill. Dinero iv trace-driven uniprocessor cache simulator, 1998. URL http://www.cs.wisc.edu/~markhill/DineroIV.
[14]
D. Eklov and E. Hagersten. Statstack: Efficient modeling of lru caches. In Int. Symposium on Performance Analysis of Systems Software, 2010.
[15]
D. Eklov, D. Black-Schaffer, and E. Hagersten. Fast modeling of shared caches in multicore systems. In Int. Conf. on High Performance and Embedded Architectures and Compilers, 2011.
[16]
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proceedings on USENIX Annual Technical Conference, 2005.
[17]
J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006.
[18]
Intel 64 and IA-32 Architectures Software Developer's Manual. Intel Corporation, volume 3b: system programming guide edition, September 2010. 30.4.4 Precise Event Based Sampling (PEBS).
[19]
C. Isci, G. Contreras, and M. Martonosi. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In Int. Symposium on Microarchitecture, 2006.
[20]
J. Lau, S. Schoemackers, and B. Calder. Structures for phase classification. In Int. Symposium on Performance Analysis of Systems and Software, 2004.
[21]
J. Lau, E. Perelman, G. Hamerly, T. Sherwood, and B. Calder. Motivation for variable length intervals and hierarchical phase behavior. In Int. Symposium on Performance Analysis of Systems and Software, 2005.
[22]
J. Lau, J. Sampson, E. Perelman, G. Hamerly, and B. Calder. The strong correlation between code signatures and performance. In Int. Symposium on Performance Analysis of Systems and Software, 2005.
[23]
J. Lau, S. Schoenmackers, and B. Calder. Transition phase classification and prediction. In Int. Symposium on High-Performance Computer Architecture, 2005.
[24]
A. R. Lebeck and D. A. Wood. Cache profiling and the spec benchmarks: A case study. Computer, October 1994.
[25]
D. Levinthal. Performance analysis guide for intel core i7 processor and intel xeon 5500 processors. Technical Report Version 1.0, Intel Corporation, 2009.
[26]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970.
[27]
P. Nagpurkar, C. Krintz, and T. Sherwood. Phase-aware remote profiling. In Int. Symposium on Code Generation and Optimization, 2005.
[28]
P. Nagpurkar, C. Krintz, M. Hind, P. F. Sweeney, and V. T. Rajan. Online phase detection algorithms. In Int. Symposium on Code Generation and Optimization, 2006.
[29]
N. Peleg and B. Mendelson. Detecting change in program behavior for adaptive optimization. In Int. Conf. on Parallel Architecture and Compilation Techniques, 2007.
[30]
E. Perelman, M. Polito, J. yves Bouguet, J. Sampson, B. Calder, and C. Dulong. Detecting phases in parallel applications on shared memory architectures. In Int. Symposium on Parallel and Distributed Processing, 2006.
[31]
A. Sembrant, D. Eklov, and E. Hagersten. Efficient software-based online phase classification. In Int. Symposium on Workload Characterization, 2011.
[32]
X. Shen, Y. Zhong, and C. Ding. Locality phase prediction. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 2004.
[33]
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Int. Conf. on Parallel Architecture and Compilation Techniques, 2001.
[34]
T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. In Int. Conf. on Architectural Support for Programming Languages and Operating Systems, 2002.
[35]
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Int. Symposium on Computer Architecture, 2003.
[36]
T. Sondag and H. Rajan. Phase-based tuning for better utilization of performance-asymmetric multicore processors. In Int. Symposium on Code Generation and Optimization, 2011.
[37]
D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. Rapidmrc: approximating l2 miss rate curves on commodity systems for online optimizations. In Int. Conf. on Architectural support for Programming Languages and Operating Systems, 2009.

Cited By

View all
  • (2024)Beyond Time-Quantum: A Basic-Block FDA Approach for Accurate System Computing Performance Estimation2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473915(698-703)Online publication date: 22-Jan-2024
  • (2023)FLORIA: A Fast and Featherlight Approach for Predicting Cache PerformanceProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593740(25-36)Online publication date: 21-Jun-2023
  • (2021)A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006ACM Transactions on Architecture and Code Optimization10.1145/344620018:2(1-20)Online publication date: 8-Mar-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
March 2012
285 pages
ISBN:9781450312066
DOI:10.1145/2259016
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

CGO '12

Acceptance Rates

CGO '12 Paper Acceptance Rate 26 of 90 submissions, 29%;
Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Beyond Time-Quantum: A Basic-Block FDA Approach for Accurate System Computing Performance Estimation2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473915(698-703)Online publication date: 22-Jan-2024
  • (2023)FLORIA: A Fast and Featherlight Approach for Predicting Cache PerformanceProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593740(25-36)Online publication date: 21-Jun-2023
  • (2021)A Reusable Characterization of the Memory System Behavior of SPEC2017 and SPEC2006ACM Transactions on Architecture and Code Optimization10.1145/344620018:2(1-20)Online publication date: 8-Mar-2021
  • (2019)DynaSprintProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358301(426-439)Online publication date: 12-Oct-2019
  • (2019)Safer Program Behavior Sharing Through Trace WringingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304074(1059-1072)Online publication date: 4-Apr-2019
  • (2019)Analysis of cache behaviour and software optimizations for faster on-chip network simulationsInternational Journal of System Assurance Engineering and Management10.1007/s13198-019-00799-5Online publication date: 14-May-2019
  • (2018)KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2018.00019(104-117)Online publication date: Feb-2018
  • (2016)On Detecting and Using Memory Phases in Multimedia SystemsProceedings of the 14th ACM/IEEE Symposium on Embedded Systems for Real-Time Multimedia10.1145/2993452.2993566(57-66)Online publication date: 1-Oct-2016
  • (2016)Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent CharacteristicsIEEE Transactions on Computers10.1109/TC.2016.254738765:12(3537-3551)Online publication date: 1-Dec-2016
  • (2016)CoolSim: Statistical techniques to replace cache warming with efficient, virtualized profiling2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818337(106-115)Online publication date: Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media