More Web Proxy on the site http://driver.im/

research-article

Calipers: a criticality-aware framework for modeling processor performance

Authors:

Hossein Golestani,

Gagan GuptaAuthors Info & Claims

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

Article No.: 2, Pages 1 - 14

https://doi.org/10.1145/3524059.3532390

Published: 28 June 2022 Publication History

Abstract

Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, at low effort and desired accuracy. Cycle Accurate Simulators (CAS), commonly used to explore computer designs, can be slow and cumbersome to develop. We propose a complementary approach, Calipers, a criticality-based framework, to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and event dependencies, compute critical paths, and analyze architecture bottlenecks. By manipulating the graph, Calipers enables investigation of a wide range of Instruction Set Architecture (ISA) and microarchitecture design choices/"what-if" scenarios during both early- and late-stage design space exploration without recompiling and rerunning the program. Calipers can model in-order and out-of-order microarchitectures, structural hazards, and different types of ISAs, and can evaluate multiple ideas in a single run. Related modeling algorithms are described in detail. We apply Calipers to explore and gain insights in a variety of complex microarchitectural and ISA ideas for RISC and EDGE processors, at lower effort than CAS and with comparable accuracy. For example, experiments show that targeting only a fraction of critical loads can help realize most benefits of value prediction.

References

[1]

2022. Arm Research Starter Kit: System Modeling using gem5. https://github.com/arm-university/arm-gem5-rsk/blob/master/gem5_rsk_gem5-21.2.pdf.

[2]

2022. C++ set emplace_hint. https://www.cplusplus.com/reference/set/set/emplace_hint/.

[3]

2022. gem5 O3CPU. https://www.gem5.org/documentation/general_docs/cpu_models/O3CPU.

[4]

2022. Google Scholar, gem5. https://scholar.google.com/scholar?q=gem5.

[5]

2022. Intel Skylake Microarchitecture. https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client).

[6]

2022. Intel Sunny Cove Microarchitecture. https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove.

[7]

Grant Ayers, Nayana Prasad Nagendra, David I. August, Hyoun Kyu Cho, Svilen Kanev, Christos Kozyrakis, Trivikram Krishnamurthy, Heiner Litz, Tipp Moseley, and Parthasarathy Ranganathan. 2019. AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 462--473.

Digital Library

[8]

Sumeet Bandishte, Jayesh Gaur, Zeev Sperber, Lihu Rappoport, Adi Yoaz, and Sreenivas Subramoney. 2020. Focused Value Prediction. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (Virtual Event) (ISCA '20). IEEE Press, 79--91.

Digital Library

[9]

Nathan L. Binkert, Bradford M. Beckmann, Gabriel Black, Steven K. Reinhardt, Ali G. Saidi, Arkaprava Basu, Joel Hestness, Derek Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib Bin Altaf, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.

Digital Library

[10]

Maximilien B. Breughe, Stijn Eyerman, and Lieven Eeckhout. 2015. Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance. ACM Trans. Archit. Code Optim. 11, 4, Article 50 (Jan. 2015), 26 pages.

Digital Library

[11]

D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. 2004. Scaling to the end of silicon with EDGE architectures. Computer 37, 7 (July 2004), 44--55.

Digital Library

[12]

Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Transactions on Architecture and Code Optimization (TACO), Article 5 (2014), 23 pages.

Digital Library

[13]

Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Trans. Archit. Code Optim. 11, 3, Article 28 (Aug. 2014), 25 pages.

Digital Library

[14]

Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2022. Introduction to algorithms. MIT press.

[15]

Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2006. A Performance Counter Architecture for Computing Accurate CPI Components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California, USA) (ASPLOS XII). Association for Computing Machinery, New York, NY, USA, 175--184.

Digital Library

[16]

Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2009. A Mechanistic Performance Model for Superscalar Out-of-Order Processors. ACM Trans. Comput. Syst. 27, 2, Article 3 (May 2009), 37 pages.

Digital Library

[17]

Stijn Eyerman, Wim Heirman, Kristof Du Bois, and Ibrahim Hur. 2018. MultiStage CPI Stacks. IEEE Computer Architecture Letters 17, 1 (2018), 55--58.

Digital Library

[18]

B.A. Fields, R. Bodik, M.D. Hill, and C.J. Newburn. 2003. Using interaction costs for microarchitectural bottleneck analysis. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. 228--239.

[19]

B. Fields, R. Bodik, and M. D. Hill. 2002. Slack: maximizing performance under technological constraints. In Proceedings 29th Annual International Symposium on Computer Architecture. 47--58.

[20]

Brian Fields, Shai Rubin, and Rastislav Bodík. 2001. Focusing Processor Policies via Critical-Path Prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture (Göteborg, Sweden) (ISCA '01). Association for Computing Machinery, New York, NY, USA, 74--85.

Digital Library

[21]

D. Genbrugge, S. Eyerman, and L. Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). IEEE Computer Society, Los Alamitos, CA, USA.

[22]

Simcha Gochman, Ronny Ronen, Ittai Anati, Ariel Berkovits, Tsvika Kurts, Alon Naveh, Amer Saeed, Zeev Sperber, and Raymond D. Valentine. 2003. The Intel Pentium M processor: Microarchitecture and performance. In Intel Technology Journal, Vol. 07. Intel Corp., 21--36. Issue 02.

[23]

Hossein Golestani, Gagan Gupta, and Rathijit Sen. 2019. Performance Modeling and Bottleneck Analysis of EDGE Processors Using Dependence Graphs. IEEE Computer Architecture Letters 18, 1 (2019), 79--82.

[24]

Thomas Grass, César Allande, Adrià Armejach, Alejandro Rico, Eduard Ayguadé, Jesus Labarta, Mateo Valero, Marc Casas, and Miquel Moreto. 2016. MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines. In SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 526--537.

[25]

Anthony Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 13--22.

[26]

Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. SimPoint 3.0: Faster and More Flexible Program Phase Analysis. J. Instruction-Level Parallelism 7 (2005).

[27]

M. Jahre and L. Eeckhout. 2018. GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 296--309.

[28]

Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, and Krste Asanovic. 2018. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 29--42.

Digital Library

[29]

Changkyu Kim, Simha Sethumadhavan, M.S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W. Keckler. 2007. Composable Lightweight Processors. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). 381--394.

Digital Library

[30]

Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. 1996. Value Locality and Load Value Prediction (ASPLOS VII). Association for Computing Machinery, New York, NY, USA, 138--147.

Digital Library

[31]

Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Anthony Gutierrez, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Miquel Moreto, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, William Wang, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, and Éder F. Zulian. 2020. The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs.AR]

[32]

Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective Compiler Support for Predicated Execution Using the Hyperblock. SIGMICRO Newsl. 23, 1--2 (Dec. 1992), 45--54.

Digital Library

[33]

R. Nagarajan, Xia Chen, R.G. McDonald, D. Burger, and S.W. Keckler. 2006. Critical path analysis of the TRIPS architecture. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. 37--47.

[34]

A. V. Nori, J. Gaur, S. Rai, S. Subramoney, and H. Wang. 2018. Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 96--109.

Digital Library

[35]

Anant Vithal Nori, Jayesh Gaur, Siddharth Rai, Sreenivas Subramoney, and Hong Wang. 2018. Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 96--109.

Digital Library

[36]

Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam. 2015. Exploring the Potential of Heterogeneous von Neumann/Dataflow Execution Models. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA '15). Association for Computing Machinery, New York, NY, USA, 298--310.

Digital Library

[37]

Tony Nowatzki, Venkatraman Govindaraju, and Karthikeyan Sankaralingam. 2015. Studying Hybrid Von-Neumann/Dataflow Execution Models. Tech Report. Computer Sciences, University of Wisconsin-Madison.

[38]

David B. Papworth. 1996. Tuning the Pentium Pro Microarchitecture. IEEE Micro 16, 2 (April 1996), 8--15.

Digital Library

[39]

Avadh Patel, Furat Afram, and Kanad Ghose. 2011. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In 1st International Qemu Users' Forum. 29--30.

[40]

Arun Rangasamy, Rahul Nagpal, and Y.N. Srikant. 2008. Compiler-Directed Frequency and Voltage Scaling for a Multiple Clock Domain Microarchitecture. In Proceedings of the 5th Conference on Computing Frontiers (Ischia, Italy) (CF '08). Association for Computing Machinery, New York, NY, USA, 209--218.

Digital Library

[41]

Arun Rangasamy and Y. N. Srikant. 2011. Evaluation of Dynamic Voltage and Frequency Scaling for Stream Programs. In Proceedings of the 8th ACM International Conference on Computing Frontiers (Ischia, Italy) (CF '11). Association for Computing Machinery, New York, NY, USA, Article 40, 10 pages.

Digital Library

[42]

Alejandro Rico, Felipe Cabarcas, Antonio Quesada, Milan Pavlovic, Augusto Javier Vega, Carlos Villavieja, Yoav Etsion, and Alex Ramirez. 2010. Scalable simulation of decoupled accelerator architectures. Universitat Politecnica de Catalunya, Tech. Rep. UPCDAC-RR-2010-14 (2010).

[43]

Behnam Robatmili, Sibi Govindan, Doug Burger, and Stephen W. Keckler. 2011. Exploiting criticality to reduce bottlenecks in distributed uniprocessors. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 431--442.

[44]

Ali G. Saidi, Nathan L. Binkert, Steven K. Reinhardt, and Trevor Mudge. 2009. End-to-End Performance Forecasting: Finding Bottlenecks before They Happen. In Proceedings of the 36th Annual International Symposium on Computer Architecture (Austin, TX, USA) (ISCA '09). Association for Computing Machinery, New York, NY, USA, 361--370.

Digital Library

[45]

Pierre Salverda, Charles Tu ker, and Craig Zilles. 2008. Accurate Critical Path Prediction via Random Trace Construction. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (Boston, MA, USA) (CGO '08). Association for Computing Machinery, New York, NY, USA, 64--73.

Digital Library

[46]

P. Salverda and C. Zilles. 2005. A criticality analysis of clustering in superscalar processors. In 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05). 12 pp.-66.

Digital Library

[47]

Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems (ISCA '13). Association for Computing Machinery, New York, NY, USA, 475--486.

Digital Library

[48]

Robert Sedgewick and Kevin Wayne. 2011. Algorithms (4th edn).

Digital Library

[49]

R. Sheikh and D. Hower. 2019. Efficient Load Value Prediction Using Multiple Predictors and Filters. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 454--465.

[50]

Teruo Tanimoto, Takatsugu Ono, Koji Inoue, and Hiroshi Sasaki. 2017. Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors. IEEE Computer Architecture Letters 16, 2 (2017), 111--114.

[51]

Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. Technical Report UCB/EECS-2014-54. EECS Department, University of California, Berkeley.

[52]

Thomas Wenisch and Roland Wunderlich. 2005. SimFlex: Fast, accurate and flexible simulation of computer systems. In Proceedings of the Tutorial in the International Symposium on Microarchitecture (MICRO-38).

[53]

Ahmad Yasin.2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.

[54]

Matt T. Yourst. 2007. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator. In 2007 IEEE International Symposium on Performance Analysis of Systems Software. 23--34.

Cited By

Zheng YHan CZhang TZhang FWang J(2024)A dependence graph pattern mining method for processor performance analysisPerformance Evaluation10.1016/j.peva.2024.102409164:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.peva.2024.102409
Bai CSun QZhai JMa YYu BWong M(2023)BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space ExplorationACM Transactions on Design Automation of Electronic Systems10.1145/363001329:1(1-23)Online publication date: 18-Dec-2023
https://dl.acm.org/doi/10.1145/3630013

Index Terms

Calipers: a criticality-aware framework for modeling processor performance
1. Computer systems organization
  1. Architectures
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis

Recommendations

Performance Gain of a Data Flow Oriented ISA as Replacement for Java Bytecode
Architecture of Computing Systems
Abstract
Java Bytecode is used as binary format for a number of programming languages and programming systems. Since Java virtual machines exist for many platforms, it can be regarded as a universal execution format. Consequently, several hardware ...
Temporal isolation on multiprocessing architectures
DAC '11: Proceedings of the 48th Design Automation Conference

Multiprocessing architectures provide hardware for executing multiple tasks simultaneously via techniques such as simultaneous multithreading and symmetric multiprocessing. The problem addressed by this paper is that even when tasks that are executing ...
A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors

The slowdown in technology scaling puts architectural features at the forefront of the innovation in modern processors. This article presents a Metric-Guided Method (MGM) that extends Top-Down analysis with carefully selected, dynamically adapted ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

June 2022

514 pages

ISBN:9781450392815

DOI:10.1145/3524059

General Chairs:
Lawrence Rauchwerger
University of Illinois at Urbana-Champaign
,
Kirk Cameron
Virginia Tech
,
Program Chairs:
Dimitrios S. Nikolopoulos
Virginia Tech
,
Dionisios Pnevmatikatos
National Technical University of Athens

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS '22

Sponsor:

SIGARCH

ICS '22: 2022 International Conference on Supercomputing

June 28 - 30, 2022

Virtual Event

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
221
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)8

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng YHan CZhang TZhang FWang J(2024)A dependence graph pattern mining method for processor performance analysisPerformance Evaluation10.1016/j.peva.2024.102409164:COnline publication date: 1-May-2024
https://dl.acm.org/doi/10.1016/j.peva.2024.102409
Bai CSun QZhai JMa YYu BWong M(2023)BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space ExplorationACM Transactions on Design Automation of Electronic Systems10.1145/363001329:1(1-23)Online publication date: 18-Dec-2023
https://dl.acm.org/doi/10.1145/3630013

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents