[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3524059.3532390acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Calipers: a criticality-aware framework for modeling processor performance

Published: 28 June 2022 Publication History

Abstract

Computer architecture design space is vast and complex. Tools are needed to explore new ideas and gain insights quickly, at low effort and desired accuracy. Cycle Accurate Simulators (CAS), commonly used to explore computer designs, can be slow and cumbersome to develop. We propose a complementary approach, Calipers, a criticality-based framework, to model key abstractions of complex architectures and a program's execution using dynamic event-dependence graphs. By applying graph algorithms, Calipers can track instruction and event dependencies, compute critical paths, and analyze architecture bottlenecks. By manipulating the graph, Calipers enables investigation of a wide range of Instruction Set Architecture (ISA) and microarchitecture design choices/"what-if" scenarios during both early- and late-stage design space exploration without recompiling and rerunning the program. Calipers can model in-order and out-of-order microarchitectures, structural hazards, and different types of ISAs, and can evaluate multiple ideas in a single run. Related modeling algorithms are described in detail. We apply Calipers to explore and gain insights in a variety of complex microarchitectural and ISA ideas for RISC and EDGE processors, at lower effort than CAS and with comparable accuracy. For example, experiments show that targeting only a fraction of critical loads can help realize most benefits of value prediction.

References

[1]
2022. Arm Research Starter Kit: System Modeling using gem5. https://github.com/arm-university/arm-gem5-rsk/blob/master/gem5_rsk_gem5-21.2.pdf.
[2]
2022. C++ set emplace_hint. https://www.cplusplus.com/reference/set/set/emplace_hint/.
[3]
2022. gem5 O3CPU. https://www.gem5.org/documentation/general_docs/cpu_models/O3CPU.
[4]
2022. Google Scholar, gem5. https://scholar.google.com/scholar?q=gem5.
[5]
2022. Intel Skylake Microarchitecture. https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client).
[6]
2022. Intel Sunny Cove Microarchitecture. https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove.
[7]
Grant Ayers, Nayana Prasad Nagendra, David I. August, Hyoun Kyu Cho, Svilen Kanev, Christos Kozyrakis, Trivikram Krishnamurthy, Heiner Litz, Tipp Moseley, and Parthasarathy Ranganathan. 2019. AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 462--473.
[8]
Sumeet Bandishte, Jayesh Gaur, Zeev Sperber, Lihu Rappoport, Adi Yoaz, and Sreenivas Subramoney. 2020. Focused Value Prediction. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture (Virtual Event) (ISCA '20). IEEE Press, 79--91.
[9]
Nathan L. Binkert, Bradford M. Beckmann, Gabriel Black, Steven K. Reinhardt, Ali G. Saidi, Arkaprava Basu, Joel Hestness, Derek Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib Bin Altaf, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (Aug. 2011), 1--7.
[10]
Maximilien B. Breughe, Stijn Eyerman, and Lieven Eeckhout. 2015. Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance. ACM Trans. Archit. Code Optim. 11, 4, Article 50 (Jan. 2015), 26 pages.
[11]
D. Burger, S. W. Keckler, K. S. McKinley, M. Dahlin, L. K. John, C. Lin, C. R. Moore, J. Burrill, R. G. McDonald, and W. Yoder. 2004. Scaling to the end of silicon with EDGE architectures. Computer 37, 7 (July 2004), 44--55.
[12]
Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Transactions on Architecture and Code Optimization (TACO), Article 5 (2014), 23 pages.
[13]
Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of High-Level Mechanistic Core Models. ACM Trans. Archit. Code Optim. 11, 3, Article 28 (Aug. 2014), 25 pages.
[14]
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2022. Introduction to algorithms. MIT press.
[15]
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2006. A Performance Counter Architecture for Computing Accurate CPI Components. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, California, USA) (ASPLOS XII). Association for Computing Machinery, New York, NY, USA, 175--184.
[16]
Stijn Eyerman, Lieven Eeckhout, Tejas Karkhanis, and James E. Smith. 2009. A Mechanistic Performance Model for Superscalar Out-of-Order Processors. ACM Trans. Comput. Syst. 27, 2, Article 3 (May 2009), 37 pages.
[17]
Stijn Eyerman, Wim Heirman, Kristof Du Bois, and Ibrahim Hur. 2018. MultiStage CPI Stacks. IEEE Computer Architecture Letters 17, 1 (2018), 55--58.
[18]
B.A. Fields, R. Bodik, M.D. Hill, and C.J. Newburn. 2003. Using interaction costs for microarchitectural bottleneck analysis. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. 228--239.
[19]
B. Fields, R. Bodik, and M. D. Hill. 2002. Slack: maximizing performance under technological constraints. In Proceedings 29th Annual International Symposium on Computer Architecture. 47--58.
[20]
Brian Fields, Shai Rubin, and Rastislav Bodík. 2001. Focusing Processor Policies via Critical-Path Prediction. In Proceedings of the 28th Annual International Symposium on Computer Architecture (Göteborg, Sweden) (ISCA '01). Association for Computing Machinery, New York, NY, USA, 74--85.
[21]
D. Genbrugge, S. Eyerman, and L. Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). IEEE Computer Society, Los Alamitos, CA, USA.
[22]
Simcha Gochman, Ronny Ronen, Ittai Anati, Ariel Berkovits, Tsvika Kurts, Alon Naveh, Amer Saeed, Zeev Sperber, and Raymond D. Valentine. 2003. The Intel Pentium M processor: Microarchitecture and performance. In Intel Technology Journal, Vol. 07. Intel Corp., 21--36. Issue 02.
[23]
Hossein Golestani, Gagan Gupta, and Rathijit Sen. 2019. Performance Modeling and Bottleneck Analysis of EDGE Processors Using Dependence Graphs. IEEE Computer Architecture Letters 18, 1 (2019), 79--82.
[24]
Thomas Grass, César Allande, Adrià Armejach, Alejandro Rico, Eduard Ayguadé, Jesus Labarta, Mateo Valero, Marc Casas, and Miquel Moreto. 2016. MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines. In SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 526--537.
[25]
Anthony Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 13--22.
[26]
Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. 2005. SimPoint 3.0: Faster and More Flexible Program Phase Analysis. J. Instruction-Level Parallelism 7 (2005).
[27]
M. Jahre and L. Eeckhout. 2018. GDP: Using Dataflow Properties to Accurately Estimate Interference-Free Performance at Runtime. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 296--309.
[28]
Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan Bachrach, and Krste Asanovic. 2018. FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 29--42.
[29]
Changkyu Kim, Simha Sethumadhavan, M.S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W. Keckler. 2007. Composable Lightweight Processors. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). 381--394.
[30]
Mikko H. Lipasti, Christopher B. Wilkerson, and John Paul Shen. 1996. Value Locality and Load Value Prediction (ASPLOS VII). Association for Computing Machinery, New York, NY, USA, 138--147.
[31]
Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi, Dibakar Gope, Thomas Grass, Anthony Gutierrez, Bagus Hanindhito, Andreas Hansson, Swapnil Haria, Austin Harris, Timothy Hayes, Adrian Herrera, Matthew Horsnell, Syed Ali Raza Jafri, Radhika Jagtap, Hanhwi Jang, Reiley Jeyapaul, Timothy M. Jones, Matthias Jung, Subash Kannoth, Hamidreza Khaleghzadeh, Yuetsu Kodama, Tushar Krishna, Tommaso Marinelli, Christian Menard, Andrea Mondelli, Miquel Moreto, Tiago Mück, Omar Naji, Krishnendra Nathella, Hoa Nguyen, Nikos Nikoleris, Lena E. Olson, Marc Orr, Binh Pham, Pablo Prieto, Trivikram Reddy, Alec Roelke, Mahyar Samani, Andreas Sandberg, Javier Setoain, Boris Shingarov, Matthew D. Sinclair, Tuan Ta, Rahul Thakur, Giacomo Travaglini, Michael Upton, Nilay Vaish, Ilias Vougioukas, William Wang, Zhengrong Wang, Norbert Wehn, Christian Weis, David A. Wood, Hongil Yoon, and Éder F. Zulian. 2020. The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs.AR]
[32]
Scott A. Mahlke, David C. Lin, William Y. Chen, Richard E. Hank, and Roger A. Bringmann. 1992. Effective Compiler Support for Predicated Execution Using the Hyperblock. SIGMICRO Newsl. 23, 1--2 (Dec. 1992), 45--54.
[33]
R. Nagarajan, Xia Chen, R.G. McDonald, D. Burger, and S.W. Keckler. 2006. Critical path analysis of the TRIPS architecture. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. 37--47.
[34]
A. V. Nori, J. Gaur, S. Rai, S. Subramoney, and H. Wang. 2018. Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 96--109.
[35]
Anant Vithal Nori, Jayesh Gaur, Siddharth Rai, Sreenivas Subramoney, and Hong Wang. 2018. Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 96--109.
[36]
Tony Nowatzki, Vinay Gangadhar, and Karthikeyan Sankaralingam. 2015. Exploring the Potential of Heterogeneous von Neumann/Dataflow Execution Models. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (Portland, Oregon) (ISCA '15). Association for Computing Machinery, New York, NY, USA, 298--310.
[37]
Tony Nowatzki, Venkatraman Govindaraju, and Karthikeyan Sankaralingam. 2015. Studying Hybrid Von-Neumann/Dataflow Execution Models. Tech Report. Computer Sciences, University of Wisconsin-Madison.
[38]
David B. Papworth. 1996. Tuning the Pentium Pro Microarchitecture. IEEE Micro 16, 2 (April 1996), 8--15.
[39]
Avadh Patel, Furat Afram, and Kanad Ghose. 2011. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In 1st International Qemu Users' Forum. 29--30.
[40]
Arun Rangasamy, Rahul Nagpal, and Y.N. Srikant. 2008. Compiler-Directed Frequency and Voltage Scaling for a Multiple Clock Domain Microarchitecture. In Proceedings of the 5th Conference on Computing Frontiers (Ischia, Italy) (CF '08). Association for Computing Machinery, New York, NY, USA, 209--218.
[41]
Arun Rangasamy and Y. N. Srikant. 2011. Evaluation of Dynamic Voltage and Frequency Scaling for Stream Programs. In Proceedings of the 8th ACM International Conference on Computing Frontiers (Ischia, Italy) (CF '11). Association for Computing Machinery, New York, NY, USA, Article 40, 10 pages.
[42]
Alejandro Rico, Felipe Cabarcas, Antonio Quesada, Milan Pavlovic, Augusto Javier Vega, Carlos Villavieja, Yoav Etsion, and Alex Ramirez. 2010. Scalable simulation of decoupled accelerator architectures. Universitat Politecnica de Catalunya, Tech. Rep. UPCDAC-RR-2010-14 (2010).
[43]
Behnam Robatmili, Sibi Govindan, Doug Burger, and Stephen W. Keckler. 2011. Exploiting criticality to reduce bottlenecks in distributed uniprocessors. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture. 431--442.
[44]
Ali G. Saidi, Nathan L. Binkert, Steven K. Reinhardt, and Trevor Mudge. 2009. End-to-End Performance Forecasting: Finding Bottlenecks before They Happen. In Proceedings of the 36th Annual International Symposium on Computer Architecture (Austin, TX, USA) (ISCA '09). Association for Computing Machinery, New York, NY, USA, 361--370.
[45]
Pierre Salverda, Charles Tu ker, and Craig Zilles. 2008. Accurate Critical Path Prediction via Random Trace Construction. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (Boston, MA, USA) (CGO '08). Association for Computing Machinery, New York, NY, USA, 64--73.
[46]
P. Salverda and C. Zilles. 2005. A criticality analysis of clustering in superscalar processors. In 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05). 12 pp.-66.
[47]
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems (ISCA '13). Association for Computing Machinery, New York, NY, USA, 475--486.
[48]
Robert Sedgewick and Kevin Wayne. 2011. Algorithms (4th edn).
[49]
R. Sheikh and D. Hower. 2019. Efficient Load Value Prediction Using Multiple Predictors and Filters. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). 454--465.
[50]
Teruo Tanimoto, Takatsugu Ono, Koji Inoue, and Hiroshi Sasaki. 2017. Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors. IEEE Computer Architecture Letters 16, 2 (2017), 111--114.
[51]
Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. Technical Report UCB/EECS-2014-54. EECS Department, University of California, Berkeley.
[52]
Thomas Wenisch and Roland Wunderlich. 2005. SimFlex: Fast, accurate and flexible simulation of computer systems. In Proceedings of the Tutorial in the International Symposium on Microarchitecture (MICRO-38).
[53]
Ahmad Yasin.2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35--44.
[54]
Matt T. Yourst. 2007. PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator. In 2007 IEEE International Symposium on Performance Analysis of Systems Software. 23--34.

Cited By

View all
  • (2024)A dependence graph pattern mining method for processor performance analysisPerformance Evaluation10.1016/j.peva.2024.102409164:COnline publication date: 1-May-2024
  • (2023)BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space ExplorationACM Transactions on Design Automation of Electronic Systems10.1145/363001329:1(1-23)Online publication date: 18-Dec-2023

Index Terms

  1. Calipers: a criticality-aware framework for modeling processor performance

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing
      June 2022
      514 pages
      ISBN:9781450392815
      DOI:10.1145/3524059
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. bottleneck analysis
      2. instruction set architecture
      3. microarchitecture
      4. performance modeling

      Qualifiers

      • Research-article

      Conference

      ICS '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 629 of 2,180 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)87
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A dependence graph pattern mining method for processor performance analysisPerformance Evaluation10.1016/j.peva.2024.102409164:COnline publication date: 1-May-2024
      • (2023)BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space ExplorationACM Transactions on Design Automation of Electronic Systems10.1145/363001329:1(1-23)Online publication date: 18-Dec-2023

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media