[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1854273.1854308acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Criticality-driven superscalar design space exploration

Published: 11 September 2010 Publication History

Abstract

It has become increasingly difficult to perform design space exploration (DSE) of computer systems with a short turnaround time because of exploding design spaces, increasing design complexity and long-running workloads. Researchers have used classical search/optimization techniques like simulated annealing, genetic algorithms, etc., to accelerate the DSE. While these techniques are better than an exhaustive search, a substantial amount of time must still be dedicated to DSE. This is a serious bottleneck in reducing research/development time. These techniques do not perform the DSE quickly enough, primarily because they do not leverage any insight as to how the different design parameters of a computer system interact to increase or degrade performance at a design point and treat the computer system as a "black-box".
We propose using criticality analysis to guide the classical search/optimization techniques. We perform criticality analysis to find the design parameter which is most detrimental to the performance at a given design point. Criticality analysis at a given design point provides a localized view of the region around the design point without performing simulations at the neighboring points. On the other hand, a classical search/optimization technique has a global view of the design space and avoids getting stuck at a local maximum. We use this synergistic behavior between the criticality analysis (good locally) and the classical search/optimization techniques (good globally) to accelerate the DSE.
For the DSE of superscalar processors on SPEC 2000 benchmarks, on average, criticality-driven walk achieves 3.8x speedup over random walk and criticality-driven simulated annealing achieves 2.3x speedup over simulated annealing.

References

[1]
}}M. Agarwal, N. Navale, K. Malik, M. I. Frank. Fetch-Criticality Reduction through Control Independence, In ISCA 2008.
[2]
}}T. Austin, E. Larson, D. Ernst. SimpleScalar: An Infrastructure for Computer System Modeling, IEEE Micro, Feb 2002.
[3]
}}E. Borch, E. Tune, S. Manne, J. Emer. Loose Loops Sink Chips, In HPCA 2002.
[4]
}}D. Brooks, V. Tiwari, M. Martonosi. Wattch: A Framework for Architectural-Level Power Analysis and Optimizations, In ISCA 2000.
[5]
}}G. Cai, K. Chow, T. Nakanishi, J. Hall, M. Barany. Multivariate Power/Performance Analysis for High Performance Mobile Microprocessor Design, In Power Driven Microarchitecture Workshop, June 1998.
[6]
}}N. Choudhary, S. Wadhavkar, T. Shah, S. Navada, H. Hashemi, E. Rotenberg. FabScalar, In WARP-2009.
[7]
}}K. Chow, J. Ding. Multivariate Analysis of Pentium Pro Processor, In Intel Software Developers Conference, Oct. 1997.
[8]
}}T. Conte. Systematic Computer Architecture Prototyping, PhD. Thesis, Department of Electrical Engineering, UIUC, 1992.
[9]
}}C. Dubach, T. M. Jones, M. O'Boyle. Microarchitectural Design Space Exploration using An Architecture-Centric Approach, In MICRO 2007.
[10]
}}S. Eyerman, L. Eeckhout, K. De Bosschere. Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors, In DATE 2006.
[11]
}}B. Fields, S. Rubin, R. Bodik. Focusing Processor Policies via Critical-Path Prediction, In ISCA 2001.
[12]
}}B. Fields, R. Bodik, M. D. Hill, C. J. Newburn. Interaction Cost: For When Event Counts Just Don't Add Up, IEEE Micro, Nov. 2004.
[13]
}}E. Grochowski, R. Ronen, J. Shen, H. Wang. Best of Both Latency and Throughput, In ICCD 2004.
[14]
}}H. Hashemi Najaf-abadi, E. Rotenberg. Configurational Workload Characterization, In ISPASS 2008.
[15]
}}L Ingber, B Rosen. Genetic algorithms and Very Fast Simulated Reannealing: A Comparison, Mathematical and Computer Modelling, 1992.
[16]
}}E. 0pek, S.A. McKee, B.R. de Supinski, M. Schulz, R. Caruana. Efficiently Exploring Architectural Design Spaces via Predictive Modeling, In ASPLOS 2006.
[17]
}}P. J. Joseph, K. Vaswani, M. Thazhuthaveetil. A Predictive Perfomance Model for Superscalar Processors, In MICRO 2006.
[18]
}}S. Kang, R. Kumar. Magellan: A Framework for Fast Muti-core Design Space Exploration and Optimization Using Search and Machine Learning, In DATE 2008.
[19]
}}T. Karkhanis, J. E. Smith. A First-Order Superscalar Processor Model, In ISCA 2004.
[20]
}}T. Karkhanis, J. E. Smith. Automated Design of Application-Specific Superscalar Processors, In ISCA 2007.
[21]
}}T. Karkhanis. Automated Design of Application-Specific Processors, PhD. Thesis, Department of Electrical Engineering, University of Wisconsin-Madison, 2006.
[22]
}}H. Kannan, M. Budiu, J. Davis, G. Venkataramani. Tuning SOCs using the Dynamic Critical Path, In SOCC 2009.
[23]
}}R. Kumar, D. Tullsen, N. Jouppi. Core Architecture Optimization for Heterogeneous Chip Multiprocessors, In PACT 2006.
[24]
}}P. Laarhoven, E. Aarts. Simulated Annealing: Theory and Applications, Springer, 1987.
[25]
}}B. Lee, D. Brooks. Accurate and Efficient Regression Modeling for Microarchitectural Performance and Power Prediction, In ASPLOS 2006.
[26]
}}R. Nagarajan, X. Chen, R. G. McDonald, D. Burger, S. W. Keckler. Critical Path Analysis of the TRIPS architecture, In ISPASS 2006.
[27]
}}S. Nussbaum, J. E. Smith. Modeling Superscalar Processors via Statistical Simulation, In PACT 2001.
[28]
}}E. Rich, K. Knight. Artificial Intelligence, 2nd Edition. Morgan Kaufmann, 1991.
[29]
}}A. Saidi, N. Binkert, T. N. Mudge, S. K. Reinhardt. Full System Critical Path Analysis, In ISPASS 2008.
[30]
}}A. Saidi, N. Binkert, S. K. Reinhardt, T. N. Mudge. End-To-End Performance Forecasting: Finding Bottlenecks before They Happen, In ISCA 2009.
[31]
}}T. Sherwood, E. Perelman, G. Hamerly, B. Calder. Automatically Characterizing Large Scale Program Behavior, In ASPLOS 2002.
[32]
}}P. Shivakumar, N.P. Jouppi. Cacti 3.0: An Integrated Cache Timing, Power and Area model, Technical report, 2001.
[33]
}}The Standard Performance Evaluation Corporation, http://spec.org
[34]
}}S. Subramaniam, A. Bracy, H. Wang, G. Loh. Criticality-Based Optimizations for Efficient Load Processing, In HPCA 2009.
[35]
}}P. Salverda, C. Zilles. A Criticality Analysis of Clustering in Superscalar Processors, In MICRO 2005.
[36]
}}E. Tune, D. Tullsen, B. Calder. Quantifying Instruction Criticality, In PACT 2002.
[37]
}}R. E. Wunderlich, T. F. Wenisch, B. Falsafi, J. C. Hoe. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling, In ISCA 2003.
[38]
}}J. Yi, D. Lilja, D. Hawkins. A Statistically Rigorous Approach for Improving Simulation Methodology, In HPCA 2003.

Cited By

View all
  • (2020)CSMO-DSEACM Journal on Emerging Technologies in Computing Systems10.1145/337140616:2(1-22)Online publication date: 30-Jan-2020
  • (2018)A Parallel Algorithm for Instruction Dependence Graph Analysis Based on Multithreading2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00108(716-721)Online publication date: Dec-2018
  • (2016)Improving performance per Watt of non-monotonic Multicore Processors via bottleneck-based online program phase classification2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753337(528-535)Online publication date: Oct-2016
  • Show More Cited By

Index Terms

  1. Criticality-driven superscalar design space exploration

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
    September 2010
    596 pages
    ISBN:9781450301787
    DOI:10.1145/1854273
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 September 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bottleneck analysis
    2. criticality model
    3. design space exploration
    4. simulated annealing
    5. superscalar processors

    Qualifiers

    • Research-article

    Conference

    PACT '10
    Sponsor:
    • IFIP WG 10.3
    • IEEE CS TCPP
    • SIGARCH
    • IEEE CS TCAA

    Acceptance Rates

    Overall Acceptance Rate 121 of 471 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)CSMO-DSEACM Journal on Emerging Technologies in Computing Systems10.1145/337140616:2(1-22)Online publication date: 30-Jan-2020
    • (2018)A Parallel Algorithm for Instruction Dependence Graph Analysis Based on Multithreading2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00108(716-721)Online publication date: Dec-2018
    • (2016)Improving performance per Watt of non-monotonic Multicore Processors via bottleneck-based online program phase classification2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753337(528-535)Online publication date: Oct-2016
    • (2015)HMCPA: Heuristic Method Utilizing Critical Path Analysis for Design Space Exploration of Superscalar MicroprocessorsComputer Engineering and Technology10.1007/978-3-662-45815-0_3(20-35)Online publication date: 2015
    • (2014)Design-effort alloy: Boosting a highly tuned primary core with untuned alternate cores2014 IEEE 32nd International Conference on Computer Design (ICCD)10.1109/ICCD.2014.6974713(408-415)Online publication date: Oct-2014
    • (2013)A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessorsProceedings of the 22nd international conference on Parallel architectures and compilation techniques10.5555/2523721.2523743(133-144)Online publication date: 7-Oct-2013
    • (2013)Jigsaw: scalable software-defined cachesProceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2013.6618811(213-224)Online publication date: Oct-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media