[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

FLASH vs. (simulated) FLASH: closing the simulation loop

Published: 01 November 2000 Publication History

Abstract

Simulation is the primary method for evaluating computer systems during all phases of the design process. One significant problem with simulation is that it rarely models the system exactly, and quantifying the resulting simulator error can be difficult. More importantly, architects often assume without proof that although their simulator may make inaccurate absolute performance predictions, it will still accurately predict architectural trends.This paper studies the source and magnitude of error in a range of architectural simulators by comparing the simulated execution time of several applications and microbenchmarks to their execution time on the actual hardware being modeled. The existence of a hardware gold standard allows us to find, quantify, and fix simulator inaccuracies. We then use the simulators to predict architectural trends and analyze the sensitivity of the results to the simulator configuration. We find that most of our simulators predict trends accurately, as long as they model all of the important performance effects for the application in question. Unfortunately, it is difficult to know what these effects are without having a hardware reference, as they can be quite subtle. This calls into question the value, for architectural studies, of highly detailed simulators whose characteristics are not carefully validated against a real hardware design.

References

[1]
R. Bedichek. Talisman: Fast and Accurate Multicomputer Simulation. Performance Evaluation Review, vol. 23, no. 1, pp. 14-24, May 1995.
[2]
D. C. Burger and T. M. Austin. The SimpleScalar Tool Set, Version 2.0. Computer Architecture News, 25(3), pages 13-25, June 1997.
[3]
M. Durbhakula, V. Pai, and S. Adve. Improving the Speed vs. Accuracy Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors. Rice University ECE Technical Report 9802, June 1998.
[4]
S. Goldschmidt. Simulation of Multiprocessors: Accuradcy and Performance. Ph.D. Dissertation, Stanford University, June 1993.
[5]
M. Heinrich. The Performance and Scalability of Distributed Shared Memory Cache Coherence Protocols. Ph.D. Dissertation, Stanford University, October 1998.
[6]
M. Heinrich, J. Kuskin, D. Ofelt, et al. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 274-285, October 1994.
[7]
M. Heinrich, D. Ofelt, M. Horowitz, and J. Hennessy. Hardware/Software Codesign of the Stanford FLASH Multiprocessor. In Proceedings of the IEEE Special Issue on Hardware/Software Co-design, 85(3), March 1997.
[8]
C. Hristea, D. Lenoski, and J. Keen. Measuring Memory Hierarchy Performance of Cache-Coherent Multiprocessors Using Micro Benchmarks. In Proceedings of Supercomputing 1997, November 1997.
[9]
J. Kuskin, D. Ofelt, M. Heinrich, et al. The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture, pages 302-313, April 1994.
[10]
P.S. Magnusson, F. Dahlgren, H. Grahn, et al. SimICS/sun4m: A VirtualWorstation. In Proceedings of the Usenix Annual Technical Conference, June 1998.
[11]
D. Ofelt. Efficient Performance Prediction for Modern Microprocessors. Ph.D. Dissertation, Stanford University, August 1999.
[12]
M. Martonosi, D. Ofelt, and M. Heinrich. Integrating Performance Monitoring and Communication in Parallel Computers. In 1996 ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, May, 1996.
[13]
L. McVoy and C. Staelin. lmbench: Portable tools for performance analysis. USENIX technical conference, pages 279-284, January 1996.
[14]
V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM Reference Manual version 1.0. Technical Report #9705, Department of Electrical and Computer Engineering, Rice University, August 1997.
[15]
V. S. Pai, P. Ranganathan, and S. V. Adve. The Impact of Instruction Level Parallelism on Multiprocessor Performance and Simulation Methodology. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture, 1997.
[16]
U. Prestor. Snbench homepage, on-line at http://www.cs.utah.edu/~uros/snbench.
[17]
S. K. Reinhardt, M. D. Hill, J. R. Larus, et al. The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers. In ACM SIGMETRICS Conference on Measurement & Modeling of Computer Systems, May 1993.
[18]
M. Rosenblum. Personal Communication.
[19]
M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta. Complete Computer Simulation: The SimOS Approach. IEEE Parallel and Distributed Technology, 3(4):34-43, Winter 1995.
[20]
Standard Performance Evaluation Corporation. The SPEC95 Benchmark Suite. Details on-line at http://www.specbench.org/.
[21]
Stanford Parallel Applications for Shared Memory. SPLASH-2 web page, on-line at http://www-flash.stanford.edu/apps/SPLASH.
[22]
C. Stolte, R. Bosch, P. Hanrahan, and M. Rosenblum. Visualizing Application Behavior on Superscalar Processors. In Proceedings of IEEE Information Visualization, 1999, pages 10-17, 1999.
[23]
M. Talluri and M. Hill. Surpassing the TLB Performance of Superpages with Less Operating System Support. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 171-182, October 1994.
[24]
J. E. Veenstra and R. J. Fowler. MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In Proceedings of the Second International Workshop on Modeling. Analysis, and Simulation of Computer and Telecommunication Systems, pages 201-207, January 1994.
[25]
S. C. Woo, M. Ohara, E. Torrie, et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, June 1995.
[26]
Kenneth Yeager. Personal Communication.
[27]
Kenneth Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28-40, April 1996.

Cited By

View all
  • (2011)HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURESParallel Processing Letters10.1142/S012962641100009621:01(85-106)Online publication date: Mar-2011
  • (2009)Performance Modeling and Measurement TechniquesPerformance Evaluation and Benchmarking10.1201/9781420037425.ch2(5-23)Online publication date: 9-Nov-2009

Index Terms

  1. FLASH vs. (simulated) FLASH: closing the simulation loop

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 35, Issue 11
    Nov. 2000
    269 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/356989
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2000
    Published in SIGPLAN Volume 35, Issue 11

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2011)HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURESParallel Processing Letters10.1142/S012962641100009621:01(85-106)Online publication date: Mar-2011
    • (2009)Performance Modeling and Measurement TechniquesPerformance Evaluation and Benchmarking10.1201/9781420037425.ch2(5-23)Online publication date: 9-Nov-2009

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media