[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

FaCSim: a fast and cycle-accurate architecture simulator for embedded systems

Published: 12 June 2008 Publication History

Abstract

There have been strong demands for a fast and cycle-accurate virtual platforms in the embedded systems area where developers can do meaningful software development including performance debugging in the context of the entire platform. In this paper, we describe the design and implementation of a fast and cycle-accurate architecture simulator called FaCSim as a first step towards such a virtual platform. FacSim accurately models the ARM9E-S processor core and ARM926EJ-S processor's memory subsystem. It accurately simulates exceptions and interrupts to enable whole-system simulation including the OS. Since it is implemented in a modular manner in C++, it can be easily extended with other system components by subclassing or adding new classes. FaCSim is based on an interpretive simulation technique to provide flexibility, yet achieving high speed. It enables fast cycle-accurate architecture simulation by means of three mechanisms. First, it computes elapsed cycles in each pipeline stage as a chunk and incrementally adds it up to advance the core clock instead of performing cycle-by-cycle simulation. Second, it uses a basic-block cache that caches decoded instructions at the basic-block level. Finally, it is parallelized to exploit multicore systems that are available everywhere these days. Using 21 applications from the EEMBC benchmark suite, FaCSim's accuracy is validated against the ARM926EJ-S development board from ARM, and is accurate in a ±7% error margin. Due to basic-block level caching and parallelization, FaCSim is, on average, more than three times faster than ARMulator and more than six times faster than SimpleScalar.

References

[1]
Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI ?00: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 1--12, New York, NY, USA, 2000. ACM Press.
[2]
Martin Burtscher and Ilya Ganusov. Automatic synthesis of highspeed processor simulators. In MICRO 37: Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, pages 55--66, Washington, DC, USA, 2004. IEEE Computer Society.
[3]
Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil A. Patil, William Reinhart, Darrel Eric Johnson, Jebediah Keefe, and Hari Angepat. FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators. In MICRO ?07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 249--261, Washington, DC, USA, 2007. IEEE Computer Society.
[4]
Bob Cmelik and David Keppel. Shade: a fast instruction-set simulator for execution profiling. In SIGMETRICS ?94: Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 128--137, New York, NY, USA, 1994. ACM Press.
[5]
The Embedded Microprocessor Benchmark Consortium. EEMBC Benchmark Suite. http://www.eembc.com, 2008.
[6]
James Donald and Margaret Martonosi. An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation. IEEE Computer Architecture Letters, 5(2):14--14, August 2006.
[7]
Joel Emer, Pritpal Ahuja, Eric Borch, Artur Klauser, Chi-Keung Luk, Srilatha Manne, Shubhendu S. Mukherjee, Harish Patil, Steven Wallace, Nathan Binkert, Roger Espasa, and Toni Juan. Asim: A Performance Model Framework. Computer, 35(2):68--76, 2002.
[8]
Lei Gao, Stefan Kraemer, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. A fast and generic hybrid simulation approach using c virtual machine. In CASES ?07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, pages 3--12, New York, NY, USA, 2007. ACM.
[9]
Intel. VTune Performance Analyzer. http://www.intel.com, 2008.
[10]
K. H. (Kane) Kim, Juan A. Colmenares, and Kee-Wook Rim. Efficient adaptations of the non-blocking buffer for event message communication. In ISORC?07: Proceedings of the 10th IEEE 15th International Symposium on Object and Component Oriented Real-Time Distributed Computing, May 2007.
[11]
Stefan Kraemer, Lei Gao, Jan Weinstock, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. Hysim: a fast simulation framework for embedded software development. In CODES+ISSS ?07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pages 75--80, New York, NY, USA, 2007. ACM.
[12]
Gary Lauterbach. Accelerating architectural simulation by parallel execution of trace samples. Technical report, Mountain View, CA, USA, 1993.
[13]
ARM Limited. ARM926EJ-S Techinical Reference Manual, 2003. http://infocenter.arm.com.
[14]
ARM Limited. ARM9E-S Core Techinical Reference Manual, 2004. http://infocenter.arm.com.
[15]
ARM Limited. ARM Architecture Reference Manual, 2005. http://infocenter.arm.com.
[16]
ARM Limited. Verstile Application Baseboard for ARM926EJ-S User Guide, 2006. http://infocenter.arm.com.
[17]
ARM Limited. RealView ARMulator ISS User Guide, Version 1.4.3, 2007. http://infocenter.arm.com.
[18]
LISA - Language for Instruction Set Architecture. http://www.iss.rwth-aachen.de/lisa/, 2001.
[19]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI ?05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM Press.
[20]
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Høallberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. Simics: A full system simulation platform. Computer, 35(2):50--58, 2002.
[21]
Carl J. Mauer, Mark D. Hill, and David A. Wood. Full-system timing-first simulation. In SIGMETRICS?02: Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 108--116, New York, NY, USA, June 2002. ACM.
[22]
Christopher Mills, Stanley C. Ahalt, and Jim Fowler. Compiled instruction set simulation. Software, Practice and Experience, 21(8):877--889, 1991.
[23]
Mayan Moudgill, John-David Wellman, and Jaime H. Moreno. Environment for PowerPC Microarchitecture Exploration. IEEE Micro, 19(3):15--25, May/Jun 1999.
[24]
Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, and Andreas Hoffmann. A universal technique for fast and flexible instruction-set architecture simulation. In DAC ?02: Proceedings of the 39th conference on Design automation, pages 22--27, New York, NY, USA, 2002. ACM Press.
[25]
David A. Penry, Daniel Fay, David Hodgdon, Ryan Wells, Graham Schelle, David I. August, and Dan Connors. Exploiting Parallelism and Structure to Accelerate the Simulation of Chip Multi-processors. In HPCA ?06: Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pages 27--38, Feburary 2006.
[26]
M. Poncino and Jianwen Zhu. Dynamosim: a trace-based dynamically compiled instruction set simulator. In ICCAD ?04: Proceedings of the 2004 IEEE/ACM International conference on Computeraided design, pages 131--136, Washington, DC, USA, 2004. IEEE Computer Society.
[27]
QEMU. http://fabrice.bellard.free.fr/qemu/, 2008.
[28]
Wei Qin, Joseph D?Errico, and Xinping Zhu. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In CODES+ISSS ?06: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, pages 193--198, New York, NY, USA, 2006. ACM Press.
[29]
Mehrdad Reshadi, Prabhat Mishra, and Nikil Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In DAC ?03: Proceedings of the 40th conference on Design automation, pages 758--763, New York, NY, USA, 2003. ACM Press.
[30]
Mendel Rosenblum, Stephen A. Herrod, Emmett Witchel, and Anoop Gupta. Complete computer system simulation: The simos approach. IEEE Parallel Distrib. Technol., 3(4):34--43, 1995.
[31]
Eric Schnarr and James R. Larus. Fast out-of-order processor simulation using memoization. In ASPLOS-VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, pages 283--294, New York, NY, USA, 1998. ACM Press.
[32]
Kevin Scott and Jack Davidson. Strata: A software dynamic translation infrastructure. Technical report, Charlottesville, VA, USA, 2001.
[33]
SESC: SuperESCalar Simulator. http://iacoma.cs.uiuc.edu/~paulsack/sescdoc/, 2002.
[34]
SimpleScalar. http://www.simplescalar.com, 2004.
[35]
Infineon Technologies. HYB39S512400T(L), HYB39S512800T(L), HYB39S512160T(L) 512-Mbit Synchronous DRAM Data Sheet, Rev. 1.3, 2003. http://www.infineon.com.
[36]
Philippas Tsigas and Yi Zhang. A simple, fast and scalable nonblocking concurrent fifo queue for shared memory multiprocessor systems. In SPAA ?01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, pages 134--143, New York, NY, USA, 2001. ACM.
[37]
Steven Wallace and Kim Hazelwood. Superpin: Parallelizing dynamic instrumentation for real-time performance. In CGO ?07: Proceedings of the International Symposium on Code Generation and Optimization, pages 209--220, Washington, DC, USA, 2007. IEEE Computer Society.
[38]
Emmett Witchel and Mendel Rosenblum. Embra: fast and flexible machine simulation. In SIGMETRICS ?96: Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 68--79, New York, NY, USA, 1996. ACM Press.
[39]
Ji Zhang, Jaejin Lee, and Philip K. McKinley. Optimizing the java piped i/o stream library for performance. In LCPC ?02: Proceedings of the 15th International Workshop on Languages and Compilers for Parallel Computing, pages 233--248, Berlin/Heidelberg, Germany, July 2002. Springer. Also published in Springer Lecture Notes in Computer Science, Vol. 2481/2005.
[40]
Jianwen Zhu and Daniel D. Gajski. A retargetable, ultra-fast instruction set simulator. In DATE ?99: Proceedings of the conference on Design, automation and test in Europe, page 62, New York, NY, USA, 1999. ACM Press.

Cited By

View all
  • (2014)Performance and power profiling for emulated Android systemsACM Transactions on Design Automation of Electronic Systems (TODAES)10.1145/256666019:2(1-25)Online publication date: 28-Mar-2014
  • (2013)PPSim: A Cycle-Accurate Simulator for PowerPC Instruction SetApplied Mechanics and Materials10.4028/www.scientific.net/AMM.325-326.1766325-326(1766-1769)Online publication date: Jun-2013
  • (2012)A cycle-approximate, mixed-ISA simulator for the KAHRISMA architectureProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492716(21-26)Online publication date: 12-Mar-2012
  • Show More Cited By

Index Terms

  1. FaCSim: a fast and cycle-accurate architecture simulator for embedded systems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 43, Issue 7
    LCTES '08
    July 2008
    167 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1379023
    Issue’s Table of Contents
    • cover image ACM Conferences
      LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
      June 2008
      180 pages
      ISBN:9781605581040
      DOI:10.1145/1375657
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 June 2008
    Published in SIGPLAN Volume 43, Issue 7

    Check for updates

    Author Tags

    1. architecture simulator
    2. cycle-accurate simulation
    3. full-system simulation
    4. simulator parallelization
    5. virtual prototyping

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Performance and power profiling for emulated Android systemsACM Transactions on Design Automation of Electronic Systems (TODAES)10.1145/256666019:2(1-25)Online publication date: 28-Mar-2014
    • (2013)PPSim: A Cycle-Accurate Simulator for PowerPC Instruction SetApplied Mechanics and Materials10.4028/www.scientific.net/AMM.325-326.1766325-326(1766-1769)Online publication date: Jun-2013
    • (2012)A cycle-approximate, mixed-ISA simulator for the KAHRISMA architectureProceedings of the Conference on Design, Automation and Test in Europe10.5555/2492708.2492716(21-26)Online publication date: 12-Mar-2012
    • (2018)High Speed Cycle-Approximate Simulation of Embedded Cache-Incoherent and Coherent Chip-MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-018-0566-x46:6(1247-1282)Online publication date: 1-Dec-2018
    • (2017)OPTIMAS: Overwrite Purging Through In-Execution Memory Address Snooping to Improve Lifetime of NVM-Based Scratchpad MemoriesIEEE Transactions on Device and Materials Reliability10.1109/TDMR.2017.271008917:3(481-489)Online publication date: Sep-2017
    • (2016)A distributed OpenCL framework using redundant computation and data replicationACM SIGPLAN Notices10.1145/2980983.290809451:6(553-569)Online publication date: 2-Jun-2016
    • (2016)A distributed OpenCL framework using redundant computation and data replicationProceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2908080.2908094(553-569)Online publication date: 2-Jun-2016
    • (2016)A Cache-Assisted Scratchpad Memory for Multiple-Bit-Error CorrectionIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.254481124:11(3296-3309)Online publication date: 1-Nov-2016
    • (2015)A2CM2: aging-aware cache memory management technique2015 CSI Symposium on Real-Time and Embedded Systems and Technologies (RTEST)10.1109/RTEST.2015.7369845(1-8)Online publication date: Oct-2015
    • (2014)A data recomputation approach for reliability improvement of scratchpad memory in embedded systems2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)10.1109/DFT.2014.6962091(228-233)Online publication date: Oct-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media