[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1941553.1941583acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

COREMU: a scalable and portable parallel full-system emulator

Published: 12 February 2011 Publication History

Abstract

This paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.

References

[1]
http://davmac.org/davpage/linux/rtsignals.html.
[2]
Kvm/qemu. http://wiki.qemu.org/KVM.
[3]
R. Bedichek. SimNow: Fast Platform Simulation Purely in Software. In 16th Hot Chips Symp, 2004.
[4]
C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, pages 72--81, 2008.
[5]
Bochs. http://bochs.sourceforge.net/.
[6]
P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, et al. Mambo: a full system simulator for the PowerPC architecture. ACM SIGMETRICS Performance Evaluation Review, 31(4):8--12, 2004.
[7]
H. Cain, K. Lepak, B. Schwartz, and M. Lipasti. Precise and accurate processor simulation. In Workshop on Computer Architecture Evaluation using Commercial Workload, 2002.
[8]
E. Chung, E. Nurvitadhi, J. Hoe, B. Falsafi, and K. Mai. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proc. FPGA, pages 77--86, 2008.
[9]
E. Chung, M. Papamichael, E. Nurvitadhi, J. Hoe, K. Mai, and B. Falsafi. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2009.
[10]
J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In IEEE HPCA, 2008.
[11]
G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08, pages 121--130, 2008.
[12]
T. L. Harris, K. Fraser, and I. A. Pratt. A practical multi-word compare-and-swap operation. In Proc. DISC, pages 265--279, 2002.
[13]
J. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 33(7):28--35, 2000.
[14]
R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Computer Systems Laboratory, Stanford University, 2007.
[15]
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hllberg, J. Hgberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, pages 50--58, 2002.
[16]
P. Magnusson and B. Werner. Efficient memory simulation in SimICS. In Proc. Annual Simulation Symposium, pages 62--73, 1995.
[17]
M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 33(4):99, 2005.
[18]
M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proc. PODC, pages 267--275, 1996.
[19]
J. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Proc. HPCA, 2010.
[20]
A. Over, B. Clarke, and P. E. Strazdins. A comparison of two approaches to parallel simulation of multiprocessors. In Proc. ISPASS, pages 12--22, 2007.
[21]
QEMU. http://qemu.org/.
[22]
C. Ranger, R. Raghuraman, A. Penmetsa, G. R. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA, pages 13--24, 2007.
[23]
M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel & Distributed Technology: Systems & Applications, 3(4):34--43, 1995.
[24]
A. Tridgell. Dbench filesystem benchmark. http://dbench.samba.org/.
[25]
V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd conference on Virtual Machine Research And Technology. USENIX Association, 2004.
[26]
K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM mambo system simulator in functional modes. SIGOPS Operating System Review, 2008.
[27]
S. Wee, J. Casper, N. Njoroge, Y. Tesylar, D. Ge, C. Kozyrakis, and K. Olukotun. A practical FPGA-based framework for novel CMP research. In Proc. FPGA, pages 116--125, 2007.
[28]
T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J. C. Hoe. Simflex: Statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, 2006.
[29]
E. Witchel and M. Rosenblum. Embra: Fast and flexible machine simulation. ACM SIGMETRICS Performance Evaluation Review, 24(1):68--79, 1996.
[30]
R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. Statistical sampling of microarchitecture simulation. ACM Trans. Model. Comput. Simul., 16(3):197--224, 2006.
[31]
D. Yeh, L.-S. Peh, S. Borkar, J. A. Darringer, A. Agarwal, and W. mei Hwu. Thousand-core chips {roundtable}. IEEE Design & Test of Computers, 25(3):272--278, 2008.
[32]
M. Yourst. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proc. ISPASS, pages 23--34, 2007.
[33]
G. Zheng, G. Kakulapati, and L. V. Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS. IEEE Computer Society, 2004.

Cited By

View all
  • (2024)CrossMappingProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692054(1013-1028)Online publication date: 10-Jul-2024
  • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
  • (2024)Challenges, Novel Approaches and Next Generation Computing Architecture for Hyper-Distributed Platforms Towards Real Computing ContinuumAdvanced Information Networking and Applications10.1007/978-3-031-57870-0_40(449-459)Online publication date: 10-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
  • General Chair:
  • Calin Cascaval,
  • Program Chair:
  • Pen-Chung Yew
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 8
    PPoPP '11
    August 2011
    300 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2038037
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. full-system emulator
  2. multicore
  3. parallel emulator

Qualifiers

  • Research-article

Conference

PPoPP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)5
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CrossMappingProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692054(1013-1028)Online publication date: 10-Jul-2024
  • (2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
  • (2024)Challenges, Novel Approaches and Next Generation Computing Architecture for Hyper-Distributed Platforms Towards Real Computing ContinuumAdvanced Information Networking and Applications10.1007/978-3-031-57870-0_40(449-459)Online publication date: 10-Apr-2024
  • (2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
  • (2023)Risotto: A Dynamic Binary Translator for Weak Memory Model ArchitecturesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567962(107-122)Online publication date: 25-Mar-2023
  • (2022)Eliminate the overhead of interrupt checking in full-system dynamic binary translatorProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534939(1-12)Online publication date: 6-Jun-2022
  • (2022)Lasagne: a static binary translator for weak memory model architecturesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523719(888-902)Online publication date: 9-Jun-2022
  • (2021)Enhancing Dynamic Binary Translation in Mobile Computing by Leveraging Polyhedral OptimizationWireless Communications & Mobile Computing10.1155/2021/66118672021Online publication date: 1-Jan-2021
  • (2021)To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00045(238-245)Online publication date: Sep-2021
  • (2020)DQEMU: A Scalable Emulator with Retargetable DBT on Distributed PlatformsProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404403(1-11)Online publication date: 17-Aug-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media