More Web Proxy on the site http://driver.im/

research-article

COREMU: a scalable and portable parallel full-system emulator

Authors:

Binyu ZangAuthors Info & Claims

PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Pages 213 - 222

https://doi.org/10.1145/1941553.1941583

Published: 12 February 2011 Publication History

Abstract

This paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.

References

[1]

http://davmac.org/davpage/linux/rtsignals.html.

[2]

Kvm/qemu. http://wiki.qemu.org/KVM.

[3]

R. Bedichek. SimNow: Fast Platform Simulation Purely in Software. In 16th Hot Chips Symp, 2004.

[4]

C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, pages 72--81, 2008.

Digital Library

[5]

Bochs. http://bochs.sourceforge.net/.

[6]

P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, et al. Mambo: a full system simulator for the PowerPC architecture. ACM SIGMETRICS Performance Evaluation Review, 31(4):8--12, 2004.

Digital Library

[7]

H. Cain, K. Lepak, B. Schwartz, and M. Lipasti. Precise and accurate processor simulation. In Workshop on Computer Architecture Evaluation using Commercial Workload, 2002.

[8]

E. Chung, E. Nurvitadhi, J. Hoe, B. Falsafi, and K. Mai. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proc. FPGA, pages 77--86, 2008.

Digital Library

[9]

E. Chung, M. Papamichael, E. Nurvitadhi, J. Hoe, K. Mai, and B. Falsafi. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2009.

Digital Library

[10]

J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In IEEE HPCA, 2008.

[11]

G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08, pages 121--130, 2008.

Digital Library

[12]

T. L. Harris, K. Fraser, and I. A. Pratt. A practical multi-word compare-and-swap operation. In Proc. DISC, pages 265--279, 2002.

Digital Library

[13]

J. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 33(7):28--35, 2000.

Digital Library

[14]

R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Computer Systems Laboratory, Stanford University, 2007.

[15]

P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hllberg, J. Hgberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, pages 50--58, 2002.

Digital Library

[16]

P. Magnusson and B. Werner. Efficient memory simulation in SimICS. In Proc. Annual Simulation Symposium, pages 62--73, 1995.

Digital Library

[17]

M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 33(4):99, 2005.

Digital Library

[18]

M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proc. PODC, pages 267--275, 1996.

Digital Library

[19]

J. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Proc. HPCA, 2010.

[20]

A. Over, B. Clarke, and P. E. Strazdins. A comparison of two approaches to parallel simulation of multiprocessors. In Proc. ISPASS, pages 12--22, 2007.

[21]

QEMU. http://qemu.org/.

[22]

C. Ranger, R. Raghuraman, A. Penmetsa, G. R. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA, pages 13--24, 2007.

Digital Library

[23]

M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel & Distributed Technology: Systems & Applications, 3(4):34--43, 1995.

Digital Library

[24]

A. Tridgell. Dbench filesystem benchmark. http://dbench.samba.org/.

[25]

V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd conference on Virtual Machine Research And Technology. USENIX Association, 2004.

Digital Library

[26]

K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM mambo system simulator in functional modes. SIGOPS Operating System Review, 2008.

Digital Library

[27]

S. Wee, J. Casper, N. Njoroge, Y. Tesylar, D. Ge, C. Kozyrakis, and K. Olukotun. A practical FPGA-based framework for novel CMP research. In Proc. FPGA, pages 116--125, 2007.

Digital Library

[28]

T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J. C. Hoe. Simflex: Statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, 2006.

Digital Library

[29]

E. Witchel and M. Rosenblum. Embra: Fast and flexible machine simulation. ACM SIGMETRICS Performance Evaluation Review, 24(1):68--79, 1996.

Digital Library

[30]

R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. Statistical sampling of microarchitecture simulation. ACM Trans. Model. Comput. Simul., 16(3):197--224, 2006.

Digital Library

[31]

D. Yeh, L.-S. Peh, S. Borkar, J. A. Darringer, A. Agarwal, and W. mei Hwu. Thousand-core chips {roundtable}. IEEE Design & Test of Computers, 25(3):272--278, 2008.

Digital Library

[32]

M. Yourst. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proc. ISPASS, pages 23--34, 2007.

[33]

G. Zheng, G. Kakulapati, and L. V. Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS. IEEE Computer Society, 2004.

Cited By

Gao CMeng XLi WLai JZhang YRen FBagchi SZhang Y(2024)CrossMappingProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692054(1013-1028)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692054
Jiang JLiang CDong RYang ZZhou ZWang WYew PZhang WGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444850
Lubrano FCaragnano GScionti ATerzo O(2024)Challenges, Novel Approaches and Next Generation Computing Architecture for Hyper-Distributed Platforms Towards Real Computing ContinuumAdvanced Information Networking and Applications10.1007/978-3-031-57870-0_40(449-459)Online publication date: 10-Apr-2024
https://doi.org/10.1007/978-3-031-57870-0_40
Show More Cited By

Index Terms

COREMU: a scalable and portable parallel full-system emulator
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

COREMU: a scalable and portable parallel full-system emulator
PPoPP '11

This paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and ...
Scalable deterministic replay in a parallel full-system emulator
PPoPP '13

Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the ...
Scalable deterministic replay in a parallel full-system emulator
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

February 2011

326 pages

ISBN:9781450301190

DOI:10.1145/1941553

General Chair:
Calin Cascaval
Qualcomm Research, USA
,
Program Chair:
Pen-Chung Yew
Academia Sinica, Taiwan and University of Minnesota at Twin Cities, USA

ACM SIGPLAN Notices Volume 46, Issue 8
PPoPP '11
August 2011
300 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2038037
Issue’s Table of Contents

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '11

Sponsor:

SIGPLAN

PPoPP '11: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 12 - 16, 2011

TX, San Antonio, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

57
Total Citations
View Citations
781
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)5

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gao CMeng XLi WLai JZhang YRen FBagchi SZhang Y(2024)CrossMappingProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692054(1013-1028)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692054
Jiang JLiang CDong RYang ZZhou ZWang WYew PZhang WGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)A System-Level Dynamic Binary Translator using Automatically-Learned Translation RulesProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444850(423-434)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444850
Lubrano FCaragnano GScionti ATerzo O(2024)Challenges, Novel Approaches and Next Generation Computing Architecture for Hyper-Distributed Platforms Towards Real Computing ContinuumAdvanced Information Networking and Applications10.1007/978-3-031-57870-0_40(449-459)Online publication date: 10-Apr-2024
https://doi.org/10.1007/978-3-031-57870-0_40
Beck MBhat KStričević LChen GBehrens DFu MVafeiadis VChen HHärtig HAamodt TJerger NSwift M(2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3579849
Gouicem RSprokholt DRuehl JRocha RSpink TChakraborty SBhatotia PAamodt TJerger NSwift M(2023)Risotto: A Dynamic Binary Translator for Weak Memory Model ArchitecturesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567962(107-122)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3567955.3567962
Niu GZhang FLi XMalka MKolodner HBellosa FGabel M(2022)Eliminate the overhead of interrupt checking in full-system dynamic binary translatorProceedings of the 15th ACM International Conference on Systems and Storage10.1145/3534056.3534939(1-12)Online publication date: 6-Jun-2022
https://dl.acm.org/doi/10.1145/3534056.3534939
Rocha RSprokholt DFink MGouicem RSpink TChakraborty SBhatotia PJhala RDillig I(2022)Lasagne: a static binary translator for weak memory model architecturesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523719(888-902)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519939.3523719
Li MPang JYue FLiu FWang JTan J(2021)Enhancing Dynamic Binary Translation in Mobile Computing by Leveraging Polyhedral OptimizationWireless Communications & Mobile Computing10.1155/2021/66118672021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6611867
Badaroux MMiroddi SPetrot F(2021)To Pin or Not to Pin: Asserting the Scalability of QEMU Parallel Implementation2021 24th Euromicro Conference on Digital System Design (DSD)10.1109/DSD53832.2021.00045(238-245)Online publication date: Sep-2021
https://doi.org/10.1109/DSD53832.2021.00045
Zhao ZJiang ZLiu XGong XWang WYew P(2020)DQEMU: A Scalable Emulator with Retargetable DBT on Distributed PlatformsProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404403(1-11)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404403
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents