[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3313808.3313821acmconferencesArticle/Chapter ViewAbstractPublication PagesveeConference Proceedingsconference-collections
research-article

Fast and live hypervisor replacement

Published: 14 April 2019 Publication History

Abstract

Hypervisors are increasingly complex and must be often updated for applying security patches, bug fixes, and feature upgrades. However, in a virtualized cloud infrastructure, updates to an operational hypervisor can be highly disruptive. Before being updated, virtual machines (VMs) running on a hypervisor must be either migrated away or shut down, resulting in downtime, performance loss, and network overhead. We present a new technique, called HyperFresh, to transparently replace a hypervisor with a new updated instance without disrupting any running VMs. A thin shim layer, called the hyperplexor, performs live hypervisor replacement by remapping guest memory to a new updated hypervisor on the same machine. The hyperplexor leverages nested virtualization for hypervisor replacement while minimizing nesting overheads during normal execution. We present a prototype implementation of the hyperplexor on the KVM/QEMU platform that can perform live hypervisor replacement within 10ms. We also demonstrate how a hyperplexor-based approach can used for sub-second relocation of containers for live OS replacement.

References

[1]
Darren Abramson, Jeff Jackson, Sridhar Muthrasanallur, Gil Neiger, Greg Regnier, Rajesh Sankaran, Ioannis Schoinas, Rich Uhlig, Balaji Vembu, and John Wiegert. Intel virtualization technology for directed I/O. Intel technology journal, 10(3), 2006.
[2]
Amazon EC2. https://aws.amazon.com/ec2/.
[3]
Amazon Lambda Programming Model. https://docs.aws.amazon.com/lambda/latest/dg/programming-model-v2.html.
[4]
Jeff Arnold and M Frans Kaashoek. Ksplice: Automatic rebootless kernel updates. In Proceedings of the 4th ACM European conference on Computer systems, pages 187-198. ACM, 2009.
[5]
Autoscaling groups of instances. https://cloud.google.com/compute/docs/autoscaler/.
[6]
Autoscaling with Heat. https://docs.openstack.org/senlin/latest/scenarios/autoscaling_heat.html.
[7]
Azure Autoscale. https://azure.microsoft.com/en-us/features/autoscale/.
[8]
Hardik Bagdi, Rohith Kugve, and Kartik Gopalan. Hyperfresh: Live refresh of hypervisors using nested virtualization. In Proceedings of the 8th Asia-Pacific Workshop on Systems, page 18. ACM, 2017.
[9]
Barak, A. and Shiloh, A. A Distributed Load-Balancing Policy for a Multicomputer. In Software-Practice and Experience, volume 15, pages 901-913, 1985.
[10]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In ACM SIGOPS Operating Systems Review, volume 37, pages 164-177. ACM, 2003.
[11]
Fabrice Bellard. QEMU: A fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track, volume 41, page 46, 2005.
[12]
Muli Ben-Yehuda, Michael D. Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, OritWasserman, and Ben-Ami Yassour. The Turtles project: Design and implementation of nested virtualization. In Proc. of Operating Systems Design and Implementation, 2010.
[13]
SPEC CPU 2017 benchmark suite. https://www.spec.org/cpu2017/.
[14]
Bershad, B., Savage, S., Pardyak, P., Sirer, E. G., Fiuczinski, M., Becker, D., Chambers, C., and Eggers, S. Extensibility, Safety and Performance in the SPIN Operating System. In Proceedings of the 15th Symposium on Operating Systems Principles, pages 267-284, 1995.
[15]
Franz Ferdinand Brasser, Mihai Bucicoiu, and Ahmad-Reza Sadeghi. Swap and play: Live updating hypervisors and its application to Xen. In Proceedings of the 6th edition of the ACMWorkshop on Cloud Computing Security, pages 33-44. ACM, 2014.
[16]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008. http://parsec.cs.princeton.edu/.
[17]
George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, and Armando Fox. Microreboot -- a technique for cheap recovery. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 3-3, Berkeley, CA, USA, 2004. USENIX Association.
[18]
Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation-Volume 2, pages 273-286. USENIX Association, 2005.
[19]
Yaozu Dong, Zhao Yu, and Greg Rose. SR-IOV networking in Xen: Architecture, design and implementation. In First Workshop on I/O Virtualization, San Diego, CA, 2008.
[20]
Douglis, F. and Ousterhout, J. Transparent Process Migration: Design Alternatives and the Sprite Implementation. In Software-Practice and Experience, volume 21, pages 757-785, 1991.
[21]
Kubernetes Engine. https://cloud.google.com/kubernetes-engine/.
[22]
Engler, D. R., Kaashoek, M. F., and O'Toole, J. J. Exokernel: An Operating System Architecture for Application-Level Resource Management. In Proceedings of the 15th Symposium on Operating Systems Principles, pages 26-284, 1995.
[23]
Dan Goodin. Xen patches 7-year-old bug that shattered hypervisor security. In https://arstechnica.com/information-technology/2015/10/xenpatches-7-year-old-bug-that-shattered-hypervisor-security/, 2015.
[24]
Google Cloud platform. https://cloud.google.com/.
[25]
Google Infrastructure Security Design Overview, 2017. https://cloud.google.com/security/infrastructure/design/.
[26]
Kartik Gopalan, Rohit Kugve, Hardik Bagdi, Yaohui Hu, Daniel Williams, and Nilton Bila. Multi-hypervisor virtual machines: Enabling an ecosystem of hypervisor-level services. In USENIX Annual Technical Conference (USENIX ATC), pages 235-249. USENIX Association, 2017.
[27]
Michael R. Hines, Umesh Deshpande, and Kartik Gopalan. Post-copy live migration of virtual machines. SIGOPS Oper. Syst. Rev., 2009.
[28]
Yennun Huang, Chandra Kintala, Nick Kolettis, and N Dudley Fulton. Software rejuvenation: Analysis, module and applications. In IEEE International Symposium on Fault-Tolerant Computing, 1995.
[29]
iperf: The network bandwidth measurement tool. https://iperf.fr/.
[30]
Kernel live patching. https://www.kernel.org/doc/Documentation/livepatch/livepatch.txt.
[31]
Kernel live patching - Kpatch. https://lwn.net/Articles/596854/.
[32]
Kernel live patching - Kpatch2. https://lwn.net/Articles/706327/.
[33]
kGraft: Live Kernel Patching. https://www.suse.com/c/kgraft-live-kernel-patching/.
[34]
Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: The Linux virtual machine monitor. In Proceedings of the Linux symposium, volume 1, pages 225-230. Dttawa, Dntorio, Canada, 2007.
[35]
C. Kolivas. Kernbench. http://ck.kolivas.org/apps/kernbench/.
[36]
Alexey Kopytov. Sysbench manual. MySQL AB, pages 2-3, 2012.
[37]
Kopytov, A. Sysbench manual. 2009. http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf.
[38]
Kenichi Kourai and Shigeru Chiba. A fast rejuvenation technique for server consolidation with virtual machines. In Proc. of Dependable Systems and Networks (DSN), pages 245-255, 2007.
[39]
Kenichi Kourai and Hiroki Ooba. Zero-copy migration for lightweight software rejuvenation of virtualized systems. In Proceedings of the 6th Asia-Pacific Workshop on Systems, page 7. ACM, 2015.
[40]
Michael Le and Yuval Tamir. ReHype: Enabling VM survival across hypervisor failures. ACM SIGPLAN Notices, 46(7):63-74, 2011.
[41]
Linux Bug Tracker. https://bugzilla.kernel.org/buglist.cgi?quicksearch=kvm.
[42]
Litzkow, M., Livny, M., and Mutka, M. Condor: A Hunter of Idle Workstation. In Proc. of the 8th International Conference on Distributed Computing Systems (ICDCS), pages 104-111, 1988.
[43]
Microsoft azure. https://azure.microsoft.com/en-us/.
[44]
David Mosberger and Tai Jin. httperf - a tool for measuring web server performance. ACM SIGMETRICS Performance Evaluation Review, 26(3):31-37, 1998.
[45]
David Lorge Parnas. Software aging. In Proc. of the 16th international conference on Software engineering, pages 279-287, 1994.
[46]
P. Haul. https://criu.org/P.Haul.
[47]
Platform Computing. LSF User's and Administrator's Guides. In Platform Computing Corporation.
[48]
C. Pu, T. Autrey, A. Black, C. Consel, C. Cowan, J. Inouye, L. Kethana, J. Walpole, and K. Zhang. Optimistic incremental specialization. In Proc. of the 15th Symposium on Operating Systems Principles (SOSP), pages 314-324, 1995.
[49]
Rusty Russell. virtio: Towards a de-facto standard for virtual I/O devices. ACM SIGOPS Operating Systems Review, 42(5):95-103, 2008.
[50]
Serverless. https://cloud.google.com/serverless/.
[51]
Amazon Elastic Container Service. https://aws.amazon.com/ecs/.
[52]
Azure Kubernetes Service. https://azure.microsoft.com/en-us/services/kubernetes-service/.
[53]
IBM Cloud Kubernetes Service. https://www.ibm.com/cloud/container-service.
[54]
VMware Pivotal Container Service. https://cloud.vmware.com/vmware-pks.
[55]
Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim. Virtualizing I/O devices on vmware workstation's hosted virtual machine monitor. In USENIX Annual Technical Conference, pages 1-14, 2001.
[56]
Theimer, M., Lantz, K., and Cheriton, D. Preemptable Remote Execution Facilities for the V System. In Proceedings of the 10th ACM Symposium on OS Principles, pages 2-12, 1985.
[57]
Michael S. Tsirkin. vhost-net: A kernel-level virtio server, 2009. https://lwn.net/Articles/346267/.
[58]
Checkpoint/Restore In Userspace. https://criu.org/Main_Page.
[59]
VFIO Driver. https://www.kernel.org/doc/Documentation/vfio.txt.
[60]
Dan Williams, Yaohui Hu, Umesh Deshpande, Piush K Sinha, Nilton Bila, Kartik Gopalan, and Hani Jamjoom. Enabling efficient hypervisor-as-a-service clouds with ephemeral virtualization. ACM SIGPLAN Notices, 51:79-92, 2016.
[61]
Xen Project. Live Patching of Xen. https://wiki.xenproject.org/wiki/LivePatch.
[62]
Zayas, E. Attacking the Process Migration Bottleneck. In Proceedings of the 11th Symposium on Operating Systems Principles, pages 13-24, 1987.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
VEE 2019: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
April 2019
206 pages
ISBN:9781450360203
DOI:10.1145/3313808
  • General Chair:
  • Jennifer Sartor,
  • Program Chairs:
  • Mayur Naik,
  • Chris Rossbach
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 April 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Container
  2. Hypervisor
  3. Live Migration
  4. Virtualization

Qualifiers

  • Research-article

Conference

VEE '19

Acceptance Rates

Overall Acceptance Rate 80 of 235 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hardware-Assisted Low-Latency NPU Virtualization Method for Multi-Sensor AI SystemsSensors10.3390/s2424801224:24(8012)Online publication date: 15-Dec-2024
  • (2024)DBMS-Assisted Live Migration of Virtual MachinesIEEE Transactions on Computers10.1109/TC.2023.332994373:2(380-393)Online publication date: 1-Feb-2024
  • (2024)Resilient VirtualizationComputer10.1109/MC.2023.330661757:2(70-78)Online publication date: 31-Jan-2024
  • (2023)Fast VM Replication on Heterogeneous Hypervisors for Robust Fault ToleranceProceedings of the 24th International Middleware Conference10.1145/3590140.3592849(15-28)Online publication date: 27-Nov-2023
  • (2023)Phoenix: A Live Upgradable Blockchain ClientIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.32825868:4(703-714)Online publication date: Oct-2023
  • (2023)V-Recover: Virtual Machine Recovery When Live Migration FailsIEEE Transactions on Cloud Computing10.1109/TCC.2023.3282466(1-12)Online publication date: 2023
  • (2023)VM Migration Support for Secure Out-of-Band VNC with Shadow Devices2023 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy59711.2023.10361346(0298-0305)Online publication date: 14-Nov-2023
  • (2023)Runtime software patchingJournal of Systems and Software10.1016/j.jss.2023.111652200:COnline publication date: 1-Jun-2023
  • (2022)Hy-FiX: Fast In-Place Upgrades of KVM HypervisorsIEEE Transactions on Cloud Computing10.1109/TCC.2021.305659010:4(2679-2690)Online publication date: 1-Oct-2022
  • (2021)Mitigating vulnerability windows with hypervisor transplantProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456235(162-177)Online publication date: 21-Apr-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media