Lightweight Live Migration for High Availability Cluster Service

Bo Jiang⁵,
Binoy Ravindran⁵ &
Changsoo Kim⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6366))

Included in the following conference series:

Symposium on Self-Stabilizing Systems

703 Accesses
4 Citations

Abstract

High availability is a critical feature for service clusters and cloud computing, and is often considered more valuable than performance. One commonly used technique to enhance the availability is live migration, which replicates services based on virtualization technology. However, continuous live migration with checkpointing will introduce significant overhead. In this paper, we present a lightweight live migration (LLM) mechanism to integrate whole-system migration and input replay efforts, which aims at reducing the overhead while providing comparable availability. LLM migrates service requests from network clients at high frequency during the interval of checkpointing system updates. Once a failure happens to the primary machine, the backup machine will continue the service based on the virtual machine image and network inputs at their respective last migration rounds. We implemented LLM based on Xen and compared it with Remus—a state-of-the-art effort that enhances the availability by checkpointing system status updates. Our experimental evaluations show that LLM clearly outperforms Remus in terms of network delay and overhead. For certain types of applications, LLM may also be a better alternative in terms of downtime than Remus. In addition, LLM achieves transaction level consistency like Remus.

This work was supported by the IT R&D program of MKE/KEIT, South Korea [2007S01602, Development of Cost Effective and Large Scale Global Internet Service Solution].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Asymmetric virtual machine replication for low latency and high available service

Article 20 June 2018

Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications

Priority-Based Live Migration of Virtual Machine

References

Kopper, K.: The Linux Enterprise Cluster: build a highly available cluster with commodity hardware and free software. No Starch Press (2004)
Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A berkeley view of cloud computing. Technical report (2009)
Google Scholar
Blake, V.: Five nines: A telecom myth. Communications Technology (2009)
Google Scholar
Poledna, S.: Fault-Tolerant Real-Time Systems: The Problem of Replica Determinism. Kluwer Academic Publishers, Dordrecht (1996)
MATH Google Scholar
Mullender, S.: Distributed Systems. Addison Wesley Publishing Company, Reading (1993)
MATH Google Scholar
Carwardine, J.: Providing open architecture high availability solutions. HA forum (2005)
Google Scholar
Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: NSDI 2005: Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation, pp. 273–286. USENIX Association, Berkeley (2005)
Google Scholar
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2), 51–59 (2002)
Article Google Scholar
Mergen, M.F., Uhlig, V., Krieger, O., Xenidis, J.: Virtualization for high-performance computing. SIGOPS Oper. Syst. Rev. 40(2), 8–11 (2006)
Article Google Scholar
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: NSDI 2008: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, pp. 161–174. USENIX Association (2008)
Google Scholar
Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault tolerance. In: SOSP 1995: Proceedings of the fifteenth ACM symposium on Operating systems principles, pp. 1–11. ACM, New York (1995)
Google Scholar
Aguilera, M.K., Spence, S., Veitch, A.: Olive: distributed point-in-time branching storage for real systems. In: NSDI 2006: Proceedings of the 3rd conference on Networked Systems Design & Implementation, Berkeley, CA, USA, pp. 27–27 (2006)
Google Scholar
Hawkins, M., Piedad, F.: High Availability: Design, Techniques and Processes. Prentice Hall PTR, Upper Saddle River (2000)
Google Scholar
Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: SIGMOD 1996: Proceedings of the 1996 ACM SIGMOD international conference on Management of data, pp. 173–182. ACM, New York (1996)
Chapter Google Scholar
Miloj́ičić, D.S., Douglis, F., Paindaveine, Y., Wheeler, R., Zhou, S.: Process migration. ACM Comput. Surv. 32(3), 241–299 (2000)
Article Google Scholar
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: SOSP 2003: Proceedings of the nineteenth ACM symposium on Operating systems principles, pp. 164–177. ACM, New York (2003)
Chapter Google Scholar
Bradford, R., Kotsovinos, E., Feldmann, A., Schiöberg, H.: Live wide-area migration of virtual machines including local persistent state. In: VEE 2007: Proceedings of the 3rd international conference on Virtual execution environments, pp. 169–179. ACM, New York (2007)
Google Scholar
Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: Revirt: enabling intrusion analysis through virtual-machine logging and replay. SIGOPS Oper. Syst. Rev. 36(SI), 211–224 (2002)
Google Scholar
Elnozahy, E.N.: Manetho: fault tolerance in distributed systems using rollback-recovery and process replication. PhD thesis, Houston, TX, USA, Chairman-Zwaenepoel, Willy (1994)
Google Scholar
Mchardy, P.: Linux imq, http://www.linuximq.net/
Russell, R., Welte, H.: Linux netfilter hacking howto, http://www.iptables.org/documentation/HOWTO/netfilter-hacking-HOWTO.html
Xen Community: Xen unstable source, http://xenbits.xensource.com/xen-unstable.hg
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus source code, http://dsg.cs.ubc.ca/remus/
Stevens, W.R.: TCP/IP illustrated. The protocols, vol. 1. Addison-Wesley Longman Publishing Co., Inc., Boston (1993)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

ECE Dept., Virginia Tech, USA
Bo Jiang & Binoy Ravindran
ETRI, Daejeon, South Korea
Changsoo Kim

Authors

Bo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Binoy Ravindran
View author publications
You can also search for this author in PubMed Google Scholar
Changsoo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Ben-Gurion University of the Negev,Beer-Sheva, 84105, Israel
Shlomi Dolev
Department of Computer Science, The University of Texas at Dallas, TX 75083-0688, Richardson, USA
Jorge Cobb
Department of Computer Science, Yale University, 51 Prospect Street, CT 06511, New Haven, USA
Michael Fischer
Department of Computer Science, Columbia University, 10027, New York, NY, USA
Moti Yung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, B., Ravindran, B., Kim, C. (2010). Lightweight Live Migration for High Availability Cluster Service. In: Dolev, S., Cobb, J., Fischer, M., Yung, M. (eds) Stabilization, Safety, and Security of Distributed Systems. SSS 2010. Lecture Notes in Computer Science, vol 6366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16023-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-16023-3_34
Published: 20 September 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16022-6
Online ISBN: 978-3-642-16023-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Lightweight Live Migration for High Availability Cluster Service

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Asymmetric virtual machine replication for low latency and high available service

Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications

Priority-Based Live Migration of Virtual Machine

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Lightweight Live Migration for High Availability Cluster Service

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Asymmetric virtual machine replication for low latency and high available service

Lightweight Virtual Machine Checkpoint and Rollback for Long-running Applications

Priority-Based Live Migration of Virtual Machine

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation