[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3663408.3663416acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Open access

Software-based Live Migration for Containerized RDMA

Published: 03 August 2024 Publication History

Abstract

Container live migration is critical to ensure services are not interrupted during host maintenance in data centers. On the other hand, RDMA containerization has attracted both academia and industry for years. However, live migration for containerized RDMA is not supported in today’s data centers. Although modifying RDMA NICs (RNICs) to be aware of live migration has been proposed for years, there is no sign of supporting it on commodity RNICs. This paper proposes MigrRDMA, a software-based RDMA live migration for containers, which does not rely on any extra hardware support. MigrRDMA provides a minimum virtualization layer inside the RDMA library loaded in applications, which achieves transparent switching to new RDMA communications. Unlike previous RDMA virtualization that provides sharing and isolation, MigrRDMA’s virtualization layer focuses on keeping the RDMA states on the migration source and destination the same from the perspective of applications. Our evaluation shows that MigrRDMA only adds 0.7 ∼ 12.1 ms downtime to migrate a container with live RDMA connections running at line rate. Besides, the MigrRDMA virtualization layer only adds 3% ∼ 9% overheads in the data path operations.

References

[1]
[n. d.]. GitHub - linux-rdma/perftest: Infiniband Verbs Performance Tests. https://github.com/linux-rdma/perftest.
[2]
[n. d.]. Intel SGX Explained. https://eprint.iacr.org/2016/086.pdf.
[3]
Yahya Al-Dhuraibi, Fawaz Paraiso, Nabil Djarallah, and Philippe Merle. 2017. Autonomic Vertical Elasticity of Docker Containers with ELASTICDOCKER. In 2017 IEEE 10th International Conference on Cloud Computing (CLOUD). 472–479. https://doi.org/10.1109/CLOUD.2017.67
[4]
Mohamed Azab and Mohamed Eltoweissy. 2016. MIGRATE: Towards a Lightweight Moving-Target Defense Against Cloud Side-Channels. In 2016 IEEE Security and Privacy Workshops (SPW). 96–103. https://doi.org/10.1109/SPW.2016.28
[5]
Wei Cao, Yang Liu, Zhushi Cheng, Ning Zheng, Wei Li, Wenjie Wu, Linqiang Ouyang, Peng Wang, Yijing Wang, Ray Kuan, Zhenjun Liu, Feng Zhu, and Tong Zhang. 2020. POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 29–41. https://www.usenix.org/conference/fast20/presentation/cao-wei
[6]
Hongzhi Chen, Changji Li, Chenguang Zheng, Chenghuan Huang, Juncheng Fang, James Cheng, and Jian Zhang. 2022. G-Tran: A High Performance Distributed Graph Database with a Decentralized Architecture. Proc. VLDB Endow. 15, 11 (jul 2022), 2545–2558. https://doi.org/10.14778/3551793.3551813
[7]
Yanzhe Chen, Xingda Wei, Jiaxin Shi, Rong Chen, and Haibo Chen. 2016. Fast and General Distributed Transactions Using RDMA and HTM. In Proceedings of the Eleventh European Conference on Computer Systems (London, United Kingdom) (EuroSys ’16). Association for Computing Machinery, New York, NY, USA, Article 26, 17 pages. https://doi.org/10.1145/2901318.2901349
[8]
Jonathan Corbet. 2012. TCP Connection Repair. https://lwn.net/Articles/495304/.
[9]
CRIU. 2023. CRIU Main Page. https://criu.org/Main_Page.
[10]
Docker. 2024. Docker Checkpoint. https://docs.docker.com/reference/cli/docker/checkpoint/.
[11]
Docker. 2024. Redis – Docker Official Image. https://hub.docker.com/_/redis/.
[12]
Parav Pandit Dror Goldenberg. 2019. Mellanox Container Journey. https://qnib.org/data/hpcw19/7_END_2_MellanoxJourney.pdf.
[13]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, Fei Feng, Yan Zhuang, Fan Liu, Pan Liu, Xingkui Liu, Zhongjie Wu, Junping Wu, Zheng Cao, Chen Tian, Jinbo Wu, Jiaji Zhu, Haiyong Wang, Dennis Cai, and Jiesheng Wu. 2021. When Cloud Storage Meets RDMA. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). USENIX Association, 519–533. https://www.usenix.org/conference/nsdi21/presentation/gao
[14]
Zhiqiang He, Dongyang Wang, Binzhang Fu, Kun Tan, Bei Hua, Zhi-Li Zhang, and Kai Zheng. 2020. MasQ: RDMA for Virtual Private Cloud. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (Virtual Event, USA) (SIGCOMM ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3387514.3405849
[15]
Michael Kerrisk. 2023. signal(7) – Linux Manual Page. https://man7.org/linux/man-pages/man7/signal.7.html.
[16]
Daehyeok Kim, Tianlong Yu, Hongqiang Harry Liu, Yibo Zhu, Jitu Padhye, Shachar Raindel, Chuanxiong Guo, Vyas Sekar, and Srinivasan Seshan. 2019. FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 113–126. https://www.usenix.org/conference/nsdi19/presentation/kim
[17]
Kwangwon Koh, Kangho Kim, Seunghyub Jeon, and Jaehyuk Huh. 2019. Disaggregated Cloud Memory with Elastic Block Management. IEEE Trans. Comput. 68, 1 (2019), 39–52. https://doi.org/10.1109/TC.2018.2851565
[18]
Jiaqi Lou, Xinhao Kong, Jinghan Huang, Wei Bai, Nam Sung Kim, and Danyang Zhuo. 2024. Harmonic: Hardware-assisted RDMA Performance Isolation for Public Clouds. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 1479–1496. https://www.usenix.org/conference/nsdi24/presentation/lou
[19]
Victor Marmol and Andy Tucker. 2018. Task Migration at Scale using CRIU. https://www.slideshare.net/RohitJnagal/task-migration-using-criu.
[20]
Shripad Nadgowda, Sahil Suneja, Nilton Bila, and Canturk Isci. 2017. Voyager: Complete Container State Migration. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 2137–2142. https://doi.org/10.1109/ICDCS.2017.91
[21]
Maksym Planeta, Jan Bierbaum, Leo Sahaya Daphne Antony, Torsten Hoefler, and Hermann Härtig. 2021. MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 47–63. https://www.usenix.org/conference/atc21/presentation/planeta
[22]
Adrian Reber. 2016. Container Live Migration Using runC and CRIU. https://www.redhat.com/en/blog/container-live-migration-using-runc-and-criu.
[23]
Benjamin Rothenberger, Konstantin Taranov, Adrian Perrig, and Torsten Hoefler. 2021. ReDMArk: Bypassing RDMA Security Mechanisms. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 4277–4292. https://www.usenix.org/conference/usenixsecurity21/presentation/rothenberger
[24]
Alibaba Container Service. 2019. Using RDMA on Container Service for Kubernetes. https://www.alibabacloud.com/blog/using-rdma-on-container-service-for-kubernetes_594462?spm=a2c41.12560487.0.0.
[25]
Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 317–332. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/shi
[26]
TensorFlow. 2023. TensorFlow – Install Docker. https://www.tensorflow.org/install/docker.
[27]
Luan Teylo, Rafaela C. Brum, Luciana Arantes, Pierre Sens, and Lúcia Maria de A. Drummond. 2020. Developing Checkpointing and Recovery Procedures with the Storage Services of Amazon Web Services. In Workshop Proceedings of the 49th International Conference on Parallel Processing (Edmonton, AB, Canada) (ICPP Workshops ’20). Association for Computing Machinery, New York, NY, USA, Article 17, 8 pages. https://doi.org/10.1145/3409390.3409407
[28]
Shin-Yeh Tsai and Yiying Zhang. 2017. LITE Kernel RDMA Support for Datacenter Applications. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP ’17). Association for Computing Machinery, New York, NY, USA, 306–324. https://doi.org/10.1145/3132747.3132762
[29]
Zhe Wang, Teng Ma, Linghe Kong, Zhenzao Wen, Jingxuan Li, Zhuo Song, Yang Lu, Guihai Chen, and Wei Cao. 2022. Zero Overhead Monitoring for Cloud-native Infrastructure using RDMA. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 639–654. https://www.usenix.org/conference/atc22/presentation/wang-zhe
[30]
Ashton Webster, Ryan Eckenrod, and James Purtilo. 2018. Fast and Service-preserving Recovery from Malware Infections Using CRIU. In 27th USENIX Security Symposium (USENIX Security 18). USENIX Association, Baltimore, MD, 1199–1211. https://www.usenix.org/conference/usenixsecurity18/presentation/webster
[31]
Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 233–251. https://www.usenix.org/conference/osdi18/presentation/wei
[32]
Xingda Wei, Fangming Lu, Rong Chen, and Haibo Chen. 2022. KRCORE: A Microsecond-scale RDMA Control Plane for Elastic Computing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 121–136. https://www.usenix.org/conference/atc22/presentation/wei
[33]
Yingqiang Zhang, Chaoyi Ruan, Cheng Li, Xinjun Yang, Wei Cao, Feifei Li, Bo Wang, Jing Fang, Yuhui Wang, Jingze Huo, and Chao Bi. 2021. Towards Cost-effective and Elastic Cloud Database Deployment via Memory Disaggregation. Proc. VLDB Endow. 14, 10 (jun 2021), 1900–1912. https://doi.org/10.14778/3467861.3467877
[34]
Bohong Zhu, Youmin Chen, Qing Wang, Youyou Lu, and Jiwu Shu. 2021. Octopus+: An RDMA-Enabled Distributed Persistent Memory File System. ACM Trans. Storage 17, 3, Article 19 (aug 2021), 25 pages. https://doi.org/10.1145/3448418

Cited By

View all
  • (2024)SmartNIC-Enabled Live Migration for Storage-Optimized VMsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680487(45-52)Online publication date: 4-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking
August 2024
230 pages
ISBN:9798400717581
DOI:10.1145/3663408
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Check for updates

Author Tags

  1. Containers
  2. Live Migration
  3. RDMA
  4. Virtualization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

APNet 2024

Acceptance Rates

APNet '24 Paper Acceptance Rate 50 of 118 submissions, 42%;
Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)456
  • Downloads (Last 6 weeks)103
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SmartNIC-Enabled Live Migration for Storage-Optimized VMsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680487(45-52)Online publication date: 4-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media