[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3698038.3698510acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

On-demand and Parallel Checkpoint/Restore for GPU Applications

Published: 20 November 2024 Publication History

Abstract

Leveraging serverless computing for cloud-based machine learning services is on the rise, promising cost-efficiency and flexibility are crucial for ML applications relying on high-performance GPUs and substantial memory. However, despite modern serverless platforms handling diverse devices like GPUs seamlessly on a pay-as-you-go basis, a longstanding challenge remains: startup latency, a well-studied issue when serverless is CPU-centric. For example, initializing GPU apps with minor GPU models, like MobileNet, demands several seconds. For more intricate models such as GPT-2, startup latency can escalate to around 10 seconds, vastly overshadowing the short computation time for GPU-based inference. Prior solutions tailored for CPU serverless setups, like fork() and Checkpoint/Restore, cannot be directly and effectively applied due to differences between CPUs and GPUs.
This paper presents gCROP (GPU Checkpoint/Restore made On-demand and Parallel), the first GPU runtime that achieves <100ms startup latency for GPU apps with up to 774 million parameters (3.1GB GPT-2-Large model). The key insight behind gCROP is to selectively restore essential states on demand and in parallel during boot from a prepared checkpoint image. To this end, gCROP first introduces a global service, GPU Restore Server, which can break the existing barrier between restore stages and achieve parallel restore. Besides, gCROP leverages both CPU and GPU page faults, and can on-demand restore both CPU and GPU data with profile-guided order to mitigate costs caused by faults. Moreover, gCROP designs a multi-checkpoint mechanism to increase the common contents among checkpoint images and utilizes deduplication to reduce storage costs. Implementation and evaluations on AMD GPUs show significant improvement in startup latency, 6.4x-24.7x compared with booting from scratch and 3.9x-23.5x over the state-of-the-art method (CRIU).

References

[1]
2024. AMD GPU DMA buffer. https://github.com/torvalds/linux/blob/v6.8/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.h. Accessed: 2024-07-04.
[2]
2024. Apache OpenWhisk is a serverless, open source cloud platform. http://openwhisk.apache.org/. Referenced 2024.
[3]
2024. Apache TVM. https://tvm.apache.org/. Accessed: 2024-07-04.
[4]
2024. AWS Lambda - Serverless Compute. https://aws.amazon.com/lambda/. Referenced Jan. 2024.
[5]
2024. Best practices for GPU-accelerated instances. https://www.alibabacloud.com/help/en/function-compute/latest/development-guide. Accessed: 2024-07-15.
[6]
2024. Buffer Sharing and Synchronization (dma-buf) --- The Linux Kernel documentation. https://docs.kernel.org/driver-api/dma-buf.html. Accessed: 2024-07-04.
[7]
2024. Compute - Amazon EC2 Instance Types - AWS. https://aws.amazon.com/ec2/instance-types/. Accessed: 2024-07-04.
[8]
2024. CR in namespace - CRIU. https://criu.org/CR_in_namespace. Accessed: 2024-07-04.
[9]
2024. CRIU. https://criu.org/Main_Page. Accessed: 2024-07-04.
[10]
2024. CUDA Toolkit - Free Tools and Training | NVIDIA Developer. https://developer.nvidia.com/cuda-toolkit. Accessed: 2024-07-04.
[11]
2024. Google gVisor: Application Kernel for Containers. https://github.com/google/gvisor. Referenced 2024-07-16.
[12]
2024. GPU memory --- ROCm Documentation. https://ROCm.docs.amd.com/en/latest/conceptual/gpu-memory.html. Accessed: 2024-07-04.
[13]
2024. GPUDirect | NVIDIA Developer. https://developer.nvidia.com/gpudirect. Accessed: 2024-07-04.
[14]
2024. Heterogeneous Memory Management (HMM) --- The Linux Kernel documentation. https://www.kernel.org/doc/html/v5.0/vm/hmm.html. Accessed: 2024-07-04.
[15]
2024. HIP Runtime API Reference: PeerToPeer Device Memory Access --- HIP 6.2.41134 Documentation. https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/group___peer_to_peer.html. Accessed: 2024-07-04.
[16]
2024. hoytech/vmtouch: Portable file system cache diagnostics and control. https://github.com/hoytech/vmtouch. Accessed: 2024-07-04.
[17]
2024. Introducing ChatGPT | OpenAI. https://openai.com/index/chatgpt/. Accessed: 2024-07-04.
[18]
2024. Models and pre-trained weights --- Torchvision 0.18 documentation. https://PyTorch.org/vision/stable/models.html. Accessed: 2024-07-04.
[19]
2024. NVIDIA Tesla V100 | NVIDIA. https://www.nvidia.com/en-gb/data-center/tesla-v100/. Accessed: 2024-07-04.
[20]
2024. NVIDIA/cuda-checkpoint: CUDA checkpoint and restore utility. https://github.com/NVIDIA/cuda-checkpoint. Accessed: 2024-07-04.
[21]
2024. NVLink & NVSwitch: Fastest HPC Data Center Platform | NVIDIA. https://www.nvidia.com/en-us/data-center/nvlink/. Accessed: 2024-07-04.
[22]
2024. opencontainers/runc. https://github.com/opencontainers/runc.git. Referenced 2024-07-15.
[23]
2024. ROCm Software. https://www.amd.com/en/products/software/rocm.html. Accessed: 2024-07-04.
[24]
2024. ROCm/rocBLAS: Next generation BLAS implementation for ROCm platform. https://github.com/ROCm/rocBLAS. Accessed: 2024-07-04.
[25]
2024. Safetensors: ML Safer for All. https://github.com/huggingface/safetensors. Accessed: 2024-07-04.
[26]
2024. [Snaps] Full snapshot + restore, firecracker-microvm/firecracker. https://github.com/firecracker-microvm/firecracker/issues/1184. Referenced April 2024.
[27]
2024. Sora | OpenAI. https://openai.com/index/sora/. Accessed: 2024-07-04.
[28]
2024. Supporting ROCm with CRIU. https://github.com/checkpoint-restore/criu/tree/criu-dev/plugins/amdgpu. Accessed: 2024-07-04.
[29]
2024. Unified Memory for CUDA Beginners | NVIDIA Technical Blog. https://developer.nvidia.com/blog/unified-memory-cuda-beginners/. Accessed: 2024-07-04.
[30]
Alexandru Agache, Marc Brooker, Alexandra Iordache, Anthony Liguori, Rolf Neugebauer, Phil Piwonka, and Diana-Maria Popa. 2020. Firecracker: Lightweight virtualization for serverless applications. In 17th USENIX symposium on networked systems design and implementation (NSDI 20). 419--434.
[31]
Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards High-Performance Serverless Computing. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 923--935.
[32]
Ahsan Ali, Riccardo Pinciroli, Feng Yan, and Evgenia Smirni. 2020. BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1--15. https://doi.org/10.1109/SC41405.2020.00073
[33]
Lixiang Ao, George Porter, and Geoffrey M. Voelker. 2022. FaaSnap: FaaS made fast using snapshot-based VMs. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 730--746. https://doi.org/10.1145/3492321.3524270
[34]
The KServe Authors. 2023. KServe. https://github.com/kserve/kserve. Accessed on 2024-06-22.
[35]
Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 499--514. https://www.usenix.org/conference/osdi20/presentation/bai
[36]
Vivek M. Bhasi, Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra, Mahmut Taylan Kandemir, and Chita Das. 2021. Kraken: Adaptive Container Provisioning for Deploying Dynamic DAGs in Serverless Platforms. In Proceedings of the ACM Symposium on Cloud Computing (Seattle, WA, USA) (SoCC '21). Association for Computing Machinery, New York, NY, USA, 153--167. https://doi.org/10.1145/3472883.3486992
[37]
Seungbeom Choi, Sunho Lee, Yeonjae Kim, Jongse Park, Youngjin Kwon, and Jaehyuk Huh. 2022. Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 199--216. https://www.usenix.org/conference/atc22/presentation/choi-seungbeom
[38]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613--627. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw
[39]
Weihao Cui, Han Zhao, Quan Chen, Hao Wei, Zirui Li, Deze Zeng, Chao Li, and Minyi Guo. 2022. DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 183--198. https://www.usenix.org/conference/atc22/presentation/cui
[40]
Aditya Dhakal, Sameer G Kulkarni, and K. K. Ramakrishnan. 2020. GSLICE: controlled spatial sharing of GPUs for a scalable inference platform. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC '20). Association for Computing Machinery, New York, NY, USA, 492--506. https://doi.org/10.1145/3419111.3421284
[41]
Dong Du, Qingyuan Liu, Xueqiang Jiang, Yubin Xia, Binyu Zang, and Haibo Chen. 2022. Serverless computing on heterogeneous computers. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 797--813.
[42]
Dong Du, Tianyi Yu, Yubin Xia, Binyu Zang, Guanglu Yan, Chenggang Qin, Qixuan Wu, and Haibo Chen. 2020. Catalyzer: Sub-Millisecond Startup for Serverless Computing with Initialization-Less Booting. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 467--481. https://doi.org/10.1145/3373376.3378512
[43]
Henrique Fingler, Zhiting Zhu, Esther Yoon, Zhipeng Jia, Emmett Witchel, and Christopher J. Rossbach. 2022. DGSF: Disaggregated GPUs for Serverless Functions. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 739--750. https://doi.org/10.1109/IPDPS53621.2022.00077
[44]
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, and Luo Mai. 2024. ServerlessLLM: Low-Latency Serverless Inference for Large Language Models. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). USENIX Association, Santa Clara, CA, 135--153. https://www.usenix.org/conference/osdi24/presentation/fu
[45]
Alexander Fuerst and Prateek Sharma. 2021. FaasCache: Keeping Serverless Computing Alive with Greedy-Dual Caching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 386--400. https://doi.org/10.1145/3445814.3446757
[46]
Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann, Ymir Vigfusson, and Jonathan Mace. 2020. Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 443--462. https://www.usenix.org/conference/osdi20/presentation/gujarati
[47]
Mingcong Han, Hanze Zhang, Rong Chen, and Haibo Chen. 2022. Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 539--558. https://www.usenix.org/conference/osdi22/presentation/han
[48]
Vatche Isahagian, Vinod Muthusamy, and Aleksander Slominski. 2017. Serving Deep Learning Models in a Serverless Platform. 2018 IEEE International Conference on Cloud Engineering (IC2E) (2017), 257--262. https://api.semanticscholar.org/CorpusID:21724528
[49]
Jinwoo Jeong, Seungsu Baek, and Jeongseob Ahn. 2023. Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access. In Proceedings of the Eighteenth European Conference on Computer Systems (Rome, Italy) (EuroSys '23). Association for Computing Machinery, New York, NY, USA, 249--265. https://doi.org/10.1145/3552326.3567508
[50]
Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (Istanbul, Turkey) (VEE '15). Association for Computing Machinery, New York, NY, USA, 65--77. https://doi.org/10.1145/2731186.2731192
[51]
Tan N. Le, Xiao Sun, Mosharaf Chowdhury, and Zhenhua Liu. 2020. AlloX: compute allocation in hybrid clusters. In Proceedings of the Fifteenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys '20). Association for Computing Machinery, New York, NY, USA, Article 31, 16 pages. https://doi.org/10.1145/3342195.3387547
[52]
Jie Li, Laiping Zhao, Yanan Yang, Kunlin Zhan, and Keqiu Li. 2022. Tetris: Memory-efficient Serverless Inference through Tensor Sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA. https://www.usenix.org/conference/atc22/presentation/li-jie
[53]
Zijun Li, Linsong Guo, Quan Chen, Jiagan Cheng, Chuhao Xu, Deze Zeng, Zhuo Song, Tao Ma, Yong Yang, Chao Li, and Minyi Guo. 2022. Help Rather Than Recycle: Alleviating Cold Startup in Serverless Computing Through Inter-Function Container Sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 69--84. https://www.usenix.org/conference/atc22/presentation/li-zijun-help
[54]
Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA, 663--679. https://www.usenix.org/conference/osdi23/presentation/li-zhouhan
[55]
Zhen Lin, Kao-Feng Hsieh, Yu Sun, Seunghee Shin, and Hui Lu. 2021. FlashCube: Fast Provisioning of Serverless Functions with Streamlined Container Runtimes. In Proceedings of the 11th Workshop on Programming Languages and Operating Systems (Virtual Event, Germany) (PLOS '21). Association for Computing Machinery, New York, NY, USA, 38--45. https://doi.org/10.1145/3477113.3487273
[56]
Qingyuan Liu, Dong Du, Yubin Xia, Ping Zhang, and Haibo Chen. 2023. The Gap Between Serverless Research and Real-world Systems. In Proceedings of the 2023 ACM Symposium on Cloud Computing (Santa Cruz, CA, USA) (SoCC '23). Association for Computing Machinery, New York, NY, USA, 475--485. https://doi.org/10.1145/3620678.3624785
[57]
Qingyuan Liu, Yanning Yang, Dong Du, Yubin Xia, Ping Zhang, Jia Feng, James R. Larus, and Haibo Chen. 2024. Harmonizing Efficiency and Practicability: Optimizing Resource Utilization in Serverless Computing with Jiagu. In 2024 USENIX Annual Technical Conference (USENIX ATC 24). USENIX Association, Santa Clara, CA, 1--17. https://www.usenix.org/conference/atc24/presentation/liu-qingyuan
[58]
Zihan Liu, Jingwen Leng, Zhihui Zhang, Quan Chen, Chao Li, and Minyi Guo. 2022. VELTAIR: towards high-performance multi-tenant deep learning services via adaptive compilation and scheduling. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 388--401. https://doi.org/10.1145/3503222.3507752
[59]
Microsoft. 2023. Azure ML. https://learn.microsoft.com/en-us/azure/machine-learning. Accessed on 2024-06-22.
[60]
Diana M. Naranjo, Sebastián Risco, Carlos de Alfonso, Alfonso Pérez, Ignacio Blanquer, and Germán Moltó. 2020. Accelerated serverless computing based on GPU virtualization. J. Parallel Distributed Comput. 139 (2020), 32--42. https://api.semanticscholar.org/CorpusID:213589452
[61]
Edward Oakes, Leon Yang, Dennis Zhou, Kevin Houck, Tyler Harter, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. SOCK: Rapid Task Provisioning with Serverless-Optimized Containers. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 57--70.
[62]
Qiangyu Pei, Yongjie Yuan, Haichuan Hu, Qiong Chen, and Fangming Liu. 2023. AsyFunc: A High-Performance and Resource-Efficient Serverless Inference System via Asymmetric Functions. In Proceedings of the 2023 ACM Symposium on Cloud Computing (Santa Cruz, CA, USA) (SoCC '23). Association for Computing Machinery, New York, NY, USA, 324--340. https://doi.org/10.1145/3620678.3624664
[63]
Francisco Romero, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 397--411. https://www.usenix.org/conference/atc21/presentation/romero
[64]
Rohan Basu Roy, Tirthak Patel, and Devesh Tiwari. 2022. IceBreaker: warming serverless functions better with heterogeneity. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 753--767. https://doi.org/10.1145/3503222.3507750
[65]
AWS SageMaker. 2023. Machine Learning Service - Amazon SageMaker. https://aws.amazon.com/pm/sagemaker/.
[66]
Divyanshu Saxena, Tao Ji, Arjun Singhvi, Junaid Khalid, and Aditya Akella. 2022. Memory deduplication for serverless computing with Medes. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys '22). Association for Computing Machinery, New York, NY, USA, 714--729. https://doi.org/10.1145/3492321.3524272
[67]
Mohammad Shahrad, Rodrigo Fonseca, Íñigo Goiri, Gohar Chaudhry, Paul Batum, Jason Cooke, Eduardo Laureano, Colby Tresness, Mark Russinovich, and Ricardo Bianchini. 2020. Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 205--218.
[68]
Haichen Shen, Lequn Chen, Yuchen Jin, Liangyu Zhao, Bingyu Kong, Matthai Philipose, Arvind Krishnamurthy, and Ravi Sundaram. 2019. Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP '19). Association for Computing Machinery, New York, NY, USA, 322--337. https://doi.org/10.1145/3341301.3359658
[69]
Simon Shillaker and Peter Pietzuch. 2020. Faasm: lightweight isolation for efficient stateful serverless computing. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 419--433.
[70]
Paulo Silva, Daniel Fireman, and Thiago Emmanuel Pereira. 2020. Prebaking Functions to Warm the Serverless Cold Start. In Proceedings of the 21st International Middleware Conference (Delft, Netherlands) (Middleware '20). Association for Computing Machinery, New York, NY, USA, 1--13. https://doi.org/10.1145/3423211.3425682
[71]
Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, and Boris Grot. 2021. Benchmarking, Analysis, and Optimization of Serverless Function Snapshots. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Virtual, USA) (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA, 559--572. https://doi.org/10.1145/3445814.3446714
[72]
Kai-Ting Amy Wang, Rayson Ho, and Peng Wu. 2019. Replayable execution optimized for page sharing for a managed runtime environment. In Proceedings of the Fourteenth EuroSys Conference 2019.1-16.
[73]
Xingda Wei, Fangming Lu, Rong Chen, and Haibo Chen. 2022. KR-CORE: A Microsecond-scale RDMA Control Plane for Elastic Computing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 121--136. https://www.usenix.org/conference/atc22/presentation/wei
[74]
Xingda Wei, Fangming Lu, Tianxia Wang, Jinyu Gu, Yuhan Yang, Rong Chen, and Haibo Chen. 2023. No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23). USENIX Association, Boston, MA, 497--517. https://www.usenix.org/conference/osdi23/presentation/wei-rdma
[75]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics, Online, 38--45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
[76]
Yanan Yang, Laiping Zhao, Yiming Li, Huanyu Zhang, Jie Li, Mingyang Zhao, Xingzhen Chen, and Keqiu Li. 2022. INFless: a native serverless system for low-latency, high-throughput inference. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 768--781. https://doi.org/10.1145/3503222.3507709
[77]
Hanfei Yu, Rohan Basu Roy, Christian Fontenot, Devesh Tiwari, Jian Li, Hong Zhang, Hao Wang, and Seung-Jong Park. 2024. Rainbow-Cake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (La Jolla, CA, USA) (ASPLOS '24). Association for Computing Machinery, New York, NY, USA, 335--350. https://doi.org/10.1145/3617232.3624871
[78]
Minchen Yu, Zhifeng Jiang, Hok Chun Ng, Wei Wang, Ruichuan Chen, and Bo Li. 2021. Gillis: Serving Large Neural Networks in Server-less Functions with Automatic Model Partitioning. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). 138--148. https://doi.org/10.1109/ICDCS51616.2021.00022
[79]
Minchen Yu, Ao Wang, Dong Chen, Haoxuan Yu, Xiaonan Luo, Zhuohao Li, Wei Wang, Ruichuan Chen, Dapeng Nie, and Haoran Yang. 2024. FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping. arXiv:2306.03622 [cs.DC] https://arxiv.org/abs/2306.03622
[80]
Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: exploiting cloud services for cost-effective, SLO-aware machine learning inference serving. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (Renton, WA, USA) (USENIX ATC '19). USENIX Association, USA, 1049--1062.
[81]
Han Zhao, Weihao Cui, Quan Chen, Shulai Zhang, Zijun Li, Jingwen Leng, Chao Li, Deze Zeng, and Minyi Guo. 2024. Towards Fast Setup and High Throughput of GPU Serverless Computing. arXiv:2404.14691 [cs.DC] https://arxiv.org/abs/2404.14691

Index Terms

  1. On-demand and Parallel Checkpoint/Restore for GPU Applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud Computing
    November 2024
    1062 pages
    ISBN:9798400712869
    DOI:10.1145/3698038
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 November 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Checkpoint and Restore
    2. Cloud Computing
    3. GPUs
    4. Startup Latency

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    SoCC '24
    Sponsor:
    SoCC '24: ACM Symposium on Cloud Computing
    November 20 - 22, 2024
    WA, Redmond, USA

    Acceptance Rates

    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 433
      Total Downloads
    • Downloads (Last 12 months)433
    • Downloads (Last 6 weeks)94
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media