[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3542929.3563470acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Owl: performance-aware scheduling for resource-efficient function-as-a-service cloud

Published: 07 November 2022 Publication History

Abstract

This work documents our experience of improving the scheduler in Alibaba Function Compute, a public FaaS platform. It commences with our observation that memory and CPU are under-utilized in most FaaS sandboxes. A natural solution is to overcommit VM resources when allocating sandboxes, whereas the ensuing contention may cause performance degradation and compromise user experience. To complicate matters, the degradation in FaaS can arise from external factors, such as failed dependencies of user functions.
We design Owl to achieve both high utilization and performance stability. It introduces a customizable rule system for users to specify their toleration of degradation, and overcommits resources with a dual approach. (1) For less-invoked functions, it allocates resources to the sandboxes with usage-based heuristic, keeps monitoring their performance, and remedies any detected degradation. It differentiates whether a degraded sandbox is affected externally by separating a contention-free environment and migrating the affected sandbox into there as a comparison baseline. (2) For frequently-invoked functions, Owl profiles the interference patterns among collocated sandboxes and place the sandboxes under the guidance of profiles. The collocation profiling is designed to tackle the constraints that profiling has to be conducted in production. Owl further consolidates idle sandboxes to reduce resource waste. We prototype Owl in our production system and implement a representative benchmark suite to evaluate it. The results demonstrate that the prototype could reduce VM cost by 43.80% and effectively mitigate latency degradation, with negligible overhead incurred.

References

[1]
2022. Alibaba Cloud Function Compute. https://www.alibabacloud.com/product/function-compute.
[2]
2022. AWS Lambda. https://aws.amazon.com/lambda/.
[3]
2022. Azure Functions. https://azure.microsoft.com/en-us/services/functions/.
[4]
2022. Google Cloud Functions. https://cloud.google.com/functions.
[5]
Nabeel Akhtar, Ali Raza, Vatche Ishakian, and Ibrahim Matta. 2020. COSE: configuring serverless functions using statistical learning. In IEEE Conference on Computer Communications (INFOCOM).
[6]
Anonymized. 2021. Private Communication. Interview.
[7]
Noman Bashir, Nan Deng, Krzysztof Rzadca David Irwin, Sree Kodak, and Rohit Jnagal. 2021. Take it to the limit: peak prediction-driven resource overcommitment in datacenters. In EuroSys.
[8]
Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy. 2016. Site reliability engineering: How Google runs production systems. "O'Reilly Media, Inc.".
[9]
Vivek M Bhasi, Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra, Mahmut Taylan Kandemir, and Chita Das. 2021. Kraken: Adaptive container provisioning for deploying dynamic dags in serverless platforms. In SoCC.
[10]
Laurent Bindschaedler, Jasmina Malicevic, Nicolas Schiper, Ashvin Goel, and Willy Zwaenepoel. 2018. Rock you like a hurricane: taming skew in large scale analytics. In EuroSys.
[11]
James Cadden, Thomas Unger, Yara Awad, Han Dong, Orran Krieger, and Jonathan Appavoo. 2020. SEUSS: skip redundant paths to make serverless fast. In EuroSys.
[12]
Mosharaf Chowdhury, Zhenhua Liu, Ali Ghodsi, and Ion Stoica. 2016. HUG: Multi-Resource Fairness for Correlated and Elastic Demands. In NSDI.
[13]
Alibaba Function Compute. 2022. Manage functions. https://www.alibabacloud.com/help/en/function-compute/latest/manage-functions.
[14]
Alibaba Function Compute. 2022. Metrics (See "CPU utilization - FunctionCPUQuotaPercent"). https://www.alibabacloud.com/help/en/function-compute/latest/metrics.
[15]
Nilanjan Daw, Umesh Bellur, and Purushottam Kulkarni. 2020. Xanadu: Mitigating cascading cold starts in serverless function chain deployments. In Middleware.
[16]
Christina Delimitrou and Christoforos E. Kozyrakis. 2014. Quasar: resource-efficient and QoS-aware cluster management. ASPLOS.
[17]
Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In SoCC.
[18]
Simon Eismann, Long Bui, Johannes Grohmann, Cristina Abad, Nikolas Herbst, and Samuel Kounev. 2021. Sizeless: Predicting the optimal size of serverless functions. In Middleware.
[19]
Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. 2011. Dominant resource fairness: Fair allocation of multiple resource types. In NSDI.
[20]
Google. 2018. gvisor: Container runtime sandbox. https://github.com/google/gvisor.
[21]
Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2014. Multi-resource packing for cluster schedulers. SIGCOMM.
[22]
Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic Scheduling in Multi-Resource Clusters. In OSDI.
[23]
Qi Huang, Ken Birman, Robbert van Renesse, Wyatt Lloyd, Sanjeev Kumar, and Harry C. Li. 2013. An analysis of Facebook photo caching. In SOSP.
[24]
Tatiana Jin, Zhenkun Cai, Boyang Li, Chengguang Zheng, Guanxian Jiang, and James Cheng. 2020. Improving resource utilization by timely fine-grained scheduling. In EuroSys.
[25]
Kostis Kaffes, Neeraja J Yadwadkar, and Christos Kozyrakis. 2022. Principled and Practical Scheduling for Real-World Serverless Computing. SoCC (2022).
[26]
Jeongchul Kim and Kyungyong Lee. 2019. FunctionBench: A Suite of Workloads for Serverless Cloud Function Service. In CLOUD.
[27]
Ashraf Mahgoub, Karthick Shankar, Subrata Mitra, Ana Klimovic, Somali Chaterji, and Saurabh Bagchi. 2021. SONIC: Application-aware Data Passing for Chained Serverless Applications. In ATC.
[28]
Ashraf Mahgoub, Edgardo Barsallo Yi, Karthick Shankar, Sameh Elnikety, Somali Chaterji, and Saurabh Bagchi. 2022. ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs. In OSDI.
[29]
Djob Mvondo, Mathieu Bacou, Kevin Nguetchouang, Lucien Ngale, Stéphane Pouget, Josiane Kouam, Renaud Lachaize, Jinho Hwang, Tim Wood, Daniel Hagimont, et al. 2021. OFC: an opportunistic caching system for FaaS platforms. In EuroSys.
[30]
Mohammad Shahrad, Rodrigo Fonseca, Inigo Goiri, et al. 2020. Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider. In ATC.
[31]
Arjun Singhvi, Arjun Balasubramanian, Kevin Houck, Mohammed Danish Shaikh, Shivaram Venkataraman, and Aditya Akella. 2021. Atoll: A scalable low-latency serverless platform. In SoCC.
[32]
Amoghvarsha Suresh and Anshul Gandhi. 2019. FnSched: An Efficient Scheduler for Serverless Functions. In Proceedings of the 5th International Workshop on Serverless Computing.
[33]
A. Suresh, G. Somashekar, A. Varadarajan, V. R. Kakarla, H. Upadhyay, and A. Gandhi. 2020. ENSURE: Efficient Scheduling and Autonomous Resource Management in Serverless Environments. In 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS).
[34]
Ali Tariq, Austin Pahl, Sharat Nimmagadda, Eric Rozner, and Siddharth Lanka. 2020. Sequoia: Enabling quality-of-service in serverless computing. In SoCC.
[35]
Huangshi Tian, Suyi Li, Ao Wang, Wei Wang, Tianlong Wu, and Haoran Yang. 2022. Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud. https://www.cse.ust.hk/~weiwa/papers/owl-techreport.pdf.
[36]
Ao Wang, Shuai Chang, Huangshi Tian, Hongqi Wang, Haoran Yang, Huiba Li, Rui Du, and Yue Cheng. 2021. FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute. In ATC.
[37]
Liang Wang, Mengyuan Li, Yinqian Zhang, Thomas Ristenpart, and Michael Swift. 2018. Peeking Behind the Curtains of Serverless Platforms. In ATC.
[38]
AWS Whitepaper. 2017. Choosing the Optimal Memory Size. https://amzn.to/3k9bVMj.
[39]
Minchen Yu, Tingjia Cao, Wei Wang, and Ruichuan Chen. 2023. Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing. In NSDI.
[40]
Minchen Yu, Yinghao Yu, Yunchuan Zheng, Baichen Yang, and Wei Wang. 2020. RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data. In IEEE INFOCOM.
[41]
Tianyi Yu, Qingyuan Liu, Dong Du, Yubin Xia, et al. 2020. Characterizing Serverless Platforms with Serverlessbench. In SoCC.
[42]
Yinghao Yu, Renfei Huang, Wei Wang, Jun Zhang, and Khaled B. Letaief. 2018. SP-Cache: Load-balanced, Redundancy-free Cluster Caching with Selective Partition. In IEEE/ACM SC.
[43]
Hong Zhang, Yupeng Tang, Anurag Khandelwal, Jingrong Chen, and Ion Stoica. 2021. Caerus: NIMBLE Task Scheduling for Serverless Analytics. In NSDI.

Cited By

View all
  • (2024)Harmonizing efficiency and practicabilityProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3691993(1-17)Online publication date: 10-Jul-2024
  • (2024)Is It Time To Put Cold Starts In The Deep Freeze?Proceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698527(259-268)Online publication date: 20-Nov-2024
  • (2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. Owl: performance-aware scheduling for resource-efficient function-as-a-service cloud

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '22: Proceedings of the 13th Symposium on Cloud Computing
    November 2022
    574 pages
    ISBN:9781450394147
    DOI:10.1145/3542929
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. overcommitment
    2. resource-management
    3. scheduling
    4. serverless

    Qualifiers

    • Research-article

    Funding Sources

    • University Grants Committee of Hong Kong

    Conference

    SoCC '22
    Sponsor:
    SoCC '22: ACM Symposium on Cloud Computing
    November 7 - 11, 2022
    California, San Francisco

    Acceptance Rates

    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)225
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Harmonizing efficiency and practicabilityProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3691993(1-17)Online publication date: 10-Jul-2024
    • (2024)Is It Time To Put Cold Starts In The Deep Freeze?Proceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698527(259-268)Online publication date: 20-Nov-2024
    • (2024)FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless WorkflowsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695477(957-969)Online publication date: 27-Oct-2024
    • (2024)In Serverless, OS Scheduler Choice Costs Money: A Hybrid Scheduling Approach for Cheaper FaaSProceedings of the 25th International Middleware Conference10.1145/3652892.3700757(172-184)Online publication date: 2-Dec-2024
    • (2024)Do Predictors for Resource Overcommitment Even Predict?Proceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655838(153-160)Online publication date: 22-Apr-2024
    • (2024)ComboFunc: Joint Resource Combination and Container Placement for Serverless Function Scaling with Heterogeneous ContainerIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.3454071(1-17)Online publication date: 2024
    • (2024)Graft: Efficient Inference Serving for Hybrid Deep Learning With SLO Guarantees via DNN Re-AlignmentIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334051835:2(280-296)Online publication date: 1-Feb-2024
    • (2024)XFBench: A Cross-Cloud Benchmark Suite for Evaluating FaaS Workflow Platforms2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00067(543-556)Online publication date: 6-May-2024
    • (2024)Self-Provisioning Infrastructures for the Next Generation Serverless ComputingSN Computer Science10.1007/s42979-024-03022-w5:6Online publication date: 26-Jun-2024
    • (2023)Design Requirements for Smart Vehicles Efficient Carbon Emission Management FrameworkSSRN Electronic Journal10.2139/ssrn.4535483Online publication date: 2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media