Daily Papers

The project automatically fetches the latest papers from arXiv based on keywords.

The subheadings in the README file represent the search keywords.

Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.

You can click the 'Watch' button to receive daily email notifications.

Last update: 2025-05-31

Confidential Computing

Title	Date	Abstract	Comment
Performance of Confidential Computing GPUs	2025-05-22	Show This work examines latency, throughput, and other metrics when performing inference on confidential GPUs. We explore different traffic patterns and scheduling strategies using a single Virtual Machine with one NVIDIA H100 GPU, to perform relaxed batch inferences on multiple Large Language Models (LLMs), operating under the constraint of swapping models in and out of memory, which necessitates efficient control. The experiments simulate diverse real-world scenarios by varying parameters such as traffic load, traffic distribution patterns, scheduling strategies, and Service Level Agreement (SLA) requirements. The findings provide insights into the differences between confidential and non-confidential settings when performing inference in scenarios requiring active model swapping. Results indicate that in No-CC mode, relaxed batch inference with model swapping latency is 20-30% lower than in confidential mode. Additionally, SLA attainment is 15-20% higher in No-CC settings. Throughput in No-CC scenarios surpasses that of confidential mode by 45-70%, and GPU utilization is approximately 50% higher in No-CC environments. Overall, performance in the confidential setting is inferior to that in the No-CC scenario, primarily due to the additional encryption and decryption overhead required for loading models onto the GPU in confidential environments.	6 pag... 6 pages, 7 tables. Accepted in conference IEEE ICDCS 2025
ACE: Confidential Computing for Embedded RISC-V Systems	2025-05-19	Show Confidential computing plays an important role in isolating sensitive applications from the vast amount of untrusted code commonly found in the modern cloud. We argue that it can also be leveraged to build safer and more secure mission-critical embedded systems. In this paper, we introduce the Assured Confidential Execution (ACE), an open-source and royalty-free confidential computing technology targeted for embedded RISC-V systems. We present a set of principles and a methodology that we used to build \ACE and that might be applied for developing other embedded systems that require formal verification. An evaluation of our prototype on the first available RISC-V hardware supporting virtualization indicates that ACE is a viable candidate for our target systems.
A Survey on Privacy Risks and Protection in Large Language Models	2025-05-04	Show Although Large Language Models (LLMs) have become increasingly integral to diverse applications, their capabilities raise significant privacy concerns. This survey offers a comprehensive overview of privacy risks associated with LLMs and examines current solutions to mitigate these challenges. First, we analyze privacy leakage and attacks in LLMs, focusing on how these models unintentionally expose sensitive information through techniques such as model inversion, training data extraction, and membership inference. We investigate the mechanisms of privacy leakage, including the unauthorized extraction of training data and the potential exploitation of these vulnerabilities by malicious actors. Next, we review existing privacy protection against such risks, such as inference detection, federated learning, backdoor mitigation, and confidential computing, and assess their effectiveness in preventing privacy leakage. Furthermore, we highlight key practical challenges and propose future research directions to develop secure and privacy-preserving LLMs, emphasizing privacy risk assessment, secure knowledge transfer between models, and interdisciplinary frameworks for privacy governance. Ultimately, this survey aims to establish a roadmap for addressing escalating privacy challenges in the LLMs domain.
Confidential Serverless Computing	2025-05-01	Show Although serverless computing offers compelling cost and deployment simplicity advantages, a significant challenge remains in securely managing sensitive data as it flows through the network of ephemeral function executions in serverless computing environments within untrusted clouds. While Confidential Virtual Machines (CVMs) offer a promising secure execution environment, their integration with serverless architectures currently faces fundamental limitations in key areas: security, performance, and resource efficiency. We present Hacher, a confidential computing system for secure serverless deployments to overcome these limitations. By employing nested confidential execution and a decoupled guest OS within CVMs, Hacher runs each function in a minimal "trustlet", significantly improving security through a reduced Trusted Computing Base (TCB). Furthermore, by leveraging a data-centric I/O architecture built upon a lightweight LibOS, Hacher optimizes network communication to address performance and resource efficiency challenges. Our evaluation shows that compared to CVM-based deployments, Hacher has 4.3x smaller TCB, improves end-to-end latency (15-93%), achieves higher function density (up to 907x), and reduces inter-function communication (up to 27x) and function chaining latency (16.7-30.2x); thus, Hacher offers a practical system for confidential serverless computing.
Trusted Compute Units: A Framework for Chained Verifiable Computations	2025-04-28	Show Blockchain and distributed ledger technologies (DLTs) facilitate decentralized computations across trust boundaries. However, ensuring complex computations with low gas fees and confidentiality remains challenging. Recent advances in Confidential Computing -- leveraging hardware-based Trusted Execution Environments (TEEs) -- and Proof-carrying Data -- employing cryptographic Zero-Knowledge Virtual Machines (zkVMs) -- hold promise for secure, privacy-preserving off-chain and layer-2 computations. On the other side, a homogeneous reliance on a single technology, such as TEEs or zkVMs, is impractical for decentralized environments with heterogeneous computational requirements. This paper introduces the Trusted Compute Unit (TCU), a unifying framework that enables composable and interoperable verifiable computations across heterogeneous technologies. Our approach allows decentralized applications (dApps) to flexibly offload complex computations to TCUs, obtaining proof of correctness. These proofs can be anchored on-chain for automated dApp interactions, while ensuring confidentiality of input data, and integrity of output data. We demonstrate how TCUs can support a prominent blockchain use case, such as federated learning. By enabling secure off-chain interactions without incurring on-chain confirmation delays or gas fees, TCUs significantly improve system performance and scalability. Experimental insights and performance evaluations confirm the feasibility and practicality of this unified approach, advancing the state of the art in verifiable off-chain services for the blockchain ecosystem.	To be... To be published in 2025 IEEE International Conference on Blockchain and Cryptocurrency (ICBC'25). 9 pages. 4 figures
An Early Experience with Confidential Computing Architecture for On-Device Model Protection	2025-04-11	Show Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for protecting proprietary models, yet existing TEE solutions have architectural constraints that hinder on-device model deployment. Arm Confidential Computing Architecture (CCA), a new Arm extension, addresses several of these limitations and shows promise as a secure platform for on-device ML. In this paper, we evaluate the performance-privacy trade-offs of deploying models within CCA, highlighting its potential to enable confidential and efficient ML applications. Our evaluations show that CCA can achieve an overhead of, at most, 22% in running models of different sizes and applications, including image classification, voice recognition, and chat assistants. This performance overhead comes with privacy benefits; for example, our framework can successfully protect the model against membership inference attack by an 8.3% reduction in the adversary's success rate. To support further research and early adoption, we make our code and methodology publicly available.	Accep... Accepted to the 8th Workshop on System Software for Trusted Execution (SysTEX 2025)
Characterization of GPU TEE Overheads in Distributed Data Parallel ML Training	2025-03-27	Show Confidential computing (CC) or trusted execution enclaves (TEEs) is now the most common approach to enable secure computing in the cloud. The recent introduction of GPU TEEs by NVIDIA enables machine learning (ML) models to be trained without leaking model weights or data to the cloud provider. However, the potential performance implications of using GPU TEEs for ML training are not well characterized. In this work, we present an in-depth characterization study on performance overhead associated with running distributed data parallel (DDP) ML training with GPU Trusted Execution Environments (TEE). Our study reveals the performance challenges in DDP training within GPU TEEs. DDP uses ring-all-reduce, a well-known approach, to aggregate gradients from multiple devices. Ring all-reduce consists of multiple scatter-reduce and all-gather operations. In GPU TEEs only the GPU package (GPU and HBM memory) is trusted. Hence, any data communicated outside the GPU packages must be encrypted and authenticated for confidentiality and integrity verification. Hence, each phase of the ring-all-reduce requires encryption and message authentication code (MAC) generation from the sender, and decryption and MAC aut 3D11 hentication on the receiver. As the number of GPUs participating in DDP increases, the overhead of secure inter-GPU communication during ring-all-reduce grows proportionally. Additionally, larger models lead to more asynchronous all-reduce operations, exacerbating the communication cost. Our results show that with four GPU TEEs, depending on the model that is being trained, the runtime per training iteration increases by an average of 8x and up to a maximum of 41.6x compared to DDP training without TEE.
Transparent Attested DNS for Confidential Computing Services	2025-03-18	Show Confidential services running in hardware-protected Trusted Execution Environments (TEEs) can provide higher security assurance, but this requires custom clients and protocols to distribute, update, and verify their attestation evidence. Compared with classic Internet security, built upon universal abstractions such as domain names, origins, and certificates, this puts a significant burden on service users and providers. In particular, Web browsers and other legacy clients do not get the same security guaranties as custom clients. We present a new approach for users to establish trust in confidential services. We propose attested DNS (aDNS): a name service that securely binds the attested implementation of confidential services to their domain names. ADNS enforces policies for all names in its zone of authority: any TEE that runs a service must present hardware attestation that complies with the domain-specific policy before registering keys and obtaining certificates for any name in this domain. ADNS provides protocols for zone delegation, TEE registration, and certificate issuance. ADNS builds on standards such as DNSSEC, DANE, ACME and Certificate Transparency. ADNS provides DNS transparency by keeping all records, policies, and attestations in a public append-only log, thereby enabling auditing and preventing targeted attacks. We implement aDNS as a confidential service using a fault-tolerant network of TEEs. We evaluate it using sample confidential services that illustrate various TEE platforms. On the client side, we provide a generic browser extension that queries and verifies attestation records before opening TLS connections, with negligible performance overhead, and we show that, with aDNS, even legacy Web clients benefit from confidential computing as long as some enlightened clients verify attestations to deter or blame malicious actors.
TFHE-Coder: Evaluating LLM-agentic Fully Homomorphic Encryption Code Generation	2025-03-15	Show Fully Homomorphic Encryption over the torus (TFHE) enables computation on encrypted data without decryption, making it a cornerstone of secure and confidential computing. Despite its potential in privacy preserving machine learning, secure multi party computation, private blockchain transactions, and secure medical diagnostics, its adoption remains limited due to cryptographic complexity and usability challenges. While various TFHE libraries and compilers exist, practical code generation remains a hurdle. We propose a compiler integrated framework to evaluate LLM inference and agentic optimization for TFHE code generation, focusing on logic gates and ReLU activation. Our methodology assesses error rates, compilability, and structural similarity across open and closedsource LLMs. Results highlight significant limitations in off-the-shelf models, while agentic optimizations such as retrieval augmented generation (RAG) and few-shot prompting reduce errors and enhance code fidelity. This work establishes the first benchmark for TFHE code generation, demonstrating how LLMs, when augmented with domain-specific feedback, can bridge the expertise gap in FHE code generation.	8 pages, 7 figures
SoK: A cloudy view on trust relationships of CVMs -- How Confidential Virtual Machines are falling short in Public Cloud	2025-03-11	Show Confidential computing in the public cloud intends to safeguard workload privacy while outsourcing infrastructure management to a cloud provider. This is achieved by executing customer workloads within so called Trusted Execution Environments (TEEs), such as Confidential Virtual Machines (CVMs), which protect them from unauthorized access by cloud administrators and privileged system software. At the core of confidential computing lies remote attestation -- a mechanism that enables workload owners to verify the initial state of their workload and furthermore authenticate the underlying hardware. hile this represents a significant advancement in cloud security, this SoK critically examines the confidential computing offerings of market-leading cloud providers to assess whether they genuinely adhere to its core principles. We develop a taxonomy based on carefully selected criteria to systematically evaluate these offerings, enabling us to analyse the components responsible for remote attestation, the evidence provided at each stage, the extent of cloud provider influence and whether this undermines the threat model of confidential computing. Specifically, we investigate how CVMs are deployed in the public cloud infrastructures, the extent to which customers can request and verify attestation evidence, and their ability to define and enforce configuration and attestation requirements. This analysis provides insight into whether confidential computing guarantees -- namely confidentiality and integrity -- are genuinely upheld. Our findings reveal that all major cloud providers retain control over critical parts of the trusted software stack and, in some cases, intervene in the standard remote attestation process. This directly contradicts their claims of delivering confidential computing, as the model fundamentally excludes the cloud provider from the set of trusted entities.
Training Differentially Private Models with Secure Multiparty Computation	2025-03-11	Show We address the problem of learning a machine learning model from training data that originates at multiple data owners while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but leak information when the trained model is made publicly available. We propose an MPC solution for training DP models. Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise in a privacy-preserving manner. The resulting MPC+DP approach achieves higher accuracy than a pure DP approach while providing the same formal privacy guarantees. Our work obtained first place in the iDASH2021 Track III competition on confidential computing for secure genome analysis.
Locking Machine Learning Models into Hardware	2025-03-08	Show Modern machine learning (ML) models are expensive IP and business competitiveness often depends on keeping this IP confidential. This in turn restricts how these models are deployed; for example, it is unclear how to deploy a model on-device without inevitably leaking the underlying model. At the same time, confidential computing technologies such as multi-party computation or homomorphic encryption remain impractical for wide adoption. In this paper, we take a different approach and investigate the feasibility of ML-specific mechanisms that deter unauthorized model use by restricting the model to only be usable on specific hardware, making adoption on unauthorized hardware inconvenient. That way, even if IP is compromised, it cannot be trivially used without specialised hardware or major model adjustment. In a sense, we seek to enable cheap \emph{locking of machine learning models into specific hardware}. We demonstrate that \emph{locking} mechanisms are feasible by either targeting efficiency of model representations, making such models incompatible with quantization, or tying the model's operation to specific characteristics of hardware, such as the number of clock cycles for arithmetic operations. We demonstrate that locking comes with negligible overheads, while significantly restricting usability of the resultant model on unauthorized hardware.	10 pa... 10 pages, 6 figures of main text; 9 pages, 12 figures of appendices
Laminator: Verifiable ML Property Cards using Hardware-assisted Attestations	2025-03-05	Show Regulations increasingly call for various assurances from machine learning (ML) model providers about their training data, training process, and model behavior. For better transparency, industry (e.g., Huggingface and Google) has adopted model cards and datasheets to describe various properties of training datasets and models. In the same vein, we introduce the notion of inference cards to describe the properties of a given inference (e.g., binding of the output to the model and its corresponding input). We coin the term ML property cards to collectively refer to these various types of cards. To prevent a malicious model provider from including false information in ML property cards, they need to be verifiable. We show how to construct verifiable ML property cards using property attestation, technical mechanisms by which a prover (e.g., a model provider) can attest to various ML properties to a verifier (e.g., an auditor). Since prior attestation mechanisms based purely on cryptography are often narrowly focused (lacking versatility) and inefficient, we need an efficient mechanism to attest different types of properties across the entire ML model pipeline. Emerging widespread support for confidential computing has made it possible to run and even train models inside hardware-assisted trusted execution environments (TEEs), which provide highly efficient attestation mechanisms. We propose Laminator, which uses TEEs to provide the first framework for verifiable ML property cards via hardware-assisted ML property attestations. Laminator is efficient in terms of overhead, scalable to large numbers of verifiers, and versatile with respect to the properties it can prove during training or inference.	ACM C... ACM Conference on Data and Application Security and Privacy (CODASPY), 2025
Confidential Prompting: Protecting User Prompts from Cloud LLM Providers	2025-03-04	Show Our work tackles the challenge of securing user inputs in cloud-hosted large language model (LLM) serving while ensuring model confidentiality, output invariance, and compute efficiency. We introduce Secure Partitioned Decoding (SPD), which uses confidential computing to confine user prompts to a trusted execution environment (TEE), namely a confidential virtual machine (CVM), while allowing service providers to generate tokens efficiently. We also introduce a novel cryptographic method, Prompt Obfuscation (PO), to ensure robustness against reconstruction attacks on SPD. We demonstrate our approach preserves both prompt confidentiality and LLM serving efficiency. Our solution enables privacy-preserving cloud LLM serving that handles sensitive prompts, such as clinical records, financial data, and personal information.
Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment	2025-02-17	Show The increasing adoption of Large Language Models (LLMs) in cloud environments raises critical security concerns, particularly regarding model confidentiality and data privacy. Confidential computing, enabled by Trusted Execution Environments (TEEs), offers a promising solution to mitigate these risks. However, existing TEE implementations, primarily CPU-based, struggle to efficiently support the resource-intensive nature of LLM inference and training. In this work, we present the first evaluation of the DeepSeek model within a TEE-enabled confidential computing environment, specifically utilizing Intel Trust Domain Extensions (TDX). Our study benchmarks DeepSeek's performance across CPU-only, CPU-GPU hybrid, and TEE-based implementations. For smaller parameter sets, such as DeepSeek-R1-1.5B, the TDX implementation outperforms the CPU version in executing computations within a secure environment. It highlights the potential for efficiently deploying LLM models on resource-constrained systems while ensuring security. The overall GPU-to-CPU performance ratio averages 12 across different model sizes, with smaller models exhibiting a lower ratio. Additionally, we provide foundational insights and guidance on optimizing CPU-GPU confidential computing solutions for scalable and secure AI deployments. Our findings contribute to the advancement of privacy-preserving AI, paving the way for efficient and secure LLM inference in confidential computing environments.
Automatic ISA analysis for Secure Context Switching	2025-02-10	Show Instruction set architectures are complex, with hundreds of registers and instructions that can modify dozens of them during execution, variably on each instance. Prose-style ISA specifications struggle to capture these intricacies of the ISAs, where often the important details about a single register are spread out across hundreds of pages of documentation. Ensuring that all ISA-state is swapped in context switch implementations of privileged software requires meticulous examination of these pages. This manual process is tedious and error-prone. We propose a tool called Sailor that leverages machine-readable ISA specifications written in Sail to automate this task. Sailor determines the ISA-state necessary to swap during the context switch using the data collected from Sail and a novel algorithm to classify ISA-state as security-sensitive. Using Sailor's output, we identify three different classes of mishandled ISA-state across four open-source confidential computing systems. We further reveal five distinct security vulnerabilities t F438 hat can be exploited using the mishandled ISA-state. This research exposes an often overlooked attack surface that stems from mishandled ISA-state, enabling unprivileged adversaries to exploit system vulnerabilities.	15 pa... 15 pages, 6 figures, 2 tables, 4 listings
Individual Confidential Computing of Polynomials over Non-Uniform Information	2025-01-26	Show In this paper, we address the problem of secure distributed computation in scenarios where user data is not uniformly distributed, extending existing frameworks that assume uniformity, an assumption that is challenging to enforce in data for computation. Motivated by the pervasive reliance on single service providers for data storage and computation, we propose a privacy-preserving scheme that achieves information-theoretic security guarantees for computing polynomials over non-uniform data distributions. Our framework builds upon the concept of perfect subset privacy and employs linear hashing techniques to transform non-uniform data into approximately uniform distributions, enabling robust and secure computation. We derive leakage bounds and demonstrate that information leakage of any subset of user data to untrusted service providers, i.e., not only to colluding workers but also (and more importantly) to the admin, remains negligible under the proposed scheme.	Parts... Parts of this work were submitted to ISIT 2025
A performance analysis of VM-based Trusted Execution Environments for Confidential Federated Learning	2025-01-20	Show Federated Learning (FL) is a distributed machine learning approach that has emerged as an effective way to address recent privacy concerns. However, FL introduces the need for additional security measures as FL alone is still subject to vulnerabilities such as model and data poisoning and inference attacks. Confidential Computing (CC) is a paradigm that, by leveraging hardware-based trusted execution environments (TEEs), protects the confidentiality and integrity of ML models and data, thus resulting in a powerful ally of FL applications. Typical TEEs offer an application-isolation level but suffer many drawbacks, such as limited available memory and debugging and coding difficulties. The new generation of TEEs offers a virtual machine (VM)-based isolation level, thus reducing the porting effort for existing applications. In this work, we compare the performance of VM-based and application-isolation level TEEs for confidential FL (CFL) applications. In particular, we evaluate the impact of TEEs and additional security mechanisms such as TLS (for securing the communication channel). The results, obtained across three datasets and two deep learning models, demonstrate that the new VM-based TEEs introduce a limited overhead (at most 1.5x), thus paving the way to leverage public and untrusted computing environments, such as HPC facilities or public cloud, without detriment to performance.
ObfuscaTune: Obfuscated Offsite Fine-tuning and Inference of Proprietary LLMs on Private Datasets	2025-01-12	Show This work addresses the timely yet underexplored problem of performing inference and finetuning of a proprietary LLM owned by a model provider entity on the confidential/private data of another data owner entity, in a way that ensures the confidentiality of both the model and the data. Hereby, the finetuning is conducted offsite, i.e., on the computation infrastructure of a third-party cloud provider. We tackle this problem by proposing ObfuscaTune, a novel, efficient and fully utility-preserving approach that combines a simple yet effective obfuscation technique with an efficient usage of confidential computing (only 5% of the model parameters are placed on TEE). We empirically demonstrate the effectiveness of ObfuscaTune by validating it on GPT-2 models with different sizes on four NLP benchmark datasets. Finally, we compare to a na"ive version of our approach to highlight the necessity of using random matrices with low condition numbers in our approach to reduce errors induced by the obfuscation.	Accep... Accepted at AAAI 2025 (PPAI Workshop)
Formal Security Analysis of the AMD SEV-SNP Software Interface	2025-01-10	Show AMD Secure Encrypted Virtualization technologies enable confidential computing by protecting virtual machines from highly privileged software such as hypervisors. In this work, we develop the first, comprehensive symbolic model of the software interface of the latest SEV iteration called SEV Secure Nested Paging (SEV-SNP). Our model covers remote attestation, key derivation, page swap and live migration. We analyze the security of the software interface of SEV-SNP and formally prove that most critical secrecy, authentication, attestation and freshness properties do indeed hold in the model. Furthermore, we find that the platform-agnostic nature of messages exchanged between SNP guests and the AMD Secure Processor firmware presents a potential weakness in the design. We show how this weakness leads to formal attacks on multiple security properties, including the partial compromise of attestation report integrity, and discuss possible impacts and mitigations.	This ... This work has been submitted to the IEEE for possible publication
Splicer$^{+}$: Secure Hub Placement and Deadlock-Free Routing for Payment Channel Network Scalability	2025-01-08	Show Payment channel hub (PCH) is a promising approach for payment channel networks (PCNs) to improve efficiency by deploying robust hubs to steadily process off-chain transactions. However, existing PCHs, often preplaced without considering payment request distribution across PCNs, can lead to load imbalance. PCNs' reliance on source routing, which makes decisions based solely on individual sender requests, can degrade performance by overlooking other requests, thus further impairing scalability. In this paper, we introduce Splicer$^{+}$, a highly scalable multi-PCH solution based on the trusted execution environment (TEE). We study tradeoffs in communication overhead between participants, transform the original NP-hard PCH placement problem by mixed-integer linear programming, and propose optimal/approximate solutions with load balancing for different PCN scales using supermodular techniques. Considering global PCN states and local directly connected sender requests, we design a deadlock-free routing protocol for PCHs. It dynamically adjusts the payment processing rate across multiple channels and, combined with TEE, ensures high-performance routing with confidential computation. We provide a formal security proof for the Splicer$^{+}$ protocol in the UC-framework. Extensive evaluations demonstrate the effectiveness of Splicer$^{+}$, with transaction success ratio ($\uparrow$51.1%), throughput ($\uparrow$181.5%), and latency outperforming state-of-the-art PCNs.	Exten... Extended version of ICDCS 2023 (arXiv:2305.19182)
C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System	2024-12-18	Show Organizations seeking to utilize Large Language Models (LLMs) for knowledge querying and analysis often encounter challenges in maintaining an LLM fine-tuned on targeted, up-to-date information that keeps answers relevant and grounded. Retrieval Augmented Generation (RAG) has quickly become a feasible solution for organizations looking to overcome the challenges of maintaining proprietary models and to help reduce LLM hallucinations in their query responses. However, RAG comes with its own issues regarding scaling data pipelines across tiered-access and disparate data sources. In many scenarios, it is necessary to query beyond a single data silo to provide richer and more relevant context for an LLM. Analyzing data sources within and across organizational trust boundaries is often limited by complex data-sharing policies that prohibit centralized data storage, therefore, inhibit the fast and effective setup and scaling of RAG solutions. In this paper, we introduce Confidential Computing (CC) techniques as a solution for secure Federated Retrieval Augmented Generation (FedRAG). Our proposed Confidential FedRAG system (C-FedRAG) enables secure connection and scaling of a RAG workflows across a decentralized network of data providers by ensuring context confidentiality. We also demonstrate how to implement a C-FedRAG system using the NVIDIA FLARE SDK and assess its performance using the MedRAG toolkit and MIRAGE benchmarking dataset.
CCxTrust: Confidential Computing Platform Based on TEE and TPM Collaborative Trust	2024-12-12	Show Confidential Computing has emerged to address data security challenges in cloud-centric deployments by protecting data in use through hardware-level isolation. However, reliance on a single hardware root of trust (RoT) limits user confidence in cloud platforms, especially for high-performance AI services, where end-to-end protection of sensitive models and data is critical. Furthermore, the lack of interoperability and a unified trust model in multi-cloud environments prevents the establishment of a cross-platform, cross-cloud chain of trust, creating a significant trust gap for users with high privacy requirements. To address the challenges mentioned above, this paper proposes CCxTrust (Confidential Computing with Trust), a confidential computing platform leveraging collaborative roots of trust from TEE and TPM. CCxTrust combines the black-box RoT embedded in the CPU-TEE with the flexible white-box RoT of TPM to establish a collaborative trust framework. The platform implements independent Roots of Trust for Measurement (RTM) for TEE and TPM, and a collaborative Root of Trust for Report (RTR) for composite attestation. The Root of Trust for Storage (RTS) is solely supported by TPM. We also present the design and implementation of a confidential TPM supporting multiple modes for secure use within confidential virtual machines. Additionally, we propose a composite attestation protocol integrating TEE and TPM to enhance security and attestation efficiency, which is proven secure under the PCL protocol security model. We implemented a prototype of CCxTrust on a confidential computing server with AMD SEV-SNP and TPM chips, requiring minimal modifications to the TPM and guest Linux kernel. The composite attestation efficiency improved by 24% without significant overhead, while Confidential TPM performance showed a 16.47% reduction compared to standard TPM.	23 pages, 14 figures
A Confidential Computing Transparency Framework for a Comprehensive Trust Chain	2024-12-05	Show Confidential Computing enhances privacy of data in-use through hardware-based Trusted Execution Environments (TEEs) that use attestation to verify their integrity, authenticity, and certain runtime properties, along with those of the binaries they execute. However, TEEs require user trust, as attestation alone cannot guarantee the absence of vulnerabilities or backdoors. Enhanced transparency can mitigate the reliance on naive trust. Some organisations currently employ various transparency measures, including open-source firmware, publishing technical documentation, or undergoing external audits, but these require investments with unclear returns. This may discourage the adoption of transparency, leaving users with limited visibility into system privacy measures. Additionally, the lack of standardisation complicates meaningful comparisons between implementations. To address these challenges, we propose a three-level conceptual framework providing organisations with a practical pathway to incrementally improve Confidential Computing transparency. To evaluate whether our transparency framework contributes to an increase in end-user trust, we conducted an empirical study with over 800 non-expert participants. The results indicate that greater transparency improves user comfort, with participants willing to share various types of personal data across different levels of transparency. The study also reveals misconceptions about transparency, highlighting the need for clear communication and user education.
Blindfold: Confidential Memory Management by Untrusted Operating System	2024-12-05	Show Confidential Computing (CC) has received increasing attention in recent years as a mechanism to protect user data from untrusted operating systems (OSes). Existing CC solutions hide confidential memory from the OS and/or encrypt it to achieve confidentiality. In doing so, they render OS memory optimization unusable or complicate the trusted computing base (TCB) required for optimization. This paper presents our results toward overcoming these limitations, synthesized in a CC design named Blindfold. Like many other CC solutions, Blindfold relies on a small trusted software component running at a higher privilege level than the kernel, called Guardian. It features three techniques that can enhance existing CC solutions. First, instead of nesting page tables, Guardian mediates how the OS accesses memory and handles exceptions by switching page and interrupt tables. Second, Blindfold employs a lightweight capability system to regulate the kernel semantic access to user memory, unifying case-by-case approaches in previous work. Finally, Blindfold provides carefully designed secure ABI for confidential memory management without encryption. We report an implementation of Blindfold that works on ARMv8-A/Linux. Using Blindfold prototype, we are able to evaluate the cost of enabling confidential memory management by the untrusted Linux kernel. We show Blindfold has a smaller runtime TCB than related systems and enjoys competitive performance. More importantly, we show that the Linux kernel, including all of its memory optimizations except memory compression, can function properly for confidential memory. This requires only about 400 lines of kernel modifications.
The Forking Way: When TEEs Meet Consensus	2024-12-01	Show An increasing number of distributed platforms combine Trusted Execution Environments (TEEs) with blockchains. Indeed, many hail the combination of TEEs and blockchains a good "marriage": TEEs bring confidential computing to the blockchain while the consensus layer could help defend TEEs from forking attacks. In this paper, we systemize how current blockchain solutions integrate TEEs and to what extent they are secure against forking attacks. To do so, we thoroughly analyze 29 proposals for TEE-based blockchains, ranging from academic proposals to production-ready platforms. We uncover a lack of consensus in the community on how to combine TEEs and blockchains. In particular, we identify four broad means to interconnect TEEs with consensus, analyze their limitations, and discuss possible remedies. Our analysis also reveals previously undocumented forking attacks on three production-ready TEE-based blockchains: Ten, Phala, and the Secret Network. We leverage our analysis to propose effective countermeasures against those vulnerabilities; we responsibly disclosed our findings to the developers of each affected platform.	18 pa... 18 pages, 14 figures, 1 table
RISecure-PUF: Multipurpose PUF-Driven Security Extensions with Lookaside Buffer in RISC-V	2024-11-21	Show RISC-V's limited security features hinder its use in confidential computing and heterogeneous platforms. This paper introduces RISecure-PUF, a security extension utilizing existing Physical Unclonable Functions for key generation and secure protocol purposes. A one-way hash function is integrated to ensure provable security against modeling attacks, while a lookaside buffer accelerates batch sampling and minimizes reliance on error correction codes. Implemented on the Genesys 2 FPGA, RISecure-PUF improves at least $2.72\times$ in batch scenarios with negligible hardware overhead and a maximum performance reduction of $10.7%$, enabled by reusing the hash function module in integrated environments such as cryptographic engines.
Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study	2024-11-05	Show This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA Hopper GPUs for large language model (LLM) inference tasks. We benchmark the overhead introduced by TEE mode across various LLMs and token lengths, with a particular focus on the bottleneck caused by CPU-GPU data transfers via PCIe. Our results indicate that while there is minimal computational overhead within the GPU, the overall performance penalty is primarily attributable to data transfer. For the majority of typical LLM queries, the overhead remains below 7%, with larger models and longer sequences experiencing nearly zero overhead.
PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption	2024-11-04	Show Confidential computing on GPUs, like NVIDIA H100, mitigates the security risks of outsourced Large Language Models (LLMs) by implementing strong isolation and data encryption. Nonetheless, this encryption incurs a significant performance overhead, reaching up to 52.8 percent and 88.2 percent throughput drop when serving OPT-30B and OPT-66B, respectively. To address this challenge, we introduce PipeLLM, a user-transparent runtime system. PipeLLM removes the overhead by overlapping the encryption and GPU computation through pipelining - an idea inspired by the CPU instruction pipelining - thereby effectively concealing the latency increase caused by encryption. The primary technical challenge is that, unlike CPUs, the encryption module lacks prior knowledge of the specific data needing encryption until it is requested by the GPUs. To this end, we propose speculative pipelined encryption to predict the data requiring encryption by analyzing the serving patterns of LLMs. Further, we have developed an efficient, low-cost pipeline relinquishing approach for instances of incorrect predictions. Our experiments on NVIDIA H100 GPU show that compared with vanilla systems without confidential computing (e.g., vLLM, PEFT, and FlexGen), PipeLLM incurs modest overhead (less than 19.6 percent in throughput) across various LLM sizes, from 13B to 175B.	To ap... To appear in ASPLOS 2025
RA-WEBs: Remote Attestation for WEB services	2024-11-02	Show Data theft and leakage, caused by external adversaries and insiders, demonstrate the need for protecting user data. Trusted Execution Environments (TEEs) offer a promising solution by creating secure environments that protect data and code from such threats. The rise of confidential computing on cloud platforms facilitates the deployment of TEE-enabled server applications, which are expected to be widely adopted in web services such as privacy-preserving LLM inference and secure data logging. One key feature is Remote Attestation (RA), which enables integrity verification of a TEE. However, $\textit{compatibility}$ issues with RA verification arise as no browsers natively support this feature, making prior solutions cumbersome and risky. To address these challenges, we propose $\texttt{RA-WEBs}$ ($\textbf{R}$emote $\textbf{A}$ttestation for $\textbf{Web}$ $\textbf{s}$ervices), a novel RA protocol designed for high compatibility with the current web ecosystem. $\texttt{RA-WEBs}$ leverages established web mechanisms for immediate deployability, enabling RA verification on existing browsers. We conduct a comprehensive security analysis, demonstrating $\texttt{RA-WEBs}$'s resilience against various threats. Our contributions include the $\texttt{RA-WEBs}$ proposal, a proof-of-concept implementation, an in-depth security analysis, and publicly available code for reproducible research.
Privacy-Preserving Decentralized AI with Confidential Computing	2024-10-18	Show This paper addresses privacy protection in decentralized Artificial Intelligence (AI) using Confidential Computing (CC) within the Atoma Network, a decentralized AI platform designed for the Web3 domain. Decentralized AI distributes AI services among multiple entities without centralized oversight, fostering transparency and robustness. However, this structure introduces significant privacy challenges, as sensitive assets such as proprietary models and personal data may be exposed to untrusted participants. Cryptography-based privacy protection techniques such as zero-knowledge machine learning (zkML) suffers prohibitive computational overhead. To address the limitation, we propose leveraging Confidential Computing (CC). Confidential Computing leverages hardware-based Trusted Execution Environments (TEEs) to provide isolation for processing sensitive data, ensuring that both model parameters and user data remain secure, even in decentralized, potentially untrusted environments. While TEEs face a few limitations, we believe they can bridge the privacy gap in decentralized AI. We explore how we can integrate TEEs into Atoma's decentralized framework.
Ditto: Elastic Confidential VMs with Secure and Dynamic CPU Scaling	2024-09-23	Show Confidential Virtual Machines (CVMs) are a type of VMbased Trusted Execution Environments (TEEs) designed to enhance the security of cloud-based VMs, safeguarding them even from malicious hypervisors. Although CVMs have been widely adopted by major cloud service providers, current CVM designs face significant challenges in runtime resource management due to their fixed capacities and lack of transparency. These limitations hamper efficient cloud resource management, leading to increased operational costs and reduced agility in responding to fluctuating workloads. This paper introduces a dynamic CPU resource management approach, featuring the novel concept of "Elastic CVM. This approach allows for hypervisor-assisted runtime adjustment of CPU resources using a specialized vCPU type, termed Worker vCPU. This new approach enhances CPU resource adaptability and operational efficiency without compromising security. Additionally, we introduce a Worker vCPU Abstraction Layer to simplify Worker vCPU deployment and management. To demonstrate the effectiveness of our approach, we have designed and implemented a serverless computing prototype platform, called Ditto. We show that Ditto significantly improves performance and efficiency through finergrain resource management. The concept of "Elastic CVM" and the Worker vCPU design not only optimize cloud resource utilization but also pave the way for more flexible and cost-effective confidential computing environments.
VECA: Reliable and Confidential Resource Clustering for Volunteer Edge-Cloud Computing	2024-09-04	Show Volunteer Edge-Cloud (VEC) computing has a significant potential to support scientific workflows in user communities contributing volunteer edge nodes. However, managing heterogeneous and intermittent resources to support machine/deep learning (ML/DL) based workflows poses challenges in resource governance for reliability, and confidentiality for model/data privacy protection. There is a need for approaches to handle the volatility of volunteer edge node availability, and also to scale the confidential data-intensive workflow execution across a large number of VEC nodes. In this paper, we present VECA, a reliable and confidential VEC resource clustering solution featuring three-fold methods tailored for executing ML/DL-based scientific workflows on VEC resources. Firstly, a capacity-based clustering approach enhances system reliability and minimizes VEC node search latency. Secondly, a novel two-phase, globally distributed scheduling scheme optimizes job allocation based on node attributes and using time-series-based Recurrent Neural Networks. Lastly, the integration of confidential computing ensures privacy preservation of the scientific workflows, where model and data information are not shared with VEC resources providers. We evaluate VECA in a Function-as-a-Service (FaaS) cloud testbed that features OpenFaaS and MicroK8S to support two ML/DL-based scientific workflows viz., G2P-Deep (bioinformatics) and PAS-ML (health informatics). Results from tested experiments demonstrate that our proposed VECA approach outperforms state-of-the-art methods; especially VECA exhibits a two-fold reduction in VEC node search latency and over 20% improvement in productivity rates following execution failures compared to the next best method.	Accep... Accepted - IC2E 2024 Conference
Confidential Computing on Heterogeneous CPU-GPU Systems: Survey and Future Directions	2024-09-03	Show In recent years, the widespread informatization and rapid data explosion have increased the demand for high-performance heterogeneous systems that integrate multiple computing cores such as CPUs, Graphics Processing Units (GPUs), Application Specific Integrated Circuits (ASICs), and Field Programmable Gate Arrays (FPGAs). The combination of CPU and GPU is particularly popular due to its versatility. However, these heterogeneous systems face significant security and privacy risks. Advances in privacy-preserving techniques, especially hardware-based Trusted Execution Environments (TEEs), offer effective protection for GPU applications. Nonetheless, the potential security risks involved in extending TEEs to GPUs in heterogeneous systems remain uncertain and need further investigation. To investigate these risks in depth, we study the existing popular GPU TEE designs and summarize and compare their key implications. Additionally, we review existing powerful attacks on GPUs and traditional TEEs deployed on CPUs, along with the efforts to mitigate these threats. We identify potential attack surfaces introduced by GPU TEEs and provide insights into key considerations for designing secure GPU TEEs. This survey is timely as new TEEs for heterogeneous systems, particularly GPUs, are being developed, highlighting the need to understand potential security threats and build both efficient and secure systems.	35 pages, 7 figures
Privacy-Preserving Multimedia Mobile Cloud Computing Using Protective Perturbation	2024-09-03	Show Mobile cloud computing has been adopted in many multimedia applications, where the resource-constrained mobile device sends multimedia data (e.g., images) to remote cloud servers to request computation-intensive multimedia services (e.g., image recognition). While significantly improving the performance of the mobile applications, the cloud-based mechanism often causes privacy concerns as the multimedia data and services are offloaded from the trusted user device to untrusted cloud servers. Several recent studies have proposed perturbation-based privacy preserving mechanisms, which obfuscate the offloaded multimedia data to eliminate privacy exposures without affecting the functionality of the remote multimedia services. However, the existing privacy protection approaches require the deployment of computation-intensive perturbation generation on the resource-constrained mobile devices. Also, the obfuscated images are typically not compliant with the standard image compression algorithms and suffer from significant bandwidth consumption. In this paper, we develop a novel privacy-preserving multimedia mobile cloud computing framework, namely $PMC^2$, to address the resource and bandwidth challenges. $PMC^2$ employs secure confidential computing in the cloud to deploy the perturbation generator, which addresses the resource challenge while maintaining the privacy. Furthermore, we develop a neural compressor specifically trained to compress the perturbed images in order to address the bandwidth challenge. We implement $PMC^2$ in an end-to-end mobile cloud computing system, based on which our evaluations demonstrate superior latency, power efficiency, and bandwidth consumption achieved by $PMC^2$ while maintaining high accuracy in the target multimedia service.
Devlore: Extending Arm CCA to Integrated Devices A Journey Beyond Memory to Interrupt Isolation	2024-08-11	Show Arm Confidential Computing Architecture (CCA) executes sensitive computation in an abstraction called realm VMs and protects it from the hypervisor, host OS, and other co-resident VMs. However, CCA does not allow integrated devices on the platform to access realm VMs and doing so requires intrusive changes to software and is simply not possible to achieve securely for some devices. In this paper, we present Devlore which allows realm VMs to directly access integrated peripherals. Devlore memory isolation re-purposes CCA hardware primitives (granule protection and stage-two page tables), while our interrupt isolation adapts a delegate-but-check strategy. Our choice of offloading interrupt management to the hypervisor but adding correctness checks in the trusted software allows Devlore to preserve compatibility and performance. We evaluate Devlore on Arm FVP to demonstrate 5 diverse peripherals attached to realm VMs.
Aster: Fixing the Android TEE Ecosystem with Arm CCA	2024-07-23	Show The Android ecosystem relies on either TrustZone (e.g., OP-TEE, QTEE, Trusty) or trusted hypervisors (pKVM, Gunyah) to isolate security-sensitive services from malicious apps and Android bugs. TrustZone allows any secure world code to access the normal world that runs Android. Similarly, a trusted hypervisor has full access to Android running in one VM and security services in other VMs. In this paper, we motivate the need for mutual isolation, wherein Android, hypervisors, and the secure world are isolated from each other. Then, we propose a sandboxed service abstraction, such that a sandboxed execution cannot access any other sandbox, Android, hypervisor, or secure world memory. We present Aster which achieves these goals while ensuring that sandboxed execution can still communicate with Android to get inputs and provide outputs securely. Our main insight is to leverage the hardware isolation offered by Arm Confidential Computing Architecture (CCA). However, since CCA does not satisfy our sandboxing and mutual isolation requirements, Aster repurposes its hardware enforcement to meet its goals while addressing challenges such as secure interfaces, virtio, and protection against interrupts. We implement Aster to demonstrate its feasibility and assess its compatibility. We take three case studies, including one currently deployed on Android phones and insufficiently secured using a trusted hypervisor, to demonstrate that they can be protected by Aster.
Honest Computing: Achieving demonstrable data lineage and provenance for driving data and process-sensitive policies	2024-07-19	Show Data is the foundation of any scientific, industrial or commercial process. Its journey typically flows from collection to transport, storage, management and processing. While best practices and regulations guide data management and protection, recent events have underscored its vulnerability. Academic research and commercial data handling have been marred by scandals, revealing the brittleness of data management. Data, despite its importance, is susceptible to undue disclosures, leaks, losses, manipulation, or fabrication. These incidents often occur without visibility or accountability, necessitating a systematic structure for safe, honest, and auditable data management. In this paper, we introduce the concept of Honest Computing as the practice and approach that emphasizes transparency, integrity, and ethical behaviour within the realm of computing and technology. It ensures that computer systems and software operate honestly and reliably without hidden agendas, biases, or unethical practices. It enables privacy and confidentiality of data and code by design and by default. We also introduce a reference framework to achieve demonstrable data lineage and provenance, contrasting it with Secure Computing, a related but differently-orientated form of computing. At its core, Honest Computing leverages Trustless Computing, Confidential Computing, Distributed Computing, Cryptography and AAA security concepts. Honest Computing opens new ways of creating technology-based processes and workflows which permit the migration of regulatory frameworks for data protection from principle-based approaches to rule-based ones. Addressing use cases in many fields, from AI model protection and ethical layering to digital currency formation for finance and banking, trading, and healthcare, this foundational layer approach can help define new standards for appropriate data custody and processing.	Accep... Accepted for publication in the Data & Policy journal
Cabin: Confining Untrusted Programs within Confidential VMs	2024-07-18	Show Confidential computing safeguards sensitive computations from untrusted clouds, with Confidential Virtual Machines (CVMs) providing a secure environment for guest OS. However, CVMs often come with large and vulnerable operating system kernels, making them susceptible to attacks exploiting kernel weaknesses. The imprecise control over the read/write access in the page table has allowed attackers to exploit vulnerabilities. The lack of security hierarchy leads to insufficient separation between untrusted applications and guest OS, making the kernel susceptible to direct threats from untrusted programs. This study proposes Cabin, an isolated execution framework within guest VM utilizing the latest AMD SEV-SNP technology. Cabin shields untrusted processes to the user space of a lower virtual machine privilege level (VMPL) by introducing a proxy-kernel between the confined processes and the guest OS. Furthermore, we propose execution protection mechanisms based on fine-gained control of VMPL privilege for vulnerable programs and the proxy-kernel to minimize the attack surface. We introduce asynchronous forwarding mechanism and anonymous memory management to reduce the performance impact. The evaluation results show that the Cabin framework incurs a modest overhead (5% on average) on Nbench and WolfSSL benchmarks.	ICICS 2024
Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads	2024-07-16	Show Cloud workloads have dominated generative AI based on large language models (LLM). Specialized hardware accelerators, such as GPUs, NPUs, and TPUs, play a key role in AI adoption due to their superior performance over general-purpose CPUs. The AI models and the data are often highly sensitive and come from mutually distrusting parties. Existing CPU-based TEEs such as Intel SGX or AMD SEV do not provide sufficient protection. Device-centric TEEs like Nvidia-CC only address tightly coupled CPU-GPU systems with a proprietary solution requiring TEE on the host CPU side. On the other hand, existing academic proposals are tailored toward specific CPU-TEE platforms. To address this gap, we propose Ascend-CC, a confidential computing architecture based on discrete NPU devices that requires no trust in the host system. Ascend-CC provides strong security by ensuring data and model encryption that protects not only the data but also the model parameters and operator binaries. Ascend-CC uses delegation-based memory semantics to ensure isolation from the host software stack, and task attestation provides strong model integrity guarantees. Our Ascend-CC implementation and evaluation with state-of-the-art LLMs such as Llama2 and Llama3 shows that Ascend-CC introduces minimal overhead with no changes in the AI software stack.
Enabling Performant and Secure EDA as a Service in Public Clouds Using Confidential Containers	2024-07-08	Show Increasingly, business opportunities available to fabless design teams in the semiconductor industry far exceed those addressable with on-prem compute resources. An attractive option to capture these electronic design automation (EDA) design opportunities is through public cloud bursting. However, security concerns with public cloud bursting arise from having to protect process design kits, third party intellectual property, and new design data for semiconductor devices and chips. One way to address security concerns for public cloud bursting is to leverage confidential containers for EDA workloads. Confidential containers add zero trust computing elements to significantly reduce the probability of intellectual property escapes. A key concern that often follows security discussions is whether EDA workload performance will suffer with confidential computing. In this work we demonstrate a full set of EDA confidential containers and their deployment and characterize performance impacts of confidential elements of the flow including storage and networking. A complete end-to-end confidential container-based EDA workload exhibits 7.13% and 2.05% performance overheads over bare-metal container and VM based solutions, respectively.
TRUCE: Private Benchmarking to Prevent Contamination and Improve Comparative Evaluation of LLMs	2024-06-24	Show Benchmarking is the de-facto standard for evaluating LLMs, due to its speed, replicability and low cost. However, recent work has pointed out that the majority of the open source benchmarks available today have been contaminated or leaked into LLMs, meaning that LLMs have access to test data during pretraining and/or fine-tuning. This raises serious concerns about the validity of benchmarking studies conducted so far and the future of evaluation using benchmarks. To solve this problem, we propose Private Benchmarking, a solution where test datasets are kept private and models are evaluated without revealing the test data to the model. We describe various scenarios (depending on the trust placed on model owners or dataset owners), and present solutions to avoid data contamination using private benchmarking. For scenarios where the model weights need to be kept private, we describe solutions from confidential computing and cryptography that can aid in private benchmarking. We build an end-to-end system, TRUCE, that enables such private benchmarking showing that the overheads introduced to protect models and benchmark are negligible (in the case of confidential computing) and tractable (when cryptographic security is required). Finally, we also discuss solutions to the problem of benchmark dataset auditing, to ensure that private benchmarks are of sufficiently high quality.
Verifying components of Arm(R) Confidential Computing Architecture with ESBMC	2024-06-05	Show Realm Management Monitor (RMM) is an essential firmware component within the recent Arm Confidential Computing Architecture (Arm CCA). Previous work applies formal techniques to verify the specification and prototype reference implementation of RMM. However, relying solely on a single verification tool may lead to the oversight of certain bugs or vulnerabilities. This paper discusses the application of ESBMC, a state-of-the-art Satisfiability Modulo Theories (SMT)-based software model checker, to further enhance RRM verification. We demonstrate ESBMC's ability to precisely parse the source code and identify specification failures within a reasonable time frame. Moreover, we propose potential improvements for ESBMC to enhance its efficiency for industry engineers. This work contributes to exploring the capabilities of formal verification techniques in real-world scenarios and suggests avenues for further improvements to better meet industrial verification needs.
Machine Learning with Confidential Computing: A Systematization of Knowledge	2024-06-03	Show Privacy and security challenges in Machine Learning (ML) have become increasingly severe, along with ML's pervasive development and the recent demonstration of large attack surfaces. As a mature system-oriented approach, Confidential Computing has been utilized in both academia and industry to mitigate privacy and security issues in various ML scenarios. In this paper, the conjunction between ML and Confidential Computing is investigated. We systematize the prior work on Confidential Computing-assisted ML techniques that provide i) confidentiality guarantees and ii) integrity assurances, and discuss their advanced features and drawbacks. Key challenges are further identified, and we provide dedicated analyses of the limitations in existing Trusted Execution Environment (TEE) systems for ML use cases. Finally, prospective works are discussed, including grounded privacy definitions for closed-loop protection, partitioned executions of efficient ML, dedicated TEE-assisted designs for ML, TEE-aware ML, and ML full pipeline guarantees. By providing these potential solutions in our systematization of knowledge, we aim to build the bridge to help achieve a much stronger TEE-enabled ML for privacy guarantees without introducing computation and system costs.	Surve... Survey paper, 37 pages, accepted to ACM Computing Surveys
DuckDB-SGX2: The Good, The Bad and The Ugly within Confidential Analytical Query Processing	2024-05-30	Show We provide an evaluation of an analytical workload in a confidential computing environment, combining DuckDB with two technologies: modular columnar encryption in Parquet files (data at rest) and the newest version of the Intel SGX Trusted Execution Environment (TEE), providing a hardware enclave where data in flight can be (more) securely decrypted and processed. One finding is that the "performance tax" for such confidential analytical processing is acceptable compared to not using these technologies. We eventually manage to run TPC-H SF30 with under 2x overhead compared to non-encrypted, non-enclave execution; we show that, specifically, columnar compression and encryption are a good combination. Our second finding consists of dos and don'ts to tune DuckDB to work effectively in this environment. There are various performance hazards: potentially 5x higher cache miss costs due to memory encryption inside the enclave, NUMA penalties, and highly elevated cost of swapping pages in and out of the enclave -- which is also triggered indirectly by using a non-SGX-aware malloc library.
GuaranTEE: Towards Attestable and Private ML with CCA	2024-03-29	Show Machine-learning (ML) models are increasingly being deployed on edge devices to provide a variety of services. However, their deployment is accompanied by challenges in model privacy and auditability. Model providers want to ensure that (i) their proprietary models are not exposed to third parties; and (ii) be able to get attestations that their genuine models are operating on edge devices in accordance with the service agreement with the user. Existing measures to address these challenges have been hindered by issues such as high overheads and limited capability (processing/secure memory) on edge devices. In this work, we propose GuaranTEE, a framework to provide attestable private machine learning on the edge. GuaranTEE uses Confidential Computing Architecture (CCA), Arm's latest architectural extension that allows for the creation and deployment of dynamic Trusted Execution Environments (TEEs) within which models can be executed. We evaluate CCA's feasibility to deploy ML models by developing, evaluating, and openly releasing a prototype. We also suggest improvements to CCA to facilitate its use in protecting the entire ML deployment pipeline on edge devices.	Accep... Accepted at the 4th Workshop on Machine Learning and Systems (EuroMLSys '24)
vSPACE: Voting in a Scalable, Privacy-Aware and Confidential Election	2024-03-08	Show The vSPACE experimental proof-of-concept (PoC) on the TrueElect[Anon][Creds] protocol presents a novel approach to secure, private, and scalable elections, extending the TrueElect and ElectAnon protocols with the integration of AnonCreds SSI (Self-Sovereign Identity). Such a protocol PoC is situated within a Zero-Trust Architecture (ZTA) and leverages confidential computing, continuous authentication, multi-party computation (MPC), and well-architected framework (WAF) principles to address the challenges of cybersecurity, privacy, and trust over IP (ToIP) protection. Employing a Kubernetes confidential cluster within an Enterprise-Scale Landing Zone (ESLZ), vSPACE integrates Distributed Ledger Technology (DLT) for immutable and certifiable audit trails. The Infrastructure as Code (IaC) model ensures rapid deployment, consistent management, and adherence to security standards, making vSPACE a future-proof solution for digital voting systems.
Trustworthy confidential virtual machines for the masses	2024-02-23	Show Confidential computing alleviates the concerns of distrustful customers by removing the cloud provider from their trusted computing base and resolves their disincentive to migrate their workloads to the cloud. This is facilitated by new hardware extensions, like AMD's SEV Secure Nested Paging (SEV-SNP), which can run a whole virtual machine with confidentiality and integrity protection against a potentially malicious hypervisor owned by an untrusted cloud provider. However, the assurance of such protection to either the service providers deploying sensitive workloads or the end-users passing sensitive data to services requires sending proof to the interested parties. Service providers can retrieve such proof by performing remote attestation while end-users have typically no means to acquire this proof or validate its correctness and therefore have to rely on the trustworthiness of the service providers. In this paper, we present Revelio, an approach that features two main contributions: i) it allows confidential virtual machine (VM)-based workloads to be designed and deployed in a way that disallows any tampering even by the service providers and ii) it empowers users to easily validate their integrity. In particular, we focus on web-facing workloads, protect them leveraging SEV-SNP, and enable end-users to remotely attest them seamlessly each time a new web session is established. To highlight the benefits of Revelio, we discuss how a standalone stateful VM that hosts an open-source collaboration office suite can be secured and present a replicated protocol proxy that enables commodity users to securely access the Internet Computer, a decentralized blockchain infrastructure.
Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry	2024-02-22	Show The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are caused by privacy concerns, the rarity of claim events, the lack of informative rating factors, etc.. During each round of FL, collaborators compute improvements on the model using their local private data, and these insights are combined to update a global model. Such aggregation of insights allows for an increase to the effectiveness in forecasting claims losses compared to models individually trained at each collaborator. Critically, this approach enables machine learning collaboration without the need for raw data to leave the compute infrastructure of each respective data owner. Additionally, the open-source framework, OpenFL, that is used in our experiments is designed so that it can be run using confidential computing as well as with additional algorithmic protections against leakage of information via the shared model updates. In such a way, FL is implemented as a privacy-enhancing collaborative learning technique that addresses the challenges posed by the sensitivity and privacy of data in traditional machine learning solutions. This paper's application of FL can also be expanded to other areas including fraud detection, catastrophe modeling, etc., that have a similar need to incorporate data privacy into machine learning collaborations. Our framework and empirical results provide a foundation for future collaborations among insurers, regulators, academic researchers, and InsurTech experts.
virtCCA: Virtualized Arm Confidential Compute Architecture with TrustZone	2024-02-18	Show ARM recently introduced the Confidential Compute Architecture (CCA) as part of the upcoming ARMv9-A architecture. CCA enables the support of confidential virtual machines (cVMs) within a separate world called the Realm world, providing protection from the untrusted normal world. While CCA offers a promising future for confidential computing, the widespread availability of CCA hardware is not expected in the near future, according to ARM's roadmap. To address this gap, we present virtCCA, an architecture that facilitates virtualized CCA using TrustZone, a mature hardware feature available on existing ARM platforms. Notably, virtCCA can be implemented on platforms equipped with the Secure EL2 (S-EL2) extension available from ARMv8.4 onwards, as well as on earlier platforms that lack S-EL2 support. virtCCA is fully compatible with the CCA specifications at the API level. We have developed the entire CCA software and firmware stack on top of virtCCA, including the enhancements to the normal world's KVM to support cVMs, and the TrustZone Management Monitor (TMM) that enforces isolation among cVMs and provides cVM life-cycle management. We have implemented virtCCA on real ARM servers, with and without S-EL2 support. Our evaluation, conducted on micro-benchmarks and macro-benchmarks, demonstrates that the overhead of running cVMs is acceptable compared to running normal-world VMs. Specifically, in a set of real-world workloads, the overhead of virtCCA-SEL2 is less than 29.5% for I/O intensive workloads, while virtCCA-EL3 outperforms the baseline in most cases.
HasTEE+ : Confidential Cloud Computing and Analytics with Haskell	2024-01-17	Show Confidential computing is a security paradigm that enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs). By integrating TEEs with a Remote Attestation protocol, confidential computing allows a third party to establish the integrity of an \textit{enclave} hosted within an untrusted cloud. However, TEE solutions, such as Intel SGX and ARM TrustZone, offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks. Moreover, the toolchains involve complex multi-project hierarchies and the deployment of hand-written attestation protocols for verifying \textit{enclave} integrity. We address the above with HasTEE+, a domain-specific language (DSL) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety. HasTEE+ assists in multi-tier cloud application development by (1) introducing a \textit{tierless} programming model for expressing distributed client-server interactions as a single program, (2) integrating a general remote-attestation architecture that removes the necessity to write application-specific cross-cutting attestation code, and (3) employing a dynamic information flow control mechanism to prevent explicit as well as implicit data leaks. We demonstrate the practicality of HasTEE+ through a case study on confidential data analytics, presenting a data-sharing pattern applicable to mutually distrustful participants and providing overall performance metrics.	High-... High-quality pdf at https://abhiroop.github.io/pubs/HasTEE_ESORICS_Sarkar_Russo.pdf
Ontologising Trustworthy in the Telecommunications Domain	2023-11-27	Show Based upon trusted and confidential computing platforms, telecommunications systems must provide guaranteed security for the processes and data running atop them. This in turn requires us to provide trustworthy systems. The term trustworthy is poorly defined with corresponding misunderstanding and misapplication. We present a definition of this term, as well as others, demonstrate its application against certain telecommunications use cases and address how the learnings from ontologising these structures contribute to standardisation and the necessity for FAIR ontologies across telecommunications standards and hosting organisations.
ACAI: Protecting Accelerator Execution with Arm Confidential Computing Architecture	2023-10-25	Show Trusted execution environments in several existing and upcoming CPUs demonstrate the success of confidential computing, with the caveat that tenants cannot securely use accelerators such as GPUs and FPGAs. In this paper, we reconsider the Arm Confidential Computing Architecture (CCA) design, an upcoming TEE feature in Armv9-A, to address this gap. We observe that CCA offers the right abstraction and mechanisms to allow confidential VMs to use accelerators as a first-class abstraction. We build ACAI, a CCA-based solution, with a principled approach of extending CCA security invariants to device-side access to address several critical security gaps. Our experimental results on GPU and FPGA demonstrate the feasibility of ACAI while maintaining security guarantees.	Exten... Extended version of the Usenix Security 2024 paper
Towards a Formally Verified Security Monitor for VM-based Confidential Computing	2023-10-01	Show Confidential computing is a key technology for isolating high-assurance applications from the large amounts of untrusted code typical in modern systems. Existing confidential computing systems cannot be certified for use in critical applications, like systems controlling critical infrastructure, hardware security modules, or aircraft, as they lack formal verification. This paper presents an approach to formally modeling and proving a security monitor. It introduces a canonical architecture for virtual machine (VM)-based confidential computing systems. It abstracts processor-specific components and identifies a minimal set of hardware primitives required by a trusted security monitor to enforce security guarantees. We demonstrate our methodology and proposed approach with an example from our Rust implementation of the security monitor for RISC-V.
Making Your Program Oblivious: a Comparative Study for Side-channel-safe Confidential Computing	2023-08-12	Show Trusted Execution Environments (TEEs) are gradually adopted by major cloud providers, offering a practical option of \emph{confidential computing} for users who don't fully trust public clouds. TEEs use CPU-enabled hardware features to eliminate direct breaches from compromised operating systems or hypervisors. However, recent studies have shown that side-channel attacks are still effective on TEEs. An appealing solution is to convert applications to be \emph{data oblivious} to deter many side-channel attacks. While a few research prototypes on TEEs have adopted specific data oblivious operations, the general conversion approaches have never been thoroughly compared against and tested on benchmark TEE applications. These limitations make it difficult for researchers and practitioners to choose and adopt a suitable data oblivious approach for their applications. To address these issues, we conduct a comprehensive analysis of several representative conversion approaches and implement benchmark TEE applications with them. We also perform an extensive empirical study to provide insights into their performance and ease of use.
Confidential Computing across Edge-to-Cloud for Machine Learning: A Survey Study	2023-07-31	Show Confidential computing has gained prominence due to the escalating volume of data-driven applications (e.g., machine learning and big data) and the acute desire for secure processing of sensitive data, particularly, across distributed environments, such as edge-to-cloud continuum. Provided that the works accomplished in this emerging area are scattered across various research fields, this paper aims at surveying the fundamental concepts, and cutting-edge software and hardware solutions developed for confidential computing using trusted execution environments, homomorphic encryption, and secure enclaves. We underscore the significance of building trust in both hardware and software levels and delve into their applications particularly for machine learning (ML) applications. While substantial progress has been made, there are some barely-explored areas that need extra attention from the researchers and practitioners in the community to improve confidentiality aspects, develop more robust attestation mechanisms, and to address vulnerabilities of the existing trusted execution environments. Providing a comprehensive taxonomy of the confidential computing landscape, this survey enables researchers to advance this field to ultimately ensure the secure processing of users' sensitive data across a multitude of applications and computing tiers.
Remote attestation of SEV-SNP confidential VMs using e-vTPMs	2023-06-25	Show Trying to address the security challenges of a cloud-centric software deployment paradigm, silicon and cloud vendors are introducing confidential computing - an umbrella term aimed at providing hardware and software mechanisms for protecting cloud workloads from the cloud provider and its software stack. Today, Intel SGX, AMD SEV, Intel TDX, etc., provide a way to shield cloud applications from the cloud provider through encryption of the application's memory below the hardware boundary of the CPU, hence requiring trust only in the CPU vendor. Unfortunately, existing hardware mechanisms do not automatically enable the guarantee that a protected system was not tampered with during configuration and boot time. Such a guarantee relies on a hardware RoT, i.e., an integrity-protected location that can store measurements in a trustworthy manner, extend them, and authenticate the measurement logs to the user. In this work, we design and implement a virtual TPM that virtualizes the hardware RoT without requiring trust in the cloud provider. To ensure the security of a vTPM in a provider-controlled environment, we leverage unique isolation properties of the SEV-SNP hardware that allows us to execute secure services as part of the enclave environment protected from the cloud provider. We further develop a novel approach to vTPM state management where the vTPM state is not preserved across reboots. Specifically, we develop a stateless ephemeral vTPM that supports remote attestation without any persistent state on the host. This allows us to pair each confidential VM with a private instance of a vTPM completely isolated from the provider-controlled environment and other VMs. We built our prototype entirely on open-source components. Though our work is AMD-specific, a similar approach could be used to build remote attestation protocols on other trusted execution environments.	12 pages, 4 figures
Confidential Computing in Edge-Cloud Hierarchy	2023-06-19	Show The paper introduces confidential computing approaches focused on protecting hierarchical data within edge-cloud network. Edge-cloud network suggests splitting and sharing data between the main cloud and the range of networks near the endpoint devices. The proposed solutions allow data in this two-level hierarchy to be protected via embedding traditional encryption at rest and in transit while leaving the remaining security issues, such as sensitive data and operations in use, in the scope of trusted execution environment. Hierarchical data for each network device are linked and identified through distinct paths between edge and main cloud using individual blockchain. Methods for data and cryptographic key splitting between the edge and the main cloud are based on strong authentication techniques ensuring the shared data confidentiality, integrity and availability.
Towards Confidential Computing: A Secure Cloud Architecture for Big Data Analytics and AI	2023-05-28	Show Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally outsourced which are more exposed to risks. We present a secure cloud architecture and describes how it enables workflow packaging and scheduling while keeping its data, logic and computation secure in transit, in use and at rest.	2023 ... 2023 IEEE 16th International Conference on Cloud Computing (IEEE CLOUD), Chicago, Illinois, USA, July 2-8, 2023
HAAC: A Hardware-Software Co-Design to Accelerate Garbled Circuits	2023-04-25	Show Privacy and security have rapidly emerged as priorities in system design. One powerful solution for providing both is privacy-preserving computation, where functions are computed directly on encrypted data and control can be provided over how data is used. Garbled circuits (GCs) are a PPC technology that provide both confidential computing and control over how data is used. The challenge is that they incur significant performance overheads compared to plaintext. This paper proposes a novel garbled circuits accelerator and compiler, named HAAC, to mitigate performance overheads and make privacy-preserving computation more practical. HAAC is a hardware-software co-design. GCs are exemplars of co-design as programs are completely known at compile time, i.e., all dependence, memory accesses, and control flow are fixed. The design philosophy of HAAC is to keep hardware simple and efficient, maximizing area devoted to our proposed custom execution units and other circuits essential for high performance (e.g., on-chip storage). The compiler can leverage its program understanding to realize hardware's performance potential by generating effective instruction schedules, data layouts, and orchestrating off-chip events. In taking this approach we can achieve ASIC performance/efficiency without sacrificing generality. Insights of our approach include how co-design enables expressing arbitrary GCs programs as streams, which simplifies hardware and enables complete memory-compute decoupling, and the development of a scratchpad that captures data reuse by tracking program execution, eliminating the need for costly hardware managed caches and tagging logic. We evaluate HAAC with VIP-Bench and achieve an average speedup of 589$\times$ with DDR4 (2,627$\times$ with HBM2) in 4.3mm$^2$ of area.	Accep... Accepted to the 50th Annual International Symposium on Computer Architecture (ISCA)
CRC: Fully General Model of Confidential Remote Computing	2023-04-17	Show Digital services have been offered through remote systems for decades. The questions of how these systems can be built in a trustworthy manner and how their security properties can be understood are given fresh impetus by recent hardware developments, allowing a fuller, more general, exploration of the possibilities than has previously been seen in the literature. Drawing on and consolidating the disparate strains of research, technologies and methods employed throughout the adaptation of confidential computing, we present a novel, dedicated Confidential Remote Computing (CRC) model. CRC proposes a compact solution for next-generation applications to be built on strong hardware-based security primitives, control of secure software products' trusted computing base, and a way to make correct use of proofs and evidence reports generated by the attestation mechanisms. The CRC model illustrates the trade-offs between decentralisation, task size and transparency overhead. We conclude the paper with six lessons learned from our approach, and suggest two future research directions.	36 pa... 36 pages, 7 figures, 6 tables
CoVE: Towards Confidential Computing on RISC-V Platforms	2023-04-12	Show Multi-tenant computing platforms are typically comprised of several software and hardware components including platform firmware, host operating system kernel, virtualization monitor, and the actual tenant payloads that run on them (typically in a virtual machine, container, or application). This model is well established in large scale commercial deployment, but the downside is that all platform components and operators are in the Trusted Computing Base (TCB) of the tenant. This aspect is ill-suited for privacy-oriented workloads that aim to minimize the TCB footprint. Confidential computing presents a good stepping-stone towards providing a quantifiable TCB for computing. Confidential computing [1] requires the use of a HW-attested Trusted Execution Environments for data-in-use protection. The RISC-V architecture presents a strong foundation for meeting the requirements for Confidential Computing and other security paradigms in a clean slate manner. This paper describes a reference architecture and discusses ISA, non-ISA and system-on-chip (SoC) requirements for confidential computing on RISC-V Platforms. It discusses proposed ISA and non-ISA Extension for Confidential Virtual Machine for RISC-V platforms, referred to as CoVE.
A visão da BBChain sobre o contexto tecnológico subjacente à adoção do Real Digital	2023-04-10	Show We explore confidential computing in the context of CBDCs using Microsoft's CCF framework as an example. By developing an experiment and comparing different approaches and performance and security metrics, we seek to evaluate the effectiveness of confidential computing to improve the privacy, security, and performance of CBDCs. Preliminary results suggest that confidential computing could be a promising solution to the technological challenges faced by CBDCs. Furthermore, by implementing confidential computing in DLTs such as Hyperledger Besu and utilizing frameworks such as CCF, we increase transaction confidentiality and privacy while maintaining the scalability and interoperability required for a global digital financial system. In conclusion, confidential computing can significantly bolster CBDC development, fostering a secure, private, and efficient financial future. -- Exploramos o uso da computa\c{c}~ao confidencial no contexto das CBDCs utilizando o framework CCF da Microsoft como exemplo. Via desenvolvimento de experimentos e compara\c{c}~ao de diferentes abordagens e m'etricas de desempenho e seguran\c{c}a, buscamos avaliar a efic'acia da computa\c{c}~ao confidencial para melhorar a privacidade, seguran\c{c}a e desempenho das CBDCs. Resultados preliminares sugerem que a computa\c{c}~ao confidencial pode ser uma solu\c{c}~ao promissora para os desafios tecnol'ogicos enfrentados pelas CBDCs. Ao implementar a computa\c{c}~ao confidencial em DLTs, como o Hyperledger Besu, e utilizar frameworks como o CCF, aumentamos a confidencialidade e a privacidade das transa\c{c}~oes, mantendo a escalabilidade e a interoperabilidade necess'arias para um sistema financeiro global e digital. Em conclus~ao, a computa\c{c}~ao confidencial pode refor\c{c}ar significativamente o desenvolvimento do CBDC, promovendo um futuro financeiro seguro, privado e eficiente.	Comme... Comments: 11 pages, 8 figures, in (Brazilian) Portuguese
Intel TDX Demystified: A Top-Down Approach	2023-03-27	Show Intel Trust Domain Extensions (TDX) is a new architectural extension in the 4th Generation Intel Xeon Scalable Processor that supports confidential computing. TDX allows the deployment of virtual machines in the Secure-Arbitration Mode (SEAM) with encrypted CPU state and memory, integrity protection, and remote attestation. TDX aims to enforce hardware-assisted isolation for virtual machines and minimize the attack surface exposed to host platforms, which are considered to be untrustworthy or adversarial in the confidential computing's new threat model. TDX can be leveraged by regulated industries or sensitive data holders to outsource their computations and data with end-to-end protection in public cloud infrastructure. This paper aims to provide a comprehensive understanding of TDX to potential adopters, domain experts, and security researchers looking to leverage the technology for their own purposes. We adopt a top-down approach, starting with high-level security principles and moving to low-level technical details of TDX. Our analysis is based on publicly available documentation and source code, offering insights from security researchers outside of Intel.
AI-Driven Confidential Computing across Edge-to-Cloud Continuum	2023-01-03	Show With the meteoric growth of technology, individuals and organizations are widely adopting cloud services to mitigate the burdens of maintenance. Despite its scalability and ease of use, many users who own sensitive data refrain from fully utilizing cloud services due to confidentiality concerns. Maintaining data confidentiality for data at rest and in transit has been widely explored but data remains vulnerable in the cloud while it is in use. This vulnerability is further elevated once the scope of computing spans across the edge-to-cloud continuum. Accordingly, the goal of this dissertation is to enable data confidentiality by adopting confidential computing across the continuum. Towards this goal, one approach we explore is to separate the intelligence aspect of data processing from the pattern-matching aspect. We present our approach to make confidential data clustering on the cloud, and then develop confidential search service across edge-to-cloud for unstructured text data. Our proposed clustering solution named ClusPr, performs topic-based clustering for static and dynamic datasets that improves cluster coherency up to 30%-to-60% when compared with other encryption-based clustering techniques. Our trusted enterprise search service named SAED, provides context-aware and personalized semantic search over confidential data across the continuum. We realized that enabling confidential computing across edge-to-cloud requires major contribution from the edge tiers particularly to run multiple Deep Learning (DL) services concurrently. This raises memory contention on the edge tier. To resolve this, we develop Edge-MultiAI framework to manage Neural Network (NN) models of DL applications such that it can meet the latency constraints of the DL applications without compromising inference accuracy.	PhD D... PhD Dissertation advised by Dr. Mohsen Amini Salehi
Confidential High-Performance Computing in the Public Cloud	2022-12-05	Show High-Performance Computing (HPC) in the public cloud democratizes the supercomputing power that most users cannot afford to purchase and maintain. Researchers have studied its viability, performance, and usability. However, HPC in the cloud has a unique feature -- users have to export data and computation to somewhat untrusted cloud platforms. Users will either fully trust cloud providers to protect from all kinds of attacks or keep sensitive assets in-house instead. With the recent deployment of the Trusted Execution Environment (TEE) in the cloud, confidential computing for HPC in the cloud is becoming practical for addressing users' privacy concerns. This paper discusses the threat models, unique challenges, possible solutions, and significant gaps, focusing on TEE-based confidential HPC computing. We hope this discussion will improve the understanding of this new topic for HPC in the cloud and promote new research directions.	to ap... to appear in IEEE Internet Computing
Empowering Data Centers for Next Generation Trusted Computing	2022-11-01	Show Modern data centers have grown beyond CPU nodes to provide domain-specific accelerators such as GPUs and FPGAs to their customers. From a security standpoint, cloud customers want to protect their data. They are willing to pay additional costs for trusted execution environments such as enclaves provided by Intel SGX and AMD SEV. Unfortunately, the customers have to make a critical choice -- either use domain-specific accelerators for speed or use CPU-based confidential computing solutions. To bridge this gap, we aim to enable data-center scale confidential computing that expands across CPUs and accelerators. We argue that having wide-scale TEE-support for accelerators presents a technically easier solution, but is far away from being a reality. Instead, our hybrid design provides enclaved execution guarantees for computation distributed over multiple CPU nodes and devices with/without TEE support. Our solution scales gracefully in two dimensions -- it can handle a large number of heterogeneous nodes and it can accommodate TEE-enabled devices as and when they are available in the future. We observe marginal overheads of $0.42$--$8%$ on real-world AI data center workloads that are independent of the number of nodes in the data center. We add custom TEE support to two accelerators (AI and storage) and integrate it into our solution, thus demonstrating that it can cater to future TEE devices.	23 pages, 12 figures
Partially Trusting the Service Mesh Control Plane	2022-10-23	Show Zero Trust is a novel cybersecurity model that focuses on continually evaluating trust to prevent the initiation and horizontal spreading of attacks. A cloud-native Service Mesh is an example of Zero Trust Architecture that can filter out external threats. However, the Service Mesh does not shield the Application Owner from internal threats, such as a rogue administrator of the cluster where their application is deployed. In this work, we are enhancing the Service Mesh to allow the definition and reinforcement of a Verifiable Configuration that is defined and signed off by the Application Owner. Backed by automated digital signing solutions and confidential computing technologies, the Verifiable Configuration allows changing the trust model of the Service Mesh, from the data plane fully trusting the control plane to partially trusting it. This lets the application benefit from all the functions provided by the Service Mesh (resource discovery, traffic management, mutual authentication, access control, observability), while ensuring that the Cluster Administrator cannot change the state of the application in a way that was not intended by the Application Owner.
CTR: Checkpoint, Transfer, and Restore for Secure Enclaves	2022-05-30	Show Hardware-based Trusted Execution Environments (TEEs) are becoming increasingly prevalent in cloud computing, forming the basis for confidential computing. However, the security goals of TEEs sometimes conflict with existing cloud functionality, such as VM or process migration, because TEE memory cannot be read by the hypervisor, OS, or other software on the platform. Whilst some newer TEE architectures support migration of entire protected VMs, there is currently no practical solution for migrating individual processes containing in-process TEEs. The inability to migrate such processes leads to operational inefficiencies or even data loss if the host platform must be urgently restarted. We present CTR, a software-only design to retrofit migration functionality into existing TEE architectures, whilst maintaining their expected security guarantees. Our design allows TEEs to be interrupted and migrated at arbitrary points in their execution, thus maintaining compatibility with existing VM and process migration techniques. By cooperatively involving the TEE in the migration process, our design also allows application developers to specify stateful migration-related policies, such as limiting the number of times a particular TEE may be migrated. Our prototype implementation for Intel SGX demonstrates that migration latency increases linearly with the size of the TEE memory and is dominated by TEE system operations.
Confidential Machine Learning within Graphcore IPUs	2022-05-20	Show We present IPU Trusted Extensions (ITX), a set of experimental hardware extensions that enable trusted execution environments in Graphcore's AI accelerators. ITX enables the execution of AI workloads with strong confidentiality and integrity guarantees at low performance overheads. ITX isolates workloads from untrusted hosts, and ensures their data and models remain encrypted at all times except within the IPU. ITX includes a hardware root-of-trust that provides attestation capabilities and orchestrates trusted execution, and on-chip programmable cryptographic engines for authenticated encryption of code and data at PCIe bandwidth. We also present software for ITX in the form of compiler and runtime extensions that support multi-party training without requiring a CPU-based TEE. Experimental support for ITX is included in Graphcore's GC200 IPU taped out at TSMC's 7nm technology node. Its evaluation on a development board using standard DNN training workloads suggests that ITX adds less than 5% performance overhead, and delivers up to 17x better performance compared to CPU-based confidential computing systems relying on AMD SEV-SNP.
Trusted Container Extensions for Container-based Confidential Computing	2022-05-11	Show Cloud computing has emerged as a corner stone of today's computing landscape. More and more customers who outsource their infrastructure benefit from the manageability, scalability and cost saving that come with cloud computing. Those benefits get amplified by the trend towards microservices. Instead of renting and maintaining full VMs, customers increasingly leverage container technologies, which come with a much more lightweight resource footprint while also removing the need to emulate complete systems and their devices. However, privacy concerns hamper many customers from moving to the cloud and leveraging its benefits. Furthermore, regulatory requirements prevent the adaption of cloud computing in many industries, such as health care or finance. Standard software isolation mechanisms have been proven to be insufficient if the host system is not fully trusted, e.g., when the cloud infrastructure gets compromised by malicious third-party actors. Consequently, confidential computing is gaining increasing relevance in the cloud computing field. We present Trusted Container Extensions (TCX), a novel container security architecture, which combines the manageability and agility of standard containers with the strong protection guarantees of hardware-enforced Trusted Execution Environments (TEEs) to enable confidential computing for container workloads. TCX provides significant performance advantages compared to existing approaches while protecting container workloads and the data processed by them. Our implementation, based on AMD Secure Encrypted Virtualization (SEV), ensures integrity and confidentiality of data and services during deployment, and allows secure interaction between protected containers as well as to external entities. Our evaluation shows that our implementation induces a low performance overhead of 5.77% on the standard SPEC2017 benchmark suite.
Private delegated computations using strong isolation	2022-05-06	Show Sensitive computations are now routinely delegated to third-parties. In response, Confidential Computing technologies are being introduced to microprocessors, offering a protected processing environment, which we generically call an isolate, providing confidentiality and integrity guarantees to code and data hosted within -- even in the face of a privileged attacker. Isolates, with an attestation protocol, permit remote third-parties to establish a trusted "beachhead" containing known code and data on an otherwise untrusted machine. Yet, the rise of these technologies introduces many new problems, including: how to ease provisioning of computations safely into isolates; how to develop distributed systems spanning multiple classes of isolate; and what to do about the billions of "legacy" devices without support for Confidential Computing? Tackling the problems above, we introduce Veracruz, a framework that eases the design and implementation of complex privacy-preserving, collaborative, delegated computations among a group of mutually mistrusting principals. Veracruz supports multiple isolation technologies and provides a common programming model and attestation protocol across all of them, smoothing deployment of delegated computations over supported technologies. We demonstrate Veracruz in operation, on private in-cloud object detection on encrypted video streaming from a video camera. In addition to supporting hardware-backed isolates -- like AWS Nitro Enclaves and Arm Confidential Computing Architecture Realms -- Veracruz also provides pragmatic "software isolates" on Armv8-A devices without hardware Confidential Computing capability, using the high-assurance seL4 microkernel and our IceCap framework.
CROSSLINE: Breaking "Security-by-Crash" based Memory Isolation in AMD SEV	2022-03-31	Show AMD's Secure Encrypted Virtualization (SEV) is an emerging security feature on AMD processors that allows virtual machines to run on encrypted memory and perform confidential computing even with an untrusted hypervisor. This paper first demystifies SEV's improper use of address space identifier (ASID) for controlling accesses of a VM to encrypted memory pages, cache lines, and TLB entries. We then present the CROSSLINE attacks, a novel class of attacks against SEV that allow the adversary to launch an attacker VM and change its ASID to that of the victim VM to impersonate the victim. We present two variants of CROSSLINE attacks: CROSSLINE V1 decrypts victim's page tables or memory blocks following the format of a page table entry; CROSSLINE V2 constructs encryption and decryption oracles by executing instructions of the victim VM. We have successfully performed CROSSLINE attacks on SEV and SEV-ES processors.	14 pa... 14 pages, 5 figures, security
Speeding up enclave transitions for IO-intensive applications	2021-12-14	Show Process-based confidential computing enclaves such as Intel SGX can be used to protect the confidentiality and integrity of workloads, without the overhead of virtualisation. However, they introduce a notable performance overhead, especially when it comes to transitions in and out of the enclave context. Such overhead makes the use of enclaves impractical for running IO-intensive applications, such as network packet processing or biological sequence analysis. We build on earlier approaches to improve the IO performance of work-loads in Intel SGX enclaves and propose the SGX-Bundler library, which helps reduce the cost of both individual single enclave transitions well as of the total number of enclave transitions in trusted applications running in Intel SGX enclaves. We describe the implementation of the SGX-Bundler library, evaluate its performance and demonstrate its practicality using the case study of Open vSwitch, a widely used software switch implementation.
PIM-Enclave: Bringing Confidential Computation Inside Memory	2021-11-05	Show Demand for data-intensive workloads and confidential computing are the prominent research directions shaping the future of cloud computing. Computer architectures are evolving to accommodate the computing of large data better. Protecting the computation of sensitive data is also an imperative yet challenging objective; processor-supported secure enclaves serve as the key element in confidential computing in the cloud. However, side-channel attacks are threatening their security boundaries. The current processor architectures consume a considerable portion of its cycles in moving data. Near data computation is a promising approach that minimizes redundant data movement by placing computation inside storage. In this paper, we present a novel design for Processing-In-Memory (PIM) as a data-intensive workload accelerator for confidential computing. Based on our observation that moving computation closer to memory can achieve efficiency of computation and confidentiality of the processed information simultaneously, we study the advantages of confidential computing \emph{inside} memory. We then explain our security model and programming model developed for PIM-based computation offloading. We construct our findings into a software-hardware co-design, which we call PIM-Enclave. Our design illustrates the advantages of PIM-based confidential computing acceleration. Our evaluation shows PIM-Enclave can provide a side-channel resistant secure computation offloading and run data-intensive applications with negligible performance overhead compared to baseline PIM model.
VIA: Analyzing Device Interfaces of Protected Virtual Machines	2021-09-22	Show Both AMD and Intel have presented technologies for confidential computing in cloud environments. The proposed solutions - AMD SEV (-ES, -SNP) and Intel TDX - protect Virtual Machines (VMs) against attacks from higher privileged layers through memory encryption and integrity protection. This model of computation draws a new trust boundary between virtual devices and the VM, which in so far lacks thorough examination. In this paper, we therefore present an analysis of the virtual device interface and discuss several attack vectors against a protected VM. Further, we develop and evaluate VIA, an automated analysis tool to detect cases of improper sanitization of input recieved via the virtual device interface. VIA improves upon existing approaches for the automated analysis of device interfaces in the following aspects: (i) support for virtualization relevant buses, (ii) efficient Direct Memory Access (DMA) support and (iii) performance. VIA builds upon the Linux Kernel Library and clang's libfuzzer to fuzz the communication between the driver and the device via MMIO, PIO, and DMA. An evaluation of VIA shows that it performs 570 executions per second on average and improves performance compared to existing approaches by an average factor of 2706. Using VIA, we analyzed 22 drivers in Linux 5.10.0-rc6, thereby uncovering 50 bugs and initiating multiple patches to the virtual device driver interface of Linux. To prove our findings criticality under the threat model of AMD SEV and Intel TDX, we showcase three exemplary attacks based on the bugs found. The attacks enable a malicious hypervisor to corrupt the memory and gain code execution in protected VMs with SEV-ES and are theoretically applicable to SEV-SNP and TDX.
Encrypted Data Processing	2021-09-20	Show In this paper, we present a comprehensive architecture for confidential computing, which we show to be general purpose and quite efficient. It executes the application as is, without any added burden or discipline requirements from the application developers. Furthermore, it does not require the trust of system software at the computing server and does not impose any added burden on the communication subsystem. The proposed Encrypted Data Processing (EDAP) architecture accomplishes confidentiality, authenticity, and freshness of the key-based cryptographic data protection by adopting data encryption with a multi-level key protection scheme. It guarantees that the user data is visible only in non-privileged mode to a designated program trusted by the data owner on a designated hardware, thus protecting the data from an untrusted hardware, hypervisor, OS, or other users' applications. The cryptographic keys and protocols used for achieving these confidential computing requirements are described in a use case example. Encrypting and decrypting data in an EDAP-enabled processor can lead to performance degradation as it adds cycle time to the overall execution. However, our simulation result shows that the slowdown is only 6% on average across a collection of commercial workloads when the data encryption engine is placed between the L1 and L2 cache. We demonstrate that the EDAP architecture is valuable and practicable in the modern cloud environment for confidential computing. EDAP delivers a zero trust model of computing where the user software does not trust system software and vice versa.	16 pa... 16 pages, 12 figures, manuscript submitted to ACM Transactions on Privacy and Security
Understanding TEE Containers, Easy to Use? Hard to Trust	2021-09-04	Show As an emerging technique for confidential computing, trusted execution environment (TEE) receives a lot of attention. To better develop, deploy, and run secure applications on a TEE platform such as Intel's SGX, both academic and industrial teams have devoted much effort to developing reliable and convenient TEE containers. In this paper, we studied the isolation strategies of 15 existing TEE containers to protect secure applications from potentially malicious operating systems (OS) or untrusted applications, using a semi-automatic approach combining a feedback-guided analyzer with manual code review. Our analysis reveals the isolation protection each of these TEE containers enforces, and their security weaknesses. We observe that none of the existing TEE containers can fulfill the goal they set, due to various pitfalls in their design and implementation. We report the lessons learnt from our study for guiding the development of more secure containers, and further discuss the trend of TEE container designs. We also release our analyzer that helps evaluate the container middleware both from the enclave and from the kernel.
Confidential Computing for Privacy-Preserving Contact Tracing	2020-06-25	Show Contact tracing is paramount to fighting the pandemic but it comes with legitimate privacy concerns. This paper proposes a system enabling both, contact tracing and data privacy. We propose the use of the Intel SGX trusted execution environment to build a privacy-preserving contact tracing backend. While the concept of a confidential computing backend proposed in this paper can be combined with any existing contact tracing smartphone application, we describe a full contact tracing system for demonstration purposes. A prototype of a privacy-preserving contact tracing system based on SGX has been implemented by the authors in a hackathon.
An Agent-Based Intelligent HCI Information System in Mixed Reality	2019-11-07	Show This paper presents a design of agent-based intelligent HCI (iHCI) system using collaborative information for MR to improve user experience and information security based on context-aware computing. In order to implement target awareness system, we propose the use of non-parameter stochastic adaptive learning and a kernel learning strategy for improving the adaptivity of the recognition. The proposed design involves the use of a context-aware computing strategy to recognize patterns for simulating human awareness and processing of stereo pattern analysis. It provides a flexible customization method for scene creation and manipulation. It also enables several types of awareness related to the interactive target, user-experience, system performance, confidentiality, and agent identification by applying several strategies, such as context pattern analysis, scalable learning, data-aware confidential computing.

Serverless

Title	Date	Abstract	Comment
Joint$λ$: Orchestrating Serverless Workflows on Jointcloud FaaS Systems	2025-05-28	Show Existing serverless workflow orchestration systems are predominantly designed for a single-cloud FaaS system, leading to vendor lock-in. This restricts performance optimization, cost reduction, and availability of applications. However, orchestrating serverless workflows on Jointcloud FaaS systems faces two main challenges: 1) Additional overhead caused by centralized cross-cloud orchestration; and 2) A lack of reliable failover and fault-tolerant mechanisms for cross-cloud serverless workflows. To address these challenges, we propose Joint$\lambda$, a distributed runtime system designed to orchestrate serverless workflows on multiple FaaS systems without relying on a centralized orchestrator. Joint$\lambda$ introduces a compatibility layer, Backend-Shim, leveraging inter-cloud heterogeneity to optimize makespan and reduce costs with on-demand billing. By using function-side orchestration instead of centralized nodes, it enables independent function invocations and data transfers, reducing cross-cloud communication overhead. For high availability, it ensures exactly-once execution via datastores and failover mechanisms for serverless workflows on Jointcloud FaaS systems. We validate Joint$\lambda$ on two heterogeneous FaaS systems, AWS and ALiYun, with four workflows. Compared to the most advanced commercial orchestration services for single-cloud serverless workflows, Joint$\lambda$ reduces up to 3.3$\times$ latency, saving up to 65% cost. Joint$\lambda$ is also faster than the state-of-the-art orchestrators for cross-cloud serverless workflows up to 4.0$\times$, reducing up to 4.5$\times$ cost and providing strong execution guarantees.
Multi-Event Triggers for Serverless Computing	2025-05-27	Show Function-as-a-Service (FaaS) is an event-driven serverless cloud computing model in which small, stateless functions are invoked in response to events, such as HTTP requests, new database entries, or messages. Current FaaS platform assume that each function invocation corresponds to a single event. However, from an application perspective, it is desirable to invoke functions in response to a collection of events of different types or only with every n\textsuperscript{th} event. To implement this today, a function would need additional state management, e.g., in a database, and custom logic to determine whether its trigger condition is fulfilled and the actual application code should run. In such an implementation, most function invocations would be rendered essentially useless, leading to unnecessarily high resource usage, latency, and cost for applications. In this paper, we introduce multi-event triggers, through which complex conditions for function invocations can be specified. Specifically, we introduce abstractions for invoking functions based on a set of $n$ events and joins of multiple events of different types. This enables application developers to define intricate conditions for function invocations, workflow steps, and complex event processing. Our evaluation with a proof-of-concept prototype shows that this reduces event--invocation latency by 62.5% in an incident detection use-case and that our system can handle more than 300,000 requests per second on limited hardware, which is sufficient load for implementation in large FaaS platforms.
Universal Workers: A Vision for Eliminating Cold Starts in Serverless Computing	2025-05-26	Show Serverless computing enables developers to deploy code without managing infrastructure, but suffers from cold start overhead when initializing new function instances. Existing solutions such as "keep-alive" or "pre-warming" are costly and unreliable under bursty workloads. We propose universal workers, which are computational units capable of executing any function with minimal initialization overhead. Based on an analysis of production workload traces, our key insight is that requests in Function-as-a-Service (FaaS) platforms show a highly skewed distribution, with most requests invoking a small subset of functions. We exploit this observation to approximate universal workers through locality groups and three-tier caching (handler, install, import). With this work, we aim to enable more efficient and scalable FaaS platforms capable of handling diverse workloads with minimal initialization overhead.	Accep... Accepted for publication in 2025 IEEE 18th International Conference on Cloud Computing (CLOUD)
ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs	2025-05-20	Show Serverless computing has grown rapidly for serving Large Language Model (LLM) inference due to its pay-as-you-go pricing, fine-grained GPU usage, and rapid scaling. However, our analysis reveals that current serverless can effectively serve general LLM but fail with Low-Rank Adaptation (LoRA) inference due to three key limitations: 1) massive parameter redundancy among functions where 99% of weights are unnecessarily duplicated, 2) costly artifact loading latency beyond LLM loading, and 3) magnified resource contention when serving multiple LoRA LLMs. These inefficiencies lead to massive GPU wastage, increased Time-To-First-Token (TTFT), and high monetary costs. We propose ServerlessLoRA, a novel serverless inference system designed for faster and cheaper LoRA LLM serving. ServerlessLoRA enables secure backbone LLM sharing across isolated LoRA functions to reduce redundancy. We design a pre-loading method that pre-loads comprehensive LoRA artifacts to minimize cold-start latency. Furthermore, ServerlessLoRA employs contention aware batching and offloading to mitigate GPU resource conflicts during bursty workloads. Experiment on industrial workloads demonstrates that ServerlessLoRA reduces TTFT by up to 86% and cuts monetary costs by up to 89% compared to state-of-the-art LLM inference solutions.
Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA Fabrics	2025-05-16	Show Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally but fall short when scaling across nodes. Furthermore, serverless environments can have unpredictable and large-scale multi-tenancy, leading to contention for shared network resources. We present Palladium, a DPU-centric serverless data plane that reduces the CPU burden and enables efficient, zero-copy communication in multi-tenant serverless clouds. Despite the limited general-purpose processing capability of the DPU cores, Palladium strategically exploits the DPU's potential by (1) offloading data transmission to high-performance NIC cores via RDMA, combined with intra-node shared memory to eliminate data copies across nodes, and (2) enabling cross-processor (CPU-DPU) shared memory to eliminate redundant data movement, which overwhelms wimpy DPU cores. At the core of Palladium is the DPU-enabled network engine (DNE) -- a lightweight reverse proxy that isolates RDMA resources from tenant functions, orchestrates inter-node RDMA flows, and enforces fairness under contention. To further reduce CPU involvement, Palladium performs early HTTP/TCP-to-RDMA transport conversion at the cloud ingress, bridging the protocol mismatch before client traffic enters the RDMA fabric, thus avoiding costly protocol translation along the critical path. We show that careful selection of RDMA primitives (i.e., two-sided instead of one-sided) significantly affects the zero-copy data plane. Our preliminary experimental results show that enabling DPU offloading in Palladium improves RPS by 20.9x. The latency is reduced by a factor of 21x in the best case, all the while saving up to 7 CPU cores, and only consuming two wimpy DPU cores.
ABase: the Multi-Tenant NoSQL Serverless Database for Diverse and Dynamic Workloads in Large-scale Cloud Environments	2025-05-12	Show Multi-tenant architectures enhance the elasticity and resource utilization of NoSQL databases by allowing multiple tenants to co-locate and share resources. However, in large-scale cloud environments, the diverse and dynamic nature of workloads poses significant challenges for multi-tenant NoSQL databases. Based on our practical observations, we have identified three crucial challenges: (1) the impact of caching on performance isolation, as cache hits alter request execution and resource consumption, leading to inaccurate traffic control; (2) the dynamic changes in traffic, with changes in tenant traffic trends causing throttling or resource wastage, and changes in access distribution causing hot key pressure or cache hit ratio drops; and (3) the imbalanced layout of data nodes due to tenants' diverse resource requirements, leading to low resource utilization. To address these challenges, we introduce ABase, a multi-tenant NoSQL serverless database developed at ByteDance. ABase introduces a two-layer caching mechanism with a cache-aware isolation mechanism to ensure accurate resource consumption estimates. Furthermore, ABase employs a predictive autoscaling policy to dynamically adjust resources in response to tenant traffic changes and a multi-resource rescheduling algorithm to balance resource utilization across data nodes. With these innovations, ABase has successfully served ByteDance's large-scale cloud environment, supporting a total workload that has achieved a peak QPS of over 13 billion and total storage exceeding 1 EB.	SIGMOD 2025 accepted
HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences	2025-05-04	Show Serverless Computing (FaaS) has become a popular paradigm for deep learning inference due to the ease of deployment and pay-per-use benefits. However, current serverless inference platforms encounter the coarse-grained and static GPU resource allocation problems during scaling, which leads to high costs and Service Level Objective (SLO) violations in fluctuating workloads. Meanwhile, current platforms only support horizontal scaling for GPU inferences, thus the cold start problem further exacerbates the problems. In this paper, we propose HAS-GPU, an efficient Hybrid Auto-scaling Serverless architecture with fine-grained GPU allocation for deep learning inferences. HAS-GPU proposes an agile scheduler capable of allocating GPU Streaming Multiprocessor (SM) partitions and time quotas with arbitrary granularity and enables significant vertical quota scalability at runtime. To resolve performance uncertainty introduced by massive fine-grained resource configuration spaces, we propose the Resource-aware Performance Predictor (RaPP). Furthermore, we present an adaptive hybrid auto-scaling algorithm with both horizontal and vertical scaling to ensure inference SLOs and minimize GPU costs. The experiments demonstrated that compared to the mainstream serverless inference platform, HAS-GPU reduces function costs by an average of 10.8x with better SLO guarantees. Compared to state-of-the-art spatio-temporal GPU sharing serverless framework, HAS-GPU reduces function SLO violation by 4.8x and cost by 1.72x on average.	The p... The paper has been accepted by Euro-Par 2025
Confidential Serverless Computing	2025-05-01	Show Although serverless computing offers compelling cost and deployment simplicity advantages, a significant challenge remains in securely managing sensitive data as it flows through the network of ephemeral function executions in serverless computing environments within untrusted clouds. While Confidential Virtual Machines (CVMs) offer a promising secure execution environment, their integration with serverless architectures currently faces fundamental limitations in key areas: security, performance, and resource efficiency. We present Hacher, a confidential computing system for secure serverless deployments to overcome these limitations. By employing nested confidential execution and a decoupled guest OS within CVMs, Hacher runs each function in a minimal "trustlet", significantly improving security through a reduced Trusted Computing Base (TCB). Furthermore, by leveraging a data-centric I/O architecture built upon a lightweight LibOS, Hacher optimizes network communication to address performance and resource efficiency challenges. Our evaluation shows that compared to CVM-based deployments, Hacher has 4.3x smaller TCB, improves end-to-end latency (15-93%), achieves higher function density (up to 907x), and reduces inter-function communication (up to 27x) and function chaining latency (16.7-30.2x); thus, Hacher offers a practical system for confidential serverless computing.
Cosmos: A Cost Model for Serverless Workflows in the 3D Compute Continuum	2025-04-30	Show Due to the high scalability, infrastructure management, and pay-per-use pricing model, serverless computing has been adopted in a wide range of applications such as real-time data processing, IoT, and AI-related workflows. However, deploying serverless functions across dynamic and heterogeneous environments such as the 3D (Edge-Cloud-Space) Continuum introduces additional complexity. Each layer of the 3D Continuum shows different performance capabilities and costs according to workload characteristics. Cloud services alone often show significant differences in performance and pricing for similar functions, further complicating cost management. Additionally, serverless workflows consist of functions with diverse characteristics, requiring a granular understanding of performance and cost trade-offs across different infrastructure layers to be able to address them individually. In this paper, we present Cosmos, a cost- and a performance-cost-tradeoff model for serverless workflows that identifies key factors that affect cost changes across different workloads and cloud providers. We present a case study analyzing the main drivers that influence the costs of serverless workflows. We demonstrate how to classify the costs of serverless workflows in leading cloud providers AWS and GCP. Our results show that for data-intensive functions, data transfer and state management costs contribute to up to 75% of the costs in AWS and 52% in GCP. For compute-intensive functions such as AI inference, the cost results show that BaaS services are the largest cost driver, reaching up to 83% in AWS and 97% in GCP.
CWASI: A WebAssembly Runtime Shim for Inter-function Communication in the Serverless Edge-Cloud Continuum	2025-04-30	Show Serverless Computing brings advantages to the Edge-Cloud continuum, like simplified programming and infrastructure management. In composed workflows, where serverless functions need to exchange data constantly, serverless platforms rely on remote services such as object storage and key-value stores as a common approach to exchange data. In WebAssembly, functions leverage WebAssembly System Interface to connect to the network and exchange data via remote services. As a consequence, co-located serverless functions need remote services to exchange data, increasing latency and adding network overhead. To mitigate this problem, in this paper, we introduce CWASI: a WebAssembly OCI-compliant runtime shim that determines the best inter-function data exchange approach based on the serverless function locality. CWASI introduces a three-mode communication model for the Serverless Edge-Cloud continuum. This communication model enables CWASI Shim to optimize inter-function communication for co-located functions by leveraging the function host mechanisms. Experimental results show that CWASI reduces the communication latency between the co-located serverless functions by up to 95% and increases the communication throughput by up to 30x.	Proce... Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing
Efficient Serverless Cold Start: Reducing Library Loading Overhead by Profile-guided Optimization	2025-04-27	Show Serverless computing abstracts away server management, enabling automatic scaling, efficient resource utilization, and cost-effective pricing models. However, despite these advantages, it faces the significant challenge of cold-start latency, adversely impacting end-to-end performance. Our study shows that many serverless functions initialize libraries that are rarely or never used under typical workloads, thus introducing unnecessary overhead. Although existing static analysis techniques can identify unreachable libraries, they fail to address workload-dependent inefficiencies, resulting in limited performance improvements. To overcome these limitations, we present SLIMSTART, a profile-guided optimization tool designed to identify and mitigate inefficient library usage patterns in serverless applications. By leveraging statistical sampling and call-path profiling, SLIMSTART collects runtime library usage data, generates detailed optimization reports, and applies automated code transformations to reduce cold-start overhead. Furthermore, SLIMSTART integrates seamlessly into CI/CD pipelines, enabling adaptive monitoring and continuous optimizations tailored to evolving workloads. Through extensive evaluation across three benchmark suites and four real-world serverless applications, SLIMSTART achieves up to a 2.30X speedup in initialization latency, a 2.26X improvement in end-to-end latency, and a 1.51X reduction in memory usage, demonstrating its effectiveness in addressing cold-start inefficiencies and optimizing resource utilization.	Accep... Accepted for publication at the 45th IEEE International Conference on Distributed Computing Systems (ICDCS 2025)
Leveraging Core and Uncore Frequency Scaling for Power-Efficient Serverless Workflows	2025-04-21	Show Serverless workflows have emerged in Function-as-a-Service (FaaS) platforms to represent the operational structure of traditional applications. With latency propagation effects becoming increasingly prominent, step-wise resource tuning is required to address Service-Level-Objectives (SLOs). Modern processors' allowance for fine-grained Dynamic Voltage and Frequency Scaling (DVFS), coupled with serverless workflows' intermittent nature, presents a unique opportunity to reduce power while meeting SLOs. We introduce $\Omega$kypous, an SLO-driven DVFS framework for serverless workflows. $\Omega$kypous employs a grey-box model that predicts functions' execution latency and power under different Core and Uncore frequency combinations. Based on these predictions and the timing slacks between workflow functions, $\Omega$kypous uses a closed-loop control mechanism to dynamically adjust Core and Uncore frequencies, thus minimizing power consumption without compromising predefined end- 10000 to-end latency constraints. Our evaluation on real-world traces from Azure, against state-of-the-art power management frameworks, demonstrates an average power consumption reduction of 16%, while consistently maintaining low SLO violation rates (1.8%), when operating under power caps.
Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing	2025-04-11	Show Orbital edge computing reduces the data transmission needs of Earth observation satellites by processing sensor data on-board, allowing near-real-time insights while minimizing downlink costs. However, current orbital edge computing architectures are inflexible, requiring custom mission planning and high upfront development costs. In this paper, we propose a novel approach: shared Earth observation satellites that are operated by a central provider but used by multiple tenants. Each tenant can execute their own logic on-board the satellite to filter, prioritize, and analyze sensor data. We introduce Trabant, a serverless architecture for shared satellite platforms, leveraging the Function-as-a-Service (FaaS) paradigm and time-shifted computing. This architecture abstracts operational complexities, enabling dynamic scheduling under satellite resource constraints, reducing deployment overhead, and aligning event-driven satellite observations with intermittent computation. We present the design of Trabant, demonstrate its capabilities with a proof-of-concept prototype, and evaluate it using real satellite computing telemetry data. Our findings suggest that Trabant can significantly reduce mission planning overheads, offering a scalable and efficient platform for diverse Earth observation missions.
ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity	2025-04-09	Show Serverless computing, with its operational simplicity and on-demand scalability, has become a preferred paradigm for deploying workflow applications. However, resource allocation for workflows, particularly those with branching structures, is complicated by cold starts and network delays between dependent functions, significantly degrading execution efficiency and response times. In this paper, we propose the Invocation Concurrency Prediction-Based Scaling (ICPS) algorithm to address these challenges. ICPS employs Long Short-Term Memory (LSTM) networks to predict function concurrency, dynamically pre-warming function instances, and an affinity-based deployment strategy to co-locate dependent functions on the same worker node, minimizing network latency. The experimental results demonstrate that ICPS consistently outperforms existing approaches in diverse scenarios. The results confirm ICPS as a robust and scalable solution for optimizing serverless workflow execution.
Serverless Approach to Running Resource-Intensive STAR Aligner	2025-04-07	Show The application of serverless computing for alignment of RNA-sequences can improve many existing bioinformatics workflows by reducing operational costs and execution times. This work analyzes the applicability of serverless services for running the STAR aligner, which is known for its accuracy and large memory requirement. This presents a challenge, as serverless services were designed for light and short tasks. Nevertheless, we successfully deploy a STAR-based pipeline on AWS ECS service, propose multiple optimizations, and perform experiment with 17 TBs of data. Results are compared against standard virtual machine (VM) based solution showing that serverless is a valid alternative for small-scale batch processing. However, in large-scale where efficiency matters the most, VMs are still recommended.	Accep... Accepted at CCGrid2025 conference in the poster format
GeoNimbus: A serverless framework to build earth observation and environmental services	2025-03-26	Show Cloud computing has become a popular solution for organizations implementing Earth Observation Systems (EOS). However, this produces a dependency on provider resources. Moreover, managing and executing tasks and data in these environments are challenges that commonly arise when building an EOS. This paper presents GeoNimbus, a serverless framework for composing and deploying spatio-temporal EOS on multiple infrastructures, e.g., on-premise resources and public or private clouds. This framework organizes EOS tasks as functions and automatically manages their deployment, invocation, scalability, and monitoring in the cloud. GeoNimbus framework enables organizations to reuse and share available functions to compose multiple EOS. We use this framework to implement EOS as a service for conducting a case study focused on measuring water resource changes in a lake in the south of Mexico. The experimental evaluation revealed the feasibility and efficiency of using GeoNimbus to build different earth observation studies.	12 pa... 12 pages, 10 images. Presented at the 1st workshop about High-Performance e-Science in the EuroPar2024 conference
SeBS-Flow: Benchmarking Serverless Cloud Function Workflows	2025-03-25	Show Serverless computing has emerged as a prominent paradigm, with a significant adoption rate among cloud customers. While this model offers advantages such as abstraction from the deployment and resource scheduling, it also poses limitations in handling complex use cases due to the restricted nature of individual functions. Serverless workflows address this limitation by orchestrating multiple functions into a cohesive application. However, existing serverless workflow platforms exhibit significant differences in their programming models and infrastructure, making fair and consistent performance evaluations difficult in practice. To address this gap, we propose the first serverless workflow benchmarking suite SeBS-Flow, providing a platform-agnostic workflow model that enables consistent benchmarking across various platforms. SeBS-Flow includes six real-world application benchmarks and four microbenchmarks representing different computational patterns. We conduct comprehensive evaluations on three major cloud platforms, assessing performance, cost, scalability, and runtime deviations. We make our benchmark suite open-source, enabling rigorous and comparable evaluations of serverless workflows over time.
PipeBoost: Resilient Pipelined Architecture for Fast Serverless LLM Scaling	2025-03-22	Show This paper presents PipeBoost, a low-latency LLM serving system for multi-GPU (serverless) clusters, which can rapidly launch inference services in response to bursty requests without preemptively over-provisioning GPUs. Many LLM inference tasks rely on the same base model (e.g., LoRA). To leverage this, PipeBoost introduces fault-tolerant pipeline parallelism across both model loading and inference stages. This approach maximizes aggregate PCIe bandwidth and parallel computation across GPUs, enabling faster generation of the first token. PipeBoost also introduces recovery techniques that enable uninterrupted inference services by utilizing the shared advantages of multiple GPUs. Experimental results show that, compared to state-of-the-art low-latency LLM serving systems, PipeBoost reduces inference latency by 31% to 49.8%. For certain models (e.g., OPT-1.3B), PipeBoost achieves cold-start latencies in the range of a few hundred microseconds.
Acurast: Decentralized Serverless Cloud	2025-03-19	Show Centralized trust is ubiquitous in today's interconnected world, from computational resources to data storage and its underlying infrastructure. The monopolization of cloud computing resembles a feudalistic system, causing a loss of privacy and data ownership. Cloud Computing and the Internet in general face widely recognized challenges, such as (1) the centralization of trust in auxiliary systems (e.g., centralized cloud providers), (2) the seamless and permissionless interoperability of fragmented ecosystems and (2) the effectiveness, verifiability, and confidentiality of the computation. Acurast is a decentralized serverless cloud that addresses all these shortcomings, following the call for a global-scale cloud founded on the principles of the open-source movement. In Acurast, a purpose-built orchestrator, a reputation engine, and an attestation service are enshrined in the consensus layer. Developers can off-load their computations and verify executions cryptographically. Furthermore, Acurast offers a modular execution layer, taking advantage of secure hardware and trusted execution environments, removing the trust required in third parties, and reducing them to cryptographic hardness assumptions. With this modular architecture, Acurast serves as a decentralized and serverless cloud, allowing confidential and verifiable compute backed by the hardware of security and performance mobile devices.	v. 0.... v. 0.2., March 17th 2025, White Paper
FaaSMT: Lightweight Serverless Framework for Intrusion Detection Using Merkle Tree and Task Inlining	2025-03-09	Show The serverless platform aims to facilitate cloud applications' straightforward deployment, scaling, and management. Unfortunately, the distributed nature of serverless computing makes it difficult to port traditional security tools directly. The existing serverless solutions primarily identify potential threats or performance bottlenecks through post-analysis of modified operating system audit logs, detection of encrypted traffic offloading, or the collection of runtime metrics. However, these methods often prove inadequate for comprehensively detecting communication violations across functions. This limitation restricts the real-time log monitoring and validation capabilities in distributed environments while impeding the maintenance of minimal communication overhead. Therefore, this paper presents FaaSMT, which aims to fill this gap by addressing research questions related to security checks and the optimization of performance and costs in serverless applications. This framework employs parallel processing for the collection of distributed data logs, incorporating Merkle Tree algorithms and heuristic optimisation methods to achieve adaptive inline security task execution. The results of experimental trials demonstrate that FaaSMT is capable of effectively identifying major attack types (e.g., Denial of Wallet (DoW) and Business Logic attacks), thereby providing comprehensive monitoring and validation of function executions while significantly reducing performance overhead.
Dilu: Enabling GPU Resourcing-on-Demand for Serverless DL Serving via Introspective Elasticity	2025-03-07	Show Serverless computing, with its ease of management, auto-scaling, and cost-effectiveness, is widely adopted by deep learning (DL) applications. DL workloads, especially with large language models, require substantial GPU resources to ensure QoS. However, it is prone to produce GPU fragments (e.g., 15%-94%) in serverless DL systems due to the dynamicity of workloads and coarse-grained static GPU allocation mechanisms, gradually eroding the profits offered by serverless elasticity. Different from classical serverless systems that only scale horizontally, we present introspective elasticity (IE), a fine-grained and adaptive two-dimensional co-scaling mechanism to support GPU resourcing-on-demand for serverless DL tasks. Based on this insight, we build Dilu, a cross-layer and GPU-based serverless DL system with IE support. First, Dilu provides multi-factor profiling for DL tasks with efficient pruning search methods. Second, Dilu adheres to the resourcing-complementary principles in scheduling to improve GPU utilization with QoS guarantees. Third, Dilu adopts an adaptive 2D co-scaling method to enhance the elasticity of GPU provisioning in real time. Evaluations show that it can dynamically adjust the resourcing of various DL functions with low GPU fragmentation (10%-46% GPU defragmentation), high throughput (up to 1.8$\times$ inference and 1.1$\times$ training throughput increment) and QoS guarantees (11%-71% violation rate reduction), compared to the SOTA baselines.
Uncoordinated Access to Serverless Computing in MEC Systems for IoT	2025-03-01	Show Edge computing is a promising solution to enable low-latency IoT applications, by shifting computation from remote data centers to local devices, less powerful but closer to the end user devices. However, this creates the challenge on how to best assign clients to edge nodes offering compute capabilities. So far, two antithetical architectures are proposed: centralized resource orchestration or distributed overlay. In this work we explore a third way, called uncoordinated access, which consists in letting every device exploring multiple opportunities, to opportunistically embrace the heterogeneity of network and load conditions towards diverse edge nodes. In particular, our contribution is intended for emerging serverless IoT applications, which do not have a state on the edge nodes executing tasks. We model the proposed system as a set of M/M/1 queues and show that it achieves a smaller jitter delay than single edge node allocation. Furthermore, we compare uncoordinated access with state-of-the-art centralized and distributed alternatives in testbed experiments under more realistic conditions. Based on the results, our proposed approach, which requires a tiny fraction of the complexity of the alternatives in both the device and network components, is very effective in using the network resources, while incurring only a small penalty in terms of increased compute load and high percentiles of delay.
Cicada: A Pipeline-Efficient Approach to Serverless Inference with Decoupled Management	2025-02-28	Show Serverless computing has emerged as a pivotal paradigm for deploying Deep Learning (DL) models, offering automatic scaling and cost efficiency. However, the inherent cold start problem in serverless ML inference systems, particularly the time-consuming model loading process, remains a significant bottleneck. Utilizing pipelined model loading improves efficiency but still suffer from pipeline stalls due to sequential layer construction and monolithic weight loading. In this paper, we propose \textit{Cicada}, a novel pipeline optimization framework that coordinates computational, storage, and scheduling resources through three key mechanisms: (1) \textit{MiniLoader}: which reduces layer construction overhead by opportunistically optimizing parameter initialization; (2) \textit{WeightDecoupler}: decoupling weight file processing from layer construction, enabling asynchronous weight retrieval and out-of-order weight application; (3) \textit{Priority-Aware Scheduler}: dynamically allocating resources to ensure high-priority inference tasks are executed promptly. Our experimental results demonstrate that Cicada achieves significant performance improvements over the state-of-the-art PISeL framework. Specifically, Cicada reduces end-to-end inference latency by an average of 61.59%, with the MiniLoader component contributing the majority of this optimization (53.41%), and the WeightDecoupler achieves up to 26.17% improvement. Additionally, Cicada achieves up to 2.52x speedup in the inference pipeline utlization compared to PISeL.	13pages, 14 figures
AARC: Automated Affinity-aware Resource Configuration for Serverless Workflows	2025-02-28	Show Serverless computing is increasingly adopted for its ability to manage complex, event-driven workloads without the need for infrastructure provisioning. However, traditional resource allocation in serverless platforms couples CPU and memory, which may not be optimal for all functions. Existing decoupling approaches, while offering some flexibility, are not designed to handle the vast configuration space and complexity of serverless workflows. In this paper, we propose AARC, an innovative, automated framework that decouples CPU and memory resources to provide more flexible and efficient provisioning for serverless workloads. AARC is composed of two key components: Graph-Centric Scheduler, which identifies critical paths in workflows, and Priority Configurator, which applies priority scheduling techniques to optimize resource allocation. Our experimental evaluation demonstrates that AARC achieves substantial improvements over state-of-the-art methods, with total search time reductions of 85.8% and 89.6%, and cost savings of 49.6% and 61.7%, respectively, while maintaining SLO compliance.	Accep... Accepted by the 62nd Design Automation Conference (DAC 2025)
Hiku: Pull-Based Scheduling for Serverless Computing	2025-02-21	Show Serverless computing promises convenient abstractions for developing and deploying functions that execute in response to events. In such Function-as-a-Service (FaaS) platforms, scheduling is an integral task, but current scheduling algorithms often struggle with maintaining balanced loads, minimizing cold starts, and adapting to commonly occurring bursty workloads. In this work, we propose pull-based scheduling as a novel scheduling algorithm for serverless computing. Our key idea is to decouple worker selection from task assignment, with idle workers requesting new tasks proactively. Experimental evaluation on an open-source FaaS platform shows that pull-based scheduling, compared to other existing scheduling algorithms, significantly improves the performance and load balancing of serverless workloads, especially under high concurrency. The proposed algorithm improves response latencies by 14.9% compared to hash-based scheduling, reduces the frequency of cold starts from 43% to 30%, increases throughput by 8.3%, and achieves a more even load distribution by 12.9% measured by the requests assigned per worker.	Accep... Accepted for publication in 25th IEEE International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2025)
Towards Swift Serverless LLM Cold Starts with ParaServe	2025-02-21	Show With the surge in number of large language models (LLMs), the industry turns to serverless computing for LLM inference serving. However, serverless LLM serving suffers from significant cold start latency and service level objective (SLO) violations due to the substantial model size, which leads to prolonged model fetching time from remote storage. We present ParaServe, a serverless LLM serving system that minimizes cold start latency through the novel use of pipeline parallelism. Our insight is that by distributing model parameters across multiple GPU servers, we can utilize their aggregated network bandwidth to concurrently fetch different parts of the model. ParaServe adopts a two-level hierarchical design. At the cluster level, ParaServe determines the optimal degree of parallelism based on user SLOs and carefully places GPU workers across servers to reduce network interference. At the worker level, ParaServe overlaps model fetching, loading, and runtime initialization to further accelerate cold starts. Additionally, ParaServe introduces pipeline consolidation, which merges parallel groups back to individual workers to maintain optimal performance for warm requests. Our comprehensive evaluations under diverse settings demonstrate that ParaServe reduces the cold start latency by up to 4.7x and improves SLO attainment by up to 1.74x compared to baselines.
It Takes Two to Tango: Serverless Workflow Serving via Bilaterally Engaged Resource Adaptation	2025-02-20	Show Serverless platforms typically adopt an early-binding approach for function sizing, requiring developers to specify an immutable size for each function within a workflow beforehand. Accounting for potential runtime variability, developers must size functions for worst-case scenarios to ensure service-level objectives (SLOs), resulting in significant resource inefficiency. To address this issue, we propose Janus, a novel resource adaptation framework for serverless platforms. Janus employs a late-binding approach, allowing function sizes to be dynamically adapted based on runtime conditions. The main challenge lies in the information barrier between the developer and the provider: developers lack access to runtime information, while providers lack domain knowledge about the workflow. To bridge this gap, Janus allows developers to provide hints containing rules and options for resource adaptation. Providers then follow these hints to dynamically adjust resource allocation at runtime based on real-time function execution information, ensuring compliance with SLOs. We implement Janus and conduct extensive experiments with real-world serverless workflows. Our results demonstrate that Janus enhances resource efficiency by up to 34.7% compared to the state-of-the-art.	to be... to be published in the 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS)
KiSS: A Novel Container Size-Aware Memory Management Policy for Serverless in Edge-Cloud Continuum	2025-02-18	Show Serverless computing has revolutionized cloud architectures by enabling developers to deploy event-driven applications via lightweight, self-contained virtualized containers. However, serverless frameworks face critical cold-start challenges in resource-constrained edge environments, where traditional solutions fall short. The limitations are especially pronounced in edge environments, where heterogeneity and resource constraints exacerbate inefficiencies in resource utilization. This paper introduces KiSS (Keep it Separated Serverless), a static, container size-aware memory management policy tailored for the edge-cloud continuum. The design of KiSS is informed by a detailed workload analysis that identifies critical patterns in container size, invocation frequency, and memory contention. Guided by these insights, KiSS partitions memory pools into categories for small, frequently invoked containers and larger, resource-intensive ones, ensuring efficient resource utilization while minimizing cold starts and inter-function interference. Using a discrete-event simulator, we evaluate KiSS on edge-cluster environments with real-world-inspired workloads. Results show that KiSS reduces cold-start percentages by 60% and function drops by 56.5%, achieving significant performance gains in resource-constrained settings. This work underscores the importance of workload-driven design in advancing serverless efficiency at the edge.
Serverless Edge Computing: A Taxonomy, Systematic Literature Review, Current Trends and Research Challenges	2025-02-16	Show In recent years, the rapid expansion of Internet of Things (IoT) nodes and devices has seamlessly integrated technology into everyday life, amplifying the demand for optimized computing solutions. To meet the critical Quality of Service (QoS) requirements such as reduced latency, efficient bandwidth usage, swift reaction times, scalability, privacy, and security serverless edge computing has emerged as a transformative paradigm. This systematic literature review explores the current landscape of serverless edge computing, analyzing recent studies to uncover the present state of this technology. The review identifies the essential features of serverless edge computing, focusing on architectural designs, QoS metrics, implementation specifics, practical applications, and communication modalities central to this paradigm. Furthermore, we propose a comprehensive taxonomy that categorizes existing research efforts, providing a comparative analysis based on these classifications. The paper concludes with an in depth discussion of open research challenges and highlights promising future directions that hold potential for advancing serverless edge computing research.	20 pages,8 images
λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	2025-02-14	Show Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often face substantial model startup overhead. This poses a significant challenge in efficiently scaling model instances to accommodate dynamic, bursty workloads commonly observed in real-world inference services. In this paper, we introduce {\lambda}Scale, an efficient serverless inference system to achieve fast model scaling. The key idea behind {\lambda}Scale is to leverage high-speed RDMA networks between GPU nodes for fast model multicast, while enabling distributed inference execution during model transmission -- referred to as "execute-while-load". {\lambda}Scale proposes an efficient model scaling scheme, {\lambda}Pipe, which supports adaptive model multicast and dynamically constructs execution pipelines across receiving nodes for collaborative, distributed inference. Additionally, {\lambda}Scale supports efficient model management across GPU and host memory, allowing fast scaling for models across different storage tiers. Evaluation results show that {\lambda}Scale enables fast model scaling and effectively handles load spikes, achieving up to 5x tail-latency improvement and 31.3% cost reduction compared to state-of-the-art solutions on real-world LLM inference traces.
SCOPE: Performance Testing for Serverless Computing	2025-02-12	Show Serverless computing is a popular cloud computing paradigm that has found widespread adoption across various online workloads. It allows software engineers to develop cloud applications as a set of functions (called serverless functions). However, accurately measuring the performance (i.e., end-to-end response latency) of serverless functions is challenging due to the highly dynamic nature of the environment in which they run. To tackle this problem, a potential solution is to apply checks of performance testing techniques to determine how many repetitions of a given serverless function across a range of inputs are needed to cater to the performance fluctuation. However, the available literature lacks performance testing approaches designed explicitly for serverless computing. In this paper, we propose SCOPE, the first serverless computing-oriented performance testing approach. SCOPE takes into account the unique performance characteristics of serverless functions, such as their short execution durations and on-demand triggering. As such, SCOPE is designed as a fine-grained analysis approach. SCOPE incorporates the accuracy check and the consistency check to obtain the accurate and reliable performance of serverless functions. The evaluation shows that SCOPE provides testing results with 97.25% accuracy, 33.83 percentage points higher than the best currently available technique. Moreover, the superiority of SCOPE over the state-of-the-art holds on all functions that we study.	Accep... Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM)
Scalable Cosmic AI Inference using Cloud Serverless Computing with FMI	2025-02-09	Show Large-scale astronomical image data processing and prediction is essential for astronomers, providing crucial insights into celestial objects, the universe's history, and its evolution. While modern deep learning models offer high predictive accuracy, they often demand substantial computational resources, making them resource-intensive and limiting accessibility. We introduce the Cloud-based Astronomy Inference (CAI) framework to address these challenges. This scalable solution integrates pre-trained foundation models with serverless cloud infrastructure through a Function-as-a-Service (FaaS) Message Interface (FMI). CAI enables efficient and scalable inference on astronomical images without extensive hardware. Using a foundation model for redshift prediction as a case study, our extensive experiments cover user devices, HPC (High-Performance Computing) servers, and Cloud. CAI's significant scalability improvement on large data sizes provides an accessible and effective tool for the astronomy community. The code is accessible at https://github.com/UVA-MLSys/AI-for-Astronomy.
LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World	2025-02-04	Show Recently, the exponential growth in capability and pervasiveness of Large Language Models (LLMs) has led to significant work done in the field of code generation. However, this generation has been limited to code snippets. Going one step further, our desideratum is to automatically generate architectural components. This would not only speed up development time, but would also enable us to eventually completely skip the development phase, moving directly from design decisions to deployment. To this end, we conduct an exploratory study on the capability of LLMs to generate architectural components for Functions as a Service (FaaS), commonly known as serverless functions. The small size of their architectural components make this architectural style amenable for generation using current LLMs compared to other styles like monoliths and microservices. We perform the study by systematically selecting open source serverless repositories, masking a serverless function and utilizing state of the art LLMs provided with varying levels of context information about the overall system to generate the masked function. We evaluate correctness through existing tests present in the repositories and use metrics from the Software Engineering (SE) and Natural Language Processing (NLP) domains to evaluate code quality and the degree of similarity between human and LLM generated code respectively. Along with our findings, we also present a discussion on the path forward for using GenAI in architectural component generation.	Accep... Accepted to IEEE International Conference on Software Architecture (ICSA) 2025 Main Track (https://conf.researchr.org/home/icsa-2025)
SQUASH: Serverless and Distributed Quantization-based Attributed Vector Similarity Search	2025-02-03	Show Vector similarity search presents significant challenges in terms of scalability for large and high-dimensional datasets, as well as in providing native support for hybrid queries. Serverless computing and cloud functions offer attractive benefits such as elasticity and cost-effectiveness, but are difficult to apply to data-intensive workloads. Jointly addressing these two main challenges, we present SQUASH, the first fully serverless vector search solution with rich support for hybrid queries. It features OSQ, an optimized and highly parallelizable quantization-based approach for vectors and attributes. Its segment-based storage mechanism enables significant compression in resource-constrained settings and offers efficient dimensional extraction operations. SQUASH performs a single distributed pass to guarantee the return of sufficiently many vectors satisfying the filter predicate, achieving high accuracy and avoiding redundant computation for vectors which fail the predicate. A multi-level search workflow is introduced to prune most vectors early to minimize the load on Function-as-a-Service (FaaS) instances. SQUASH is designed to identify and utilize retention of relevant data in re-used runtime containers, which eliminates redundant I/O and reduces costs. Finally, we demonstrate a new tree-based method for rapid FaaS invocation, enabling the bi-directional flow of data via request/response payloads. Experiments comparing SQUASH with state-of-the-art serverless vector search solutions and server-based baselines on vector search benchmarks confirm significant performance improvements at a lower cost.
Scalable and Cost-Efficient ML Inference: Parallel Batch Processing with Serverless Functions	2025-01-30	Show As data-intensive applications grow, batch processing in limited-resource environments faces scalability and resource management challenges. Serverless computing offers a flexible alternative, enabling dynamic resource allocation and automatic scaling. This paper explores how serverless architectures can make large-scale ML inference tasks faster and cost-effective by decomposing monolithic processes into parallel functions. Through a case study on sentiment analysis using the DistilBERT model and the IMDb dataset, we demonstrate that serverless parallel processing can reduce execution time by over 95% compared to monolithic approaches, at the same cost.
Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing	2025-01-27	Show Serverless computing offers elasticity unmatched by conventional server-based cloud infrastructure. Although modern data processing systems embrace serverless storage, such as Amazon S3, they continue to manage their compute resources as servers. This is challenging for unpredictable workloads, leaving clusters often underutilized. Recent research shows the potential of serverless compute resources, such as cloud functions, for elastic data processing, but also sees limitations in performance robustness and cost efficiency for long running workloads. These challenges require holistic approaches across the system stack. However, to the best of our knowledge, there is no end-to-end data processing system built entirely on serverless infrastructure. In this paper, we present Skyrise, our effort towards building the first fully serverless SQL query processor. Skyrise exploits the elasticity of its underlying infrastructure, while alleviating the inherent limitations with a number of adaptive and cost-aware techniques. We show that both Skyrise's performance and cost are competitive to other cloud data systems for terabyte-scale queries of the analytical TPC-H benchmark.
DeepFlow: Serverless Large Language Model Serving at Scale	2025-01-27	Show This paper introduces DeepFlow, a scalable and serverless AI platform designed to efficiently serve large language models (LLMs) at scale in cloud environments. DeepFlow addresses key challenges such as resource allocation, serving efficiency, and cold start latencies through four main design components. First, it uses a simple serverless abstraction called the request-job-task model, which helps manage AI workloads across post-training and model serving tasks. Second, it builds an in-house serving engine FlowServe using a microkernel-inspired design, NPU-centric execution, and SPMD-based parallelism to optimize LLM serving. The system also includes novel scheduling policies tailored for both PD-disaggregated and PD-colocated configurations. With optimizations like pre-warmed pods, DRAM pre-loading, and NPU-fork, DeepFlow can scale up to 64 instances in seconds. DeepFlow has been in production for over a year, operating on a large Ascend NPU cluster and providing industrystandard APIs for fine-tuning, agent serving, and model serving to our customers.
Cost Optimization for Serverless Edge Computing with Budget Constraints using Deep Reinforcement Learning	2025-01-23	Show Serverless computing adopts a pay-as-you-go billing model where applications are executed in stateless and shortlived containers triggered by events, resulting in a reduction of monetary costs and resource utilization. However, existing platforms do not provide an upper bound for the billing model which makes the overall cost unpredictable, precluding many organizations from managing their budgets. Due to the diverse ranges of serverless functions and the heterogeneous capacity of edge devices, it is challenging to receive near-optimal solutions for deployment cost in a polynomial time. In this paper, we investigated the function scheduling problem with a budget constraint for serverless computing in wireless networks. Users and IoT devices are sending requests to edge nodes, improving the latency perceived by users. We propose two online scheduling algorithms based on reinforcement learning, incorporating several important characteristics of serverless functions. Via extensive simulations, we justify the superiority of the proposed algorithm by comparing with an ILP solver (Midaco). Our results indicate that the proposed algorithms efficiently approximate the results of Midaco within a factor of 1.03 while our decision-making time is 5 orders of magnitude less than that of Midaco.	This ... This paper has been accepted by IEEE ICC 2025
Affinity-aware Serverless Function Scheduling	2025-01-23	Show Functions-as-a-Service (FaaS) is a Serverless Cloud paradigm where a platform manages the scheduling (e.g., resource allocation, runtime environments) of stateless functions. Recent work proposed using domain-specific languages to express per-function policies, e.g., policies that enforce the allocation on nodes that enjoy lower latencies to databases and services used by the function. Here, we focus on affinity-aware scenarios, i.e., where, for performance and functional requirements, the allocation of a function depends on the presence/absence of other functions on nodes. We present aAPP, an extension of a declarative, platform-agnostic language that captures affinity-aware scheduling at the FaaS level. We implement an aAPP-based prototype on Apache OpenWhisk. Besides proving that a FaaS platform can capture affinity awareness using aAPP and improve performance in affinity-aware scenarios, we use our prototype to show that aAPP imposes no noticeable overhead in scenarios without affinity constraints.	11 pa... 11 pages, 9 figures, 1 listing. arXiv admin note: substantial text overlap with arXiv:2407.14159
Serverless Computing: Architecture, Concepts, and Applications	2025-01-16	Show Recently, serverless computing has gained recognition as a leading cloud computing method. Providing a solution that does not require direct server and infrastructure management, this technology has addressed many traditional model problems by eliminating them. Therefore, operational complexity and costs are reduced, allowing developers to concentrate on writing and deploying software without worrying about server management. This chapter examines the advantages, disadvantages, and applications of serverless computing, implementation environments, and reasons for its use. Additionally, integrating this computing paradigm with other technologies is examined to address the challenges of managing, securing, and implementing large amounts of data. This chapter aims to provide a comprehensive view of the potentials and limitations of serverless computing by comparing its applications in different industries and examining the future trends of this technology. Lastly, this chapter provides a comprehensive conclusion of the applications and challenges of serverless computing.
An Empirical Evaluation of Serverless Cloud Infrastructure for Large-Scale Data Processing	2025-01-14	Show Data processing systems are increasingly deployed in the cloud. While monolithic systems run fully on virtual servers, recent systems embrace cloud infrastructure and utilize the disaggregation of compute and storage to scale them independently. The introduction of serverless compute services, such as AWS Lambda, enables finer-grained and elastic scalability within these systems. Prior work shows the viability of serverless infrastructure for scalable data processing yet also sees limitations due to variable performance and cost overhead, in particular for networking and storage. In this paper, we perform a detailed analysis of the performance and cost characteristics of serverless infrastructure in the data processing context. We base our analysis on a large series of micro-benchmarks across different compute and storage services, as well as end-to-end workloads. To enable our analysis, we propose the Skyrise serverless evaluation platform. For the widely used serverless infrastructure of AWS, our analysis reveals distinct boundaries for performance variability in serverless networks and storage. We further present cost break-even points for serverless compute and storage. These insights provide guidance on when and how serverless infrastructure can be efficiently used for data processing.
Unveiling Overlooked Performance Variance in Serverless Computing	2025-01-11	Show Serverless computing is an emerging cloud computing paradigm for developing applications at the function level, known as serverless functions. Due to the highly dynamic execution environment, multiple identical runs of the same serverless function can yield different performance, specifically in terms of end-to-end response latency. However, surprisingly, our analysis of serverless computing-related papers published in top-tier conferences highlights that the research community lacks awareness of the performance variance problem, with only 38.38% of these papers employing multiple runs for quantifying it. To further investigate, we analyze the performance of 72 serverless functions collected from these papers. Our findings reveal that the performance of these serverless functions can differ by up to 338.76% (44.28% on average) across different runs. Moreover, 61.11% of these functions produce unreliable performance results, with a low number of repetitions commonly employed in the serverless computing literature. Our study highlights a lack of awareness in the serverless computing community regarding the well-known performance variance problem in software engineering. The empirical results illustrate the substantial magnitude of this variance, emphasizing that ignoring the variance can affect research reproducibility and result reliability.	This ... This work has been accepted for publication in Empirical Software Engineering!
KTWIN: A Serverless Kubernetes-based Digital Twin Platform	2025-01-09	Show Digital Twins (DTs) systems are virtual representations of physical assets allowing organizations to gain insights and improve existing processes. In practice, DTs require proper modeling, coherent development and seamless deployment along cloud and edge landscapes relying on established patterns to reduce operational costs. In this work, we propose KTWIN a Kubernetes-based Serverless Platform for Digital Twins. KTWIN was developed using the state-of-the-art open-source Cloud Native tools, allowing DT operators to easily define models through open standards and configure details of the underlying services and infrastructure. The experiments carried out with the developed prototype show that KTWIN can provide a higher level of abstraction to model and deploy a Digital Twin use case without compromising the solution scalability. The tests performed also show cost savings ranging between 60% and 80% compared to overprovisioned scenarios.
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	2025-01-09	Show With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness. Mixture-of-Experts (MoE) models have been a dominant type of model architectures to enable large models nowadays, with parallel expert networks. Serving large MoE models on serverless computing is potentially beneficial, but has been underexplored due to substantial challenges in handling the skewed expert popularity and scatter-gather communication bottleneck in MoE model execution, for cost-efficient serverless MoE deployment and performance guarantee. We study optimized MoE model deployment and distributed inference serving on a serverless platform, that effectively predict expert selection, pipeline communication with model execution, and minimize the overall billed cost of serving MoE models. Especially, we propose a Bayesian optimization framework with multi-dimensional epsilon-greedy search to learn expert selections and optimal MoE deployment achieving optimal billed cost, including: 1) a Bayesian decision-making method for predicting expert popularity; 2) flexibly pipelined scatter-gather communication; and 3) an optimal model deployment algorithm for distributed MoE serving. Extensive experiments on AWS Lambda show that our designs reduce the billed cost of all MoE layers by at least 75.67% compared to CPU clusters while maintaining satisfactory inference throughput. As compared to LambdaML in serverless computing, our designs achieves 43.41% lower cost with a throughput decrease of at most 18.76%.
Shattering the Ephemeral Storage Cost Barrier for Data-Intensive Serverless Workflows	2025-01-08	Show Serverless computing is a popular cloud deployment paradigm where developers implement applications as workflows of functions that invoke each other. Cloud providers automatically scale function instances on demand and forward workflow requests to appropriate instances. However, current serverless clouds lack efficient cross-function data transfer, limiting the execution of data-intensive applications. Functions often rely on third-party services like AWS S3, AWS ElastiCache, or multi-tier solutions for intermediate data transfers, which introduces inefficiencies. We demonstrate that such through-storage transfers make data-intensive deployments economically impractical, with storage costs comprising more than 24-99% of the total serverless bill. To address this, we introduce Zipline, a fast, API-preserving data communication method for serverless platforms. Zipline enables direct function-to-function transfers, where the sender function buffers payloads in memory and sends a reference to the receiver. The receiver retrieves the data directly from the sender's memory, guided by the load balancer and autoscaler. Zipline integrates seamlessly with existing autoscaling, maintains invocation semantics, and eliminates the costs and overheads of intermediate services. We prototype Zipline in vHive/Knative on AWS EC2 nodes, demonstrating significant improvements. Zipline reduces costs and enhances latency and bandwidth compared to AWS S3 (the lowest-cost solution) and ElastiCache (the highest-performance solution). On real-world applications, Zipline lowers costs by 2-5x and reduces execution times by 1.3-3.4x versus S3. Compared to ElastiCache, Zipline achieves 17-772x cost reductions while improving performance by 2-5%.	added... added cost reduction details
Serverless Query Processing with Flexible Performance SLAs and Prices	2024-12-23	Show Serverless query processing has become increasingly popular due to its auto-scaling, high elasticity, and pay-as-you-go pricing. It allows cloud data warehouse (or lakehouse) users to focus on data analysis without the burden of managing systems and resources. Accordingly, in serverless query services, users become more concerned about cost-efficiency under acceptable performance than performance under fixed resources. This poses new challenges for serverless query engine design in providing flexible performance service-level agreements (SLAs) and cost-efficiency (i.e., prices). In this paper, we first define the problem of flexible performance SLAs and prices in serverless query processing and discuss its significance. Then, we envision the challenges and solutions for solving this problem and the opportunities it raises for other database research. Finally, we present PixelsDB, an open-source prototype with three service levels supported by dedicated architectural designs. Evaluations show that PixelsDB reduces resource costs by 65.5% for near-real-world workloads generated by Cloud Analytics Benchmark (CAB) while not violating the pending time guarantees.	9 pages, 7 figures
PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices	2024-12-23	Show Serverless query processing has become increasingly popular due to its advantages, including automated resource management, high elasticity, and pay-as-you-go pricing. For users who are not system experts, serverless query processing greatly reduces the cost of owning a data analytic system. However, it is still a significant challenge for non-expert users to transform their complex and evolving data analytic needs into proper SQL queries and select a serverless query service that delivers satisfactory performance and price for each type of query. This paper presents PixelsDB, an open-source data analytic system that allows users who lack system or SQL expertise to explore data efficiently. It allows users to generate and debug SQL queries using a natural language interface powered by fine-tuned language models. The queries are then executed by a serverless query engine that offers varying prices for different performance service levels (SLAs). The performance SLAs are natively supported by dedicated architecture design and heterogeneous resource scheduling that can apply cost-efficient resources to process non-urgent queries. We demonstrate that the combination of a serverless paradigm, a natural-language-aided interface, and flexible SLAs and prices will substantially improve the usability of cloud data analytic systems.	4 pages, 4 figures
Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters	2024-12-19	Show Existing work only effective on a given number of GPUs, often neglecting the complexities involved in manually determining the specific types and quantities of GPUs needed, which can be a significant burden for developers. To address this issue, we propose Frenzy, a memory-aware serverless computing method for heterogeneous GPU clusters. Frenzy allows users to submit models without worrying about underlying hardware resources. First, Frenzy predicts the required number and type of GPUs by estimating the GPU memory usage of the LLM. Then, it employs a low-overhead heterogeneity-aware scheduling method to optimize training efficiency. We validated Frenzy's performance by conducting multi-task LLM training tests on a heterogeneous GPU cluster with three different GPU types. The results show that Frenzy's memory usage prediction accuracy exceeds 92%, the scheduling overhead is reduced by 10 times, and it reduces the average job completion time by 12% to 18% compared to state-of-the-art methods.
SeSeMI: Secure Serverless Model Inference on Sensitive Data	2024-12-16	Show Model inference systems are essential for implementing end-to-end data analytics pipelines that deliver the benefits of machine learning models to users. Existing cloud-based model inference systems are costly, not easy to scale, and must be trusted in handling the models and user request data. Serverless computing presents a new opportunity, as it provides elasticity and fine-grained pricing. Our goal is to design a serverless model inference system that protects models and user request data from untrusted cloud providers. It offers high performance and low cost, while requiring no intrusive changes to the current serverless platforms. To realize our goal, we leverage trusted hardware. We identify and address three challenges in using trusted hardware for serverless model inference. These challenges arise from the high-level abstraction of serverless computing, the performance overhead of trusted hardware, and the characteristics of model inference workloads. We present SeSeMI, a secure, efficient, and cost-effective serverless model inference system. It adds three novel features non-intrusively to the existing serverless infrastructure and nothing else.The first feature is a key service that establishes secure channels between the user and the serverless instances, which also provides access control to models and users' data. The second is an enclave runtime that allows one enclave to process multiple concurrent requests. The final feature is a model packer that allows multiple models to be executed by one serverless instance. We build SeSeMI on top of Apache OpenWhisk, and conduct extensive experiments with three popular machine learning models. The results show that SeSeMI achieves low latency and low cost at scale for realistic workloads.
Raptor: Distributed Scheduling for Serverless Functions	2024-12-13	Show To support parallelizable serverless workflows in applications like media processing, we have prototyped a distributed scheduler called Raptor that reduces both the end-to-end delay time and failure rate of parallelizable serverless workflows. As modern serverless frameworks are typically deployed to extremely large scale distributed computing environments by major cloud providers, Raptor is specifically designed to exploit the property of statistically independent function execution that tends to emerge at very large scales. To demonstrate the effect of horizontal scale on function execution, our evaluation demonstrates that mean delay time improvements provided by Raptor for RSA public-private key pair generation can be accurately predicted by mutually independent exponential random variables, but only once the serverless framework is deployed in a highly available configuration and horizontally scaled across three availability zones.
GoldFish: Serverless Actors with Short-Term Memory State for the Edge-Cloud Continuum	2024-12-03	Show Serverless Computing is a computing paradigm that provides efficient infrastructure management and elastic scalability. Serverless functions scale up or down based on demand, which means that functions are not directly addressable and rely on platform-managed invocation. Serverless stateless nature requires functions to leverage external services, such as object storage and KVS, to exchange data. Serverless actors have emerged as a solution to these issues. However, the state-of-the-art serverless lifecycle and event-trigger invocation force actors to leverage remote services to manage their state and exchange data, which impacts the performance and incurs additional costs and dependency on third-party services. To address these issues, in this paper, we introduce a novel serverless lifecycle model that allows short-term stateful actors, enabling actors to maintain their state between executions. Additionally, we propose a novel serverless Invocation Model that enables serverless actors to influence the processing of future messages. We present GoldFish, a lightweight WebAssembly short-term stateful serverless actor platform that provides a novel serverless actor lifecycle and invocation model. GoldFish leverages WebAssembly to provide the actors with lightweight sandbox isolation, making them suitable for the Edge-Cloud Continuum, where computational resources are limited. Experimental results show that GoldFish optimizes the data exchange latency by up to 92% and increases the throughput by up to 10x compared to OpenFaaS and Spin.	14th ... 14th International Conference on the Internet of Things (IoT 2024), November 19--22, 2024, Oulu, Finland
FaaSRCA: Full Lifecycle Root Cause Analysis for Serverless Applications	2024-12-03	Show Serverless becomes popular as a novel computing paradigms for cloud native services. However, the complexity and dynamic nature of serverless applications present significant challenges to ensure system availability and performance. There are many root cause analysis (RCA) methods for microservice systems, but they are not suitable for precise modeling serverless applications. This is because: (1) Compared to microservice, serverless applications exhibit a highly dynamic nature. They have short lifecycle and only generate instantaneous pulse-like data, lacking long-term continuous information. (2) Existing methods solely focus on analyzing the running stage and overlook other stages, failing to encompass the entire lifecycle of serverless applications. To address these limitations, we propose FaaSRCA, a full lifecycle root cause analysis method for serverless applications. It integrates multi-modal observability data generated from platform and application side by using Global Call Graph. We train a Graph Attention Network (GAT) based graph auto-encoder to compute reconstruction scores for the nodes in global call graph. Based on the scores, we determine the root cause at the granularity of the lifecycle stage of serverless functions. We conduct experimental evaluations on two serverless benchmarks, the results show that FaaSRCA outperforms other baseline methods with a top-k precision improvement ranging from 21.25% to 81.63%.	issre 2024
Truffle: Efficient Data Passing for Data-Intensive Serverless Workflows in the Edge-Cloud Continuum	2024-11-25	Show Serverless computing promises a scalable, reliable, and cost-effective solution for running data-intensive applications and workflows in the heterogeneous and limited-resource environment of the Edge-Cloud Continuum. However, building and running data-intensive serverless workflows also brings new challenges that can significantly degrade the application performance. Cold start remains one of the main challenges that impact the total function execution time. Further, since the serverless functions are not directly addressable, Serverless workflows need to rely on external (storage) services to pass the input data to the downstream functions. Empirical evidence from our experiments shows that the cold start and the function data passing take up the most time in the function execution lifecycle. In this paper, we introduce Truffle - a novel model and architecture 10000 that enables efficient inter-function data passing in the Edge-Cloud Continuum by introducing mechanisms that separate computation and I/O, allowing serverless functions to leverage the cold starts to their advantage. Truffle introduces Smart Data Prefetch (SDP) mechanism that abstracts the retrieval of input data for the serverless functions by triggering the data retrieval from the external storage during the function's startup. Truffle's Cold Start Pass (CSP) mechanism optimizes inter-function data passing and data exchange within serverless workflows in the Edge-Cloud Continuum by hooking into the functions' scheduling lifecycle to trigger early data passing during the function's cold start. Experimental results show that by leveraging the data prefetching and cold-start data passing, Truffle reduces the IO latency impact on the total function execution time by up to 77%, improving the function execution time by up to 46% compared to the state-of-the-art data passing approaches.	The I... The IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2024)
Enabling Efficient Serverless Inference Serving for LLM (Large Language Model) in the Cloud	2024-11-23	Show This review report discusses the cold start latency in serverless inference and existing solutions. It particularly reviews the ServerlessLLM method, a system designed to address the cold start problem in serverless inference for large language models. Traditional serverless approaches struggle with high latency due to the size of LLM checkpoints and the overhead of initializing GPU resources. ServerlessLLM introduces a multitier checkpoint loading system, leveraging underutilized GPU memory and storage to reduce startup times by 6--8x compared to existing methods. It also proposes live inference migration and a startup-time-optimized model scheduler, ensuring efficient resource allocation and minimizing delays. This system significantly improves performance and scalability in serverless environments for LLM workloads. Besides ServerlessLLM, several other methods from recent research literature, including Rainbowcake, are reviewed in this paper. Further discussions explore how FaaS providers tackle cold starts and the possible future scopes.	12 pa... 12 pages, 7 figures, TUM Cloud Computing Seminar
Fast and Efficient Memory Reclamation For Serverless MicroVMs	2024-11-19	Show Resource elasticity is one of the key defining characteristics of the Function-as-a-Service (FaaS) serverless computing paradigm. In order to provide strong multi-tenant isolation, FaaS providers commonly sandbox functions inside virtual machines (VMs or microVMs). While compute resources assigned to VM-sandboxed functions can be seamlessly adjusted on the fly, memory elasticity remains challenging, especially when scaling down. State-of-the-art mechanisms for VM memory elasticity suffer from increased reclaim latency when memory needs to be released, compounded by CPU and memory bandwidth overheads. We identify the obliviousness of the Linux memory manager to the virtually hotplugged memory as the key issue hindering hot-unplug performance, and design HotMem, a novel approach for fast and efficient VM memory hot(un)plug, targeting VM-sandboxed serverless functions. Our key insight is that by segregating virtually hotplugged memory regions from regular VM memory, we are able to bound the lifetimes of allocations within these regions thus enabling their fast and efficient reclamation. We implement HotMem in Linux v6.6 and our evaluation shows that it is an order of magnitude faster than state-of-practice to reclaim VM memory, while achieving the same P99 function latency with a model that statically over-provisions VMs.
Unveiling the Skills and Responsibilities of Serverless Practitioners: An Empirical Investigation	2024-11-15	Show Enterprises are increasingly adopting serverless computing to enhance scalability, reduce costs, and improve efficiency. However, this shift introduces new responsibilities and necessitates a distinct set of skills for practitioners. This study aims to identify and organize the industry requirements for serverless practitioners by conducting a qualitative analysis of 141 job advertisements from seven countries. We developed comprehensive taxonomies of roles, responsibilities, and skills, categorizing 19 responsibilities into four themes: software development, infrastructure and operations, professional development and leadership, and software business. Additionally, we identified 28 hard skills mapped into seven themes and 32 soft skills mapped into eight themes, with the six most demanded soft skills being communication proficiency, continuous learning and adaptability, collaborative teamwork, problem-solving and analytical skills, leadership excellence, and project management. Our findings contribute to understanding the organizational structures and training requirements for effective serverless computing adoption.
In Serverless, OS Scheduler Choice Costs Money: A Hybrid Scheduling Approach for Cheaper FaaS	2024-11-13	Show In Function-as-a-Service (FaaS) serverless, large applications are split into short-lived stateless functions. Deploying functions is mutually profitable: users need not be concerned with resource management, while providers can keep their servers at high utilization rates running thousands of functions concurrently on a single machine. It is exactly this high concurrency that comes at a cost. The standard Linux Completely Fair Scheduler (CFS) switches often between tasks, which leads to prolonged execution times. We present evidence that relying on the default Linux CFS scheduler increases serverless workloads cost by up to 10X. In this article, we raise awareness and make a case for rethinking the OS-level scheduling in Linux for serverless workloads composed of many short-lived processes. To make serverless more affordable we introduce a hybrid two-level scheduling approach that relies on FaaS characteristics. Short-running functions are executed in FIFO fashion without preemption, while longer-running functions are passed to CFS after a certain time period. We show that tailor-made OS scheduling is able to significantly reduce user-facing costs without adding any provider-facing overhead.	Accep... Accepted at Middleware 2024, author draft made available for timely dissemination
On-demand Cold Start Frequency Reduction with Off-Policy Reinforcement Learning in Serverless Computing	2024-11-13	Show Function-as-a-Service (FaaS) is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers, and offers transparent and on-demand scalability of applications. To provide seamless on-demand scalability, new function instances are prepared to serve the incoming workload in the absence or unavailability of function instances. However, FaaS platforms are known to suffer from cold starts, where this function provisioning process introduces a non-negligible delay in function response and reduces the end-user experience. Therefore, the presented work focuses on reducing the frequent, on-demand cold starts on the platform by using Reinforcement Learning(RL). The proposed approach uses model-free Q-learning that consider function metrics such as CPU utilization, existing function instances, and response failure rate, to proactively initialize functions, in advance, based on the expected demand. The proposed solution is implemented on Kubeless and evaluated using an open-source function invocation trace applied to a matrix multiplication function. The evaluation results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and a function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, that is a direct outcome of reduced cold starts.	13 fi... 13 figures, 24 pages, 3 tables
Input-Based Ensemble-Learning Method for Dynamic Memory Configuration of Serverless Computing Functions	2024-11-12	Show In today's Function-as-a-Service offerings, a programmer is usually responsible for configuring function memory for its successful execution, which allocates proportional function resources such as CPU and network. However, right-sizing the function memory force developers to speculate performance and make ad-hoc configuration decisions. Recent research has highlighted that a function's input characteristics, such as input size, type and number of inputs, significantly impact its resource demand, run-time performance and costs with fluctuating workloads. This correlation further makes memory configuration a non-trivial task. On that account, an input-aware function memory allocator not only improves developer productivity by completely hiding resource-related decisions but also drives an opportunity to reduce resource wastage and offer a finer-grained cost-optimised pricing scheme. Therefore, we present MemFigLess, a serverless solution that estimates the memory requirement of a serverless function with input-awareness. The framework executes function profiling in an offline stage and trains a multi-output Random Forest Regression model on the collected metrics to invoke input-aware optimal configurations. We evaluate our work with the state-of-the-art approaches on AWS Lambda service to find that MemFigLess is able to capture the input-aware resource relationships and allocate upto 82% less resources and save up to 87% run-time costs.	10 pa... 10 pages, 2 tables, 28 figures, accepted conference paper - UCC'24
FaaSTube: Optimizing GPU-oriented Data Transfer for Serverless Computing	2024-11-04	Show Serverless computing has gained significant traction for machine learning inference applications, which are often deployed as serverless workflows consisting of multiple CPU and GPU functions with data dependency. However, existing data-passing solutions for serverless computing primarily reply on host memory for fast data transfer, mandating substantial data movement and resulting in salient I/O overhead. In this paper, we present FaaSTube, a GPU-efficient data passing system for serverless inference. FaaSTube manages intermediate data within a GPU memory pool to facilitate direct data exchange between GPU functions. It enables fine-grained bandwidth sharing over PCIe and NVLink, minimizing data-passing latency for both host-to-GPU and GPU-to-GPU while providing performance isolation between functions. Additionally, FaaSTube implements an elastic GPU memory pool that dynamically scales to accommodate varying data-passing demands. Evaluations on real-world applications show that FaaSTube reduces end-to-end latency by up to 90% and achieves up to 12x higher throughput compared to the state-of-the-art.
LLM-Based Misconfiguration Detection for AWS Serverless Computing	2024-11-01	Show Serverless computing is an emerging cloud computing paradigm that enables developers to build applications at the function level, known as serverless applications. Amazon Web Services (AWS), the leading provider in this domain, provides the Serverless Application Model (AWS SAM), the most widely adopted configuration schema for configuring and managing serverless applications through a specified file. However, misconfigurations pose a significant challenge in serverless development. Traditional data-driven techniques may struggle with serverless applications because the complexity of serverless configurations hinders pattern recognition, and it is challenging to gather complete datasets that cover all possible configurations. Leveraging vast amounts of publicly available data during pre-training, LLMs can have the potential to assist in identifying and explaining misconfigurations in serverless applications. In this paper, we introduce SlsDetector, the first framework leveraging LLMs to detect misconfigurations in serverless applications. SlsDetector utilizes effective prompt engineering with zero-shot learning to identify configuration issues. It designs multi-dimensional constraints specifically tailored to the configuration characteristics of serverless applications and leverages the Chain of Thought technique to enhance LLMs inferences. We evaluate SlsDetector on a curated dataset of 110 configuration files. Our results show that SlsDetector, based on ChatGPT-4o, achieves a precision of 72.88%, recall of 88.18%, and F1-score of 79.75%, outperforming state-of-the-art data-driven approaches by 53.82, 17.40, and 49.72 percentage points, respectively. Furthermore, we investigate the generalization capability of SlsDetector by applying recent LLMs, including Llama 3.1 (405B) Instruct Turbo and Gemini 1.5 Pro, with results showing consistently high effectiveness across these models.
An approach to provide serverless scientific pipelines within the context of SKA	2024-10-29	Show Function-as-a-Service (FaaS) is a type of serverless computing that allows developers to write and deploy code as individual functions, which can be triggered by specific events or requests. FaaS platforms automatically manage the underlying infrastructure, scaling it up or down as needed, being highly scalable, cost-effective and offering a high level of abstraction. Prototypes being developed within the SKA Regional Center Network (SRCNet) are exploring models for data distribution, software delivery and distributed computing with the goal of moving and executing computation to where the data is. Since SKA will be the largest data producer on the planet, it will be necessary to distribute this massive volume of data to the SRCNet nodes that will serve as a hub for computing and analysis operations on the closest data. Within this context, in this work we want to validate the feasibility of designing and deploying functions and applications commonly used in radio interferometry workflows within a FaaS platform to demonstrate the value of this computing model as an alternative to explore for data processing in the distributed nodes of the SRCNet. We have analyzed several FaaS platforms and successfully deployed one of them, where we have imported several functions using two different methods: microfunctions from the CASA framework, which are written in Python code, and highly specific native applications like wsclean. Therefore, we have designed a simple catalogue that can be easily scaled to provide all the key features of FaaS in highly distributed environments using orchestrators, as well as having the ability to integrate them with workflows or APIs. This paper contributes to the ongoing discussion of the potential of FaaS models for scientific data processing, particularly in the context of large-scale, distributed projects such as SKA.	6
Histrio: a Serverless Actor System	2024-10-29	Show In recent years, the serverless paradigm has been widely adopted to develop cloud applications, as it enables building scalable solutions while delegating operational concerns such as infrastructure management and resource provisioning to the serverless provider. Despite bringing undisputed advantages, the serverless model requires a change in programming paradigm that may add complexity in software development. In particular, in the Function-as-a-Service (FaaS) paradigm, functions are inherently stateless. As a consequence, developers carry the burden of directly interacting with external storage services and handling concurrency and state consistency across function invocations. This results in less time spent on solving the actual business problems they face. Moving from these premises, this paper proposes Histrio, a programming model and execution environment that simplifies the development of complex stateful applications in the FaaS paradigm. Histrio grounds on the actor programming model, and lifts concerns such as state management, database interaction, and concurrency handling from developers. It enriches the actor model with features that simplify and optimize the interaction with external storage. It guarantees exactly-once-processing consistency, meaning that the application always behaves as if any interaction with external clients was processed once and only once, masking failures. Histrio has been compared with a classical FaaS implementation to evaluate both the development time saved due to the guarantees the system offers and the applicability of Histrio in typical applications. In the evaluated scenarios, Histrio simplified the implementation by significantly removing the amount of code needed to handle operational concerns. It proves to be scalable and it provides configuration mechanisms to trade performance and execution costs.
FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the Edge	2024-10-28	Show Edge AI computing boxes are a new class of computing devices that are aimed to revolutionize the AI industry. These compact and robust hardware units bring the power of AI processing directly to the source of data--on the edge of the network. On the other hand, on-demand serverless inference services are becoming more and more popular as they minimize the infrastructural cost associated with hosting and running DNN models for small to medium-sized businesses. However, these computing devices are still constrained in terms of resource availability. As such, the service providers need to load and unload models efficiently in order to meet the growing demand. In this paper, we introduce FusedInf to efficiently swap DNN models for on-demand serverless inference services on the edge. FusedInf combines multiple models into a single Direct Acyclic Graph (DAG) to efficiently load the models into the GPU memory and make execution faster. Our evaluation of popular DNN models showed that creating a single DAG can make the execution of the models up to 14% faster while reducing the memory requirement by up to 17%. The prototype implementation is available at https://github.com/SifatTaj/FusedInf.
Cold Start Latency in Serverless Computing: A Systematic Review, Taxonomy, and Future Directions	2024-10-23	Show Recently, academics and the corporate sector have paid attention to serverless computing, which enables dynamic scalability and an economic model. In serverless computing, users only pay for the time they actually use resources, enabling zero scaling to optimise cost and resource utilisation. However, this approach also introduces the serverless cold start problem. Researchers have developed various solutions to address the cold start problem, yet it remains an unresolved research area. In this article, we propose a systematic literature review on clod start latency in serverless computing. Furthermore, we create a detailed taxonomy of approaches to cold start latency, which we use to investigate existing techniques for reducing the cold start time and frequency. We have classified the current studies on cold start latency into several categories such as caching and application-level optimisation-based solutions, as well as Artificial Intelligence (AI)/Machine Learning (ML)-based solutions. Moreover, we have analyzed the impact of cold start latency on quality of service, explored current cold start latency mitigation methods, datasets, and implementation platforms, and classified them into categories based on their common characteristics and features. Finally, we outline the open challenges and highlight the possible future directions.	Prepr... Preprint Version Accepted for Publication in ACM Computing Survey, 2024
HyperDrive: Scheduling Serverless Functions in the Edge-Cloud-Space 3D Continuum	2024-10-21	Show The number of Low Earth Orbit~(LEO) satellites has grown enormously in the past years. Their abundance and low orbits allow for low latency communication with a satellite almost anywhere on Earth, and high-speed inter-satellite laser links~(ISLs) enable a quick exchange of large amounts of data among satellites. As the computational capabilities of LEO satellites grow, they are becoming eligible as general-purpose compute nodes. In the 3D continuum, which combines Cloud and Edge nodes on Earth and satellites in space into a seamless computing fabric, workloads can be executed on any of the aforementioned compute nodes, depending on where it is most beneficial. However, scheduling on LEO satellites moving at approx. 27,000 km/h requires picking the satellite with the lowest latency to all data sources (ground and, possibly, earth observation satellites). Dissipating heat from onboard hardware is challenging when facing the sun and workloads must not drain the satellite's batteries. These factors make meeting SLOs more challenging than in the Edge-Cloud continuum, i.e., on Earth alone. We present HyperDrive, an SLO-aware scheduler for serverless functions specifically designed for the 3D continuum. It places functions on Cloud, Edge, or Space compute nodes, based on their availability and ability to meet the SLO requirements of the workflow. We evaluate HyperDrive using a wildfire disaster response use case with high Earth Observation data processing requirements and stringent SLOs, showing that it enables the design and execution of such next-generation 3D scenarios with 71% lower network latency than the best baseline scheduler.	2024 ... 2024 IEEE/ACM Symposium on Edge Computing(SEC)
Fusionize++: Improving Serverless Application Performance Using Dynamic Task Inlining and Infrastructure Optimization	2024-10-21	Show The Function-as-a-Service (FaaS) execution model increases developer productivity by removing operational concerns such as managing hardware or software runtimes. Developers, however, still need to partition their applications into FaaS functions, which is error-prone and complex: Encapsulating only the smallest logical unit of an application as a FaaS function maximizes flexibility and reusability. Yet, it also leads to invocation overheads, additional cold starts, and may increase cost due to double billing during synchronous invocations. Conversely, deploying an entire application as a single FaaS function avoids these overheads but decreases flexibility. In this paper we present Fusionize, a framework that automates optimizing for this trade-off by automatically fusing application code into an optimized multi-function composition. Developers only need to write fine-grained application code following the serverless model, while Fusionize automatically fuses different parts of the application into FaaS functions, manages their interactions, and configures the underlying infrastructure. At runtime, it monitors application performance and adapts it to minimize request-response latency and costs. Real-world use cases show that Fusionize can improve the deployment artifacts of the application, reducing both median request-response latency and cost of an example IoT application by more than 35%.	Autho... Author copy of article accepted in IEEE Transactions on Cloud Computing with DOI 10.1109/TCC.2024.3451108
WarmSwap: Sharing Dependencies for Accelerating Cold Starts in Serverless Functions	2024-10-21	Show This work presents WarmSwap, a novel provider-side cold-start optimization for serverless computing. This optimization reduces cold-start time when booting and loading dependencies at runtime inside a function container. Previous approaches to the optimization of cold starts tend to fall into two categories: optimizing the infrastructure of serverless computing to benefit all serverless functions; or function-specific tuning for individual serverless functions. In contrast, WarmSwap offers a broad middle ground, which optimizes entire categories of serverless functions. WarmSwap eliminates the need to initialize middleware or software dependencies when launching a new serverless container, by migrating a pre-initialized live dependency image to the new function instance. WarmSwap respects the provider's cache constraints, as a single pre-warmed dependency image in the cache is shared among all serverless functions requiring that software dependency image. WarmSwap has been tested on seven representative functions from FunctionBench. In those tests, WarmSwap accelerates dependency loading for serverless functions with large dependency requirements by a factor ranging from 2.2 to 3.2. Simulation experiments using Azure traces indicate that WarmSwap can save 88% of optimization space when sharing a dependency image among ten different functions.	15 pages, 7 figures
EcoLife: Carbon-Aware Serverless Function Scheduling for Sustainable Computing	2024-10-16	Show This work introduces ECOLIFE, the first carbon-aware serverless function scheduler to co-optimize carbon footprint and performance. ECOLIFE builds on the key insight of intelligently exploiting multi-generation hardware to achieve high performance and lower carbon footprint. ECOLIFE designs multiple novel extensions to Particle Swarm Optimization (PSO) in the context of serverless execution environment to achieve high performance while effectively reducing the carbon footprint.
Orchestrating the Execution of Serverless Functions in Hybrid Clouds	2024-10-09	Show In recent years, serverless computing, especially Function as a Service (FaaS), is rapidly growing in popularity as a cloud programming model. The serverless computing model provides an intuitive interface for developing cloud-based applications, where the development and deployment of scalable microservices has become easier and cost-effective. An increasing number of batch-processing applications are deployed as pipelines that comprise a sequence of functions that must meet their deadline targets to be practical. In this paper, we present our Hybrid Cloud Scheduler (HCS) for orchestrating the execution of serverless batch-processing pipelines deployed over heterogeneous infrastructures. Our framework enables developers to (i) automatically schedule and execute batch-processing applications in heterogeneous environments such as the private edge and public cloud serverless infrastructures, (ii) benefit from cost reduction through the utilization of their own resources in a private cluster, and (iii) significantly improves the probability of meeting the deadline requirements of their applications. Our experimental evaluation demonstrates the efficiency and benefits of our approach.
Energy Efficient Scheduling for Serverless Systems	2024-10-09	Show Serverless computing, also referred to as Function-as-a-Service (FaaS), is a cloud computing model that has attracted significant attention and has been widely adopted in recent years. The serverless computing model offers an intuitive, event-based interface that makes the development and deployment of scalable cloud-based applications easier and cost-effective. An important aspect that has not been examined in these systems is their energy consumption during the application execution. One way to deal with this issue is to schedule the function invocations in an energy-efficient way. However, efficient scheduling of applications in a multi-tenant environment, like FaaS systems, poses significant challenges. The trade-off between the server's energy usage and the hosted functions' performance requirements needs to be taken into consideration. In this work, we propose an Energy Efficient Scheduler for orchestrating the execution of serverless functions so that it minimizes energy consumption while it satisfies the applications' performance demands. Our approach considers real-time performance measurements and historical data and applies a novel DVFS technique to minimize energy consumption. Our detailed experimental evaluation using realistic workloads on our local cluster illustrates the working and benefits of our approach.
Prediction-driven resource provisioning for serverless container runtimes	2024-10-09	Show In recent years Serverless Computing has emerged as a compelling cloud based model for the development of a wide range of data-intensive applications. However, rapid container provisioning introduces non-trivial challenges for FaaS cloud providers, as (i) real-world FaaS workloads may exhibit highly dynamic request patterns, (ii) applications have service-level objectives (SLOs) that must be met, and (iii) container provisioning can be a costly process. In this paper, we present SLOPE, a prediction framework for serverless FaaS platforms to address the aforementioned challenges. Specifically, it trains a neural network model that utilizes knowledge from past runs in order to estimate the number of instances required to satisfy the invocation rate requirements of the serverless applications. In cases that a priori knowledge is not available, SLOPE makes predictions using a graph edit distance approach to capture the similarities among serverless applications. Our experimental results illustrate the efficiency and benefits of our approach, which can reduce the operating costs by 66.25% on average.	6 pag... 6 pages. arXiv admin note: substantial text overlap with arXiv:2410.18106
Serverless Cold Starts and Where to Find Them	2024-10-08	Show This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer useful lifetimes. Our findings reveal the complexity and multifaceted origins of the number, duration, and characteristics of cold starts, driven by differences in trigger types, runtime languages, and function resource allocations. For example, cold starts in Region 1 take up to 7 seconds, dominated by dependency deployment time and scheduling. In Region 2, cold starts take up to 3 seconds and are dominated by pod allocation time. Based on this, we identify opportunities to reduce the number and duration of cold starts using strategies for multi-region scheduling. Finally, we suggest directions for future research to address these challenges and enhance the performance of serverless cloud platforms. Our datasets and code are available here https://github.com/sir-lab/data-release
Komet: A Serverless Platform for Low-Earth Orbit Edge Services	2024-10-08	Show Low-Earth orbit satellite networks can provide global broadband Internet access using constellations of thousands of satellites. Integrating edge computing resources in such networks can enable global low-latency access to compute services, supporting end users in rural areas, remote industrial applications, or the IoT. To achieve this, resources must be carefully allocated to various services from multiple tenants. Moreover, applications must navigate the dynamic nature of satellite networks, where orbital mechanics necessitate frequent client hand-offs. Therefore, managing applications on the low-Earth orbit edge will require the right platform abstractions. We introduce Komet, a serverless platform for low-Earth orbit edge computing. Komet integrates Function-as-a-Service compute with data replication, enabling on-demand elastic edge resource allocation and frequent service migration against satellite orbital trajectories to keep services deployed in the same geographic region. We implement Komet as a proof-of-concept prototype and demonstrate how its abstractions can be used to build low-Earth orbit edge applications with high availability despite constant mobility. Further, we propose simple heuristics for service migration scheduling in different application scenarios and evaluate them in simulation based on our experiment traces, showing the trade-off between selecting an optimal satellite server at every instance and minimizing service migration frequency.	15th ... 15th ACM Symposium on Cloud Computing
Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach	2024-09-17	Show Automatic network management strategies have become paramount for meeting the needs of innovative real-time and data-intensive applications, such as in the Internet of Things. However, meeting the ever-growing and fluctuating demands for data and services in such applications requires more than ever an efficient and scalable network resource management approach. Such approach should enable the automated provisioning of services while incentivising energy-efficient resource usage that expands throughout the edge-to-cloud continuum. This paper is the first to realise the concept of modular Software-Defined Networks based on serverless functions in an energy-aware environment. By adopting Function as a Service, the approach enables on-demand deployment of network functions, resulting in cost reduction through fine resource provisioning granularity. An analytical model is presented to approximate the service delivery time and power consumption, as well as an open-source prototype implementation supported by an extensive experimental evaluation. The experiments demonstrate not only the practical applicability of the proposed approach but significant improvement in terms of energy efficiency.
GreenWhisk: Emission-Aware Computing for Serverless Platform	2024-09-04	Show Serverless computing is an emerging cloud computing abstraction wherein the cloud platform transparently manages all resources, including explicitly provisioning resources and geographical load balancing when the demand for service spikes. Users provide code as functions, and the cloud platform runs these functions handling all aspects of function execution. While prior work has primarily focused on optimizing performance, this paper focuses on reducing the carbon footprint of these systems making variations in grid carbon intensity and intermittency from renewables transparent to the user. We introduce GreenWhisk, a carbon-aware serverless computing platform built upon Apache OpenWhisk, operating in two modes - grid-connected and grid-isolated - addressing intermittency challenges arising from renewables and the grid's carbon footprint. Moreover, we develop carbon-aware load balancing algorithms that leverage energy and carbon information to reduce the carbon footprint. Our evaluation results show that GreenWhisk can easily incorporate carbon-aware algorithms, thereby reducing the carbon footprint of functions without significantly impacting the performance of function execution. In doing so, our system design enables the integration of new carbon-aware strategies into a serverless computing platform.	11 pa... 11 pages, 13 figures, IC2E 2024
CASA: A Framework for SLO and Carbon-Aware Autoscaling and Scheduling in Serverless Cloud Computing	2024-08-31	Show Serverless computing is an emerging cloud computing paradigm that can reduce costs for cloud providers and their customers. However, serverless cloud platforms have stringent performance requirements (due to the need to execute short duration functions in a timely manner) and a growing carbon footprint. Traditional carbon-reducing techniques such as shutting down idle containers can reduce performance by increasing cold-start latencies of containers required in the future. This can cause higher violation rates of service level objectives (SLOs). Conversely, traditional latency-reduction approaches of prewarming containers or keeping them alive when not in use can improve performance but increase the associated carbon footprint of the serverless cluster platform. To strike a balance between sustainability and performance, in this paper, we propose a novel carbon- and SLO-aware framework called CASA to schedule and autoscale containers in a serverless cloud computing cluster. Experimental results indicate that CASA reduces the operational carbon footprint of a FaaS cluster by up to 2.6x while also reducing the SLO violation rate by up to 1.4x compared to the state-of-the-art.
Empowering Volunteer Crowdsourcing Services: A Serverless-assisted, Skill and Willingness Aware Task Assignment Approach for Amicable Volunteer Involvement	2024-08-21	Show Volunteer crowdsourcing (VCS) leverages citizen interaction to address challenges by utilizing individuals' knowledge and skills. Complex social tasks often require collaboration among volunteers with diverse skill sets, and their willingness to engage is crucial. Matching tasks with the most suitable volunteers remains a significant challenge. VCS platforms face unpredictable demands in terms of tasks and volunteer requests, complicating the prediction of resource requirements for the volunteer-to-task assignment process. To address these challenges, we introduce the Skill and Willingness-Aware Volunteer Matching (SWAM) algorithm, which allocates volunteers to tasks based on skills, willingness, and task requirements. We also developed a serverless framework to deploy SWAM. Our method outperforms conventional solutions, achieving a 71% improvement in end-to-end latency efficiency. We achieved a 92% task completion ratio and reduced task waiting time by 56%, with an overall utility gain 30% higher than state-of-the-art baseline methods. This framework contributes to generating effective volunteer and task matches, supporting grassroots community coordination and fostering citizen involvement, ultimately contributing to social good.
SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions	2024-08-21	Show As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloading without full invocation pattern exploitation, rendering unsatisfactory optimization of the trade-off between cold start latency and resource waste. To bridge this gap, we propose SPES, the first differentiated scheduler for runtime cold start mitigation by optimizing serverless function provision. Our insight is that the common architecture of serverless systems prompts the concentration of certain invocation patterns, leading to predictable invocation behaviors. This allows us to categorize functions and pre-load/unload proper function instances with finer-grained strategies based on accurate invocation prediction. Experiments demonstrate the success of SPES in optimizing serverless function provision on both sides: reducing the 75th-percentile cold start rates by 49.77% and the wasted memory time by 56.43%, compared to the state-of-the-art. By mitigating the cold start issue, SPES is a promising advancement in facilitating cloud services deployed on serverless architectures.	12 pa... 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)
Context-aware Container Orchestration in Serverless Edge Computing	2024-08-14	Show Adopting serverless computing to edge networks benefits end-users from the pay-as-you-use billing model and flexible scaling of applications. This paradigm extends the boundaries of edge computing and remarkably improves the quality of services. However, due to the heterogeneous nature of computing and bandwidth resources in edge networks, it is challenging to dynamically allocate different resources while adapting to the burstiness and high concurrency in serverless workloads. This article focuses on serverless function provisioning in edge networks to optimize end-to-end latency, where the challenge lies in jointly allocating wireless bandwidth and computing resources among heterogeneous computing nodes. To address this challenge, We devised a context-aware learning framework that adaptively orchestrates a wide spectrum of resources and jointly considers them to avoid resource fragmentation. Extensive simulation results justified that the proposed algorithm reduces over 95% of converge time while the end-to-end delay is comparable to the state of the art.	This ... This paper has been accepted by the IEEE GLOBECOM 2024 Conference
Object as a Service: Simplifying Cloud-Native Development through Serverless Object Abstraction	2024-08-09	Show The function-as-a-service (FaaS) paradigm is envisioned as the next generation of cloud computing systems that mitigate the burden for cloud-native application developers by abstracting them from cloud resource management. However, it does not deal with the application data aspects. As such, developers have to intervene and undergo the burden of managing the application data, often via separate cloud storage services. To further streamline cloud-native application development, in this work, we propose a new paradigm, known as Object as a Service (OaaS) that encapsulates application data and functions into the cloud object abstraction. OaaS relieves developers from resource and data management burden while offering built-in optimization features. Inspired by OOP, OaaS incorporates access modifiers and inheritance into the serverless paradigm that: (a) prevents developers from compromising the system via accidentally accessing underlying data; and (b) enables software reuse in cloud-native application development. Furthermore, OaaS natively supports dataflow semantics. It enables developers to define function workflows while transparently handling data navigation, synchronization, and parallelism issues. To establish the OaaS paradigm, we develop a platform named Oparaca that offers state abstraction for structured and unstructured data with consistency and fault-tolerant guarantees. We evaluated Oparaca under real-world settings against state-of-the-art platforms with respect to the imposed overhead, scalability, and ease of use. The results demonstrate that the object abstraction provided by OaaS can streamline flexible and scalable cloud-native application development with an insignificant overhead on the underlying serverless system.
Detection of Compromised Functions in a Serverless Cloud Environment	2024-08-05	Show Serverless computing is an emerging cloud paradigm with serverless functions at its core. While serverless environments enable software developers to focus on developing applications without the need to actively manage the underlying runtime infrastructure, they open the door to a wide variety of security threats that can be challenging to mitigate with existing methods. Existing security solutions do not apply to all serverless architectures, since they require significant modifications to the serverless infrastructure or rely on third-party services for the collection of more detailed data. In this paper, we present an extendable serverless security threat detection model that leverages cloud providers' native monitoring tools to detect anomalous behavior in serverless applications. Our model aims to detect compromised serverless functions by identifying post-exploitation abnormal behavior related to different types of attacks on serverless functions, and therefore, it is a last line of defense. Our approach is not tied to any specific serverless application, is agnostic to the type of threats, and is adaptable through model adjustments. To evaluate our model's performance, we developed a serverless cybersecurity testbed in an AWS cloud environment, which includes two different serverless applications and simulates a variety of attack scenarios that cover the main security threats faced by serverless functions. Our evaluation demonstrates our model's ability to detect all implemented attacks while maintaining a negligible false alarm rate.
Caching Aided Multi-Tenant Serverless Computing	2024-08-01	Show One key to enabling high-performance serverless computing is to mitigate cold-starts. Current solutions utilize a warm pool to keep function alive: a warm-start can be analogous to a CPU cache-hit. However, modern cache has multiple hierarchies and the last-level cache is shared among cores, whereas the warm pool is limited to a single tenant for security concerns. Also, the warm pool keep-alive policy can be further optimized using cache replacement algorithms. In this paper, we borrow practical optimizations from caching, and design FaasCamp, a caching-aided multi-tenant serverless computing framework. FaasCamp extends the single-tier warm pool into multi-tiers, with a reclaim pool introduced enabling secure function instance sharing among tenants. Also, FaasCamp leverages machine learning to approximate the optimal cache replacement policy to improve the warm rate. We have implemented a prototype and conducted extensive experiments under multiple scenarios. The results show that FaasCamp can outperform existing platforms with minimal overhead.
Litmus: Fair Pricing for Serverless Computing	2024-08-01	Show Serverless computing has emerged as a market-dominant paradigm in modern cloud computing, benefiting both cloud providers and tenants. While service providers can optimize their machine utilization, tenants only need to pay for the resources they use. To maximize resource utilization, these serverless systems co-run numerous short-lived functions, bearing frequent system condition shifts. When the system gets overcrowded, a tenant's function may suffer from disturbing slowdowns. Ironically, tenants also incur higher costs during these slowdowns, as commercial serverless platforms determine costs proportional to their execution times. This paper argues that cloud providers should compensate tenants for losses incurred when the server is over-provisioned. However, estimating tenants' losses is challenging without pre-profiled information about their functions. Prior studies have indicated that assessing tenant losses leads to heavy overheads. As a solution, this paper introduces a new pricing model that offers discounts based on the machine's state while presuming the tenant's loss under that state. To monitor the machine state accurately, Litmus pricing frequently conducts Litmus tests, an effective and lightweight solution for measuring system congestion. Our experiments show that Litmus pricing can accurately gauge the impact of system congestion and offer nearly ideal prices, with only a 0.2% price difference on average, in a heavily congested system.
Software Resource Disaggregation for HPC with Serverless Computing	2024-07-26	Show Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of supercomputing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this paper, we improve the utilization of supercomputers by employing the new cloud paradigm of serverless computing. We show how serverless functions provide fine-grained access to the resources of batch-managed cluster nodes. We present an HPC-oriented Function-as-a-Service (FaaS) that satisfies the requirements of high-performance applications. We demonstrate a software resource disaggregation approach where placing functions on unallocated and underutilized nodes allows idle cores and accelerators to be utilized while retaining near-native performance.	Accep... Accepted for publication in the 2024 International Parallel and Distributed Processing Symposium (IPDPS)
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models	2024-07-25	Show This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs). By harnessing the substantial near-GPU storage and memory capacities of inference servers, ServerlessLLM achieves effective local checkpoint storage, minimizing the need for remote checkpoint downloads and ensuring efficient checkpoint loading. The design of ServerlessLLM features three core contributions: (i) \emph{fast multi-tier checkpoint loading}, featuring a new loading-optimized checkpoint format and a multi-tier loading system, fully utilizing the bandwidth of complex storage hierarchies on GPU servers; (ii) \emph{efficient live migration of LLM inference}, which enables newly initiated inferences to capitalize on local checkpoint storage while ensuring minimal user interruption; and (iii) \emph{startup-time-optimized model scheduling}, which assesses the locality statuses of checkpoints on each server and schedules the model onto servers that minimize the time to start the inference. Comprehensive evaluations, including microbenchmarks and real-world scenarios, demonstrate that ServerlessLLM dramatically outperforms state-of-the-art serverless systems, reducing latency by 10 - 200X across various LLM inference workloads.	18th ... 18th USENIX Symposium on Operating Systems Design and Implementation
Tutorial: Object as a Service (OaaS) Serverless Cloud Computing Paradigm	2024-07-24	Show While the first generation of cloud computing systems mitigated the job of system administrators, the next generation of cloud computing systems is emerging to mitigate the burden for cloud developers -- facilitating the development of cloud-native applications. This paradigm shift is primarily happening by offering higher-level serverless abstractions, such as Function as a Service (FaaS). Although FaaS has successfully abstracted developers from the cloud resource management details, it falls short in abstracting the management of both data (i.e., state) and the non-functional aspects, such as Quality of Service (QoS) requirements. The lack of such abstractions implies developer intervention and is counterproductive to the objective of mitigating the burden of cloud-native application development. To further streamline cloud-native application development, we present Object-as-a-Service (OaaS) -- a serverless paradigm that borrows the object-oriented programming concepts to encapsulate application logic and data in addition to non-functional requirements into a single deployment package, thereby streamlining provider-agnostic cloud-native application development. We realized the OaaS paradigm through the development of an open-source platform called Oparaca. In this tutorial, we will present the concept and design of the OaaS paradigm and its implementation -- the Oparaca platform. Then, we give a tutorial on developing and deploying the application on the Oparaca platform and discuss its benefits and its optimal configurations to avoid potential overheads.
FaaS Is Not Enough: Serverless Handling of Burst-Parallel Jobs	2024-07-19	Show Function-as-a-Service (FaaS) struggles with burst-parallel jobs due to needing multiple independent invocations to start a job. The lack of a group invocation primitive complicates application development and overlooks crucial aspects like locality and worker communication. We introduce a new serverless solution designed specifically for burst-parallel jobs. Unlike FaaS, our solution ensures job-level isolation using a group invocation primitive, allowing large groups of workers to be launched simultaneously. This method optimizes resource allocation by consolidating workers into fewer containers, speeding up their initialization and enhancing locality. Enhanced locality drastically reduces remote communication compared to FaaS, and combined with simultaneity, it enables workers to communicate synchronously via message passing and group collectives. This makes applications that are impractical with FaaS feasible. We implemented our solution on OpenWhisk, providing a communication middleware that efficiently uses locality with zero-copy messaging. Evaluations show that it reduces job invocation and communication latency, resulting in a 2$\times$ speed-up for TeraSort and a 98.5% reduction in remote communication for PageRank (13$\times$ speed-up) compared to traditional FaaS.	14 pa... 14 pages, 11 figures, article preprint
On the Complexity of Reachability Properties in Serverless Function Scheduling	2024-07-19	Show Functions-as-a-Service (FaaS) is a Serverless Cloud paradigm where a platform manages the execution scheduling (e.g., resource allocation, runtime environments) of stateless functions. Recent developments demonstrate the benefits of using domain-specific languages to express per-function scheduling policies, e.g., enforcing the allocation of functions on nodes that enjoy low data-access latencies thanks to proximity and connection pooling. We present aAPP, an affinity-aware extension of a platform-agnostic function scheduling language. We formalise its scheduling semantics and then study the complexity of statically checking reachability properties, e.g., useful to verify that trusted and untrusted functions cannot be co-located. Analysing different fragments of aAPP, we show that checking reachability of policies without affinity has linear complexity, while affinity makes the problem PSpace.	26 pa... 26 pages, 3 figures, 2 listing, appendix
Comprehensive Review of Performance Optimization Strategies for Serverless Applications on AWS Lambda	2024-07-15	Show This review paper synthesizes the latest research on performance optimization strategies for serverless applications deployed on AWS Lambda. By examining recent studies, we highlight the challenges, solutions, and best practices for enhancing the performance, cost efficiency, and scalability of serverless applications. The review covers a range of optimization techniques including resource management, runtime selection, observability improvements, and workload aware operations.	7 pages
Quantum Serverless Paradigm and Application Development using the QFaaS Framework	2024-07-03	Show Quantum computing has the potential to solve complex problems beyond the capabilities of classical computers. However, its practical use is currently limited due to early-stage quantum software engineering and the constraints of Noisy Intermediate-Scale Quantum (NISQ) devices. To address this issue, this chapter introduces the concept of serverless quantum computing with examples using QFaaS, a practical Quantum Function-as-a-Service framework. This framework utilizes the serverless computing model to simplify quantum application development and deployment by abstracting the complexities of quantum hardware and enhancing application portability across different quantum software development kits and quantum backends. The chapter provides comprehensive documentation and guidelines for deploying and using QFaaS, detailing the setup, component deployment, and examples of service-oriented quantum applications. This framework offers a promising approach to overcoming current limitations and advancing the practical software engineering of quantum computing.	Guide... Guidelines for deploying and using the QFaaS Framework (for the original paper, see https://doi.org/10.1016/j.future.2024.01.018)
Deploying AI-Based Applications with Serverless Computing in 6G Networks: An Experimental Study	2024-07-01	Show Future 6G networks are expected to heavily utilize machine learning capabilities in a wide variety of applications with features and benefits for both, the end user and the provider. While the options for utilizing these technologies are almost endless, from the perspective of network architecture and standardized service, the deployment decisions on where to execute the AI-tasks are critical, especially when considering the dynamic and heterogeneous nature of processing and connectivity capability of 6G networks. On the other hand, conceptual and standardization work is still in its infancy, as to how to categorizes ML applications in 6G landscapes; some of them are part of network management functions, some target the inference itself, while many others emphasize model training. It is likely that future mobile services may all be in the AI domain, or combined with AI. This work makes a case for the serverless computing paradigm to be used to this end. We first provide an overview of different machine learning applications that are expected to be relevant in 6G networks. We then create a set of general requirements for software engineering solutions execu 629A ting these workloads from them and propose and implement a high-level edge-focused architecture to execute such tasks. We then map the ML-serverless paradigm to the case study of 6G architecture and test the resulting performance experimentally for a machine learning application against a setup created in a more traditional, cloud-based manner. Our results show that, while there is a trade-off in predictability of the response times and the accuracy, the achieved median accuracy in a 6G setup remains the same, while the median response time decreases by around 25% compared to the cloud setup.	Submi... Submitted to https://ai-for-6g.com/
Imaginary Machines: A Serverless Model for Cloud Applications	2024-06-30	Show Serverless Function-as-a-Service (FaaS) platforms provide applications with resources that are highly elastic, quick to instantiate, accounted at fine granularity, and without the need for explicit runtime resource orchestration. This combination of the core properties underpins the success and popularity of the serverless FaaS paradigm. However, these benefits are not available to most cloud applications because they are designed for networked virtual machines/containers environments. Since such cloud applications cannot take advantage of the highly elastic resources of serverless and require run-time orchestration systems to operate, they suffer from lower resource utilization, additional management complexity, and costs relative to their FaaS serverless counterparts. We propose Imaginary Machines, a new serverless model for cloud applications. This model (1.) exposes the highly elastic resources of serverless platforms as the traditional network-of-hosts model that cloud applications expect, and (2.) it eliminates the need for explicit run-time orchestration by transparently managing application resources based on signals generated during cloud application executions. With the Imaginary Machines model, unmodified cloud applications become serverless applications. While still based on the network-of-host model, they benefit from the highly elastic resources and do not require runtime orchestration, just like their specialized serverless FaaS counterparts, promising increased resource utilization while reducing management costs.
Evaluating Serverless Machine Learning Performance on Google Cloud Run	2024-06-24	Show End-users can get functions-as-a-service from serverless platforms, which promise lower hosting costs, high availability, fault tolerance, and dynamic flexibility for hosting individual functions known as microservices. Machine learning tools are seen to be reliably useful, and the services created using these tools are in increasing demand on a large scale. The serverless platforms are uniquely suited for hosting these machine learning services to be used for large-scale applications. These platforms are well known for their cost efficiency, fault tolerance, resource scaling, robust APIs for communication, and global reach. However, machine learning services are different from the web-services in that these serverless platforms were originally designed to host web services. We aimed to understand how these serverless platforms handle machine learning workloads with our study. We examine machine learning performance on one of the serverless platforms - Google Cloud Run, which is a GPU-less infrastructure that is not designed for machine learning application deployment.	5 pages, 12 figures
Hydra: Virtualized Multi-Language Runtime for High-Density Serverless Platforms	2024-06-20	Show Serverless is an attractive computing model that offers seamless scalability and elasticity; it takes the infrastructure management burden away from users and enables a pay-as-you-use billing model. As a result, serverless is becoming increasingly popular to support highly elastic and bursty workloads. However, existing platforms are supported by bloated virtualization stacks which, combined with bursty and irregular invocations, leads to high memory and latency overheads. To reduce the virtualization stack bloat, we propose Hydra, a virtualized multi-language serverless runtime capable of handling multiple invocations of functions written in different languages. To measure its impact in large platforms, we build a serverless platform that optimizes scheduling decisions to take advantage of Hydra by consolidating function invocations on a single instance, reducing the total infrastructure tax. Hydra improves the overall function density (ops/GB-sec) by 4.47$\times$ on average compared NodeJS, JVM, and CPython, the state-of-art single-language runtimes used in most serverless platforms. When reproducing the Azure Functions trace, Hydra reduces the overall memory footprint by 2.1 $\times$ and reduces the number of cold starts between 4 and 48 $\times$.
LibProf: A Python Profiler for Improving Cold Start Performance in Serverless Applications	2024-06-17	Show Serverless computing abstracts away server management, enabling automatic scaling and efficient resource utilization. However, cold-start latency remains a significant challenge, affecting end-to-end performance. Our preliminary study reveals that inefficient library initialization and usage are major contributors to this latency in Python-based serverless applications. We introduce LibProf, a Python profiler that uses dynamic program analysis to identify inefficient library initializations. LibProf collects library usage data through statistical sampling and call-path profiling, then generates a report to guide developers in addressing four types of inefficiency patterns. Systematic evaluations on 15 serverless applications demonstrate that LibProf effectively identifies inefficiencies. LibProf guided optimization results up to 2.26x speedup in cold-start execution time and 1.51x reduction in memory usage.
sAirflow: Adopting Serverless in a Legacy Workflow Scheduler	2024-06-03	Show Serverless clouds promise efficient scaling, reduced toil and monetary costs. Yet, serverless-ing a complex, legacy application might require major refactoring and thus is risky. As a case study, we use Airflow, an industry-standard workflow system. To reduce migration risk, we propose to limit code modifications by relying on change data capture (CDC) and message queues for internal communication. To achieve serverless efficiency, we rely on Function-as-a-Service (FaaS). Our system, sAirflow, is the first adaptation of the control plane and workers to the serverless cloud - and it maintains the same interface and most of the code. Experimentally, we show that sAirflow delivers the key serverless benefits: scaling and cost reduction. We compare sAirflow to MWAA, a managed (SaaS) Airflow. On Alibaba benchmarks on warm systems, sAirflow performs similarly while halving the monetary cost. On highly parallel workflows on cold systems, sAirflow scales out in seconds to 125 workers, reducing makespan by 2x-7x.
GeoFF: Federated Serverless Workflows with Data Pre-Fetching	2024-05-22	Show Function-as-a-Service (FaaS) is a popular cloud computing model in which applications are implemented as work flows of multiple independent functions. While cloud providers usually offer composition services for such workflows, they do not support cross-platform workflows forcing developers to hardcode the composition logic. Furthermore, FaaS workflows tend to be slow due to cascading cold starts, inter-function latency, and data download latency on the critical path. In this paper, we propose GeoFF, a serverless choreography middleware that executes FaaS workflows across different public and private FaaS platforms, including ad-hoc workflow recomposition. Furthermore, GeoFF supports function pre-warming and data pre-fetching. This minimizes end-to-end workflow latency by taking cold starts and data download latency off the critical path. In experiments with our proof-of-concept prototype and a realistic application, we were able to reduce end-to-end latency by more than 50%.
ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving	2024-05-17	Show Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, monitoring and autoscaling service towards serverless LLM serving. ENOVA deconstructs the execution process of LLM service comprehensively, based on which ENOVA designs a configuration recommendation module for automatic deployment on any GPU clusters and a performance detection module for autoscaling. On top of them, ENOVA implements a deployment execution engine for multi-GPU cluster scheduling. The experiment results show that ENOVA significantly outperforms other state-of-the-art methods and is suitable for wide deployment in large online systems.

Container

Title	Date	Abstract	Comment
Efficient isogeometric Boundary Element simulation of elastic domains containing thin inclusions	2025-05-25	Show This paper is concerned with the Boundary Element simulation of elastic domains that contain thin inclusions that have elastic material properties, which are different to the domain. With thin inclusions we mean inclusions with extreme aspect ratios, i.e. where one dimension is much smaller than the other ones. Examples of this are reinforcements in civil/mechanical engineering or concrete linings in underground construction. The fact that an inclusion has an extreme aspect ratio poses a challenge to the numerical integration of the arising singular integrals and novel approaches are presented to deal with it. Several examples demonstrate the efficiency and accuracy of the proposed methods and show that the results are in good agreement with analytical and other numerical solutions.	30 pa... 30 pages, 22 figures, 1 appendix
Containment for Guarded Monotone Strict NP	2025-05-16	Show Guarded Monotone Strict NP (GMSNP) extends Monotone Monadic Strict NP (MMSNP) by guarded existentially quantified predicates of arbitrary arities. We prove that the containment problem for GMSNP is decidable, thereby settling an open question of Bienvenu, ten Cate, Lutz, and Wolter, later restated by Bourhis and Lutz. Our proof also comes with a 2NEXPTIME upper bound on the complexity of the problem, which matches the lower bound for containment of MMSNP due to Bourhis and Lutz. In order to obtain these results, we significantly improve the state of knowledge of the model-theoretic properties of GMSNP. Bodirsky, Kn"{a}uer, and Starke previously showed that every GMSNP sentence defines a finite union of CSPs of $\omega$-categorical structures. We show that these structures can be used to obtain a reduction from the containment problem for GMSNP to the much simpler problem of testing the existence of a certain map called recolouring, albeit in a more general setting than GMSNP; a careful analysis of this yields said upper bound. As a secondary contribution, we refine the construction of Bodirsky, Kn"{a}uer, and Starke by adding a restricted form of homogeneity to the properties of these structures, making the logic amenable to future complexity classifications for query evaluation using techniques developed for infinite-domain CSPs.
Extending the Control Plane of Container Orchestrators for I/O Virtualization	2025-05-09	Show Single Root Input/Output Virtualization (SR-IOV) is a standard technology for forking a single PCI express device and providing it to applications while ensuring performance isolation. It enables container orchestrators to share a limited number of physical network interfaces without incurring significant virtualization overhead. The allocation of virtualized network devices to containers, however, needs to be more configurable based on the bandwidth needs of running applications. Moreover, container orchestrators' network control over the virtualized interfaces is limited by the abilities of SR-IOV. We explore the design considerations for a system with controlled SR-IOV virtualization and present ConRDMA, a novel architecture that enables fine control of RDMA virtualization for containers. Our evaluation shows that ConRDMA enables containers to use RDMA allocated bandwidth more efficiently and to select best-suited nodes to meet their varying communication requirements.
Performance Characterization of Containers in Edge Computing	2025-05-08	Show Edge computing addresses critical limitations of cloud computing such as high latency and network congestion by decentralizing processing from cloud to the edge. However, the need for software replication across heterogeneous edge devices introduces dependency and portability challenges, driving the adoption of containerization technologies like Docker. While containers offer lightweight isolation and deployment advantages, they introduce new bottlenecks in edge environments, including cold-start delays, memory constraints, network throughput variability, and inefficient IO handling when interfacing with embedded peripherals. This paper presents an empirical evaluation of Docker containers on resource-constrained edge devices, using Raspberry Pi as a representative platform. We benchmark performance across diverse workloads, including microbenchmarks (CPU, memory, network profiling) and macrobenchmarks (AI inference, sensor IO operations), to quantify the overheads of containerization in real-world edge scenarios. Our testbed comprises physical Raspberry Pi nodes integrated with environmental sensors and camera modules, enabling measurements of latency, memory faults, IO throughput, and cold start delays under varying loads. Key findings reveal trade-offs between container isolation and edge-specific resource limitations, with performance degradation observed in IO heavy and latency sensitive tasks. We identify configuration optimizations to mitigate these issues, providing actionable insights for deploying containers in edge environments while meeting real time and reliability requirements. This work advances the understanding of containerized edge computing by systematically evaluating its feasibility and pitfalls on low-power embedded systems.
SCU-Hand: Soft Conical Universal Robotic Hand for Scooping Granular Media from Containers of Various Sizes	2025-05-07	Show Automating small-scale experiments in materials science presents challenges due to the heterogeneous nature of experimental setups. This study introduces the SCU-Hand (Soft Conical Universal Robot Hand), a novel end-effector designed to automate the task of scooping powdered samples from various container sizes using a robotic arm. The SCU-Hand employs a flexible, conical structure that adapts to different container geometries through deformation, maintaining consistent contact without complex force sensing or machine learning-based control methods. Its reconfigurable mechanism allows for size adjustment, enabling efficient scooping from diverse container types. By combining soft robotics principles with a sheet-morphing design, our end-effector achieves high flexibility while retaining the necessary stiffness for effective powder manipulation. We detail the design principles, fabrication process, and experimental validation of the SCU-Hand. Experimental validation showed that the scooping capacity is about 20% higher than that of a commercial tool, with a scooping performance of more than 95% for containers of sizes between 67 mm to 110 mm. This research contributes to laboratory automation by offering a cost-effective, easily implementable solution for automating tasks such as materials synthesis and characterization processes.	2025 ... 2025 IEEE International Conference on Robotics and Automation (ICRA2025). Preprint. Accepted January 2025
Optimizing Intra-Container Communication with Memory Protection Keys: A Novel Approach to Secure and Efficient Microservice Interaction	2025-05-04	Show In modern cloud-native applications, microservices are commonly deployed in containerized environments to ensure scalability and flexibility. However, inter-process communication (IPC) between co-located microservices often suffers from significant overhead, especially when traditional networking protocols are employed within containers. This paper introduces a novel approach, MPKLink, leveraging Intel Memory Protection Keys (MPK) to enhance intra-container communication efficiency while ensuring security. By utilizing shared memory with MPK-based access control, we eliminate unnecessary networking latencies, leading to reduced resource consumption and faster response times. We present a comprehensive evaluation of MPKLink, demonstrating its superior performance over conventional methods such as REST and gRPC within microservice architectures. Furthermore, we explore the integration of this approach with existing container orchestration platforms, showcasing its seamless adoption in real-world deployment scenarios. This work provides a transformative solution for developers looking to optimize communication in microservices while maintaining the integrity and security of containerized applications.	7 pag... 7 pages, 3 figures, 1 table
Integrated optimization of operations and capacity planning under uncertainty for drayage procurement in container logistics	2025-05-03	Show We present an integrated framework for truckload procurement in container logistics, bridging strategic and operational aspects that are often treated independently in existing research. Drayage, the short-haul trucking of containers, plays a critical role in intermodal container logistics. Using dynamic programming, we identify optimal operational policies for allocating drayage volumes among capacitated carriers under uncertain container flows and spot rates. The computational complexity of optimization under uncertainty is mitigated through sample average approximation. These optimal policies serve as the basis for evaluating specific capacity arrangements. To optimize capacity reservations with strategic and spot carriers, we employ an efficient quasi-Newton method. Numerical experiments demonstrate significant cost-efficiency improvements, including a 21.2% cost reduction in a four-period scenario. Monte Carlo simulations further highlight the strong generalization capabilities of the proposed joint optimization method across out-of-sample scenarios. These findings underscore the importance of integrating strategic and operational decisions to enhance cost efficiency in truckload procurement under uncertainty.
Dynamic Dimensioning of Frequency Containment Reserves: The Case of the Nordic Grid	2025-05-02	Show One of the main responsibilities of a Transmission System Operator (TSO) operating an electric grid is to maintain a designated frequency (e.g., 50 Hz in Europe). To achieve this, TSOs have created several products called frequency-supporting ancillary services. The Frequency Containment Reserve (FCR) is one of these ancillary service products. This article focuses on the TSO problem of determining the volume procured for FCR. Specifically, we investigate the potential benefits and impact on grid security when transitioning from a traditionally \textit{static} procurement method to a \textit{dynamic} strategy for FCR volume. We take the Nordic synchronous area in Europe as a case study and use a diffusion model to capture its frequency development. We introduce a controlled mean reversal parameter to assess changes in FCR obligations, in particular for the Nordic FCR-N ancillary service product. We establish closed-form expressions for exceedance probabilities and use historical frequency data as input to calibrate the model. We show that a dynamic dimensioning approach for FCR has the potential to significantly reduce the exceedance probabilities (up to $37%$) while maintaining the total yearly procured FCR volume equal to that of the current static approach. Alternatively, a dynamic dimensioning approach could significantly increase security at limited extra cost.	12 pa... 12 pages, 12 figures, submitted to IEEE Transactions on Power Systems
Weihrauch problems as containers	2025-04-22	Show We note that Weihrauch problems can be regarded as containers over the category of projective represented spaces and that Weihrauch reductions correspond exactly to container morphisms. We also show that Bauer's extended Weihrauch degrees and the posetal reflection of containers over partition assemblies are equivalent. Using this characterization, we show how a number of operators over Weihrauch degrees, such as the composition product, also arise naturally from the abstract theory of polynomial functors.	26 pa... 26 pages, minor edits following reviews for a conference version
An Enhanced Iterative Deepening Search Algorithm for the Unrestricted Container Rehandling Problem	2025-04-19	Show In container terminal yards, the Container Rehandling Problem (CRP) involves rearranging containers between stacks under specific operational rules, and it is a pivotal optimization challenge in intelligent container scheduling systems. Existing CRP studies primarily focus on minimizing reallocation costs using two-dimensional bay structures, considering factors such as container size, weight, arrival sequences, and retrieval priorities. This paper introduces an enhanced deepening search algorithm integrated with improved lower bounds to boost search efficiency. To further reduce the search space, we design mutually consistent pruning rules to avoid excessive computational overhead. The proposed algorithm is validated on three widely used benchmark datasets for the Unrestricted Container Rehandling Problem (UCRP). Experimental results demonstrate that our approach outperforms state-of-the-art exact algorithms in solving the more general UCRP variant, particularly exhibiting superior efficiency when handling containers within the same priority group under strict time constraints.	Verif... Verification confirmed that this article used data without authorization from the original owners, violating ethical standards for scientific data sharing. To protect data copyright and maintain research integrity, the article is retracted. Future content will strictly follow data usage protocols
Robust Containment Queries over Collections of Trimmed NURBS Surfaces via Generalized Winding Numbers	2025-04-15	Show Efficient and accurate evaluation of containment queries for regions bound by trimmed NURBS surfaces is important in many graphics and engineering applications. However, the algebraic complexity of surface-surface intersections makes gaps and overlaps between surfaces difficult to avoid for in-the-wild surface models. By considering this problem through the lens of the generalized winding number (GWN), a mathematical construction that is indifferent to the arrangement of surfaces in the shape, we can define a containment query that is robust to model watertightness. Applying contemporary techniques for the 3D GWN on arbitrary curved surfaces would require some form of geometric discretization, potentially inducing containment misclassifications near boundary components. In contrast, our proposed method computes an accurate GWN directly on the curved geometry of the input model. We accomplish this using a novel reformulation of the relevant surface integral using Stokes' theorem, which in turn permits an efficient adaptive quadrature calculation on the boundary and trimming curves of the model. While this is sufficient for "far-field" query points that are distant from the surface, we augment this approach for "near-field" query points (i.e., within a bounding box) and even those coincident to the surface patches via a strategy that directly identifies and accounts for the jump discontinuity in the scalar field. We demonstrate that our method of evaluating the GWN field is robust to complex trimming geometry in a CAD model, and is accurate up to arbitrary precision at arbitrary distances from the surface. Furthermore, the derived containment query is robust to non-watertightness while respecting all curved features of the input shape.	20 Pa... 20 Pages, 18 Figures, 2 Tables
Container-level Energy Observability in Kubernetes Clusters	2025-04-14	Show Kubernetes has been for a number of years the default cloud orchestrator solution across multiple application and research domains. As such, optimizing the energy efficiency of Kubernetes-deployed workloads is of primary interest towards controlling operational expenses by reducing energy consumption at data center level and allocated resources at application level. A lot of research in this direction aims on reducing the total energy usage of Kubernetes clusters without establishing an understanding of their workloads, i.e. the applications deployed on the cluster. This means that there are untapped potential improvements in energy efficiency that can be achieved through, for example, application refactoring or deployment optimization. For all these cases a prerequisite is establishing fine-grained observability down to the level of individual containers and their power draw over time. A state-of-the-art tool approved by the Cloud-Native Computing Foundation, Kepler, aims to provide this functionality, but has not been assessed for its accuracy and therefore fitness for purpose. In this work we start by developing an experimental procedure to this goal, and we conclude that the reported energy usage metrics provided by Kepler are not at a satisfactory level. As a reaction to this, we develop KubeWatt as an alternative to Kepler for specific use case scenarios, and demonstrate its higher accuracy through the same experimental procedure as we used for Kepler.	11 pa... 11 pages, accepted for publication at ICT4S 2025
Managing Security Issues in Software Containers: From Practitioners Perspective	2025-04-10	Show Software development industries are increasingly adopting containers to enhance the scalability and flexibility of software applications. Security in containerized projects is a critical challenge that can lead to data breaches and performance degradation, thereby directly affecting the reliability and operations of the container services. Despite the ongoing effort to manage the security issues in containerized projects in software engineering (SE) research, more focused investigations are needed to explore the human perspective of security management and the technical approaches to security management in containerized projects. This research aims to explore security management in containerized projects by exploring how SE practitioners perceive the security issues in containerized software projects and their approach to managing such issues. A clear understanding of security management in containerized projects will enable industries to develop robust security strategies that enhance software reliability and trust. To achieve this, we conducted two separate semi-structured interview studies to examine how practitioners approach security management. The first study focused on practitioners perceptions of security challenges in containerized environments, where we interviewed 15 participants between December 2022 and October 2023. The second study explored how to enhance container security, with 20 participants interviewed between October 2024 and December 2024. Analyzing the data from both studies reveals how SE practitioners address the various security challenges in containerized projects. Our analysis also identified the technical and non-technical enablers that can be utilized to enhance security.	no comments
Formalising Inductive and Coinductive Containers	2025-04-07	Show Containers capture the concept of strictly positive data types in programming. The original development of containers is done in the internal language of locally cartesian closed categories (LCCCs) with disjoint coproducts and W-types, and uniqueness of identity proofs (UIP) is implicitly assumed throughout. Although it is claimed that these developments can also be interpreted in extensional Martin-L"of type theory, this interpretation is not made explicit. In this paper, we present a formalisation of the results that 'containers preserve least and greatest fixed points' in Cubical Agda, thereby giving a formulation in intensional type theory. Our proofs do not make use of UIP and thereby generalise the original results from talking about container functors on Set to container functors on the wild category of types. Our main incentive for using Cubical Agda is that its path type restores the equivalence between bisimulation and coinductive equality. Thus, besides developing container theory in a more general setting, we also demonstrate the usefulness of Cubical Agda's path type to coinductive proofs.
Continuous reasoning for adaptive container image distribution in the cloud-edge continuum	2025-04-05	Show Cloud-edge computing requires applications to operate across diverse infrastructures, often triggered by cyber-physical events. Containers offer a lightweight deployment option but pulling images from central repositories can cause delays. This article presents a novel declarative approach and open-source prototype for replicating container images across the cloud-edge continuum. Considering resource availability, network QoS, and storage costs, we leverage logic programming to (i) determine optimal initial placements via Answer Set Programming (ASP) and (ii) adapt placements using Prolog-based continuous reasoning. We evaluate our solution through simulations, showcasing how combining ASP and Prolog continuous reasoning can balance cost optimisation and prompt decision-making in placement adaptation at increasing infrastructure sizes.
Malware Detection in Docker Containers: An Image is Worth a Thousand Logs	2025-04-04	Show Malware detection is increasingly challenged by evolving techniques like obfuscation and polymorphism, limiting the effectiveness of traditional methods. Meanwhile, the widespread adoption of software containers has introduced new security challenges, including the growing threat of malicious software injection, where a container, once compromised, can serve as entry point for further cyberattacks. In this work, we address these security issues by introducing a method to identify compromised containers through machine learning analysis of their file systems. We cast the entire software containers into large RGB images via their tarball representations, and propose to use established Convolutional Neural Network architectures on a streaming, patch-based manner. To support our experiments, we release the COSOCO dataset--the first of its kind--containing 3364 large-scale RGB images of benign and compromised software containers at https://huggingface.co/datasets/k3ylabs/cosoco-image-dataset. Our method detects more malware and achieves higher F1 and Recall scores than all individual and ensembles of VirusTotal engines, demonstrating its effectiveness and setting a new standard for identifying malware-compromised software containers.	Accepted at ICC-W
Bag Semantics Conjunctive Query Containment. Four Small Steps Towards Undecidability	2025-03-23	Show Query Containment Problem (QCP) is one of the most fundamental decision problems in database query processing and optimization. Complexity of QCP for conjunctive queries (QCP-CQ) has been fully understood since 1970s. But, as Chaudhuri and Vardi noticed in their classical 1993 paper [1], this understanding is based on the assumption that query answers are sets of tuples, and it does not transfer to the situation when multi-set (bag) semantics is considered. Now, 30 years after [1] was written, decidability of QCP-CQ for bag semantics remains an open question, one of the most intriguing open questions in database theory. In this paper we show a series of undecidability results for some generalizations of bag-semantics QCP-CQ. We show, for example, that the problem whether, for given two boolean conjunctive queries Q and Q' , and a linear function F, the inequality F(Q(D)) =< Q'(D) holds for each database instance D, is undecidable	28 pages
Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem	2025-03-21	Show In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk. Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty -- failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising: (1) a curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance, and (2) an offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost. Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.
Distributive Laws of Monadic Containers	2025-03-21	Show Containers are used to carve out a class of strictly positive data types in terms of shapes and positions. They can be interpreted via a fully-faithful functor into endofunctors on Set. Monadic containers are those containers whose interpretation as a Set functor carries a monad structure. The category of containers is closed under container composition and is a monoidal category, whereas monadic containers do not in general compose. In this paper, we develop a characterisation of distributive laws of monadic containers. Distributive laws were introduced as a sufficient condition for the composition of the underlying functors of two monads to also carry a monad structure. Our development parallels Ahman and Uustalu's characterisation of distributive laws of directed containers, i.e. containers whose Set functor interpretation carries a comonad structure. Furthermore, by combining our work with theirs, we construct characterisations of mixed distributive laws (i.e. of directed containers over monadic containers and vice versa), thereby completing the 'zoo' of container characterisations of (co)monads and their distributive laws. We have found these characterisations amenable to development of existence and uniqueness proofs of distributive laws, particularly in the mechanised setting of Cubical Agda, in which most of the theory of this paper has been formalised.	15 pa... 15 pages main text, 11 pages references and appendices
Bag Semantics Query Containment: The CQ vs. UCQ Case and Other Stories	2025-03-20	Show Query Containment Problem (QCP) is a fundamental decision problem in query processing and optimization. While QCP has for a long time been completely understood for the case of set semantics, decidability of QCP for conjunctive queries under multi-set semantics ($QCP_{\text{CQ}}^{\text{bag}}$) remains one of the most intriguing open problems in database theory. Certain effort has been put, in last 30 years, to solve this problem and some decidable special cases of $QCP_{\text{CQ}}^{\text{bag}}$ were identified, as well as some undecidable extensions, including $QCP_{\text{UCQ}}^{\text{bag}}$. In this paper we introduce a new technique which produces, for a given UCQ $\Phi$, a CQ $\phi$ such that the application of $\phi$ to a database $D$ is, in some sense, an approximation of the application of $\Phi$ to $D$. Using this technique we could analyze the status of $QCP^{\text{bag}}$ when one of the queries in question is a CQ and the other is a UCQ, and we reached conclusions which surprised us a little bit. We also tried to use this technique to translate the known undecidability proof for $QCP_{\text{UCQ}}^{\text{bag}}$ into a proof of undecidability of $QCP_{\text{CQ}}^{\text{bag}}$. And, as you are going to see, we got stopped just one infinitely small $\varepsilon$ before reaching this ultimate goal.	Added... Added funding information
Navigating Demand Uncertainty in Container Shipping: Deep Reinforcement Learning for Enabling Adaptive and Feasible Master Stowage Planning	2025-03-20	Show Reinforcement learning (RL) has shown promise in solving various combinatorial optimization problems. However, conventional RL faces challenges when dealing with real-world constraints, especially when action space feasibility is explicit and dependent on the corresponding state or trajectory. In this work, we focus on using RL in container shipping, often considered the cornerstone of global trade, by dealing with the critical challenge of master stowage planning. The main objective is to maximize cargo revenue and minimize operational costs while navigating demand uncertainty and various complex operational constraints, namely vessel capacity and stability, which must be dynamically updated along the vessel's voyage. To address this problem, we implement a deep reinforcement learning framework with feasibility projection to solve the master stowage planning problem (MPP) under demand uncertainty. The experimental results show that our architecture efficiently finds adaptive, feasible solutions for this multi-stage stochastic optimization problem, outperforming traditional mixed-integer programming and RL with feasibility regularization. Our AI-driven decision-support policy enables adaptive and feasible planning under uncertainty, optimizing operational efficiency and capacity utilization while contributing to sustainable and resilient global supply chains.	This ... This paper is currently under review for IJCAI 2025
Vexed by VEX tools: Consistency evaluation of container vulnerability scanners	2025-03-18	Show This paper presents a study that analyzed state-of-the-art vulnerability scanning tools applied to containers. We have focused the work on tools following the Vulnerability Exploitability eXchange (VEX) format, which has been introduced to complement Software Bills of Material (SBOM) with security advisories of known vulnerabilities. Being able to get an accurate understanding of vulnerabilities found in the dependencies of third-party software is critical for secure software development and risk analysis. Accepting the overwhelming challenge of estimating the precise accuracy and precision of a vulnerability scanner, we have in this study instead set out to explore how consistently different tools perform. By doing this, we aim to assess the maturity of the VEX tool field as a whole (rather than any particular tool). We have used the Jaccard and Tversky indices to produce similarity scores of tool performance for several different datasets created from container images. Overall, our results show a low level of consistency among the tools, thus indicating a low level of maturity in VEX tool space. We have performed a number of experiments to find and explanation to our results, but largely they are inconclusive and further research is needed to understand the underlying causalities of our findings.	22 pa... 22 pages, 1 listing, 18 tables
Container late-binding in unprivileged dHTC pilot systems on Kubernetes resources	2025-03-17	Show The scientific and research community has benefited greatly from containerized distributed High Throughput Computing (dHTC), both by enabling elastic scaling of user compute workloads to thousands of compute nodes, and by allowing for distributed ownership of compute resources. To effectively and efficiently deal with the dynamic nature of the setup, the most successful implementations use an overlay batch scheduling infrastructure fed by a pilot provisioning system. One fundamental property of these setups is the use of late binding of containerized user workloads. From a resource provider point of view, a compute resource is thus claimed before the user container image is selected. This paper provides a mechanism to implement this late-binding of container images on Kubernetes-managed resources, without requiring any elevated privileges.	8 pag... 8 pages, 6 figures, Accepted to PEARC25
Enable Time-Sensitive Applications in Kubernetes with Container Network Interface Plugin Agnostic Metadata Proxy	2025-03-17	Show Application deployment in cloud environment is dominated by Kubernetes-orchestrated microservices. Provides a secure environment, networking, storage, isolation, scheduling, and many other abstractions that can be easily extended to meet our needs. Time-Sensitive Applications (TSAs) have special requirements for compute and network. Deploying TSAs in Kubernetes is challenging because the networking implemented by Container Network Interface (CNI) plugins is not aware of the traffic characteristic required by Time-Sensitive Network. Even if a network interface supports TSN features (e.g.: Scheduled Traffic) and a modified CNI plugin is aware of this interface, the pod network isolation built on top of Linux deletes the metadata required for TSN protocols to work with. We propose TSN metadata proxy, a simple architecture that allows any TSA microservice to use the TSN capabilities of the physical NIC, without any modification. This architecture is tightly integrated with the Kubernetes networking model, works with popular CNI plugins, and supports services such as ClusterIP, NodePort, or LoadBalancer without additional configuration. Unlike former proposals, this architecture does not require either bypassing the Linux kernel network stack, direct access to the physical NIC, escalated privileges for the TSA microservice, or even modification of the TSA.	Prese... Presented at netdev 0x19 conference. Conference page: https://netdevconf.info/0x19/15
Towards a Digital Twin Modeling Method for Container Terminal Port	2025-03-14	Show This paper introduces a novel strategy aimed at enhancing productivity and minimizing non-productive movements within container terminals, specifically focusing on container yards. It advocates for the implementation of a digital twin-based methodology to streamline the operations of stacking cranes (SCs) responsible for container handling. The proposed approach entails the creation of a virtual container yard that mirrors the physical yard within a digital twin system, facilitating real-time observation and validation. In addition, this article demonstrates the effectiveness of using a digital twin to reduce unproductive movements and improve productivity through simulation. It defines various operational strategies and takes into account different yard contexts, providing a comprehensive understanding of optimisation possibilities. By exploiting the capabilities of the digital twin, managers and operators are provided with crucial information on operational dynamics, enabling them to identify areas for improvement. This visualisation helps decision-makers to make informed choices about their stacking strategies, thereby improving the efficiency of overall container terminal operations. Overall, this paper present a digital twin solution in container terminal operations, offering a powerful tool for optimising productivity and minimising inefficiencies.
Forecasting Empty Container availability for Vehicle Booking System Application	2025-03-14	Show Container terminals, pivotal nodes in the network of empty container movement, hold significant potential for enhancing operational efficiency within terminal depots through effective collaboration between transporters and terminal operators. This collaboration is crucial for achieving optimization, leading to streamlined operations and reduced congestion, thereby benefiting both parties. Consequently, there is a pressing need to develop the most suitable forecasting approaches to address this challenge. This study focuses on developing and evaluating a data-driven approach for forecasting empty container availability at container terminal depots within a Vehicle Booking System (VBS) framework. It addresses the gap in research concerning optimizing empty container dwell time and aims to enhance operational efficiencies in container terminal operations. Four forecasting models-Naive, ARIMA, Prophet, and LSTM-are comprehensively analyzed for their predictive capabilities, with LSTM emerging as the top performer due to its ability to capture complex time series patterns. The research underscores the significance of selecting appropriate forecasting techniques tailored to the specific requirements of container terminal operations, contributing to improved operational planning and management in maritime logistics.
Propensity Formation-Containment Control of Fully Heterogeneous Multi-Agent Systems via Online Data-Driven Learning	2025-03-12	Show This paper introduces an online data-driven learning scheme designed to address a novel problem in propensity formation and containment control for fully heterogeneous multi-agent systems. Unlike traditional approaches that rely on the eigenvalues of the Laplacian matrix, this problem considers the determination of follower positions based on propensity factors released by leaders. To address the challenge of incomplete utilization of leader information in existing multi-leader control methods, the concept of an influential transit formation leader (ITFL) is introduced. An adaptive observer is developed for the agents, including the ITFL, to estimate the state of the tracking leader or the leader's formation. Building on these observations, a model-based control protocol is proposed, elucidating the relationship between the regulation equations and control gains, ensuring the asymptotic convergence of the agent's state. To eliminate the necessity for model information throughout the control process, a new online data-driven learning algorithm is devised for the control protocol. Finally, numerical simulation results are given to verify the effectiveness of the proposed method.
Analyzing the temporal dynamics of linguistic features contained in misinformation	2025-03-10	Show Consumption of misinformation can lead to negative consequences that impact the individual and society. To help mitigate the influence of misinformation on human beliefs, algorithmic labels providing context about content accuracy and source reliability have been developed. Since the linguistic features used by algorithms to estimate information accuracy can change across time, it is important to understand their temporal dynamics. As a result, this study uses natural language processing to analyze PolitiFact statements spanning between 2010 and 2024 to quantify how the sources and linguistic features of misinformation change between five-year time periods. The results show that statement sentiment has decreased significantly over time, reflecting a generally more negative tone in PolitiFact statements. Moreover, statements associated with misinformation realize significantly lower sentiment than accurate information. Additional analysis shows that recent time periods are dominated by sources from online social networks and other digital forums, such as blogs and viral images, that contain high levels of misinformation containing negative sentiment. In contrast, most statements during early time periods are attributed to individual sources (i.e., politicians) that are relatively balanced in accuracy ratings and contain statements with neutral or positive sentiment. Named-entity recognition was used to identify that presidential incumbents and candidates are relatively more prevalent in statements containing misinformation, while US states tend to be present in accurate information. Finally, entity labels associated with people and organizations are more common in misinformation, while accurate statements are more likely to contain numeric entity labels, such as percentages and dates.
The Cure is in the Cause: A Filesystem for Container Debloating	2025-02-19	Show Containers have become a standard for deploying applications due to their convenience, but they often suffer from significant software bloat-unused files that inflate image sizes, increase provisioning times, and waste resources. These inefficiencies are particularly problematic in serverless and edge computing scenarios, where resources are constrained, and performance is critical. Existing debloating tools are limited in scope and effectiveness, failing to address the widespread issue of container bloat at scale. In this paper, we conduct a large-scale evaluation of container bloat, analyzing the top 20 most downloaded containers on DockerHub. We evaluate two state-of-the-art debloating tools, identify their limitations, and propose a novel solution, BAFFS, which addresses bloat at the filesystem level by introducing a flexible debloating layer that preserves the layered structure of container filesystems. The debloating layer can be organized in different ways to meet diverse requirements. Our evaluation demonstrates that over 50% of the top-downloaded containers have more than 60% bloat, and BAFFS reduces container sizes significantly while maintaining functionality. For serverless functions, BAFFS reduces cold start latency by up to 68%. Additionally, when combined with lazy-loading snapshotters, BAFFS enhances provisioning efficiency, reducing conversion times by up to 93% and provisioning times by up to 19%.
Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective	2025-02-19	Show Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.
KiSS: A Novel Container Size-Aware Memory Management Policy for Serverless in Edge-Cloud Continuum	2025-02-18	Show Serverless computing has revolutionized cloud architectures by enabling developers to deploy event-driven applications via lightweight, self-contained virtualized containers. However, serverless frameworks face critical cold-start challenges in resource-constrained edge environments, where traditional solutions fall short. The limitations are especially pronounced in edge environments, where heterogeneity and resource constraints exacerbate inefficiencies in resource utilization. This paper introduces KiSS (Keep it Separated Serverless), a static, container size-aware memory management policy tailored for the edge-cloud continuum. The design of KiSS is informed by a detailed workload analysis that identifies critical patterns in container size, invocation frequency, and memory contention. Guided by these insights, KiSS partitions memory pools into categories for small, frequently invoked containers and larger, resource-intensive ones, ensuring efficient resource utilization while minimizing cold starts and inter-function interference. Using a discrete-event simulator, we evaluate KiSS on edge-cluster environments with real-world-inspired workloads. Results show that KiSS reduces cold-start percentages by 60% and function drops by 56.5%, achieving significant performance gains in resource-constrained settings. This work underscores the importance of workload-driven design in advancing serverless efficiency at the edge.
pylevin: efficient numerical integration of integrals containing up to three Bessel functions	2025-02-17	Show Integrals involving highly oscillatory Bessel functions are notoriously challenging to compute using conventional integration techniques. While several methods are available, they predominantly cater to integrals with at most a single Bessel function, resulting in specialised yet highly optimised solutions. Here we present pylevin, a Python package to efficiently compute integrals containing up to three Bessel functions of arbitrary order and arguments. The implementation makes use of Levin's method and allows for accurate and fast integration of these highly oscillatory integrals. In benchmarking pylevin against existing software for single Bessel function integrals, we find its speed comparable, usually within a factor of two, to specialised packages such as FFTLog. Furthermore, when dealing with integrals containing two or three Bessel functions, pylevin delivers performance up to four orders of magnitude faster than standard adaptive quadrature methods, while also exhibiting better stability for large Bessel function arguments. pylevin is available from source via github or directly from PyPi.	10 pa... 10 pages, 3 Figures, abridged version to be submitted to JOSS, comments welcome, code available via https://github.com/rreischke/levin_bessel and https://pypi.org/project/pylevin/
Partially Frozen Random Networks Contain Compact Strong Lottery Tickets	2025-02-08	Show Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning--strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70%$ of a ResNet on ImageNet provides $3.3 \times$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.	Accepted at TMLR
Con D628 tainment Control Approach for Steering Opinion in a Social Network	2025-02-05	Show The paper studies the problem of steering multi-dimensional opinion in a social network. Assuming the society of desire consists of stubborn and regular agents, stubborn agents are considered as leaders who specify the desired opinion distribution as a distributed reward or utility function. In this context, each regular agent is seen as a follower, updating its bias on the initial opinion and influence weights by averaging their observations of the rewards their influencers have received. Assuming random graphs with reducible and irreducible topology specify the influences on regular agents, opinion evolution is represented as a containment control problem in which stability and convergence to the final opinion are proven.
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations	2025-02-04	Show Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the industry currently lacks automated solutions capable of fixing these misconfigurations. The emergence of Large Language Models (LLMs), with their proven capabilities in code understanding and generation, presents an opportunity to address this limitation. This study introduces LLMSecConfig, an innovative framework that bridges this gap by combining SATs with LLMs. Our approach leverages advanced prompting techniques and Retrieval-Augmented Generation (RAG) to automatically repair security misconfigurations while preserving operational functionality. Evaluation of 1,000 real-world Kubernetes configurations achieved a 94% success rate while maintaining a low rate of introducing new misconfigurations. Our work makes a promising step towards automated container security management, reducing the manual effort required for configuration maintenance.
On the Shape Containment Problem within the Amoebot Model with Reconfigurable Circuits	2025-01-28	Show In programmable matter, we consider a large number of tiny, primitive computational entities called particles that run distributed algorithms to control global properties of the particle structure. Shape formation problems, where the particles have to reorganize themselves into a desired shape using basic movement abilities, are particularly interesting. In the related shape containment problem, the particles are given the description of a shape $S$ and have to find maximally scaled representations of $S$ within the initial configuration, without movements. While the shape formation problem is being studied extensively, no attention has been given to the shape containment problem, which may have additional uses beside shape formation, such as detection of structural flaws. In this paper, we consider the shape containment problem within the geometric amoebot model for programmable matter, using its reconfigurable circuit extension to enable the instantaneous transmission of primitive signals on connected subsets of particles. We first prove a lower runtime bound of $\Omega(\sqrt{n})$ synchronous rounds for the general problem, where $n$ is the number of particles. Then, we construct the class of snowflake shapes and its subclass of star convex shapes, and present solutions for both. Let $k$ be the maximum scale of the considered shape in a given amoebot structure. If the shape is star convex, we solve it within $\mathcal{O}(\log^2 k)$ rounds. If it is a snowflake but not star convex, we solve it within $\mathcal{O}(\sqrt{n} \log n)$ rounds.
Containers as the Quantum Leap in Software Development	2025-01-13	Show The goal of the project QLEAP (2022-24), funded by Business Finland and participating organizations, was to study using containers as elements of architecture design. Such systems include containerized AI systems, using containers in a hybrid setup (public/hybrid/private clouds), and related security concerns. The consortium consists of four companies that represent different concerns over using containers (Bittium, M-Files, Solita/ADE Insights, Vaadin) and one research organization (University of Jyv"askyl"a). In addition, it has received support from two Veturi companies - Nokia and Tietoevry - who have also participated in steering the project. Moreover, the SW4E ecosystem has participated in the project. This document gathers the key lessons learned from the project.
On the Interaction in Transient Stability of Two-Inverter Power Systems containing GFL inverter Using Manifold Method	2025-01-10	Show Many renewable energy resources are integrated into power systems via grid-following (GFL) inverters which rely on a phase-locked loop (PLL) for grid synchronization. During severe grid faults, GFL inverters are vulnerable to transient instability, often leading to disconnection from the grid. This paper aims to elucidate the interaction mechanisms and define the stability boundaries of systems of two inverters, including GFL, grid-forming (GFM), or grid-supporting (GSP) inverters. First, the generalized large-signal expression for the two-inverter system under various inverter combinations is derived, revealing that no energy function exists for systems containing GFL inverters. This implies that the traditional direct method cannot be applied to such systems. To overcome these challenges, a manifold method is employed to precisely determine the domain of attraction (DOA) of the system, and the transient stability margin is assessed by a new metric termed the critical clearing radius (CCR). A case study of the two-inverter system under various inverter combinations is conducted to explore large-signal interactions across different scenarios. Manifold analysis and simulation results reveal that GSP inverters using PLL for grid synchronization exhibit behavior similar to GFM inverters when the droop coefficients in the terminal voltage control loop (TVC) are sufficiently large. Compared to GFL inverters, GSP inverters incorporating a TVC significantly enhances the transient stability of other inverters. In the STATCOM case, the optimal placement of the STATCOM, realized by GSP or GFM inverters, is identified to be at the midpoint of a transmission line. All findings in this paper are validated through electromagnetic transient (EMT) simulations
Analysis of kinematics of mechanisms containing revolute joints	2025-01-06	Show Kinematics of rigid bodies can be analyzed in many different ways. The advantage of using Euler parameters is that the resulting equations are polynomials and hence computational algebra, in particular Gr"obner bases, can be used to study them. The disadvantage of the Gr"obner basis methods is that the computational complexity grows quite fast in the worst case in the number of variables and the degree of polynomials. In the present article we show how to simplify computations when the mechanism contains revolute joints. The idea is based on the fact that the ideal representing the constraints of the revolute joint is not prime. Choosing the appropriate prime component reduces significantly the computational cost. We illustrate the method by applying it to the well known Bennett's and Bricard's mechanisms, but it can be applied to any mechanism which has revolute joints.
Unlocking FedNL: Self-Contained Compute-Optimized Implementation	2024-12-12	Show Federated Learning (FL) is an emerging paradigm that enables intelligent agents to collaboratively train Machine Learning (ML) models in a distributed manner, eliminating the need for sharing their local data. The recent work (arXiv:2106.02969) introduces a family of Federated Newton Learn (FedNL) algorithms, marking a significant step towards applying second-order methods to FL and large-scale optimization. However, the reference FedNL prototype exhibits three serious practical drawbacks: (i) It requires 4.8 hours to launch a single experiment in a sever-grade workstation; (ii) The prototype only simulates multi-node setting; (iii) Prototype integration into resource-constrained applications is challenging. To bridge the gap between theory and practice, we present a self-contained implementation of FedNL, FedNL-LS, FedNL-PP for single-node and multi-node settings. Our work resolves the aforementioned issues and reduces the wall clock time by x1000. With this FedNL outperforms alternatives for training logistic regression in a single-node -- CVXPY (arXiv:1603.00943), and in a multi-node -- Apache Spark (arXiv:1505.06807), Ray/Scikit-Learn (arXiv:1712.05889). Finally, we propose two practical-orientated compressors for FedNL - adaptive TopLEK and cache-aware RandSeqK, which fulfill the theory of FedNL.	55 pa... 55 pages, 12 figures, 12 tables
Strategic Bidding in the Frequency-Containment Ancillary Services Market	2024-12-12	Show The vast integration of non-synchronous renewable energy sources compromises power system stability, increasing vulnerability to frequency deviations due to the lack of inertia. Current efforts to decarbonise electricity grids while maintaining frequency security still rely on Ancillary Services (AS) provision, such as inertia and frequency response, from flexible synchronous generators, placing these type of units in an advantageous position in the AS market. However, in the ongoing transition to decarbonisation, not enough attention has been given to analysing market power in the frequency-containment AS market. This work presents a strategic bidding model designed to analyse market power in the coupled energy and frequency-containment AS market. Through a non-convex primal-dual bi-level formulation, we determine the interaction of a strategic market player with the rest of the market that behaves competitively. The case study is based on Great Britain in 2030, demonstrating the capacity of the strategic player to influence prices. While this impact is perceived in the energy market, it is particularly pronounced in the AS market.
Edge System Design Using Containers and Unikernels for IoT Applications	2024-12-08	Show Edge computing is emerging as a key enabler of low-latency, high-efficiency processing for the Internet of Things (IoT) and other real-time applications. To support these demands, containerization has gained traction in edge computing due to its lightweight virtualization and efficient resource management. However, there is currently no established framework to leverage both containers and unikernels on edge devices for optimized IoT deployments. This paper proposes a hybrid edge system design that leverages container and unikernel technologies to optimize resource utilization based on application complexity. Containers are employed for resource-intensive applications, e.g., computer vision, providing faster processing, flexibility, and ease of deployment. In contrast, unikernels are used for lightweight applications, offering enhanced resource performance with minimal overhead. Our system design also incorporates container orchestration to efficiently manage multiple instances across the edge efficiently, ensuring scalability and reliability. We demonstrate our hybrid approach's performance and efficiency advantages through real-world computer vision and data science applications on ARM-powered edge device. Our results demonstrate that this hybrid approach improves resource utilization and reduces latency compared to traditional virtualized solutions. This work provides insights into optimizing edge infrastructures, enabling more efficient and specialized deployment strategies for diverse application workloads.
Self-contained relaxation-based dynamical Ising machines	2024-12-04	Show Dynamical Ising machines are based on continuous dynamical systems evolving from a generic initial state to a state strongly related to the ground state of the classical Ising model on a graph. Reaching the ground state is equivalent to finding the maximum (weighted) cut of the graph, which presents the Ising machines as an alternative way to solving and investigating NP-complete problems. Among the dynamical models, relaxation-based models are distinguished by their relations with guarantees of performance achieved in time scaling polynomially with the problem size. However, the terminal states of such machines are essentially non-binary, necessitating special post-processing relying on disparate computing. We show that an Ising machine implementing a special continuous dynamical system (called the V${}_2$ model) solves the rounding problem dynamically. We prove that the V${}_2$ model, starting from an arbitrary non-binary state, terminates in a state that trivially rounds to a binary state with the cut at least as big as obtained by optimal rounding of the initial state. Besides showing that relaxation-based dynamical Ising machines can be made self-contained, this result presents a non-Boolean realization of solving a non-trivial information processing task on Ising machines. Moreover, we prove that if the initial state of the V${}_2$-machine is a random limited amplitude perturbation of a binary state, the machine progresses to a state with at least as high cut as that of the initial binary state. Since the probability of improving the cut is finite, this shows that the V${}_2$-machine with random agitations converges to a maximum cut state almost surely.	23 pages, 7 figures
Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm	2024-12-04	Show This paper addresses the optimization of container unloading and loading operations at ports, integrating quay-crane dual-cycling with dockyard rehandle minimization. We present a unified model encompassing both operations: ship container unloading and loading by quay crane, and the other is reducing dockyard rehandles while loading the ship. We recognize that optimizing one aspect in isolation can lead to suboptimal outcomes due to interdependencies. Specifically, optimizing unloading sequences for minimal operation time may inadvertently increase dockyard rehandles during loading and vice versa. To address this NP-hard problem, we propose a hybrid genetic algorithm (GA) QCDC-DR-GA comprising one-dimensional and two-dimensional GA components. Our model, QCDC-DR-GA, consistently outperforms four state-of-the-art methods in maximizing dual cycles and minimizing dockyard rehandles. Compared to those methods, it reduced 15-20% of total operation time for large vessels. Statistical validation through a two-tailed paired t-test confirms the superiority of QCDC-DR-GA at a 5% significance level. The approach effectively combines QCDC optimization with dockyard rehandle minimization, optimizing the total unloading-loading time. Results underscore the inefficiency of separately optimizing QCDC and dockyard rehandles. Fragmented approaches, such as QCDC Scheduling Optimized by bi-level GA and GA-ILSRS (Scenario 2), show limited improvement compared to QCDC-DR-GA. As in GA-ILSRS (Scenario 1), neglecting dual-cycle optimization leads to inferior performance than QCDC-DR-GA. This emphasizes the necessity of simultaneously considering both aspects for optimal resource utilization and overall operational efficiency.
Robust Model Predictive Control for Constrained Uncertain Systems Based on Concentric Container and Varying Tube	2024-12-04	Show This paper proposes a novel robust model predictive control (RMPC) method for the stabilization of constrained systems subject to additive disturbance (AD) and multiplicative disturbance (MD). Concentric containers are introduced to facilitate the characterization of MD, and varying tubes are constructed to bound reachable states. By restricting states and the corresponding inputs in containers with free sizes and a fixed shape, feasible MDs, which are the products of model uncertainty with states and inputs, are restricted into polytopes with free sizes. Then, tubes with different centers and shapes are constructed based on the nominal dynamics and the knowledge of AD and MD. The free sizes of containers allow for a more accurate characterization of MD, while the fixed shape reduces online computational burden, making the proposed method less conservative and computationally efficient. Moreover, the shape of containers is optimized to further reduce conservativeness. Compared to the RMPC method using homothetic tubes, the proposed method has a larger region of attraction while involving fewer decision variables and constraints in the online optimization problem.	13 pages, 6 figures
A refined graph container lemma and applications to the hard-core model on bipartite expanders	2024-11-23	Show We establish a refined version of a graph container lemma due to Galvin and discuss several applications related to the hard-core model on bipartite expander graphs. Given a graph $G$ and $\lambda>0$, the hard-core model on $G$ at activity $\lambda$ is the probability distribution $\mu_{G,\lambda}$ on independent sets in $G$ given by $\mu_{G,\lambda}(I)\propto \lambda^{	I
DCSim: Computing and Networking Integration based Container Scheduling Simulator for Data Centers	2024-11-21	Show The increasing prevalence of cloud-native technologies, particularly containers, has led to the widespread adoption of containerized deployments in data centers. The advancement of deep neural network models has increased the demand for container-based distributed model training and inference, where frequent data transmission among nodes has emerged as a significant performance bottleneck. However, traditional container scheduling simulators often overlook the influence of network modeling on the efficiency of container scheduling, primarily concentrating on modeling computational resources. In this paper, we focus on a container scheduling simulator based on collaboration between computing and networking within data centers. We propose a new container scheduling simulator for data centers, named DCSim. The simulator consists of several modules: a data center module, a network simulation module, a container scheduling module, a discrete event-driven module, and a data collection and analysis module. Together, these modules provide heterogeneous computing power modeling and dynamic network simulation capabilities. We design a discrete event model using SimPy to represent various aspects of container processing, including container requests, scheduling, execution, pauses, communication, migration, and termination within data centers. Among these, lightweight virtualization technology based on Mininet is employed to construct a software-defined network. An experimental environment for container scheduling simulation was established, and functional and performance tests were conducted on the simulator to validate its scheduling simulation capabilities.
TEEMATE: Fast and Efficient Confidential Container using Shared Enclave	2024-11-18	Show Confidential container is becoming increasingly popular as it meets both needs for efficient resource management by cloud providers, and data protection by cloud users. Specifically, confidential containers integrate the container and the enclave, aiming to inherit the design-wise advantages of both (i.e., resource management and data protection). However, current confidential containers suffer from large performance overheads caused by i) a larger startup latency due to the enclave creation, and ii) a larger memory footprint due to the non-shareable characteristics of enclave memory. This paper explores a design conundrum of confidential container, examining why the confidential containers impose such large performance overheads. Surprisingly, we found there is a universal misconception that an enclave can only be used by a single (containerized) process that created it. However, an enclave can be shared across multiple processes, because an enclave is merely a set of physical resources while the process is an abstraction constructed by the host kernel. To this end, we introduce TeeMate, a new approach to utilize the enclaves on the host system. Especially, TeeMate designs the primitives to i) share the enclave memory between processes, thus preserving memory abstraction, and ii) assign the threads in enclave between processes, thus preserving thread abstraction. We concretized TeeMate on Intel SGX, and implemented confidential serverless computing and confidential database on top of TeeMate based confidential containers. The evaluation clearly demonstrated the strong practical impact of TeeMate by achieving at least 4.5 times lower latency and 2.8 times lower memory usage compared to the applications built on the conventional confidential containers.
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry	2024-11-17	Show Comprehensively understanding surgical scenes in Surgical Visual Question Answering (Surgical VQA) requires reasoning over multiple objects. Previous approaches address this task using cross-modal fusion strategies to enhance reasoning ability. However, these methods often struggle with limited scene understanding and question comprehension, and some rely on external resources (e.g., pre-extracted object features), which can introduce errors and generalize poorly across diverse surgical environments. To address these challenges, we propose SCAN, a simple yet effective memory-augmented framework that leverages Multimodal LLMs to improve surgical context comprehension via Self-Contained Inquiry. SCAN operates autonomously, generating two types of memory for context augmentation: Direct Memory (DM), which provides multiple candidates (or hints) to the final answer, and Indirect Memory (IM), which consists of self-contained question-hint pairs to capture broader scene context. DM directly assists in answering the question, while IM enhances understanding of the surgical scene beyond the immediate query. Reasoning over these object-aware memories enables the model to accurately interpret images and respond to questions. Extensive experiments on three publicly available Surgical VQA datasets demonstrate that SCAN achieves state-of-the-art performance, offering improved accuracy and robustness across various surgical scenarios.
UAV survey coverage path planning of complex regions containing exclusion zones	2024-11-13	Show This article addresses the challenge of UAV survey coverage path planning for areas that are complex concave polygons, containing exclusion zones or obstacles. While standard drone path planners typically generate coverage paths for simple convex polygons, this study proposes a method to manage more intricate regions, including boundary splits, merges, and interior holes. To achieve this, polygonal decomposition techniques are used to partition the target area into convex sub-regions. The sub-polygons are then merged using a depth-first search algorithm, followed by the generation of continuous Boustrophedon paths based on connected components. Polygonal offset by the straight skeleton method was used to ensure a constant safe distance from the exclusion zones. This approach allows UAV path planning in environments with complex geometric constraints.
Segmentized quarantine policy for managing a tradeoff between containment of infectious disease and social cost of quarantine	2024-11-09	Show By the end of 2021, COVID-19 had spread to over 230 countries, with over 5.4 million deaths. To contain its spread, many countries implemented non-pharmaceutical interventions, notably contact tracing and self-quarantine policies. However, these measures came with significant social costs, highlighting the need for more sustainable approaches that minimize disruptions to economic and societal activities. This research explores a segmentized quarantine policy, applying different quarantine measures for various population segments to better balance the benefits and costs of containment. Different groups, like students versus working adults, have distinct societal activity patterns, posing varied risks for disease spread. We define segmentized quarantine policy across two dimensions-contact tracing range and quarantine period-and optimize these parameters for each segment to minimize total infection cases and quarantine days. Using an Agent-Based Epidemic Simulation and an Evolutionary Algorithm to derive the Pareto front, we demonstrate that segmentized policies can be more effective than uniform policies, with specific segments benefiting from tailored measures. The findings support segmentized quarantine as a viable, efficient, and sustainable approach, offering a valuable framework for public health policy in future pandemics.
Comparing Security and Efficiency of WebAssembly and Linux Containers in Kubernetes Cloud Computing	2024-11-02	Show This study investigates the potential of WebAssembly as a more secure and efficient alternative to Linux containers for executing untrusted code in cloud computing with Kubernetes. Specifically, it evaluates the security and performance implications of this shift. Security analyses demonstrate that both Linux containers and WebAssembly have attack surfaces when executing untrusted code, but WebAssembly presents a reduced attack surface due to an additional layer of isolation. The performance analysis further reveals that while WebAssembly introduces overhead, particularly in startup times, it could be negligible in long-running computations. However, WebAssembly enhances the core principle of containerization, offering better security through isolation and platform-agnostic portability compared to Linux containers. This research demonstrates that WebAssembly is not a silver bullet for all security concerns or performance requirements in a Kubernetes environment, but typical attacks are less likely to succeed and the performance loss is relatively small.
Frequency Control and Disturbance Containment Using Grid-Forming Embedded Storage Networks	2024-10-18	Show The paper discusses fast frequency control in bulk power systems using embedded networks of grid-forming energy storage resources. Differing from their traditional roles of regulating reserves, the storage resources in this work operate as fast-acting grid assets shaping transient dynamics. The storage resources in the network are autonomously controlled using local measurements for distributed frequency support during disturbance events. Further, the grid-forming inverter systems interfacing with the storage resources, are augmented with fast-acting safety controls designed to contain frequency transients within a prescribed tolerance band. The control action, derived from the storage network, improves the frequency nadirs in the system and prevents the severity of a disturbance from propagating far from the source. The paper also presents sensitivity studies to evaluate the impacts of storage capacity and inverter controller parameters on the dynamic performance of frequency control and disturbance localization. The performance of the safety-constrained grid-forming control is also compared with the more common grid-following control. The results are illustrated through case studies on an IEEE test system.	accep... accepted at the IEEE PES Electrical Energy Storage Applications and Technologies Conference (EESAT)
Prediction-driven resource provisioning for serverless container runtimes	2024-10-09	Show In recent years Serverless Computing has emerged as a compelling cloud based model for the development of a wide range of data-intensive applications. However, rapid container provisioning introduces non-trivial challenges for FaaS cloud providers, as (i) real-world FaaS workloads may exhibit highly dynamic request patterns, (ii) applications have service-level objectives (SLOs) that must be met, and (iii) container provisioning can be a costly process. In this paper, we present SLOPE, a prediction framework for serverless FaaS platforms to address the aforementioned challenges. Specifically, it trains a neural network model that utilizes knowledge from past runs in order to estimate the number of instances required to satisfy the invocation rate requirements of the serverless applications. In cases that a priori knowledge is not available, SLOPE makes predictions using a graph edit distance approach to capture the similarities among serverless applications. Our experimental results illustrate the efficiency and benefits of our approach, which can reduce the operating costs by 66.25% on average.	6 pag... 6 pages. arXiv admin note: substantial text overlap with arXiv:2410.18106
Who should pay for frequency-containment ancillary services? Making responsible units bear the cost to shape investment in generation and loads	2024-10-07	Show While the operating cost of electricity grids based on thermal generation was largely driven by the cost of fuel, as renewable penetration increases, ancillary services represent an increasingly large proportion of the running costs. Electric frequency is an important magnitude in highly renewable grids, as it becomes more volatile and therefore the cost related to maintaining it within safe bounds has significantly increased. So far, costs for frequency-containment ancillary services have been socialised in most countries, but it has become relevant to rethink this regulatory arrangement. In this paper, we discuss the issue of cost allocation for these services, highlighting the need to evolve towards a causation-based regulatory framework. We argue that parties responsible for creating the need for ancillary services should bear these costs. However, this would imply an important change in electricity market policy, therefore it is necessary to understand the impact on current and future investments on generation, as well as on electricity tariffs. Here we provide a mostly qualitative analysis of this issue, defining guidelines for practical implementation and further study.	Publi... Published in journal Energy Policy

Name		Name	Last commit message	Last commit date
Latest commit History 368 Commits
.github		.github
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Daily Papers

Confidential Computing

Serverless

Container

About

Uh oh!

Releases

Packages

Languages

Xander-C/DailyArXiv

Folders and files

Latest commit

History

Repository files navigation

Daily Papers

Confidential Computing

Serverless

Container

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages