- Sponsor:
- sigops
No abstract available.
WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations
- Kezhao Huang,
- Jidong Zhai,
- Liyan Zheng,
- Haojie Wang,
- Yuyang Jin,
- Qihao Zhang,
- Runqing Zhang,
- Zhen Zheng,
- Youngmin Yi,
- Xipeng Shen
Graph Neural Network (GNN) has emerged as an important workload for learning on graphs. With the size of graph data and the complexity of GNN model architectures increasing, developing an efficient GNN system grows more important. As GNN has heavy neural ...
Core Graph: Exploiting Edge Centrality to Speedup the Evaluation of Iterative Graph Queries
When evaluating an iterative graph query over a large graph, systems incur significant overheads due to repeated graph transfer across the memory hierarchy coupled with repeated (redundant) propagation of values over the edges in the graph. An approach ...
LSGraph: A Locality-centric High-performance Streaming Graph Engine
Streaming graph has been broadly employed across various application domains. It involves updating edges to the graph and then performing analytics on the updated graph. However, existing solutions either suffer from poor data locality and high ...
Contigra: Graph Mining with Containment Constraints
While graph mining systems employ efficient task-parallel strategies to quickly explore subgraphs of interest (or matches), they remain oblivious to containment constraints like maximality and minimality, resulting in expensive constraint checking on ...
Halflife: An Adaptive Flowlet-based Load Balancer with Fading Timeout in Data Center Networks
- Sen Liu,
- Yongbo Gao,
- Zixuan Chen,
- Jiarui Ye,
- Haiyang Xu,
- Furong Liang,
- Wei Yan,
- Zerui Tian,
- Quanwei Sun,
- Zehua Guo,
- Yang Xu
Modern data centers (DCs) employ various traffic load balancers to achieve high bisection bandwidth. Among them, flowlet switching has shown remarkable performance in both load balancing and upper-layer protocol (e.g., TCP) friendliness. However, flowlet-...
Hoda: a High-performance Open vSwitch Dataplane with Multiple Specialized Data Paths
Open vSwitch (OvS) has been widely used in cloud networks in view of its programmability and flexibility. However, we observe a huge performance drop when it loads practical cloud networking services (e.g., tunneling and firewalling). Our further ...
Astraea: Towards Fair and Efficient Learning-based Congestion Control
Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including fairness, ...
Unison: A Parallel-Efficient and User-Transparent Network Simulation Kernel
- Songyuan Bai,
- Hao Zheng,
- Chen Tian,
- Xiaoliang Wang,
- Chang Liu,
- Xin Jin,
- Fu Xiao,
- Qiao Xiang,
- Wanchun Dou,
- Guihai Chen
Discrete-event simulation (DES) is a prevalent tool for evaluating network designs. Although DES offers full fidelity and generality, its slow performance limits its application. To speed up DES, many network simulators employ parallel discrete-event ...
Serialization/Deserialization-free State Transfer in Serverless Workflows
Serialization and deserialization play a dominant role in the state transfer time of serverless workflows, leading to substantial performance penalties during workflow execution. We identify the key reason as a lack of ability to efficiently access the (...
Occam: A Programming System for Reliable Network Management
The complexity of large networks makes their management a daunting task. State-of-the-art network management tools use workflow systems for automation, but they do not adequately address the substantial challenges in operation reliability. This paper ...
Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation
Many parallel mechanisms, including data parallelism, tensor parallelism, and pipeline parallelism, have been proposed and combined together to support training increasingly large deep neural networks (DNN) on massive GPU devices. Given a DNN model and ...
Totoro: A Scalable Federated Learning Engine for the Edge
Federated Learning (FL) is an emerging distributed machine learning (ML) technique that enables in-situ model training and inference on decentralized edge devices. We propose Totoro, a novel scalable FL engine, that enables massive FL applications to run ...
FLOAT: Federated Learning Optimizations with Automated Tuning
Federated Learning (FL) has emerged as a powerful approach that enables collaborative distributed model training without the need for data sharing. However, FL grapples with inherent heterogeneity challenges leading to issues such as stragglers, dropouts,...
DeTA: Minimizing Data Leaks in Federated Learning via Decentralized and Trustworthy Aggregation
Federated learning (FL) relies on a central authority to oversee and aggregate model updates contributed by multiple participating parties in the training process. This centralization of sensitive model updates naturally raises concerns about the ...
ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling
In recent years, large-scale models can be easily scaled to trillions of parameters with sparsely activated mixture-of-experts (MoE), which significantly improves the model quality while only requiring a sub-linear increase in computational costs. ...
Dashing and Star: Byzantine Fault Tolerance with Weak Certificates
State-of-the-art Byzantine fault-tolerant (BFT) protocols assuming partial synchrony such as SBFT and HotStuff use regular certificates obtained from 2f + 1 (partial) signatures. We show that one can use weak certificates obtained from only f + 1 ...
Bandle: Asynchronous State Machine Replication Made Efficient
State machine replication (SMR) uses consensus as its core component for reaching agreement among a group of processes, in order to provide fault-tolerant services. Most SMR protocols, such as Paxos and Raft, are designed in the partial synchrony model. ...
Characterization and Reclamation of Frozen Garbage in Managed FaaS Workloads
FaaS (function-as-a-service) is becoming a popular workload in cloud environments due to its virtues such as auto-scaling and pay-as-you-go. High-level languages like JavaScript and Java are commonly used in FaaS for programmability, but their managed ...
Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts
Serverless computing allows developers to deploy and scale stateless functions in ephemeral workers easily. As a result, serverless computing has been widely used for many applications, such as computer vision, video processing, and HTML generation. ...
Improving Resource and Energy Efficiency for Cloud 3D through Excessive Rendering Reduction
The rise of cloud gaming makes interactive 3D applications an emerging type of data center workload. However, the excessive rendering in current cloud 3D systems leads to large gaps between the cloud and client frame rates (FPS, frames per second), thus ...
Draconis: Network-Accelerated Scheduling for Microsecond-Scale Workloads
We present Draconis, a novel scheduler for workloads in the range of tens to hundreds of microseconds. Draconis challenges the popular belief that programmable switches cannot house the complex data structures, such as queues, needed to support an in-...
Snatch: Online Streaming Analytics at the Network Edge
In recent years, we have witnessed a growing trend of content hyper-giants deploying server infrastructure and services close to end-users, in "eyeball" networks. Still, one of the services that remained largely unaffected by this trend is online ...
Blaze: Holistic Caching for Iterative Data Processing
Modern data processing workloads, such as machine learning and graph processing, involve iterative computations to converge generated models into higher accuracy. An effective caching mechanism is vital to expedite iterative computations since the ...
TTLs Matter: Efficient Cache Sizing with TTL-Aware Miss Ratio Curves and Working Set Sizes
In-memory caches play a pivotal role in optimizing distributed systems by significantly reducing query response times. Correctly sizing these caches is critical, especially considering that prominent organizations use terabytes and even petabytes of DRAM ...
Trinity: A Fast Compressed Multi-attribute Data Store
With the proliferation of attribute-rich machine-generated data, emerging real-time monitoring, diagnosis, and visualization tools ingest and analyze such data across multiple attributes simultaneously. Due to the sheer volume of the data, applications ...
FLOWS: Balanced MRC Profiling for Heterogeneous Object-Size Cache
While Miss Ratio Curve (MRC) profiling methods based on spatial sampling are effective in modeling cache behaviors, previous MRC studies lack in-depth analysis of profiling errors and primarily target homogeneous object-size scenarios. This has caused ...
CCL-BTree: A Crash-Consistent Locality-Aware B+-Tree for Reducing XPBuffer-Induced Write Amplification in Persistent Memory
In persistent B+ -Tree, random updates of small key-value (KV) pairs will cause severe XPBuffer-induced write amplification (XBI-amplification) because CPU cacheline size is smaller than media access granularity in persistent memory (PM). We observe that ...
Wormhole Filters: Caching Your Hash on Persistent Memory
- Hancheng Wang,
- Haipeng Dai,
- Rong Gu,
- Youyou Lu,
- Jiaqi Zheng,
- Jingsong Dai,
- Shusen Chen,
- Zhiyuan Chen,
- Shuaituan Li,
- Guihai Chen
Approximate membership query (AMQ) data structures can approximately determine whether an element is in the set with high efficiency. They are widely used in distributed systems, database systems, bioinformatics, IoT applications, data stream mining, ...
Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy
Federated learning (FL) is increasingly deployed among multiple clients to train a shared model over decentralized data. To address privacy concerns, FL systems need to safeguard the clients' data from disclosure during training and control data leakage ...
Accelerating Privacy-Preserving Machine Learning With GeniBatch
Cross-silo privacy-preserving machine learning (PPML) adopt; Partial Homomorphic Encryption (PHE) for secure data combination and high-quality model training across multiple organizations (e.g., medical and financial). However, PHE introduces significant ...
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
EuroSys '21 | 181 | 38 | 21% |
EuroSys '20 | 234 | 43 | 18% |
EuroSys '18 | 262 | 43 | 16% |
EuroSys '16 | 180 | 38 | 21% |
EuroSys '14 | 147 | 27 | 18% |
EuroSys '13 | 143 | 28 | 20% |
EuroSys '11 | 161 | 24 | 15% |
Overall | 1,308 | 241 | 18% |