Acceleration of Multi-Body Molecular Dynamics With Customized Parallel Dataflow
- Quan Deng,
- Qiang Liu,
- Ming Yuan,
- Xiaohui Duan,
- Lin Gan,
- Jinzhe Yang,
- Wenlai Zhao,
- Zhenxiang Zhang,
- Guiming Wu,
- Wayne Luk,
- Haohuan Fu,
- Guangwen Yang
FPGAs are drawing increasing attention in resolving molecular dynamics (MD) problems, and have already been applied in problems such as two-body potentials, force fields composed of these potentials, etc. Competitive performance is obtained compared with ...
Optimizing I/O Performance Through Effective vCPU Scheduling Interference Management
Virtual machines (VMs) heavily rely on virtual CPUs (vCPUs) scheduling to achieve efficient I/O performance. The vCPU scheduling interference can cause inconsistent scheduling latency and degraded I/O performance, potentially compromising ...
Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning
As a new computing paradigm, multi-data center computing enables service providers to deploy their applications close to the users. However, due to the spatio-temporal changes in workloads, it is challenging to coordinate multiple distributed data centers ...
Fair Coflow Scheduling via Controlled Slowdown
- Francesco De Pellegrini,
- Vaibhav Kumar Gupta,
- Rachid El Azouzi,
- Serigne Gueye,
- Cedric Richier,
- Jeremie Leguay
The average coflow completion time (CCT) is the standard performance metric in coflow scheduling. However, standard CCT minimization may introduce unfairness between the data transfer phase of different computing jobs. Thus, while progress guarantees have ...
GeoDeploy: Geo-Distributed Application Deployment Using Benchmarking
- Devki Nandan Jha,
- Yinhao Li,
- Zhenyu Wen,
- Graham Morgan,
- Prem Prakash Jayaraman,
- Maciej Koutny,
- Omer F. Rana,
- Rajiv Ranjan
Geo-distributed web-applications (GWA) can be deployed across multiple geographically separated datacenters to reduce the latency of access for users. Finding a suitable deployment for a GWA is challenging due to the requirement to consider a number of ...
Efficient Schedule Construction for Distributed Execution of Large DNN Models
Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices ...
Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach
With the rapid development of artificial intelligence (AI) and the Internet of Things (IoT), intelligent information services have showcased unprecedented capabilities in acquiring and analysing information. The conventional task processing platforms rely ...
VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision Tasks
The emergence of diverse machine learning (ML) models has led to groundbreaking revolutions in computer vision (CV). These ML models include convolutional neural networks (CNNs), graph neural networks (GNNs), and vision transformers (ViTs). However, ...
FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs
Sparse Matrix-Vector Multiplication (SpMV) on GPUs has gained significant attention because of SpMV's importance in modern applications and the increasing computing power of GPUs in the last decade. Previous studies have emphasized the importance ...
BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration
Sparse tensor contraction (SpTC) is an important operator in tensor networks, which tends to generate a large amount of sparse high-dimensional data, placing higher demands on the computational performance and storage bandwidth of the processor. Using ...
Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery Network
As the demand for faster and more reliable content delivery escalates, Content Delivery Networks (CDNs) face significant challenges in managing content placement across their increasingly complex, multi-tiered structures to balance performance, complexity,...
A Survey on Performance Modeling and Prediction for Distributed DNN Training
The recent breakthroughs in large-scale DNN attract significant attention from both academia and industry toward distributed DNN training techniques. Due to the time-consuming and expensive execution process of large-scale distributed DNN training, it is ...
TrieKV: A High-Performance Key-Value Store Design With Memory as Its First-Class Citizen
Key-value (KV) stores based on log-structured merge tree (LSM-tree) have been extensively studied and deployed in major information technology infrastructures. Because this type of systems is catered for KV store accessing disks, a limited disk bandwidth ...
Mitosis: A Scalable Sharding System Featuring Multiple Dynamic Relay Chains
Sharding is a prevalent approach for addressing performance issues in blockchain. To reduce governance complexities and ensure system security, a common practice involves a relay chain to coordinate cross-shard transactions. However, with a growing number ...
Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting
Federated Learning (FL) enables multiple devices to collaboratively train a shared model while preserving data privacy. Ever-increasing model complexity coupled with limited memory resources on the participating devices severely bottlenecks the deployment ...
TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming Graphs
Temporal graphs change with time and have a lifespan associated with each vertex and edge. These graphs are suitable to process time-respecting algorithms where the traversed edges must have monotonic timestamps. Interval-centric Computing Model (ICM) is ...
MoltDB: Accelerating Blockchain via Ancient State Segregation
Blockchain store states in Log-Structured Merge (LSM) tree-based database. Due to blockchain traceability, the growing ancient states are inevitably stored in the databases. Unfortunately, by default, this process mixes <italic>current</italic> and <...
Efficient Distributed Edge Computing for Dependent Delay-Sensitive Tasks in Multi-Operator Multi-Access Networks
We study the problem of distributed computing in the <italic>multi-operator multi-access edge computing</italic> (MEC) network for <italic>dependent tasks</italic>. Every task comprises several <italic>sub-tasks</italic> which are executed based on ...
PeakFS: An Ultra-High Performance Parallel File System via Computing-Network-Storage Co-Optimization for HPC Applications
Emerging high-performance computing (HPC) applications with diverse workload characteristics impose greater demands on parallel file systems (PFSs). PFSs also require more efficient software designs to fully utilize the performance of modern hardware, ...
Design and Performance Evaluation of Linearly Extensible Cube-Triangle Network for Multicore Systems
High-performance interconnection networks are currently being used to design Massively Parallel Computers. Selecting the set of nodes on which parallel tasks execute plays a vital role in the performance of such systems. These networks when deployed to ...
HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems
With the ever-increasing demand for higher I/O performance and reliability in data-intensive applications, <italic>solid-state drives</italic> (SSDs) typically configured as <italic>redundant array of independent disks</italic> (RAID) are broadly used in ...
DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV
Sparse matrix-vector multiplication (SpMV) is crucial in many scientific and engineering applications, particularly concerning the effectiveness of different sparse matrix storage formats for various architectures, no single format excels across all ...