TPDS: Vol 35, No 12

Volume 35, Issue 12Dec. 2024

Volume 35, Issue 12

Dec. 2024

Publisher:

IEEE Press

ISSN:1045-9219

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Acceleration of Multi-Body Molecular Dynamics With Customized Parallel Dataflow

Pages 2297–2314https://doi.org/10.1109/TPDS.2024.3420441

FPGAs are drawing increasing attention in resolving molecular dynamics (MD) problems, and have already been applied in problems such as two-body potentials, force fields composed of these potentials, etc. Competitive performance is obtained compared with ...

research-article

Optimizing I/O Performance Through Effective vCPU Scheduling Interference Management

Pages 2315–2330https://doi.org/10.1109/TPDS.2023.3329298

Virtual machines (VMs) heavily rely on virtual CPUs (vCPUs) scheduling to achieve efficient I/O performance. The vCPU scheduling interference can cause inconsistent scheduling latency and degraded I/O performance, potentially compromising ...

research-article

Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning

Pages 2331–2346https://doi.org/10.1109/TPDS.2024.3467212

As a new computing paradigm, multi-data center computing enables service providers to deploy their applications close to the users. However, due to the spatio-temporal changes in workloads, it is challenging to coordinate multiple distributed data centers ...

research-article

Fair Coflow Scheduling via Controlled Slowdown

Pages 2347–2360https://doi.org/10.1109/TPDS.2024.3446188

The average coflow completion time (CCT) is the standard performance metric in coflow scheduling. However, standard CCT minimization may introduce unfairness between the data transfer phase of different computing jobs. Thus, while progress guarantees have ...

research-article

GeoDeploy: Geo-Distributed Application Deployment Using Benchmarking

Pages 2361–2374https://doi.org/10.1109/TPDS.2024.3470532

Geo-distributed web-applications (GWA) can be deployed across multiple geographically separated datacenters to reduce the latency of access for users. Finding a suitable deployment for a GWA is challenging due to the requirement to consider a number of ...

research-article

Efficient Schedule Construction for Distributed Execution of Large DNN Models

Pages 2375–2391https://doi.org/10.1109/TPDS.2024.3466913

Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices ...

research-article

Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach

Pages 2392–2404https://doi.org/10.1109/TPDS.2024.3469545

With the rapid development of artificial intelligence (AI) and the Internet of Things (IoT), intelligent information services have showcased unprecedented capabilities in acquiring and analysing information. The conventional task processing platforms rely ...

research-article

VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision Tasks

Pages 2405–2422https://doi.org/10.1109/TPDS.2024.3466891

The emergence of diverse machine learning (ML) models has led to groundbreaking revolutions in computer vision (CV). These ML models include convolutional neural networks (CNNs), graph neural networks (GNNs), and vision transformers (ViTs). However, ...

research-article

FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs

Pages 2423–2434https://doi.org/10.1109/TPDS.2024.3477431

Sparse Matrix-Vector Multiplication (SpMV) on GPUs has gained significant attention because of SpMV's importance in modern applications and the increasing computing power of GPUs in the last decade. Previous studies have emphasized the importance ...

research-article

BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration

Pages 2435–2448https://doi.org/10.1109/TPDS.2024.3477746

Sparse tensor contraction (SpTC) is an important operator in tensor networks, which tends to generate a large amount of sparse high-dimensional data, placing higher demands on the computational performance and storage bandwidth of the processor. Using ...

research-article

Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery Network

Pages 2449–2462https://doi.org/10.1109/TPDS.2024.3475412

As the demand for faster and more reliable content delivery escalates, Content Delivery Networks (CDNs) face significant challenges in managing content placement across their increasingly complex, multi-tiered structures to balance performance, complexity,...

research-article

Open Access

A Survey on Performance Modeling and Prediction for Distributed DNN Training

Pages 2463–2478https://doi.org/10.1109/TPDS.2024.3476390

The recent breakthroughs in large-scale DNN attract significant attention from both academia and industry toward distributed DNN training techniques. Due to the time-consuming and expensive execution process of large-scale distributed DNN training, it is ...

research-article

TrieKV: A High-Performance Key-Value Store Design With Memory as Its First-Class Citizen

Pages 2479–2496https://doi.org/10.1109/TPDS.2024.3473013

Key-value (KV) stores based on log-structured merge tree (LSM-tree) have been extensively studied and deployed in major information technology infrastructures. Because this type of systems is catered for KV store accessing disks, a limited disk bandwidth ...

research-article

Open Access

Mitosis: A Scalable Sharding System Featuring Multiple Dynamic Relay Chains

Pages 2497–2512https://doi.org/10.1109/TPDS.2024.3480223

Sharding is a prevalent approach for addressing performance issues in blockchain. To reduce governance complexities and ensure system security, a common practice involves a relay chain to coordinate cross-shard transactions. However, with a growing number ...

research-article

Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting

Pages 2513–2526https://doi.org/10.1109/TPDS.2024.3480115

Federated Learning (FL) enables multiple devices to collaboratively train a shared model while preserving data privacy. Ever-increasing model complexity coupled with limited memory resources on the participating devices severely bottlenecks the deployment ...

research-article

TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming Graphs

Pages 2527–2544https://doi.org/10.1109/TPDS.2024.3471574

Temporal graphs change with time and have a lifespan associated with each vertex and edge. These graphs are suitable to process time-respecting algorithms where the traversed edges must have monotonic timestamps. Interval-centric Computing Model (ICM) is ...

research-article

MoltDB: Accelerating Blockchain via Ancient State Segregation

Pages 2545–2558https://doi.org/10.1109/TPDS.2024.3467927

Blockchain store states in Log-Structured Merge (LSM) tree-based database. Due to blockchain traceability, the growing ancient states are inevitably stored in the databases. Unfortunately, by default, this process mixes <italic>current</italic> and <...

research-article

Efficient Distributed Edge Computing for Dependent Delay-Sensitive Tasks in Multi-Operator Multi-Access Networks

Pages 2559–2577https://doi.org/10.1109/TPDS.2024.3468892

We study the problem of distributed computing in the <italic>multi-operator multi-access edge computing</italic> (MEC) network for <italic>dependent tasks</italic>. Every task comprises several <italic>sub-tasks</italic> which are executed based on ...

research-article

PeakFS: An Ultra-High Performance Parallel File System via Computing-Network-Storage Co-Optimization for HPC Applications

Pages 2578–2595https://doi.org/10.1109/TPDS.2024.3485754

Emerging high-performance computing (HPC) applications with diverse workload characteristics impose greater demands on parallel file systems (PFSs). PFSs also require more efficient software designs to fully utilize the performance of modern hardware, ...

research-article

Design and Performance Evaluation of Linearly Extensible Cube-Triangle Network for Multicore Systems

Pages 2596–2607https://doi.org/10.1109/TPDS.2024.3486219

High-performance interconnection networks are currently being used to design Massively Parallel Computers. Selecting the set of nodes on which parallel tasks execute plays a vital role in the performance of such systems. These networks when deployed to ...

research-article

HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems

Pages 2608–2623https://doi.org/10.1109/TPDS.2024.3429336

With the ever-increasing demand for higher I/O performance and reliability in data-intensive applications, <italic>solid-state drives</italic> (SSDs) typically configured as <italic>redundant array of independent disks</italic> (RAID) are broadly used in ...

research-article

DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV

Pages 2624–2639https://doi.org/10.1109/TPDS.2024.3488053

Sparse matrix-vector multiplication (SpMV) is crucial in many scientific and engineering applications, particularly concerning the effectiveness of different sparse matrix storage formats for various architectures, no single format excels across all ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

IEEE Transactions on Parallel and Distributed Systems

Sections

Acceleration of Multi-Body Molecular Dynamics With Customized Parallel Dataflow

Optimizing I/O Performance Through Effective vCPU Scheduling Interference Management

Two-Timescale Joint Optimization of Task Scheduling and Resource Scaling in Multi-Data Center System Based on Multi-Agent Deep Reinforcement Learning

Fair Coflow Scheduling via Controlled Slowdown

GeoDeploy: Geo-Distributed Application Deployment Using Benchmarking

Efficient Schedule Construction for Distributed Execution of Large DNN Models

Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization Approach

VisionAGILE: A Versatile Domain-Specific Accelerator for Computer Vision Tasks

FastLoad: Speeding Up Data Loading of Both Sparse Matrix and Vector for SpMV on GPUs

BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core Acceleration

Competitive Analysis of Online Elastic Caching of Transient Data in Multi-Tiered Content Delivery Network

A Survey on Performance Modeling and Prediction for Distributed DNN Training

TrieKV: A High-Performance Key-Value Store Design With Memory as Its First-Class Citizen

Mitosis: A Scalable Sharding System Featuring Multiple Dynamic Relay Chains

Breaking the Memory Wall for Heterogeneous Federated Learning via Model Splitting

TARIS: Scalable Incremental Processing of Time-Respecting Algorithms on Streaming Graphs

MoltDB: Accelerating Blockchain via Ancient State Segregation

Efficient Distributed Edge Computing for Dependent Delay-Sensitive Tasks in Multi-Operator Multi-Access Networks

PeakFS: An Ultra-High Performance Parallel File System via Computing-Network-Storage Co-Optimization for HPC Applications

Design and Performance Evaluation of Linearly Extensible Cube-Triangle Network for Multicore Systems

HybRAID: A High-Performance Hybrid RAID Storage Architecture for Write-Intensive Applications in All-Flash Storage Systems

DyLaClass: Dynamic Labeling Based Classification for Optimal Sparse Matrix Format Selection in Accelerating SpMV