TACO: Vol 21, No 3

Volume 21, Issue 3September 2024

Volume 21, Issue 3

September 2024

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

PDF eReader

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Cross-core Data Sharing for Energy-efficient GPUs

Article No.: 42, Pages 1–32https://doi.org/10.1145/3653019

Graphics Processing Units (GPUs) are the accelerator of choice in a variety of application domains, because they can accelerate massively parallel workloads and can be easily programmed using general-purpose programming frameworks such as CUDA and OpenCL. ...

research-article

Open Access

ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors

Article No.: 43, Pages 1–24https://doi.org/10.1145/3653363

Systolic array architecture has significantly accelerated deep neural networks (DNNs). A systolic array comprises multiple processing elements (PEs) that can perform multiply-accumulate (MAC). Traditionally, the systolic array can execute a certain amount ...

research-article

Open Access

An Example of Parallel Merkle Tree Traversal: Post-Quantum Leighton-Micali Signature on the GPU

Article No.: 44, Pages 1–25https://doi.org/10.1145/3659209

The hash-based signature (HBS) is the most conservative and time-consuming among many post-quantum cryptography (PQC) algorithms. Two HBSs, LMS and XMSS, are the only PQC algorithms standardised by the National Institute of Standards and Technology (NIST) ...

research-article

Open Access

Knowledge-Augmented Mutation-Based Bug Localization for Hardware Design Code

Article No.: 45, Pages 1–26https://doi.org/10.1145/3660526

Verification of hardware design code is crucial for the quality assurance of hardware products. Being an indispensable part of verification, localizing bugs in the hardware design code is significant for hardware development but is often regarded as a ...

research-article

Open Access

D²Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage

Article No.: 46, Pages 1–22https://doi.org/10.1145/3656584

LSM-based key-value stores suffer from sub-optimal performance due to their slow and heavy background compactions. The compaction brings severe CPU and network overhead on high-speed disaggregated storage. This article further reveals that data-intensive ...

research-article

Open Access

iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments

Article No.: 47, Pages 1–24https://doi.org/10.1145/3653302

This article proposes iSwap, a new memory page swap mechanism that reduces the ineffective I/O swap operations and improves the QoS for applications with a high priority in cloud environments. iSwap works in the OS kernel. iSwap accurately learns the ...

research-article

Open Access

GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core Systems

Article No.: 48, Pages 1–25https://doi.org/10.1145/3661998

With the explosive growth of graph data, distributed graph processing has become popular, and many graph hardware accelerators use distributed frameworks. Graph partitioning is foundation in distributed graph processing. However, dynamic changes in graph ...

research-article

Open Access

COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign

Article No.: 49, Pages 1–26https://doi.org/10.1145/3660525

RDMA (Remote Direct Memory Access) networks require efficient congestion control to maintain their high throughput and low latency characteristics. However, congestion control protocols deployed at the software layer suffer from slow response times due to ...

research-article

Open Access

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

Article No.: 50, Pages 1–23https://doi.org/10.1145/3659207

The increasing demand for computing power and the emergence of heterogeneous computing architectures have driven the exploration of innovative techniques to address current limitations in both the compute and memory subsystems. One such solution is the ...

research-article

Open Access

CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature Scaling

Article No.: 51, Pages 1–27https://doi.org/10.1145/3664925

For datacenter architects, it is the most important goal to minimize the datacenter’s total cost of ownership for the target performance (i.e., TCO/performance). As the major component of a datacenter is a server farm, the most effective way of reducing ...

research-article

Open Access

Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star Networks

Article No.: 52, Pages 1–24https://doi.org/10.1145/3664926

More and more storage systems use erasure code to tolerate faults. It takes pieces of data blocks as input and encodes a small number of parity blocks as output, where these blocks form a stripe. When reconsidering the recovery problem in the multi-stripe ...

research-article

Open Access

Fixed-point Encoding and Architecture Exploration for Residue Number Systems

Article No.: 53, Pages 1–27https://doi.org/10.1145/3664923

Residue Number Systems (RNS) demonstrate the fascinating potential to serve integer addition/ multiplication-intensive applications. The complexity of Artificial Intelligence (AI) models has grown enormously in recent years. From a computer system’s ...

research-article

Open Access

Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs

Article No.: 54, Pages 1–27https://doi.org/10.1145/3664924

AMG is one of the most efficient and widely used methods for solving sparse linear systems. The computational process of AMG mainly consists of a series of iterative calculations of generalized sparse matrix-matrix multiplication (SpGEMM) and sparse ...

research-article

Open Access

Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

Article No.: 55, Pages 1–28https://doi.org/10.1145/3663479

The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because its access ...

research-article

Open Access

SAL: Optimizing the Dataflow of Spin-based Architectures for Lightweight Neural Networks

Article No.: 56, Pages 1–27https://doi.org/10.1145/3673654

As the Convolutional Neural Network (CNN) goes deeper and more complex, the network becomes memory-intensive and computation-intensive. To address this issue, the lightweight neural network reduces parameters and Multiplication-and-Accumulation (MAC) ...

research-article

Open Access

Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated Memory

Article No.: 57, Pages 1–26https://doi.org/10.1145/3666004

Disaggregated memory separates compute and memory resources into independent pools connected by RDMA (Remote Direct Memory Access) networks, which can improve memory utilization, reduce cost, and enable elastic scaling of compute and memory resources. ...

research-article

Open Access

Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job Colocation

Article No.: 58, Pages 1–23https://doi.org/10.1145/3674736

Workload consolidation is a widely used approach to enhance resource utilization in modern data centers. However, the concurrent execution of multiple jobs on a shared server introduces contention for essential shared resources such as CPU cores, Last ...

research-article

Open Access

Achieving Tunable Erasure Coding with Cluster-Aware Redundancy Transitioning

Article No.: 59, Pages 1–24https://doi.org/10.1145/3672077

Erasure coding has been demonstrated as a storage-efficient means against failures, yet its tunability remains a challenging issue in data centers, which is prone to induce substantial cross-cluster traffic. In this article, we present ClusterRT, a ...

research-article

Open Access

Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture

Article No.: 60, Pages 1–29https://doi.org/10.1145/3673653

Modern computing systems access data in main memory at coarse granularity (e.g., at 512-bit cache block granularity). Coarse-grained access leads to wasted energy because the system does not use all individually accessed small portions (e.g., words, each ...

research-article

Open Access

ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error Detection

Article No.: 61, Pages 1–26https://doi.org/10.1145/3674909

To satisfy prohibitively massive computational requirements of current deep Convolutional Neural Networks (CNNs), CNN-specific accelerators are widely deployed in large-scale systems. Caused by high-energy neutrons and α-particle strikes, soft error may ...

research-article

Open Access

Characterizing and Optimizing LDPC Performance on 3D NAND Flash Memories

Article No.: 62, Pages 1–26https://doi.org/10.1145/3663478

With the development of NAND flash memories’ bit density and stacking technologies, while storage capacity keeps increasing, the issue of reliability becomes increasingly prominent. Low-density parity check (LDPC) code, as a robust error-correcting code, ...

research-article

Open Access

ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators

Article No.: 63, Pages 1–26https://doi.org/10.1145/3659208

ReRAM-based Processing-In-Memory (PIM) architectures have been increasingly explored to accelerate various Deep Neural Network (DNN) applications because they can achieve extremely high performance and energy-efficiency for in-situ analog Matrix-Vector ...

research-article

Open Access

Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data Purification

Article No.: 64, Pages 1–25https://doi.org/10.1145/3678009

The verification of hardware design code is a critical aspect in ensuring the quality and reliability of hardware products. Finding bugs in hardware design code is important for hardware development and is frequently considered as a notoriously ...

Subjects

Comments

Please enable JavaScript to view thecomments powered by Disqus.

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Cross-core Data Sharing for Energy-efficient GPUs

ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors

An Example of Parallel Merkle Tree Traversal: Post-Quantum Leighton-Micali Signature on the GPU

Knowledge-Augmented Mutation-Based Bug Localization for Hardware Design Code

D²Comp: Efficient Offload of LSM-tree Compaction with Data Processing Units on Disaggregated Storage

iSwap: A New Memory Page Swap Mechanism for Reducing Ineffective I/O Operations in Cloud Environments

GraphSER: Distance-Aware Stream-Based Edge Repartition for Many-Core Systems

COER: A Network Interface Offloading Architecture for RDMA and Congestion Control Protocol Codesign

Intermediate Address Space: virtual memory optimization of heterogeneous architectures for cache-resident workloads

CoolDC: A Cost-Effective Immersion-Cooled Datacenter with Workload-Aware Temperature Scaling

Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star Networks

Fixed-point Encoding and Architecture Exploration for Residue Number Systems

Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs

Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

SAL: Optimizing the Dataflow of Spin-based Architectures for Lightweight Neural Networks

Scythe: A Low-latency RDMA-enabled Distributed Transaction System for Disaggregated Memory

Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job Colocation

Achieving Tunable Erasure Coding with Cluster-Aware Redundancy Transitioning

Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture

ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error Detection

Characterizing and Optimizing LDPC Performance on 3D NAND Flash Memories

ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators

Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data Purification