TACO: Vol 21, No 1

Volume 21, Issue 1March 2024

Volume 21, Issue 1

March 2024

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

PDF eReader

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Critical Data Backup with Hybrid Flash-Based Consumer Devices

Article No.: 1, Pages 1–23https://doi.org/10.1145/3631529

Hybrid flash-based storage constructed with high-density and low-cost flash memory has become increasingly popular in consumer devices in the last decade due to its low cost. However, its poor reliability is one of the major concerns. To protect critical ...

research-article

Open Access

DAG-Order: An Order-Based Dynamic DAG Scheduling for Real-Time Networks-on-Chip

Article No.: 2, Pages 1–24https://doi.org/10.1145/3631527

With the high-performance requirement of safety-critical real-time tasks, the platforms of many-core processors with high parallelism are widely utilized, where network-on-chip (NoC) is generally employed for inter-core communication due to its ...

research-article

Open Access

JiuJITsu: Removing Gadgets with Safe Register Allocation for JIT Code Generation

Article No.: 3, Pages 1–26https://doi.org/10.1145/3631526

Code-reuse attacks have the capability to craft malicious instructions from small code fragments, commonly referred to as “gadgets.” These gadgets are generated by JIT (Just-In-Time) engines as integral components of native instructions, with the ...

research-article

Open Access

Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations

Article No.: 4, Pages 1–25https://doi.org/10.1145/3631709

Leveraging the SIMD capability of modern CPU architectures is mandatory to take full advantage of their increased performance. To exploit this capability, binary executables must be vectorized, either manually by developers or automatically by a tool. For ...

research-article

Open Access

Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs

Article No.: 5, Pages 1–26https://doi.org/10.1145/3632956

Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, ...

research-article

Open Access

QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDs

Article No.: 6, Pages 1–25https://doi.org/10.1145/3632955

Solid State Drives (SSDs) are widely used in data-intensive scenarios due to their high performance and decreasing cost. However, in shared environments, concurrent workloads can interfere with each other, leading to a violation of Quality of Service (QoS)...

research-article

Open Access

SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNs

Article No.: 7, Pages 1–26https://doi.org/10.1145/3632957

Deep Neural Networks (DNNs) have achieved great progress in academia and industry. But they have become computational and memory intensive with the increase of network depth. Previous designs seek breakthroughs in software and hardware levels to mitigate ...

research-article

Open Access

Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping

Article No.: 8, Pages 1–26https://doi.org/10.1145/3629525

Collecting sufficient microarchitecture performance data is essential for performance evaluation and workload characterization. There are many events to be monitored in a modern processor while only a few hardware performance monitoring counters (PMCs) ...

research-article

Open Access

QuCloud+: A Holistic Qubit Mapping Scheme for Single/Multi-programming on 2D/3D NISQ Quantum Computers

Article No.: 9, Pages 1–27https://doi.org/10.1145/3631525

Qubit mapping for NISQ superconducting quantum computers is essential to fidelity and resource utilization. The existing qubit mapping schemes meet challenges, e.g., crosstalk, SWAP overheads, diverse device topologies, etc., leading to qubit resource ...

research-article

Open Access

Abakus: Accelerating k-mer Counting with Storage Technology

Article No.: 10, Pages 1–26https://doi.org/10.1145/3632952

This work seeks to leverage Processing-with-storage-technology (PWST) to accelerate a key bioinformatics kernel called k-mer counting, which involves processing large files of sequence data on the disk to build a histogram of fixed-size genome sequence ...

research-article

Open Access

ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization Opportunities

Article No.: 11, Pages 1–24https://doi.org/10.1145/3632951

As solid-state drives (SSDs) with sufficient computing power have recently become the dominant devices in modern computer systems, in-storage processing (ISP), which processes data within the storage without transferring it to the host memory, is being ...

research-article

Open Access

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

Article No.: 12, Pages 1–26https://doi.org/10.1145/3633331

Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the threads, using a programmer-specified scheduling policy. While the existing scheduling policies perform reasonably well in the context of balanced workloads, in ...

research-article

Open Access

Hardware-hardened Sandbox Enclaves for Trusted Serverless Computing

Article No.: 13, Pages 1–25https://doi.org/10.1145/3632954

In cloud-based serverless computing, an application consists of multiple functions provided by mutually distrusting parties. For secure serverless computing, the hardware-based trusted execution environment (TEE) can provide strong isolation among ...

research-article

Open Access

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

Article No.: 14, Pages 1–24https://doi.org/10.1145/3632953

The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for ease of use provided by system-managed memory with a moderate-to-high performance ...

research-article

Open Access

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

Article No.: 15, Pages 1–26https://doi.org/10.1145/3634916

Memory disaggregation is a promising architecture for modern datacenters that separates compute and memory resources into independent pools connected by ultra-fast networks, which can improve memory utilization, reduce cost, and enable elastic scaling of ...

research-article

Open Access

WA-Zone: Wear-Aware Zone Management Optimization for LSM-Tree on ZNS SSDs

Article No.: 16, Pages 1–23https://doi.org/10.1145/3637488

ZNS SSDs divide the storage space into sequential-write zones, reducing costs of DRAM utilization, garbage collection, and over-provisioning. The sequential-write feature of zones is well-suited for LSM-based databases, where random writes are organized ...

research-article

Open Access

Improving Utilization of Dataflow Unit for Multi-Batch Processing

Article No.: 17, Pages 1–26https://doi.org/10.1145/3637906

Dataflow architectures can achieve much better performance and higher efficiency than general-purpose core, approaching the performance of a specialized design while retaining programmability. However, advanced application scenarios place higher demands ...

research-article

Open Access

Extension VM: Interleaved Data Layout in Vector Memory

Article No.: 18, Pages 1–23https://doi.org/10.1145/3631528

While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates ...

research-article

Open Access

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Article No.: 19, Pages 1–29https://doi.org/10.1145/3632950

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states ...

research-article

Open Access

Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

Article No.: 20, Pages 1–20https://doi.org/10.1145/3633462

An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTM is a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Critical Data Backup with Hybrid Flash-Based Consumer Devices

DAG-Order: An Order-Based Dynamic DAG Scheduling for Real-Time Networks-on-Chip

JiuJITsu: Removing Gadgets with Safe Register Allocation for JIT Code Generation

Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations

Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs

QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDs

SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNs

Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping

QuCloud+: A Holistic Qubit Mapping Scheme for Single/Multi-programming on 2D/3D NISQ Quantum Computers

Abakus: Accelerating k-mer Counting with Storage Technology

ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization Opportunities

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

Hardware-hardened Sandbox Enclaves for Trusted Serverless Computing

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL

WA-Zone: Wear-Aware Zone Management Optimization for LSM-Tree on ZNS SSDs

Improving Utilization of Dataflow Unit for Multi-Batch Processing

Extension VM: Interleaved Data Layout in Vector Memory

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs

Sections

Issue Downloads

Save to Binder

Comments

Subjects