Issue Downloads
Critical Data Backup with Hybrid Flash-Based Consumer Devices
Hybrid flash-based storage constructed with high-density and low-cost flash memory has become increasingly popular in consumer devices in the last decade due to its low cost. However, its poor reliability is one of the major concerns. To protect critical ...
DAG-Order: An Order-Based Dynamic DAG Scheduling for Real-Time Networks-on-Chip
With the high-performance requirement of safety-critical real-time tasks, the platforms of many-core processors with high parallelism are widely utilized, where network-on-chip (NoC) is generally employed for inter-core communication due to its ...
JiuJITsu: Removing Gadgets with Safe Register Allocation for JIT Code Generation
Code-reuse attacks have the capability to craft malicious instructions from small code fragments, commonly referred to as “gadgets.” These gadgets are generated by JIT (Just-In-Time) engines as integral components of native instructions, with the ...
Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations
Leveraging the SIMD capability of modern CPU architectures is mandatory to take full advantage of their increased performance. To exploit this capability, binary executables must be vectorized, either manually by developers or automatically by a tool. For ...
Fast Convolution Meets Low Precision: Exploring Efficient Quantized Winograd Convolution on Modern CPUs
Low-precision computation has emerged as one of the most effective techniques for accelerating convolutional neural networks and has garnered widespread support on modern hardware. Despite its effectiveness in accelerating convolutional neural networks, ...
QoS-pro: A QoS-enhanced Transaction Processing Framework for Shared SSDs
- Hao Fan,
- Yiliang Ye,
- Shadi Ibrahim,
- Zhuo Huang,
- Xingru Li,
- Weibin Xue,
- Song Wu,
- Chen Yu,
- Xuanhua Shi,
- Hai Jin
Solid State Drives (SSDs) are widely used in data-intensive scenarios due to their high performance and decreasing cost. However, in shared environments, concurrent workloads can interfere with each other, leading to a violation of Quality of Service (QoS)...
SAC: An Ultra-Efficient Spin-based Architecture for Compressed DNNs
Deep Neural Networks (DNNs) have achieved great progress in academia and industry. But they have become computational and memory intensive with the increase of network depth. Previous designs seek breakthroughs in software and hardware levels to mitigate ...
Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive Grouping
Collecting sufficient microarchitecture performance data is essential for performance evaluation and workload characterization. There are many events to be monitored in a modern processor while only a few hardware performance monitoring counters (PMCs) ...
Abakus: Accelerating k-mer Counting with Storage Technology
This work seeks to leverage Processing-with-storage-technology (PWST) to accelerate a key bioinformatics kernel called k-mer counting, which involves processing large files of sequence data on the disk to build a histogram of fixed-size genome sequence ...
ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization Opportunities
- Seokwon Kang,
- Jongbin Kim,
- Gyeongyong Lee,
- Jeongmyung Lee,
- Jiwon Seo,
- Hyungsoo Jung,
- Yong Ho Song,
- Yongjun Park
As solid-state drives (SSDs) with sufficient computing power have recently become the dominant devices in modern computer systems, in-storage processing (ISP), which processes data within the storage without transferring it to the host memory, is being ...
COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop
Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the threads, using a programmer-specified scheduling policy. While the existing scheduling policies perform reasonably well in the context of balanced workloads, in ...
Hardware-hardened Sandbox Enclaves for Trusted Serverless Computing
In cloud-based serverless computing, an application consists of multiple functions provided by mutually distrusting parties. For secure serverless computing, the hardware-based trusted execution environment (TEE) can provide strong isolation among ...
Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory
The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for ease of use provided by system-managed memory with a moderate-to-high performance ...
Rcmp: Reconstructing RDMA-Based Memory Disaggregation via CXL
Memory disaggregation is a promising architecture for modern datacenters that separates compute and memory resources into independent pools connected by ultra-fast networks, which can improve memory utilization, reduce cost, and enable elastic scaling of ...
WA-Zone: Wear-Aware Zone Management Optimization for LSM-Tree on ZNS SSDs
- Linbo Long,
- Shuiyong He,
- Jingcheng Shen,
- Renping Liu,
- Zhenhua Tan,
- Congming Gao,
- Duo Liu,
- Kan Zhong,
- Yi Jiang
ZNS SSDs divide the storage space into sequential-write zones, reducing costs of DRAM utilization, garbage collection, and over-provisioning. The sequential-write feature of zones is well-suited for LSM-based databases, where random writes are organized ...
Improving Utilization of Dataflow Unit for Multi-Batch Processing
Dataflow architectures can achieve much better performance and higher efficiency than general-purpose core, approaching the performance of a specialized design while retaining programmability. However, advanced application scenarios place higher demands ...
Extension VM: Interleaved Data Layout in Vector Memory
While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates ...
ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis
- Can Firtina,
- Kamlesh Pillai,
- Gurpreet S. Kalsi,
- Bharathwaj Suresh,
- Damla Senol Cali,
- Jeremie S. Kim,
- Taha Shahroodi,
- Meryem Banu Cavlak,
- Joël Lindegger,
- Mohammed Alser,
- Juan Gómez Luna,
- Sreenivas Subramoney,
- Onur Mutlu
Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states ...
Exploring Data Layout for Sparse Tensor Times Dense Matrix on GPUs
An important sparse tensor computation is sparse-tensor-dense-matrix multiplication (SpTM), which is used in tensor decomposition and applications. SpTM is a multi-dimensional analog to sparse-matrix-dense-matrix multiplication (SpMM). In this article, we ...