Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 259, Pages 1–6https://doi.org/10.1145/3649329.3657323To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on quantizing activations. In this paper, we present a ...
- research-articleApril 2024
NDPipe: Exploiting Near-data Processing for Scalable Inference and Continuous Training in Photo Storage
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 689–707https://doi.org/10.1145/3620666.3651345This paper proposes a novel photo storage system called NDPipe, which accelerates the performance of training and inference for image data by leveraging near-data processing in photo storage servers. NDPipe distributes storage servers with inexpensive ...
- research-articleSeptember 2023
FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support
IEEE Transactions on Computers (ITCO), Volume 72, Issue 9Pages 2522–2535https://doi.org/10.1109/TC.2023.3253050When training deep neural networks (DNNs), expensive floating point arithmetic units are used in GPUs or custom neural processing units (NPUs). To reduce the burden of floating point arithmetic, community has started exploring the use of more efficient ...
- research-articleJune 2022
- research-articleJune 2021
High-throughput Near-Memory Processing on CNNs with 3D HBM-like Memory
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 26, Issue 6Article No.: 48, Pages 1–20https://doi.org/10.1145/3460971This article discusses the high-performance near-memory neural network (NN) accelerator architecture utilizing the logic die in three-dimensional (3D) High Bandwidth Memory– (HBM) like memory. As most of the previously reported 3D memory-based near-memory ...
- research-articleJune 2019
Peregrine: A Flexible Hardware Accelerator for LSTM with Limited Synaptic Connection Patterns
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019Article No.: 209, Pages 1–6https://doi.org/10.1145/3316781.3317879In this paper, we present an integrated solution to design a high-performance LSTM accelerator. We propose a fast and flexible hardware architecture, named Peregrine, supported by a stack of innovations from algorithm to hardware design. Peregrine first ...
- articleJune 2018
Efficient Object Detection Using Embedded Binarized Neural Networks
Journal of Signal Processing Systems (JSPS), Volume 90, Issue 6Pages 877–890https://doi.org/10.1007/s11265-017-1255-5Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this ...
- research-articleJune 2017
A Programmable Hardware Accelerator for Simulating Dynamical Systems
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitecturePages 403–415https://doi.org/10.1145/3079856.3080252The fast and energy-efficient simulation of dynamical systems defined by coupled ordinary/partial differential equations has emerged as an important problem. The accelerated simulation of coupled ODE/PDE is critical for analysis of physical systems as ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 45 Issue 2 - research-articleMarch 2017
Adaptive weight compression for memory-efficient neural networks
Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and ...
- research-articleAugust 2016
Dynamic Approximation with Feedback Control for Energy-Efficient Recurrent Neural Network Hardware
ISLPED '16: Proceedings of the 2016 International Symposium on Low Power Electronics and DesignPages 168–173https://doi.org/10.1145/2934583.2934626This paper presents methodology of feedback-controlled dynamic approximation to enable energy-accuracy trade-off in digital recurrent neural network (RNN). A low-power digital RNN engine is presented that employs the proposed dynamic approximation. The ...
- research-articleJune 2016
Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory
ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitecturePages 380–392https://doi.org/10.1109/ISCA.2016.41This paper presents a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 44 Issue 3 - research-articleJuly 2015
On the Impact of Energy-Accuracy Tradeoff in a Digital Cellular Neural Network for Image Processing
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCADICS), Volume 34, Issue 7Pages 1070–1081https://doi.org/10.1109/TCAD.2015.2406853This paper studies the opportunities of energy-accuracy tradeoff in cellular neural network (CNN). Algorithmic characteristics of CNN is coupled with hardware-induced error distribution of a digital CNN cell to evaluate energy-accuracy tradeoff for simple ...
- research-articleJune 2011
Thermal signature: a simple yet accurate thermal index for floorplan optimization
DAC '11: Proceedings of the 48th Design Automation ConferencePages 108–113https://doi.org/10.1145/2024724.2024748A floorplanning has a potential to reduce chip temperature due to the conductive nature of heat. If floorplan optimization, which is usually based on simulated annealing, is employed to reduce temperature, its evaluation should be done extremely fast ...