Author: Kung, Jaeha : Search

research-article

Open Access

OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 259, Pages 1–6https://doi.org/10.1145/3649329.3657323

To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on quantizing activations. In this paper, we present a ...

research-article

Open Access

NDPipe: Exploiting Near-data Processing for Scalable Inference and Continuous Training in Photo Storage

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 689–707https://doi.org/10.1145/3620666.3651345

This paper proposes a novel photo storage system called NDPipe, which accelerates the performance of training and inference for image data by leveraging near-data processing in photo storage servers. NDPipe distributes storage servers with inexpensive ...

research-article

FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support

IEEE Transactions on Computers (ITCO), Volume 72, Issue 9Pages 2522–2535https://doi.org/10.1109/TC.2023.3253050

When training deep neural networks (DNNs), expensive floating point arithmetic units are used in GPUs or custom neural processing units (NPUs). To reduce the burden of floating point arithmetic, community has started exploring the use of more efficient ...

research-article

Implication of Optimizing NPU Dataflows on Neural Architecture Search for Mobile Devices

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 27, Issue 5Article No.: 48, Pages 1–24https://doi.org/10.1145/3513085

Recent advances in deep learning have made it possible to implement artificial intelligence in mobile devices. Many studies have put a lot of effort into developing lightweight deep learning models optimized for mobile devices. To overcome the performance ...

research-article

High-throughput Near-Memory Processing on CNNs with 3D HBM-like Memory

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 26, Issue 6Article No.: 48, Pages 1–20https://doi.org/10.1145/3460971

This article discusses the high-performance near-memory neural network (NN) accelerator architecture utilizing the logic die in three-dimensional (3D) High Bandwidth Memory– (HBM) like memory. As most of the previously reported 3D memory-based near-memory ...

research-article

Peregrine: A Flexible Hardware Accelerator for LSTM with Limited Synaptic Connection Patterns

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019Article No.: 209, Pages 1–6https://doi.org/10.1145/3316781.3317879

In this paper, we present an integrated solution to design a high-performance LSTM accelerator. We propose a fast and flexible hardware architecture, named Peregrine, supported by a stack of innovations from algorithm to hardware design. Peregrine first ...

article

Efficient Object Detection Using Embedded Binarized Neural Networks

Journal of Signal Processing Systems (JSPS), Volume 90, Issue 6Pages 877–890https://doi.org/10.1007/s11265-017-1255-5

Memory performance is a key bottleneck for deep learning systems. Binarization of both activations and weights is one promising approach that can best scale to realize the highest energy efficient system using the lowest possible precision. In this ...

research-article

Public Access

A Programmable Hardware Accelerator for Simulating Dynamical Systems

ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitecturePages 403–415https://doi.org/10.1145/3079856.3080252

The fast and energy-efficient simulation of dynamical systems defined by coupled ordinary/partial differential equations has emerged as an important problem. The accelerated simulation of coupled ODE/PDE is critical for analysis of physical systems as ...

Also Published in:

ACM SIGARCH Computer Architecture News: Volume 45 Issue 2

research-article

Free

Adaptive weight compression for memory-efficient neural networks

DATE '17: Proceedings of the Conference on Design, Automation & Test in EuropePages 199–204

Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents an application of JPEG image encoding to compress the weights by exploiting the spatial locality and ...

research-article

Public Access

Dynamic Approximation with Feedback Control for Energy-Efficient Recurrent Neural Network Hardware

ISLPED '16: Proceedings of the 2016 International Symposium on Low Power Electronics and DesignPages 168–173https://doi.org/10.1145/2934583.2934626

This paper presents methodology of feedback-controlled dynamic approximation to enable energy-accuracy trade-off in digital recurrent neural network (RNN). A low-power digital RNN engine is presented that employs the proposed dynamic approximation. The ...

research-article

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitecturePages 380–392https://doi.org/10.1109/ISCA.2016.41

This paper presents a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected ...

Also Published in:

ACM SIGARCH Computer Architecture News: Volume 44 Issue 3

research-article

On the Impact of Energy-Accuracy Tradeoff in a Digital Cellular Neural Network for Image Processing

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCADICS), Volume 34, Issue 7Pages 1070–1081https://doi.org/10.1109/TCAD.2015.2406853

This paper studies the opportunities of energy-accuracy tradeoff in cellular neural network (CNN). Algorithmic characteristics of CNN is coupled with hardware-induced error distribution of a digital CNN cell to evaluate energy-accuracy tradeoff for simple ...

research-article

Thermal signature: a simple yet accurate thermal index for floorplan optimization

DAC '11: Proceedings of the 48th Design Automation ConferencePages 108–113https://doi.org/10.1145/2024724.2024748

A floorplanning has a potential to reduce chip temperature due to the conductive nature of heat. If floorplan optimization, which is usually based on simulated annealing, is employed to reduce temperature, its evaluation should be done extremely fast ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models

NDPipe: Exploiting Near-data Processing for Scalable Inference and Continuous Training in Photo Storage

FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support

Implication of Optimizing NPU Dataflows on Neural Architecture Search for Mobile Devices

High-throughput Near-Memory Processing on CNNs with 3D HBM-like Memory

Peregrine: A Flexible Hardware Accelerator for LSTM with Limited Synaptic Connection Patterns

Efficient Object Detection Using Embedded Binarized Neural Networks

A Programmable Hardware Accelerator for Simulating Dynamical Systems

Also Published in:

Adaptive weight compression for memory-efficient neural networks

Dynamic Approximation with Feedback Control for Energy-Efficient Recurrent Neural Network Hardware

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

Also Published in:

On the Impact of Energy-Accuracy Tradeoff in a Digital Cellular Neural Network for Image Processing

Thermal signature: a simple yet accurate thermal index for floorplan optimization

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Also Published in:

Also Published in: