More Web Proxy on the site http://driver.im/

research-article

Free access

A novel zero weight/activation-aware hardware architecture of convolutional neural network

Authors:

Sungjoo YooAuthors Info & Claims

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

Pages 1466 - 1471

Published: 27 March 2017 Publication History

Abstract

It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activations. We also report a zero-induced load imbalance problem, which exists in zero-aware parallel CNN hardware architectures, and present a zero-aware kernel allocation as a solution. According to our experiments with a cycle-accurate simulation model, RTL, and layout design of the proposed architecture running two real deep CNNs, pruned AlexNet [1] and VGG-16 [2], our architecture offers 4x/1.8x (AlexNet) and 5.2x/2.1x (VGG-16) speedup compared with state-of-the-art zero-agnostic/zero-activation-aware architectures.

References

[1]

A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks," in Proc. NIPS, 2012.

Digital Library

[2]

K. Simonyan et al., "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.

[3]

C. Szegedy et al., "Going deeper with convolutions," in Proc. CVPR, 2015.

[4]

P. Sermanet et al., "Overfeat: Integrated recognition, localization and detection using convolutional networks," arXiv preprint arXiv:1312.6229, 2013.

[5]

R. Girshick et al., "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. CVPR, 2014.

Digital Library

[6]

S. Ren et al., "Faster R-CNN: Towards real-time object detection with region proposal networks," in Proc. NIPS, 2015.

Digital Library

[7]

K. He et al., "Deep residual learning for image recognition," arXiv preprint arXiv:1512.03385, 2015.

[8]

Y. LeCun et al., "Gradient-based learning applied to document recognition," Proc. IEEE. vol. 86, pp. 2278--2324, 1998.

[9]

I. Kipyatkova et al., "Recurrent neural network-based language modeling for an automatic Russian speech recognition system," in Proc. AINL-ISMW FRUCT, 2015.

[10]

J. Cong et al., "Minimizing computation in convolutional neural networks," in Proc. ICANN, 2014.

[11]

S. Han et al., "Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding," in Proc. ICLR, 2016.

[12]

J. Albericio et al., "Cnvlutin: Ineffectual-neuron-free deep neural network computing," in Proc. ISCA, 2016.

Digital Library

[13]

Y. H. Chen et al., "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," in Proc. ISSCC, 2016.

[14]

T. Chen et al., "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proc. ASPLOS, 2014.

Digital Library

[15]

Y. Chen et al., "DaDianNao: A machine-learning supercomputer," in Proc. MICRO, 2014.

Digital Library

[16]

Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor," in Proc. ISCA, 2015.

Digital Library

[17]

S. Han et al., "EIE: Efficient inference engine on compressed deep neural network, " in Proc. ISCA, 2016.

Digital Library

[18]

D. Miyashita et al., "Convolutional neural networks using logarithmic data representation," arXiv preprint arXiv:1603.01025, 2016.

[19]

CACTI v6.0, http://www.hpl.hp.com/research/cacti/.

Cited By

Sharify SLascorz AMahmoud MNikolic MSiu KStuart DPoulos ZMoshovos AManne SHunter HAltman E(2019)Laconic deep learning inference accelerationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322255(304-317)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322255
Lin CLai BShin Y(2018)Supporting compressed-sparse activations and weights on SIMD-like accelerator for sparse convolutional neural networksProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201630(105-110)Online publication date: 22-Jan-2018
https://dl.acm.org/doi/10.5555/3201607.3201630
Prost-Boucle ABourge APétrot F(2018)High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight CompressionACM Transactions on Reconfigurable Technology and Systems10.1145/327076411:3(1-24)Online publication date: 12-Dec-2018
https://dl.acm.org/doi/10.1145/3270764

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
A Convolutional Neural Network Accelerator Based on NVDLA
ICACS '21: Proceedings of the 5th International Conference on Algorithms, Computing and Systems

In recent years, Convolutional Neural Network (CNN) has been successfully applied to a wider range of fields, such as image recognition and natural language processing. With the application of CNN to solve more complex problems, their computing and ...
Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques
Abstract
With the recent advancements in high-performance computing, convolutional neural networks (CNNs) have achieved remarkable success in various vision tasks. However, along with improvements in model accuracy, the size and computational complexity of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe

March 2017

1814 pages

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 27 March 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
163
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)4

Reflects downloads up to 17 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sharify SLascorz AMahmoud MNikolic MSiu KStuart DPoulos ZMoshovos AManne SHunter HAltman E(2019)Laconic deep learning inference accelerationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322255(304-317)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322255
Lin CLai BShin Y(2018)Supporting compressed-sparse activations and weights on SIMD-like accelerator for sparse convolutional neural networksProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201630(105-110)Online publication date: 22-Jan-2018
https://dl.acm.org/doi/10.5555/3201607.3201630
Prost-Boucle ABourge APétrot F(2018)High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight CompressionACM Transactions on Reconfigurable Technology and Systems10.1145/327076411:3(1-24)Online publication date: 12-Dec-2018
https://dl.acm.org/doi/10.1145/3270764

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents