[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3130379.3130723guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

A novel zero weight/activation-aware hardware architecture of convolutional neural network

Published: 27 March 2017 Publication History

Abstract

It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activations. We also report a zero-induced load imbalance problem, which exists in zero-aware parallel CNN hardware architectures, and present a zero-aware kernel allocation as a solution. According to our experiments with a cycle-accurate simulation model, RTL, and layout design of the proposed architecture running two real deep CNNs, pruned AlexNet [1] and VGG-16 [2], our architecture offers 4x/1.8x (AlexNet) and 5.2x/2.1x (VGG-16) speedup compared with state-of-the-art zero-agnostic/zero-activation-aware architectures.

References

[1]
A. Krizhevsky et al., "ImageNet classification with deep convolutional neural networks," in Proc. NIPS, 2012.
[2]
K. Simonyan et al., "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[3]
C. Szegedy et al., "Going deeper with convolutions," in Proc. CVPR, 2015.
[4]
P. Sermanet et al., "Overfeat: Integrated recognition, localization and detection using convolutional networks," arXiv preprint arXiv:1312.6229, 2013.
[5]
R. Girshick et al., "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. CVPR, 2014.
[6]
S. Ren et al., "Faster R-CNN: Towards real-time object detection with region proposal networks," in Proc. NIPS, 2015.
[7]
K. He et al., "Deep residual learning for image recognition," arXiv preprint arXiv:1512.03385, 2015.
[8]
Y. LeCun et al., "Gradient-based learning applied to document recognition," Proc. IEEE. vol. 86, pp. 2278--2324, 1998.
[9]
I. Kipyatkova et al., "Recurrent neural network-based language modeling for an automatic Russian speech recognition system," in Proc. AINL-ISMW FRUCT, 2015.
[10]
J. Cong et al., "Minimizing computation in convolutional neural networks," in Proc. ICANN, 2014.
[11]
S. Han et al., "Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding," in Proc. ICLR, 2016.
[12]
J. Albericio et al., "Cnvlutin: Ineffectual-neuron-free deep neural network computing," in Proc. ISCA, 2016.
[13]
Y. H. Chen et al., "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," in Proc. ISSCC, 2016.
[14]
T. Chen et al., "DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proc. ASPLOS, 2014.
[15]
Y. Chen et al., "DaDianNao: A machine-learning supercomputer," in Proc. MICRO, 2014.
[16]
Z. Du et al., "ShiDianNao: Shifting vision processing closer to the sensor," in Proc. ISCA, 2015.
[17]
S. Han et al., "EIE: Efficient inference engine on compressed deep neural network, " in Proc. ISCA, 2016.
[18]
D. Miyashita et al., "Convolutional neural networks using logarithmic data representation," arXiv preprint arXiv:1603.01025, 2016.
[19]
CACTI v6.0, http://www.hpl.hp.com/research/cacti/.

Cited By

View all
  • (2019)Laconic deep learning inference accelerationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322255(304-317)Online publication date: 22-Jun-2019
  • (2018)Supporting compressed-sparse activations and weights on SIMD-like accelerator for sparse convolutional neural networksProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201630(105-110)Online publication date: 22-Jan-2018
  • (2018)High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight CompressionACM Transactions on Reconfigurable Technology and Systems10.1145/327076411:3(1-24)Online publication date: 12-Dec-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
DATE '17: Proceedings of the Conference on Design, Automation & Test in Europe
March 2017
1814 pages

Publisher

European Design and Automation Association

Leuven, Belgium

Publication History

Published: 27 March 2017

Author Tags

  1. accelerator
  2. activation
  3. convolutional neural network
  4. kernel
  5. zero value

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)4
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Laconic deep learning inference accelerationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322255(304-317)Online publication date: 22-Jun-2019
  • (2018)Supporting compressed-sparse activations and weights on SIMD-like accelerator for sparse convolutional neural networksProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201630(105-110)Online publication date: 22-Jan-2018
  • (2018)High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight CompressionACM Transactions on Reconfigurable Technology and Systems10.1145/327076411:3(1-24)Online publication date: 12-Dec-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media