[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

FP-BNN

Published: 31 January 2018 Publication History

Abstract

Deep neural networks (DNNs) have attracted significant attention for their excellent accuracy especially in areas such as computer vision and artificial intelligence. To enhance their performance, technologies for their hardware acceleration are being studied. FPGA technology is a promising choice for hardware acceleration, given its low power consumption and high flexibility which makes it suitable particularly for embedded systems. However, complex DNN models may need more computing and memory resources than those available in many current FPGAs. This paper presents FP-BNN, a binarized neural network (BNN) for FPGAs, which drastically cuts down the hardware consumption while maintaining acceptable accuracy. We introduce a Resource-Aware Model Analysis (RAMA) method, and remove the bottleneck involving multipliers by bit-level XNOR and shifting operations, and the bottleneck of parameter access by data quantization and optimized on-chip storage. We evaluate the FP-BNN accelerator designs for MNIST multi-layer perceptrons (MLP), Cifar-10 ConvNet, and AlexNet on a Stratix-V FPGA system. An inference performance of Tera opartions per second with acceptable accuracy loss is obtained, which shows improvement in speed and energy efficiency over other computing platforms.

References

[1]
Y. LeCun, C. Cortes, C.J. Burges, The MNIST database of handwritten digits, 1998, http://yann.lecun.com/exdb/mnist/.
[2]
A. Krizhevsky, V. Nair, G. Hinton, The CIFAR-10 dataset, 2014, https://www.cs.toronto.edu/~kriz/cifar.html.
[3]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115 (2015) 211-252.
[4]
G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag., 29 (2012) 82-97.
[5]
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, Deep speech 2: end-to-end speech recognition in English and Mandarin, International Conference on Machine Learning (2016) 173-182.
[6]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602 (2013).
[7]
D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016) 484-489.
[8]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, 2012.
[9]
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on imagenet classification, 2015.
[10]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2016.
[11]
A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, N. Andrew, Deep learning with COTS HPC systems, 2013.
[12]
NVIDIA, Tesla K40 GPU Active Accelerator, NVIDIA, 2013.
[13]
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, Y. LeCun, Neuflow: a runtime reconfigurable dataflow processor for vision, IEEE, 2011.
[14]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing FPGA-based accelerator design for deep convolutional neural networks, ACM, 2015.
[15]
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Going deeper with embedded FPGA platform for convolutional neural network, ACM, 2016.
[16]
S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv:1510.00149 (2015).
[17]
F.N. Iandola, M.W. Moskewicz, K. Ashraf, S. Han, W.J. Dally, K. Keutzer, Squeezenet: alexnet-level accuracy with 50x fewer parameters and less than 1MB model size, arXiv preprint arXiv:1602.07360 (2016).
[18]
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks: training deep neural networks with weights andactivations constrained to +1 or 1, arXiv preprint arXiv:1602.02830 (2016).
[19]
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A.Y. Ng, Reading digits in natural images with unsupervised feature learning, 2011.
[20]
M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, Springer International Publishing (2016) 525-542.
[21]
V. Sze, Y.-H. Chen, T.-J. Yang, J. Emer, Efficient processing of deep neural networks: a tutorial and survey, arXiv preprint arXiv:1703.09039 (2017).
[22]
W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F.E. Alsaadi, A survey of deep neural network architectures and their applications, Neurocomputing, 234 (2017) 11-26.
[23]
S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning (2015) 448-456.
[24]
A. Ng, J. Ngiam, C. Foo, Y. Mai, C. Suen, Backpropagation algorithm of ufldl tutorial, http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm.
[25]
G. Hinton, Neural Network for Machine Learning, Coursera, 2012.
[26]
C. Farabet, C. Poulet, J.Y. Han, Y. LeCun, CNP: an FPGA-based processor for convolutional networks, IEEE, 2009.
[27]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, ACM, 2014.
[28]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, Dadiannao: a machine-learning supercomputer, IEEE Computer Society, 2014.
[29]
N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, etal., In-datacenter performance analysis of a tensor processing unit, arXiv preprint arXiv:1704.04760 (2017).
[30]
W. Wen, C. Wu, Y. Wang, Y. Chen, H. Li, Learning structured sparsity in deep neural networks, 2016.
[31]
T.-J. Yang, Y.-H. Chen, V. Sze, Designing energy-efficient convolutional neural networks using energy-aware pruning, arXiv preprint arXiv:1611.05128 (2016).
[32]
H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, L. Wang, A high performance FPGA-based accelerator for large-scale convolutional neural networks, IEEE, 2016.
[33]
P. Gysel, Ristretto: Hardware-oriented approximation of convolutional neural networks, arXiv preprint arXiv:1605.06402 (2016).
[34]
F. Li, B. Zhang, B. Liu, Ternary weight networks, arXiv preprint arXiv:1605.04711 (2016).
[35]
C. Zhu, S. Han, H. Mao, W.J. Dally, Trained ternary quantization, arXiv preprint arXiv:1612.01064 (2016).
[36]
S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, Y. Zou, Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv:1606.06160 (2016).
[37]
H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, F. Ptrot, Ternary neural networks for resource-efficient ai applications, arXiv:1609.00222 (2016).
[38]
W. Meng, Z. Gu, M. Zhang, Z. Wu, Two-bit networks for deep learning on resource-constrained embedded devices, arXiv preprint arXiv:1701.00485 (2017).
[39]
R. Andri, L. Cavigelli, D. Rossi, L. Benini, YodaNN: an architecture for ultra-low power binary-weight cnn acceleration, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., PP (2017) 1-14.
[40]
R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, Z. Zhang, Accelerating binarized convolutional neural networks with software-programmable fpgas, ACM, 2017.
[41]
Y. Umuroglu, N.J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, K. Vissers, Finn: a framework for fast, scalable binarized neural network inference, ACM, 2017.
[42]
M. Kumm, P. Zipf, Pipelined compressor tree optimization using integer linear programming, IEEE, 2014.
[43]
S. Gupta, A. Agrawal, K. Gopalakrishnan, P. Narayanan, Deep learning with limited numerical precision, CoRR, 392 (2015).
[44]
D. Williamson, Dynamically scaled fixed point arithmetic, IEEE, 1991.
[45]
Maxeler, MPC-X series, https://www.maxeler.com/products/mpc-xseries/.
[46]
R. Collobert, K. Kavukcuoglu, C. Farabet, Torch7: A matlab-like environment for machine learning, 2011.
[47]
I.J. Goodfellow, D. Warde-Farley, M. Mirza, A.C. Courville, Y. Bengio, Maxout networks., ICML (3), 28 (2013) 1319-1327.
[48]
M. Lin, Q. Chen, S. Yan, Network in network, arXiv preprint arXiv:1312.4400 (2013).
[49]
D. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
[50]
M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, 2015.
[51]
N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.-s. Seo, Y. Cao, Throughput-optimized opencl-based FPGA accelerator for large-scale convolutional neural networks, ACM, 2016.
[52]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Quantized neural networks: Training neural networks with low precision weights and activations, arXiv preprint arXiv:1609.07061 (2016).

Cited By

View all
  • (2024)Binary Optical Machine Learning: Million-Scale Physical Neural Networks with Nano NeuronsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649384(603-617)Online publication date: 29-May-2024
  • (2024)HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/363161017:2(1-24)Online publication date: 30-Apr-2024
  • (2024)Binarizing by Classification: Is Soft Function Really Necessary?IEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328857234:2(973-982)Online publication date: 1-Feb-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 275, Issue C
January 2018
2070 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 31 January 2018

Author Tags

  1. Binarized neural network
  2. FPGA
  3. Hardware accelerator

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Binary Optical Machine Learning: Million-Scale Physical Neural Networks with Nano NeuronsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649384(603-617)Online publication date: 29-May-2024
  • (2024)HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/363161017:2(1-24)Online publication date: 30-Apr-2024
  • (2024)Binarizing by Classification: Is Soft Function Really Necessary?IEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328857234:2(973-982)Online publication date: 1-Feb-2024
  • (2024)Extending Neural Processing Unit and Compiler for Advanced Binarized Neural NetworksProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473822(115-120)Online publication date: 22-Jan-2024
  • (2024)A Reconfigurable Coarse-to-Fine Approach for the Execution of CNN Inference Models in Low-Power Edge DevicesIET Computers & Digital Techniques10.1049/cdt2/62144362024Online publication date: 1-Jan-2024
  • (2024)Review of neural network model acceleration techniques based on FPGA platformsNeurocomputing10.1016/j.neucom.2024.128511610:COnline publication date: 28-Dec-2024
  • (2024)FPGA-based UAV and UGV for search and rescue applicationsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109491119:PAOnline publication date: 1-Oct-2024
  • (2024)Floating-Point Quantization Analysis of Multi-Layer Perceptron Artificial Neural NetworksJournal of Signal Processing Systems10.1007/s11265-024-01911-096:4-5(301-312)Online publication date: 1-May-2024
  • (2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023
  • (2023)TinyM2Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural NetworksACM Transactions on Embedded Computing Systems10.1145/359563323:3(1-23)Online publication date: 3-May-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media