More Web Proxy on the site http://driver.im/

research-article

PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs

Authors:

Mohammad Motamedi,

Soheil GhiasiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 13, Issue 4

Article No.: 62, Pages 1 - 21

https://doi.org/10.1145/3131289

Published: 18 September 2017 Publication History

Abstract

Deep Convolutional Neural Networks (DCNNs) exhibit remarkable performance in a number of pattern recognition and classification tasks. Modern DCNNs involve many millions of parameters and billions of operations. Inference using such DCNNs, if implemented as software running on an embedded processor, results in considerable execution time and energy consumption, which is prohibitive in many mobile applications. Field-programmable gate array (FPGA)-based acceleration of DCNN inference is a promising approach to improve both energy consumption and classification throughput. However, the engineering effort required for development and verification of an optimized FPGA-based architecture is significant.

In this article, we present PLACID, an automated PLatform for Accelerator CreatIon for DCNNs. PLACID uses an analytical approach to characterization and exploration of the implementation space. PLACID enables generation of an accelerator with the highest throughput for a given DCNN on a specific target FPGA platform. Subsequently, it generates an RTL level architecture in Verilog, which can be passed onto commercial tools for FPGA implementation. PLACID is fully automated, and reduces the accelerator design time from a few months down to a few hours. Experimental results show that architectures synthesized by PLACID yield 2× higher throughput density than the best competing approach.

References

[1]

Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).

[2]

James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, David Warde-Farley, Ian Goodfellow, Arnaud Bergeronet al. 2011. Theano: Deep learning on gpus with python. In Proceedings of the BigLearning Workshop (NIPS’11), Granada, Spain, Vol. 3. Citeseer.

[3]

Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, and Hans Peter Graf. 2010. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, 273--284.

Digital Library

[4]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). ACM, New York, NY, 247--257.

Digital Library

[5]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, Vol. 49. ACM, 269--284.

Digital Library

[6]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sunet al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.

Digital Library

[7]

Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2011. Torch7: A matlab-like environment for machine learning. In Proceedings of the BigLearn, NIPS Workshop.

[8]

Jason Cong and Bingjun Xiao. 2014. Minimizing computation in convolutional neural networks. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 281--290.

[9]

Matthieu Courbariaux, Jean-Pierre David, and Yoshua Bengio. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).

[10]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. CoRR, abs/1502.02551 392 (2015).

[11]

Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. arXiv preprint arXiv:0000.0000 (2016).

[12]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149 2 (2015).

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).

[14]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[15]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, 675--678.

Digital Library

[16]

N. Jouppi. 2016. Google supercharges machine learning tasks with TPU custom chip. Google Blog, May 18 (2016).

[17]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

[19]

Edward F. Moore. 1956. Gedanken-experiments on sequential machines. Automata Studies 34 (1956), 129--153.

[20]

M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi. 2016. Design space exploration of FPGA-based deep convolutional neural networks. In Proceedings of the 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC’16). 575--580.

[21]

Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In Proceedings of the 2013 IEEE 31st International Conference on Computer Design (ICCD’13). IEEE, 13--19.

[22]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Songet al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.

Digital Library

[23]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv preprint arXiv:1603.05279 (2016).

[24]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[25]

Murugan Sankaradas, Venkata Jakkula, Srihari Cadambi, Srimat Chakradhar, Igor Durdanovic, Eric Cosatto, and Hans Peter Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP’09). IEEE, 53--60.

Digital Library

[26]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[27]

Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’16). ACM, New York, NY, 16--25.

Digital Library

[28]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.

[29]

Andrea Vedaldi and Karel Lenc. 2015. Matconvnet: Convolutional neural networks for matlab. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 689--692.

Digital Library

[30]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170.

Digital Library

Cited By

Cheng LGu YLiu QYang LLiu CWang Y(2024)Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A SurveyIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33531769:6(830-847)Online publication date: Nov-2024
https://doi.org/10.1109/TSUSC.2024.3353176
Liu FLi HHu WHe Y(2024)Review of neural network model acceleration techniques based on FPGA platformsNeurocomputing10.1016/j.neucom.2024.128511(128511)Online publication date: Aug-2024
https://doi.org/10.1016/j.neucom.2024.128511
Foukalas F(2024)A Survey of Artificial Neural Network Computing SystemsCognitive Computation10.1007/s12559-024-10383-017:1Online publication date: 22-Nov-2024
https://doi.org/10.1007/s12559-024-10383-0
Show More Cited By

Index Terms

PLACID: A Platform for FPGA-Based Accelerator Creation for DCNNs
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware
    2. System on a chip

Recommendations

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
You Cannot Improve What You Do not Measure: FPGA vs. ASIC Efficiency Gaps for Convolutional Neural Network Inference
Special Issue on Deep learning on FPGAs

Recently, deep learning (DL) has become best-in-class for numerous applications but at a high computational cost that necessitates high-performance energy-efficient acceleration. The reconfigurability of FPGAs is appealing due to the rapid change in DL ...
Efficient FPGA hardware development: A multi-language approach

This paper presents a multi-language framework to FPGA hardware development which aims to satisfy the dual requirement of high-level hardware design and efficient hardware implementation. The central idea of this framework is the integration of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 13, Issue 4

November 2017

362 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3129737

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2017

Accepted: 01 July 2017

Revised: 01 April 2017

Received: 01 November 2016

Published in TOMM Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
538
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheng LGu YLiu QYang LLiu CWang Y(2024)Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A SurveyIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33531769:6(830-847)Online publication date: Nov-2024
https://doi.org/10.1109/TSUSC.2024.3353176
Liu FLi HHu WHe Y(2024)Review of neural network model acceleration techniques based on FPGA platformsNeurocomputing10.1016/j.neucom.2024.128511(128511)Online publication date: Aug-2024
https://doi.org/10.1016/j.neucom.2024.128511
Foukalas F(2024)A Survey of Artificial Neural Network Computing SystemsCognitive Computation10.1007/s12559-024-10383-017:1Online publication date: 22-Nov-2024
https://doi.org/10.1007/s12559-024-10383-0
Sharma PGurunarayanan SKaruppiah A(2024)Hardware Accelerators for Classification of Thoracic Disorders: A SurveyProceedings of Ninth International Congress on Information and Communication Technology10.1007/978-981-97-3299-9_14(169-184)Online publication date: 30-Jul-2024
https://doi.org/10.1007/978-981-97-3299-9_14
Singh RKumar SGandhi JShekhawat DSantosh MPandey J(2023)A time domain 2D OaA-based convolutional neural networks acceleratorMemories - Materials, Devices, Circuits and Systems10.1016/j.memori.2023.1000414(100041)Online publication date: Jul-2023
https://doi.org/10.1016/j.memori.2023.100041
Ling YHuang YCai YLi ZWang MLi WZeng X(2023)FSS: algorithm and neural network accelerator for style transferScience China Information Sciences10.1007/s11432-022-3676-267:2Online publication date: 10-Oct-2023
https://doi.org/10.1007/s11432-022-3676-2
Tragoudaras AStoikos PFanaras KTziouvaras AFloros GDimitriou GKolomvatsos KStamoulis G(2022)Design Space Exploration of a Sparse MobileNetV2 Using High-Level Synthesis and Sparse Matrix Techniques on FPGAsSensors10.3390/s2212431822:12(4318)Online publication date: 7-Jun-2022
https://doi.org/10.3390/s22124318
Wang MLee KChung BBogaraju SNg HWong JShum HTsia KSo H(2022) Low-Latency In Situ Image Analytics With FPGA-Based Quantized Convolutional Neural Network IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.304645233:7(2853-2866)Online publication date: Jul-2022
https://doi.org/10.1109/TNNLS.2020.3046452
Singh RKumar SPandey J(2022)An Overlap-and-Add Based Time Domain Acceleration of CNNs on FPGA-CPU SystemsVLSI Design and Test10.1007/978-3-031-21514-8_47(573-583)Online publication date: 17-Dec-2022
https://doi.org/10.1007/978-3-031-21514-8_47
Wang WZhao XLiu D(2021)Design and Optimization of 1D-CNN for Spectrum Recognition of Underwater TargetsIntegrated Ferroelectrics10.1080/10584587.2021.1911338218:1(164-179)Online publication date: 15-Aug-2021
https://doi.org/10.1080/10584587.2021.1911338
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents