[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3289602.3293905acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Reconfigurable Convolutional Kernels for Neural Networks on FPGAs

Published: 20 February 2019 Publication History

Abstract

Convolutional neural networks (CNNs) gained great success in machine learning applications and much attention was paid to their acceleration on field programmable gate arrays (FPGAs). The most demanding computational complexity of CNNs is found in the convolutional layers, which account for 90% of the total operations. The fact that parameters in convolutional layers do not change over a long time interval in weight stationary CNNs allows the use of reconfiguration to reduce the resource requirements. This work proposes several alternative reconfiguration schemes that significantly reduce the complexity of sum-of-products operations. The proposed direct configuration schemes provide the least resource requirements and fast reconfiguration times of 32 clock cycles but require additional memory for the pre-computed configurations. The proposed online reconfiguration scheme uses an online computation of the LUT contents to avoid this memory overhead. Finally, a scheme that duplicates the reconfigurable LUTs is proposed for which the reconfiguration time can be completely hidden in the computation time. Combined with a few online reconfiguration circuits, this provides the same configuration memory and configuration time as a conventional parallel kernel but offers large resource reductions of up to 80% of the LUTs.

References

[1]
Marco Bettoni, Gianvito Urgese, Yuki Kobayashi, Enrico Macii, and Andrea Acquaviva. 2017. A Convolutional Neural Network Fully Implemented on FPGA for Embedded Platforms. In New Generation of CAS (NGCAS). IEEE, 49--52.
[2]
Nicolas Brunie, Florent de Dinechin, Matei Istoan, Guillaume Sergent, Kinga Illyes, and Bogdan Popa. 2013. Arithmetic Core Generation Using Bit Heaps. In IEEE International Conference on Field Programmable Logic and Application (FPL). 1--8.
[3]
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A Dynamically Configurable Coprocessor for Convolutional Neural Networks. ACM SIGARCH Computer Architecture News 38, 3 (June 2010), 247--257.
[4]
Ken Chapman. 1996. Constant Coefficient Multipliers for the XC4000E. Xilinx Application Note (1996), 1--8.
[5]
K D Chapman. 1994. Fast Integer Multipliers Fit in FPGAs. Electronic Design News (1994).
[6]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv.org (Feb. 2016), 1--11. arXiv:cs.LG/1602.02830v3
[7]
Florent de Dinechin. (accessed October 1, 2018). FloPoCo Project Website. http: //flopoco.gforge.inria.fr
[8]
Florent de Dinechin, Matei Istoan, and Abdelbassat Massouri. 2014. Sum-of- Product Architectures Computing Just Right. IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP) (2014), 41--47.
[9]
F. de Dinechin and B. Pasca. 2012. Custom Arithmetic Datapath Design for FPGAs using the FloPoCo Core Generator. IEEE Design & Test of Computers 99 (2012), 1--6.
[10]
T J Dekker. 1971. A Floating-Point Technique for Extending the Available Precision. Numer. Math. 18, 3 (June 1971), 224--242.
[11]
Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, Graham Taylor, and Shawki Areibi. 2016. Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks. In 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 265--268.
[12]
S Gupta, A Agrawal, K Gopalakrishnan Conference on Machine, and . 2015. Deep learning with limited numerical precision. International Conference on Machine Learning (2015), 1737--1746.
[13]
Philipp Gysel. 2016. Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks. (May 2016), 1--63. arXiv:1605.06402
[14]
Xushen Han, Dajiang Zhou, Shihao Wang, and Shinji Kimura. 2016. CNNMERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In 2016 IEEE 34th International Conference on Computer Design (ICCD. IEEE, 320--327.
[15]
Javier Hormigo, Gabriel Caffarena, Juan P Oliver, and Eduardo Boemo. 2013. Self-Reconfigurable Constant Multiplier for FPGA. ACM Transactions on Reconfigurable Technology and Systems 6, 3 (Oct. 2013), 1--17.
[16]
Martin Kumm and Johannes Kappauf. 2018. Advanced Compressor Tree Synthesis for FPGAs. IEEE Trans. Comput. 67, 8 (2018), 1078--1091.
[17]
M Kumm, K Möller, and P Zipf. 2013. Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration. International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC) (2013), 1--8.
[18]
Martin Kumm, Konrad Möller, and Peter Zipf. 2013. Reconfigurable FIR Filter Using Distributed Arithmetic on FPGAs. In IEEE International Symposium on Circuits and Systems (ISCAS). 2058--2061.
[19]
Martin Kumm and Peter Zipf. 2014. Pipelined Compressor Tree Optimization Using Integer Linear Programming. In IEEE International Conference on Field Programmable Logic and Application (FPL). IEEE, 1--8.
[20]
Shuang Liang, Shouyi Yin, Leibo Liu,Wayne Luk, and ShaojunWei. 2018. FP-BNN: Binarized Neural Network on FPGA. Neurocomputing 275 (2018), 1072--1086.
[21]
Zhiqiang Liu, Yong Dou, Jingfei Jiang, Jinwei Xu, Shijie Li, Yongmei Zhou, and Yingnan Xu. 2017. Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10, 3 (July 2017), 17--23.
[22]
Konrad Möller. 2017. Run-time Reconfigurable Constant Multiplication on Field Programmable Gate Arrays. Ph.D. Dissertation. Kassel University Press.
[23]
Mohammad Motamedi, Philipp Gysel, Venkatesh Akella, and Soheil Ghiasi. 2016. Design space exploration of FPGA-based Deep Convolutional Neural Networks. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC. IEEE, 575--580.
[24]
Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric Chung. 2015. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. Technical Report. 1--4 pages.
[25]
Hadi Parandeh-Afshar, Philip Brisk, and Paolo Ienne. 2008. Efficient Synthesis of Compressor Trees on FPGAs. In Asia and South Pacific Design Automation Conference (ASPDAC). IEEE, 138--143.
[26]
H. Parandeh-Afshar, Arkosnato Neogy, P. Brisk, and P. Ienne. 2011. Compressor Tree Synthesis on Commercial High-Performance FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 4, 4 (Dec. 2011), 1--19.
[27]
Jiantao Qiu, JieWang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In International Symposium on Field Programmable Gate Arrays (FPGA). ACM, 26--35.
[28]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105, 12 (Dec. 2017), 2295--2329.
[29]
Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, New York, New York, USA, 65--74.
[30]
G Venkatesh, E Nurvitadhi, and D Marr. 2017. Accelerating deep convolutional networks using low-precision and sparsity. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2861--2865.
[31]
Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1--6.
[32]
Kazimierz Wiatr and Ernest Jamro. 2000. Constant coefficient multiplication in FPGA structures. In Euromicro Workshop on Multimedia and Telecommunications. IEEE Comput. Soc, 252--259.
[33]
Kazimierz Wiatr and Ernest Jamro. 2001. Implementation of Multipliers in FPGA Structures. International Symposium on Quality Electronic Design (2001), 415--420.
[34]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In International Symposium on Field Programmable Gate Arrays (FPGA). ACM, 161--170.
[35]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. 2016. Trained Ternary Quantization. In International Conference on Learning Representations. 1--10.

Cited By

View all
  • (2023)VLSI-Friendly Filtering Algorithms for Deep Neural NetworksApplied Sciences10.3390/app1315900413:15(9004)Online publication date: 6-Aug-2023
  • (2023)A Memory Efficient Run-time Re-configurable Convolution IP Core for Deep Neural Networks Inference on FPGA Devices2023 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES58672.2023.00091(409-412)Online publication date: 18-Dec-2023
  • (2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2019
360 pages
ISBN:9781450361378
DOI:10.1145/3289602
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. sop
  2. sum of product

Qualifiers

  • Research-article

Conference

FPGA '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)8
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)VLSI-Friendly Filtering Algorithms for Deep Neural NetworksApplied Sciences10.3390/app1315900413:15(9004)Online publication date: 6-Aug-2023
  • (2023)A Memory Efficient Run-time Re-configurable Convolution IP Core for Deep Neural Networks Inference on FPGA Devices2023 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES58672.2023.00091(409-412)Online publication date: 18-Dec-2023
  • (2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
  • (2023)Field Programmable Gate ArraysApplication-Specific Arithmetic10.1007/978-3-031-42808-1_4(87-100)Online publication date: 23-Aug-2023
  • (2023)Multiplication by ConstantsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_12(365-426)Online publication date: 23-Aug-2023
  • (2022)FPGA-based Acceleration of Time Series Similarity Prediction: From Cloud to EdgeACM Transactions on Reconfigurable Technology and Systems10.1145/355581016:1(1-27)Online publication date: 22-Dec-2022
  • (2021)Accelerating Neural Network Inference on FPGA-Based Platforms—A SurveyElectronics10.3390/electronics1009102510:9(1025)Online publication date: 25-Apr-2021
  • (2021)Fast Algorithms for Quaternion-Valued Convolutional Neural NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.297968232:1(457-462)Online publication date: Jan-2021
  • (2021)MAFIA: Machine Learning Acceleration on FPGAs for IoT Applications2021 31st International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL53798.2021.00067(347-354)Online publication date: Aug-2021
  • (2021)FA-LAMP: FPGA-Accelerated Learned Approximate Matrix Profile for Time Series Similarity Prediction2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00013(40-49)Online publication date: May-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media