More Web Proxy on the site http://driver.im/

research-article

Reconfigurable Convolutional Kernels for Neural Networks on FPGAs

Authors:

Martin Hardieck,

Konrad Möller,

Peter ZipfAuthors Info & Claims

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 43 - 52

https://doi.org/10.1145/3289602.3293905

Published: 20 February 2019 Publication History

Abstract

Convolutional neural networks (CNNs) gained great success in machine learning applications and much attention was paid to their acceleration on field programmable gate arrays (FPGAs). The most demanding computational complexity of CNNs is found in the convolutional layers, which account for 90% of the total operations. The fact that parameters in convolutional layers do not change over a long time interval in weight stationary CNNs allows the use of reconfiguration to reduce the resource requirements. This work proposes several alternative reconfiguration schemes that significantly reduce the complexity of sum-of-products operations. The proposed direct configuration schemes provide the least resource requirements and fast reconfiguration times of 32 clock cycles but require additional memory for the pre-computed configurations. The proposed online reconfiguration scheme uses an online computation of the LUT contents to avoid this memory overhead. Finally, a scheme that duplicates the reconfigurable LUTs is proposed for which the reconfiguration time can be completely hidden in the computation time. Combined with a few online reconfiguration circuits, this provides the same configuration memory and configuration time as a conventional parallel kernel but offers large resource reductions of up to 80% of the LUTs.

References

[1]

Marco Bettoni, Gianvito Urgese, Yuki Kobayashi, Enrico Macii, and Andrea Acquaviva. 2017. A Convolutional Neural Network Fully Implemented on FPGA for Embedded Platforms. In New Generation of CAS (NGCAS). IEEE, 49--52.

[2]

Nicolas Brunie, Florent de Dinechin, Matei Istoan, Guillaume Sergent, Kinga Illyes, and Bogdan Popa. 2013. Arithmetic Core Generation Using Bit Heaps. In IEEE International Conference on Field Programmable Logic and Application (FPL). 1--8.

[3]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A Dynamically Configurable Coprocessor for Convolutional Neural Networks. ACM SIGARCH Computer Architecture News 38, 3 (June 2010), 247--257.

Digital Library

[4]

Ken Chapman. 1996. Constant Coefficient Multipliers for the XC4000E. Xilinx Application Note (1996), 1--8.

[5]

K D Chapman. 1994. Fast Integer Multipliers Fit in FPGAs. Electronic Design News (1994).

[6]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv.org (Feb. 2016), 1--11. arXiv:cs.LG/1602.02830v3

[7]

Florent de Dinechin. (accessed October 1, 2018). FloPoCo Project Website. http: //flopoco.gforge.inria.fr

[8]

Florent de Dinechin, Matei Istoan, and Abdelbassat Massouri. 2014. Sum-of- Product Architectures Computing Just Right. IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP) (2014), 41--47.

[9]

F. de Dinechin and B. Pasca. 2012. Custom Arithmetic Datapath Design for FPGAs using the FloPoCo Core Generator. IEEE Design & Test of Computers 99 (2012), 1--6.

[10]

T J Dekker. 1971. A Floating-Point Technique for Extending the Available Precision. Numer. Math. 18, 3 (June 1971), 224--242.

Digital Library

[11]

Roberto DiCecco, Griffin Lacey, Jasmina Vasiljevic, Paul Chow, Graham Taylor, and Shawki Areibi. 2016. Caffeinated FPGAs: FPGA framework For Convolutional Neural Networks. In 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 265--268.

[12]

S Gupta, A Agrawal, K Gopalakrishnan Conference on Machine, and . 2015. Deep learning with limited numerical precision. International Conference on Machine Learning (2015), 1737--1746.

Digital Library

[13]

Philipp Gysel. 2016. Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks. (May 2016), 1--63. arXiv:1605.06402

[14]

Xushen Han, Dajiang Zhou, Shihao Wang, and Shinji Kimura. 2016. CNNMERP: An FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks. In 2016 IEEE 34th International Conference on Computer Design (ICCD. IEEE, 320--327.

[15]

Javier Hormigo, Gabriel Caffarena, Juan P Oliver, and Eduardo Boemo. 2013. Self-Reconfigurable Constant Multiplier for FPGA. ACM Transactions on Reconfigurable Technology and Systems 6, 3 (Oct. 2013), 1--17.

Digital Library

[16]

Martin Kumm and Johannes Kappauf. 2018. Advanced Compressor Tree Synthesis for FPGAs. IEEE Trans. Comput. 67, 8 (2018), 1078--1091.

Digital Library

[17]

M Kumm, K Möller, and P Zipf. 2013. Dynamically Reconfigurable FIR Filter Architectures with Fast Reconfiguration. International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC) (2013), 1--8.

[18]

Martin Kumm, Konrad Möller, and Peter Zipf. 2013. Reconfigurable FIR Filter Using Distributed Arithmetic on FPGAs. In IEEE International Symposium on Circuits and Systems (ISCAS). 2058--2061.

[19]

Martin Kumm and Peter Zipf. 2014. Pipelined Compressor Tree Optimization Using Integer Linear Programming. In IEEE International Conference on Field Programmable Logic and Application (FPL). IEEE, 1--8.

[20]

Shuang Liang, Shouyi Yin, Leibo Liu,Wayne Luk, and ShaojunWei. 2018. FP-BNN: Binarized Neural Network on FPGA. Neurocomputing 275 (2018), 1072--1086.

Digital Library

[21]

Zhiqiang Liu, Yong Dou, Jingfei Jiang, Jinwei Xu, Shijie Li, Yongmei Zhou, and Yingnan Xu. 2017. Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10, 3 (July 2017), 17--23.

Digital Library

[22]

Konrad Möller. 2017. Run-time Reconfigurable Constant Multiplication on Field Programmable Gate Arrays. Ph.D. Dissertation. Kassel University Press.

[23]

Mohammad Motamedi, Philipp Gysel, Venkatesh Akella, and Soheil Ghiasi. 2016. Design space exploration of FPGA-based Deep Convolutional Neural Networks. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC. IEEE, 575--580.

Digital Library

[24]

Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric Chung. 2015. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware. Technical Report. 1--4 pages.

[25]

Hadi Parandeh-Afshar, Philip Brisk, and Paolo Ienne. 2008. Efficient Synthesis of Compressor Trees on FPGAs. In Asia and South Pacific Design Automation Conference (ASPDAC). IEEE, 138--143.

Digital Library

[26]

H. Parandeh-Afshar, Arkosnato Neogy, P. Brisk, and P. Ienne. 2011. Compressor Tree Synthesis on Commercial High-Performance FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 4, 4 (Dec. 2011), 1--19.

Digital Library

[27]

Jiantao Qiu, JieWang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In International Symposium on Field Programmable Gate Arrays (FPGA). ACM, 26--35.

Digital Library

[28]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105, 12 (Dec. 2017), 2295--2329.

[29]

Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In International Symposium on Field-Programmable Gate Arrays (FPGA). ACM, New York, New York, USA, 65--74.

Digital Library

[30]

G Venkatesh, E Nurvitadhi, and D Marr. 2017. Accelerating deep convolutional networks using low-precision and sparsity. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2861--2865.

Digital Library

[31]

Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1--6.

Digital Library

[32]

Kazimierz Wiatr and Ernest Jamro. 2000. Constant coefficient multiplication in FPGA structures. In Euromicro Workshop on Multimedia and Telecommunications. IEEE Comput. Soc, 252--259.

[33]

Kazimierz Wiatr and Ernest Jamro. 2001. Implementation of Multipliers in FPGA Structures. International Symposium on Quality Electronic Design (2001), 415--420.

Digital Library

[34]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In International Symposium on Field Programmable Gate Arrays (FPGA). ACM, 161--170.

Digital Library

[35]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J Dally. 2016. Trained Ternary Quantization. In International Conference on Learning Representations. 1--10.

Cited By

Cariow APapliński JMakowska M(2023)VLSI-Friendly Filtering Algorithms for Deep Neural NetworksApplied Sciences10.3390/app1315900413:15(9004)Online publication date: 6-Aug-2023
https://doi.org/10.3390/app13159004
Swati Sadhukhan RNagar MEngineer P(2023)A Memory Efficient Run-time Re-configurable Convolution IP Core for Deep Neural Networks Inference on FPGA Devices2023 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES58672.2023.00091(409-412)Online publication date: 18-Dec-2023
https://doi.org/10.1109/iSES58672.2023.00091
Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
Show More Cited By

Index Terms

Reconfigurable Convolutional Kernels for Neural Networks on FPGAs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
    2. Reconfigurable logic and FPGAs

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Reconfigurable Hardware Accelerator for Convolution Operations in Convolutional Neural Networks
ICCBN '24: Proceedings of the 2024 12th International Conference on Communications and Broadband Networking

Convolutional neural network (CNN) have significantly advanced image classification, video processing, and pattern recognition. Compared to other hardware deployment platforms, field programmable gate arrays (FPGAs) offer advantages such as ...
A Reconfigurable Accelerator for Sparse Convolutional Neural Networks
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional Neural Networks (CNNs) have been shown to be very useful in image recognition and other AI applications. CNNs are usually computationally intensive. To address the challenge of overwhelming calculation requirements, researchers have ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2019

360 pages

ISBN:9781450361378

DOI:10.1145/3289602

General Chair:
Kia Bazargan
Univ. of Minnesota, USA
,
Program Chair:
Stephen Neuendorffer
Xilinx, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '19

Sponsor:

SIGDA

FPGA '19: The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 24 - 26, 2019

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
962
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)8

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cariow APapliński JMakowska M(2023)VLSI-Friendly Filtering Algorithms for Deep Neural NetworksApplied Sciences10.3390/app1315900413:15(9004)Online publication date: 6-Aug-2023
https://doi.org/10.3390/app13159004
Swati Sadhukhan RNagar MEngineer P(2023)A Memory Efficient Run-time Re-configurable Convolution IP Core for Deep Neural Networks Inference on FPGA Devices2023 IEEE International Symposium on Smart Electronic Systems (iSES)10.1109/iSES58672.2023.00091(409-412)Online publication date: 18-Dec-2023
https://doi.org/10.1109/iSES58672.2023.00091
Shuvo MIslam SCheng JMorshed B(2023)Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A ReviewProceedings of the IEEE10.1109/JPROC.2022.3226481111:1(42-91)Online publication date: Jan-2023
https://doi.org/10.1109/JPROC.2022.3226481
de Dinechin FKumm Mde Dinechin FKumm M(2023)Field Programmable Gate ArraysApplication-Specific Arithmetic10.1007/978-3-031-42808-1_4(87-100)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_4
de Dinechin FKumm Mde Dinechin FKumm M(2023)Multiplication by ConstantsApplication-Specific Arithmetic10.1007/978-3-031-42808-1_12(365-426)Online publication date: 23-Aug-2023
https://doi.org/10.1007/978-3-031-42808-1_12
Kalantar AZimmerman ZBrisk P(2022)FPGA-based Acceleration of Time Series Similarity Prediction: From Cloud to EdgeACM Transactions on Reconfigurable Technology and Systems10.1145/355581016:1(1-27)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3555810
Wu RGuo XDu JLi J(2021)Accelerating Neural Network Inference on FPGA-Based Platforms—A SurveyElectronics10.3390/electronics1009102510:9(1025)Online publication date: 25-Apr-2021
https://doi.org/10.3390/electronics10091025
Cariow ACariowa G(2021)Fast Algorithms for Quaternion-Valued Convolutional Neural NetworksIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2020.297968232:1(457-462)Online publication date: Jan-2021
https://doi.org/10.1109/TNNLS.2020.2979682
Ghanathe NSeshadri VSharma RWilton SKumar A(2021)MAFIA: Machine Learning Acceleration on FPGAs for IoT Applications2021 31st International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL53798.2021.00067(347-354)Online publication date: Aug-2021
https://doi.org/10.1109/FPL53798.2021.00067
Kalantar AZimmerman ZBrisk P(2021)FA-LAMP: FPGA-Accelerated Learned Approximate Matrix Profile for Time Series Similarity Prediction2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00013(40-49)Online publication date: May-2021
https://doi.org/10.1109/FCCM51124.2021.00013
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents