More Web Proxy on the site http://driver.im/

research-article

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Authors:

Kamel Abdelouahab,

Cédric Bourrasset,

François Berry,

Jean-Charles Quinton,

Jocelyn SerotAuthors Info & Claims

ICDSC '16: Proceedings of the 10th International Conference on Distributed Smart Camera

Pages 69 - 75

https://doi.org/10.1145/2967413.2967430

Published: 12 September 2016 Publication History

Abstract

Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, Convolutional Neural Networks (CNNs) are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on Field-Programmable Gate Arrays (FPGAs). Unfortunately, FPGA resources such as logic elements or Digital Signal Processing (DSP) units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on FPGA. This method was tested when implementing a reconfigurable Optical Character Recognition (OCR) convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 × 256 resolution using 8 % of the available DSP blocks.

References

[1]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. 2012.

[2]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.

[3]

Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S. Chung. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research, Feb 2015.

[4]

C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In CVPRW'11,IEEE Computer Society Conference.

[5]

Altera. FPGAs Achieve Compelling Performance-per-Watt in Cloud Data Center Acceleration Using CNN Algorithms, 2015.

[6]

G. Lacey, G. W. Taylor, and Areibi. Deep Learning on FPGAs: Past, Present, and Future. ArXiv e-prints, 2016.

[7]

J. Cloutier, E. Cosatto, and S. Pigeon. Vip: an fpga-based processor for image processing and neural networks. In Microelectronics for Neural Network, 1996.

Digital Library

[8]

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. A dynamically configurable coprocessor for convolutional neural networks. ACM- SIGARCH Comput. Archit. News.

Digital Library

[9]

M. Peemen, A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In ICCD, 2013 IEEE.

[10]

C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In FPL International Conference on, 2009.

[11]

R. Collobert. Torch. NIPS Workshop on Machine Learning Open Source Software, 2008.

[12]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. FPGA, 2015.

Digital Library

[13]

Frédéric Bastien, Pascal Lamblin, and Goodfellow. Theano: new features and speed improvements. NIPS 2012 Workshop.

[14]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, and Long. Caffe: Convolutional architecture for fast feature embedding. arXiv, 2014.

[15]

Jack B. Dennis and David P. Misunas. A preliminary architecture for a basic data-flow processor. ISCA '75. ACM.

Digital Library

[16]

Martin Abadi and al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

[17]

Hong-Phuc Trinh & Marc Duranton & Michel Paindavoine. Efficient data encoding for convolutional neural network application. ACM (TACO), 2015.

Digital Library

[18]

Anwar. S & Kyuyeon Hwang & Wonyong Sung. Fixed point optimization of deep convolutional neural networks for object recognition. ICASSP, 2015 IEEE International Conference, 2015.

[19]

Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. A 240 g-ops/s mobile coprocessor for deep neural networks. In The IEEE (CVPR) Workshops, June 2014.

Digital Library

[20]

J. Sérot and F. Berry. High-level dataflow programming for reconfigurable computing. In Computer Architecture and High Performance Computing Workshop, 2014.

Digital Library

Cited By

Mohd BAhmad Yousef KAlMajali AHayajneh T(2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
https://doi.org/10.3390/electronics13091727
Goel SKedia RSen RBalakrishnan M(2024)EXPRESS: A Framework for Execution Time Prediction of Concurrent CNNs on Xilinx DPU AcceleratorACM Transactions on Embedded Computing Systems10.1145/369783524:1(1-31)Online publication date: 3-Oct-2024
https://dl.acm.org/doi/10.1145/3697835
Chandrasekaran SChandran SSelvam I(2024)FPGA‐Based Implementation of Real‐Time Cardiologist‐Level Arrhythmia Detection and Classification in Electrocardiograms Using Novel Deep LearningInternational Journal of Circuit Theory and Applications10.1002/cta.4289Online publication date: 29-Sep-2024
https://doi.org/10.1002/cta.4289
Show More Cited By

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster
ISLPED '16: Proceedings of the 2016 International Symposium on Low Power Electronics and Design

Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties ...
An FPGA implementation for neural networks with the FDFM processor core approach

This paper presents a field programmable gate array FPGA implementation of a three-layer perceptron using the few DSP blocks and few block RAMs FDFM approach implemented in the Xilinx Virtex-6 family FPGA. In the FDFM approach, multiple processor cores ...
Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA
Special Issue on Deep learning on FPGAs

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICDSC '16: Proceedings of the 10th International Conference on Distributed Smart Camera

September 2016

242 pages

ISBN:9781450347860

DOI:10.1145/2967413

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICDSC '16

ICDSC '16: 10th international conference on distributed smart camera

September 12 - 15, 2016

Paris, France

Acceptance Rates

Overall Acceptance Rate 92 of 117 submissions, 79%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
233
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mohd BAhmad Yousef KAlMajali AHayajneh T(2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
https://doi.org/10.3390/electronics13091727
Goel SKedia RSen RBalakrishnan M(2024)EXPRESS: A Framework for Execution Time Prediction of Concurrent CNNs on Xilinx DPU AcceleratorACM Transactions on Embedded Computing Systems10.1145/369783524:1(1-31)Online publication date: 3-Oct-2024
https://dl.acm.org/doi/10.1145/3697835
Chandrasekaran SChandran SSelvam I(2024)FPGA‐Based Implementation of Real‐Time Cardiologist‐Level Arrhythmia Detection and Classification in Electrocardiograms Using Novel Deep LearningInternational Journal of Circuit Theory and Applications10.1002/cta.4289Online publication date: 29-Sep-2024
https://doi.org/10.1002/cta.4289
Pacini TRapuano EFanucci L(2023)FPG-AI: A Technology-Independent Framework for the Automation of CNN Deployment on FPGAsIEEE Access10.1109/ACCESS.2023.326339211(32759-32775)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3263392
Feng LLiu WGuo CTang KZhuo CWang Z(2022)GANDSE: Generative Adversarial Network-based Design Space Exploration for Neural Network Accelerator DesignACM Transactions on Design Automation of Electronic Systems10.1145/357092628:3(1-20)Online publication date: 9-Nov-2022
https://dl.acm.org/doi/10.1145/3570926
Rahimifar MGranger CWingering QGouin-Ferland BRahali HCorbeil Therrien A(2022)A Survey of Machine Learning to FPGA Tool-Flows for Instrumentation2022 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)10.1109/NSS/MIC44845.2022.10399017(1-4)Online publication date: 5-Nov-2022
https://doi.org/10.1109/NSS/MIC44845.2022.10399017
Philip NSivamangai N(2022)Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks2022 6th International Conference on Devices, Circuits and Systems (ICDCS)10.1109/ICDCS54290.2022.9780689(183-189)Online publication date: 21-Apr-2022
https://doi.org/10.1109/ICDCS54290.2022.9780689
Cho MKim Y(2021)FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate UnitElectronics10.3390/electronics1022285910:22(2859)Online publication date: 19-Nov-2021
https://doi.org/10.3390/electronics10222859
Al Koutayni MRybalkin VMalik JElhayek AWeis CReis GWehn NStricker D(2020)Real-Time Energy Efficient Hand Pose Estimation: A Case StudySensors10.3390/s2010282820:10(2828)Online publication date: 16-May-2020
https://doi.org/10.3390/s20102828
Goel SKedia RBalakrishnan MSen R(2020)INFER: INterFerence-aware Estimation of Runtime for Concurrent CNN Execution on DPUs2020 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT51103.2020.00018(66-71)Online publication date: Dec-2020
https://doi.org/10.1109/ICFPT51103.2020.00018
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents