[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2967413.2967430acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdscConference Proceedingsconference-collections
research-article

A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

Published: 12 September 2016 Publication History

Abstract

Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, Convolutional Neural Networks (CNNs) are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on Field-Programmable Gate Arrays (FPGAs). Unfortunately, FPGA resources such as logic elements or Digital Signal Processing (DSP) units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on FPGA. This method was tested when implementing a reconfigurable Optical Character Recognition (OCR) convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 × 256 resolution using 8 % of the available DSP blocks.

References

[1]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. 2012.
[2]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.
[3]
Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S. Chung. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research, Feb 2015.
[4]
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In CVPRW'11,IEEE Computer Society Conference.
[5]
Altera. FPGAs Achieve Compelling Performance-per-Watt in Cloud Data Center Acceleration Using CNN Algorithms, 2015.
[6]
G. Lacey, G. W. Taylor, and Areibi. Deep Learning on FPGAs: Past, Present, and Future. ArXiv e-prints, 2016.
[7]
J. Cloutier, E. Cosatto, and S. Pigeon. Vip: an fpga-based processor for image processing and neural networks. In Microelectronics for Neural Network, 1996.
[8]
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. A dynamically configurable coprocessor for convolutional neural networks. ACM- SIGARCH Comput. Archit. News.
[9]
M. Peemen, A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In ICCD, 2013 IEEE.
[10]
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In FPL International Conference on, 2009.
[11]
R. Collobert. Torch. NIPS Workshop on Machine Learning Open Source Software, 2008.
[12]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. FPGA, 2015.
[13]
Frédéric Bastien, Pascal Lamblin, and Goodfellow. Theano: new features and speed improvements. NIPS 2012 Workshop.
[14]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, and Long. Caffe: Convolutional architecture for fast feature embedding. arXiv, 2014.
[15]
Jack B. Dennis and David P. Misunas. A preliminary architecture for a basic data-flow processor. ISCA '75. ACM.
[16]
Martin Abadi and al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[17]
Hong-Phuc Trinh & Marc Duranton & Michel Paindavoine. Efficient data encoding for convolutional neural network application. ACM (TACO), 2015.
[18]
Anwar. S & Kyuyeon Hwang & Wonyong Sung. Fixed point optimization of deep convolutional neural networks for object recognition. ICASSP, 2015 IEEE International Conference, 2015.
[19]
Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. A 240 g-ops/s mobile coprocessor for deep neural networks. In The IEEE (CVPR) Workshops, June 2014.
[20]
J. Sérot and F. Berry. High-level dataflow programming for reconfigurable computing. In Computer Architecture and High Performance Computing Workshop, 2014.

Cited By

View all
  • (2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
  • (2024)EXPRESS: A Framework for Execution Time Prediction of Concurrent CNNs on Xilinx DPU AcceleratorACM Transactions on Embedded Computing Systems10.1145/369783524:1(1-31)Online publication date: 3-Oct-2024
  • (2024)FPGA‐Based Implementation of Real‐Time Cardiologist‐Level Arrhythmia Detection and Classification in Electrocardiograms Using Novel Deep LearningInternational Journal of Circuit Theory and Applications10.1002/cta.4289Online publication date: 29-Sep-2024
  • Show More Cited By
  1. A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICDSC '16: Proceedings of the 10th International Conference on Distributed Smart Camera
    September 2016
    242 pages
    ISBN:9781450347860
    DOI:10.1145/2967413
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 September 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICDSC '16

    Acceptance Rates

    Overall Acceptance Rate 92 of 117 submissions, 79%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
    • (2024)EXPRESS: A Framework for Execution Time Prediction of Concurrent CNNs on Xilinx DPU AcceleratorACM Transactions on Embedded Computing Systems10.1145/369783524:1(1-31)Online publication date: 3-Oct-2024
    • (2024)FPGA‐Based Implementation of Real‐Time Cardiologist‐Level Arrhythmia Detection and Classification in Electrocardiograms Using Novel Deep LearningInternational Journal of Circuit Theory and Applications10.1002/cta.4289Online publication date: 29-Sep-2024
    • (2023)FPG-AI: A Technology-Independent Framework for the Automation of CNN Deployment on FPGAsIEEE Access10.1109/ACCESS.2023.326339211(32759-32775)Online publication date: 2023
    • (2022)GANDSE: Generative Adversarial Network-based Design Space Exploration for Neural Network Accelerator DesignACM Transactions on Design Automation of Electronic Systems10.1145/357092628:3(1-20)Online publication date: 9-Nov-2022
    • (2022)A Survey of Machine Learning to FPGA Tool-Flows for Instrumentation2022 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)10.1109/NSS/MIC44845.2022.10399017(1-4)Online publication date: 5-Nov-2022
    • (2022)Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks2022 6th International Conference on Devices, Circuits and Systems (ICDCS)10.1109/ICDCS54290.2022.9780689(183-189)Online publication date: 21-Apr-2022
    • (2021)FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate UnitElectronics10.3390/electronics1022285910:22(2859)Online publication date: 19-Nov-2021
    • (2020)Real-Time Energy Efficient Hand Pose Estimation: A Case StudySensors10.3390/s2010282820:10(2828)Online publication date: 16-May-2020
    • (2020)INFER: INterFerence-aware Estimation of Runtime for Concurrent CNN Execution on DPUs2020 International Conference on Field-Programmable Technology (ICFPT)10.1109/ICFPT51103.2020.00018(66-71)Online publication date: Dec-2020
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media