[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3061639.3062207acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs

Published: 18 June 2017 Publication History

Abstract

Convolutional neural networks (CNNs) have been widely applied in many deep learning applications. In recent years, the FPGA implementation for CNNs has attracted much attention because of its high performance and energy efficiency. However, existing implementations have difficulty to fully leverage the computation power of the latest FPGAs. In this paper we implement CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization. We provide an analytical model for performance and resource utilization and develop an automatic design space exploration framework, as well as source-to-source code transformation from a C program to a CNN implementation using systolic array. The experimental results show that our framework is able to generate the accelerator for real-life CNN models, achieving up to 461 GFlops for floating point data type and 1.2 Tops for 8-16 bit fixed point.

References

[1]
S. Cadambi et al., "A Programmable Parallel Accelerator for Learning and Classification," in PACT, 2010.
[2]
M. Sankaradas et al., "A Massively Parallel Coprocessor for Convolutional Neural Networks," in ASAP, 2009.
[3]
S. Chakradhar et al., "A Dynamically Configurable Coprocessor for Convolutional Neural Networks," ISCA, 2010.
[4]
C. Farabet et al., "CNP: An FPGA-based processor for Convolutional Networks," in FPL, 2009.
[5]
M. Peemen et al., "Memory-centric accelerator design for Convolutional Neural Networks," in ICCD, 2013.
[6]
C. Zhang et al., "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in FPGA, 2015.
[7]
N. Suda et al., "Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks," in FPGA, 2016.
[8]
S. I. Venieris et al., "fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs," in FCCM, 2016.
[9]
J. Qiu et al., "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network," in FPGA, 2016.
[10]
C. Zhang et al., "Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks," in ICCAD, 2016.
[11]
Y. Ma et al., "Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks," in FPGA, 2017.
[12]
Intel Arria 10.
[13]
Xilinx Ultrascale Architecture.
[14]
H. T. Kung et al., Algorithms for VLSI Processor Arrays, 1979.
[15]
J. Wang et al., "Customizable and High Performance Matrix Multiplication Kernel on FPGA," in FPGA, 2015.
[16]
A. C. Jacob et al., "Design of Throughput-Optimized Arrays from Recurrence Abstractions," in ASAP, 2010.
[17]
U. Aydonat et al., "An OpenCL Deep Learning Accelerator on Arria 10," in FPGA, 2017.
[18]
A. Krizhevsky et al., "ImageNet Classification with Deep Convolutional Neural Networks," in NIPS, 2012.
[19]
K. Simonyan et al., "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv, 2014.
[20]
C. Szegedy et al., "Going Deeper with Convolutions," arXiv, 2014.
[21]
Y. Jia et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," in MM, 2014.
[22]
D. L. Kuck, Structure of Computers and Computations. John Wiley & Sons, Inc., 1978.
[23]
S. Verdoolaege, "Isl: An Integer Set Library for the Polyhedral Model," in ICMS, 2010.
[24]
Intel SDK for OpenCL Applications.
[25]
ROSE Compiler Infrastructure.
[26]
J. Zhang et al., "Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network," in FPGA, 2017.
[27]
S. Winograd, Arithmetic Complexity of Computations, 1980.
[28]
C. Zhang et al., "Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System," in FPGA, 2017.
[29]
L. Lu et al., "Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs," in FCCM, 2017.

Cited By

View all
  • (2024)Highly Fault-Tolerant Systolic-Array-Based Matrix MultiplicationElectronics10.3390/electronics1309178013:9(1780)Online publication date: 5-May-2024
  • (2024)High-Speed CNN Accelerator SoC Design Based on a Flexible Diagonal Cyclic ArrayElectronics10.3390/electronics1308156413:8(1564)Online publication date: 19-Apr-2024
  • (2024)Auto WS: Automate Weights Streaming in Layer-Wise Pipelined DNN Accelerators2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546621(1-6)Online publication date: 25-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
June 2017
533 pages
ISBN:9781450349277
DOI:10.1145/3061639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DAC '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)455
  • Downloads (Last 6 weeks)65
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Highly Fault-Tolerant Systolic-Array-Based Matrix MultiplicationElectronics10.3390/electronics1309178013:9(1780)Online publication date: 5-May-2024
  • (2024)High-Speed CNN Accelerator SoC Design Based on a Flexible Diagonal Cyclic ArrayElectronics10.3390/electronics1308156413:8(1564)Online publication date: 19-Apr-2024
  • (2024)Auto WS: Automate Weights Streaming in Layer-Wise Pipelined DNN Accelerators2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546621(1-6)Online publication date: 25-Mar-2024
  • (2024)ONE-SA: Enabling Nonlinear Operations in Systolic Arrays For Efficient and Flexible Neural Network Inference2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546535(1-6)Online publication date: 25-Mar-2024
  • (2024)Lightweight Deep Learning for Resource-Constrained Environments: A SurveyACM Computing Surveys10.1145/365728256:10(1-42)Online publication date: 24-Jun-2024
  • (2024)An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNsACM Transactions on Architecture and Code Optimization10.1145/363982321:2(1-26)Online publication date: 8-Jan-2024
  • (2024)Flexible Systolic Array Platform on Virtual 2-D Multi-FPGA PlaneProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3637285(84-94)Online publication date: 18-Jan-2024
  • (2024)POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor ComputationsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637566(199-210)Online publication date: 1-Apr-2024
  • (2024)Fully Pipelined FPGA Acceleration of Binary Convolutional Neural Networks with Neural Architecture SearchJournal of Circuits, Systems and Computers10.1142/S021812662450170633:10Online publication date: 14-Feb-2024
  • (2024)Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A SurveyIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33531769:6(830-847)Online publication date: Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media