More Web Proxy on the site http://driver.im/

research-article

Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs

Authors:

Jason CongAuthors Info & Claims

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Article No.: 29, Pages 1 - 6

https://doi.org/10.1145/3061639.3062207

Published: 18 June 2017 Publication History

Abstract

Convolutional neural networks (CNNs) have been widely applied in many deep learning applications. In recent years, the FPGA implementation for CNNs has attracted much attention because of its high performance and energy efficiency. However, existing implementations have difficulty to fully leverage the computation power of the latest FPGAs. In this paper we implement CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization. We provide an analytical model for performance and resource utilization and develop an automatic design space exploration framework, as well as source-to-source code transformation from a C program to a CNN implementation using systolic array. The experimental results show that our framework is able to generate the accelerator for real-life CNN models, achieving up to 461 GFlops for floating point data type and 1.2 Tops for 8-16 bit fixed point.

References

[1]

S. Cadambi et al., "A Programmable Parallel Accelerator for Learning and Classification," in PACT, 2010.

Digital Library

[2]

M. Sankaradas et al., "A Massively Parallel Coprocessor for Convolutional Neural Networks," in ASAP, 2009.

Digital Library

[3]

S. Chakradhar et al., "A Dynamically Configurable Coprocessor for Convolutional Neural Networks," ISCA, 2010.

Digital Library

[4]

C. Farabet et al., "CNP: An FPGA-based processor for Convolutional Networks," in FPL, 2009.

[5]

M. Peemen et al., "Memory-centric accelerator design for Convolutional Neural Networks," in ICCD, 2013.

[6]

C. Zhang et al., "Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks," in FPGA, 2015.

Digital Library

[7]

N. Suda et al., "Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks," in FPGA, 2016.

Digital Library

[8]

S. I. Venieris et al., "fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs," in FCCM, 2016.

[9]

J. Qiu et al., "Going Deeper with Embedded FPGA Platform for Convolutional Neural Network," in FPGA, 2016.

Digital Library

[10]

C. Zhang et al., "Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks," in ICCAD, 2016.

Digital Library

[11]

Y. Ma et al., "Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks," in FPGA, 2017.

Digital Library

[12]

Intel Arria 10.

[13]

Xilinx Ultrascale Architecture.

[14]

H. T. Kung et al., Algorithms for VLSI Processor Arrays, 1979.

[15]

J. Wang et al., "Customizable and High Performance Matrix Multiplication Kernel on FPGA," in FPGA, 2015.

Digital Library

[16]

A. C. Jacob et al., "Design of Throughput-Optimized Arrays from Recurrence Abstractions," in ASAP, 2010.

[17]

U. Aydonat et al., "An OpenCL Deep Learning Accelerator on Arria 10," in FPGA, 2017.

Digital Library

[18]

A. Krizhevsky et al., "ImageNet Classification with Deep Convolutional Neural Networks," in NIPS, 2012.

Digital Library

[19]

K. Simonyan et al., "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv, 2014.

[20]

C. Szegedy et al., "Going Deeper with Convolutions," arXiv, 2014.

[21]

Y. Jia et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," in MM, 2014.

Digital Library

[22]

D. L. Kuck, Structure of Computers and Computations. John Wiley & Sons, Inc., 1978.

Digital Library

[23]

S. Verdoolaege, "Isl: An Integer Set Library for the Polyhedral Model," in ICMS, 2010.

Digital Library

[24]

Intel SDK for OpenCL Applications.

[25]

ROSE Compiler Infrastructure.

[26]

J. Zhang et al., "Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network," in FPGA, 2017.

Digital Library

[27]

S. Winograd, Arithmetic Complexity of Computations, 1980.

[28]

C. Zhang et al., "Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System," in FPGA, 2017.

Digital Library

[29]

L. Lu et al., "Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs," in FCCM, 2017.

Cited By

Lu HSu LHuang S(2024)Highly Fault-Tolerant Systolic-Array-Based Matrix MultiplicationElectronics10.3390/electronics1309178013:9(1780)Online publication date: 5-May-2024
https://doi.org/10.3390/electronics13091780
Lee DAliev HJunaid MPark SKim HLee KSim S(2024)High-Speed CNN Accelerator SoC Design Based on a Flexible Diagonal Cyclic ArrayElectronics10.3390/electronics1308156413:8(1564)Online publication date: 19-Apr-2024
https://doi.org/10.3390/electronics13081564
Yu ZBouganis C(2024)Auto WS: Automate Weights Streaming in Layer-Wise Pipelined DNN Accelerators2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546621(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546621
Show More Cited By

Recommendations

Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks

The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control ...
Automated hardware generation of CNN models on FPGAs: late breaking results
DAC '20: Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference

In this paper, we propose an automated framework that takes as input a TensorFlow inference graph and generates high-performance accelerators on FPGA by assembling CNN pre-implemented components as a puzzle, based on the graph topology. Using pre-...
A Framework for Generating High Throughput CNN Implementations on FPGAs
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

We propose a framework to generate highly efficient accelerators for inferencing on FPGAs. Our framework consists of multiple algorithmic optimizations for computation complexity and communication volume reduction, a mapping methodology for efficient ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

June 2017

533 pages

ISBN:9781450349277

DOI:10.1145/3061639

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '17

Sponsor:

EDAC
SIGDA

DAC '17: The 54th Annual Design Automation Conference 2017

June 18 - 22, 2017

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

293
Total Citations
View Citations
4,366
Total Downloads

Downloads (Last 12 months)455
Downloads (Last 6 weeks)65

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lu HSu LHuang S(2024)Highly Fault-Tolerant Systolic-Array-Based Matrix MultiplicationElectronics10.3390/electronics1309178013:9(1780)Online publication date: 5-May-2024
https://doi.org/10.3390/electronics13091780
Lee DAliev HJunaid MPark SKim HLee KSim S(2024)High-Speed CNN Accelerator SoC Design Based on a Flexible Diagonal Cyclic ArrayElectronics10.3390/electronics1308156413:8(1564)Online publication date: 19-Apr-2024
https://doi.org/10.3390/electronics13081564
Yu ZBouganis C(2024)Auto WS: Automate Weights Streaming in Layer-Wise Pipelined DNN Accelerators2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546621(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546621
Sun RNi YHe XZhao JZou A(2024)ONE-SA: Enabling Nonlinear Operations in Systolic Arrays For Efficient and Flexible Neural Network Inference2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546535(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546535
Liu HGalindo MXie HWong LShuai HLi YCheng W(2024)Lightweight Deep Learning for Resource-Constrained Environments: A SurveyACM Computing Surveys10.1145/365728256:10(1-42)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3657282
Qararyah FAzhar MTrancoso P(2024)An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNsACM Transactions on Architecture and Code Optimization10.1145/363982321:2(1-26)Online publication date: 8-Jan-2024
https://dl.acm.org/doi/10.1145/3639823
Ueno TDel Sozzo ESano K(2024)Flexible Systolic Array Platform on Virtual 2-D Multi-FPGA PlaneProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3635035.3637285(84-94)Online publication date: 18-Jan-2024
https://dl.acm.org/doi/10.1145/3635035.3637285
Hao XRong HZhang MSun CJiang HLiang YZhang ZPutnam A(2024)POPA: Expressing High and Portable Performance across Spatial and Vector Architectures for Tensor ComputationsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637566(199-210)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637566
Ji MAl-Ars ZChang YZhang B(2024)Fully Pipelined FPGA Acceleration of Binary Convolutional Neural Networks with Neural Architecture SearchJournal of Circuits, Systems and Computers10.1142/S021812662450170633:10Online publication date: 14-Feb-2024
https://doi.org/10.1142/S0218126624501706
Cheng LGu YLiu QYang LLiu CWang Y(2024)Advancements in Accelerating Deep Neural Network Inference on AIoT Devices: A SurveyIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33531769:6(830-847)Online publication date: Nov-2024
https://doi.org/10.1109/TSUSC.2024.3353176
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents