[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

RiSA: A Reinforced Systolic Array for Depthwise Convolutions and Embedded Tensor Reshaping

Published: 17 September 2021 Publication History

Abstract

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads.
In addition, the PEs in systolic arrays can be efficiently used only if the data items (tensors) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency.
Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.

References

[1]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA).
[2]
Yu Hsin Chen, Tien Ju Yang, Joel S. Emer, and Vivienne Sze. 2019. Eyeriss v2: A Flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (2019), 292–308.
[3]
Hyungmin Cho, Pyeongseok Oh, Jiyoung Park, Wookeun Jung, and Jaejin Lee. 2019. FA3C: FPGA-accelerated deep reinforcement learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).
[5]
Hasan Genc et al. 2019. Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures. arXiv:1911.09925 (2019).
[6]
Sumanth Gudaparthi et al. 2019. Wire-aware architecture and dataflow for CNN accelerators. In Proceedings of the Annual International Symposium on Microarchitecture (MICRO).
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
Andrew Howard et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[9]
Benoit Jacob et al. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
[10]
Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA).
[11]
H. T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization. In Proccedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). https://doi.org/10.1145/3297858.3304028
[12]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of DNN dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13]
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2017. Rethinking NoCs for spatial neural network accelerators. In Proceedings of the Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS).
[14]
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling flexible dataflow mapping over DNN accelerators via programmable interconnects. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operation Systems (ASPLOS).
[15]
Renjie Liu. 2020. Higher accuracy on vision models with EfficientNet-Lite. https://blog.tensorflow.org/2020/03/higher-accuracy-on-vision-models -with-efficientnet-lite.html.
[16]
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible dataflow accelerator architecture for convolutional neural networks. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA).
[17]
NVIDIA. 2017. NVDLA Deep Learning Accelerator.http://nvdla.org.
[18]
Dianne P. O’Leary. 1987. Systolic arrays for matrix transpose and other reorderings. IEEE Trans. Comput. 36, 1 (Jan. 1987), 117–122.
[19]
Junhao Pan and Deming Chen. 2021. Accelerate Non-Unit Stride Convolutions with Winograd Algorithms.
[20]
Angshuman Parashar et al. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the International Symposium on Computer Architecture (ISCA).
[21]
Eric Qin et al. 2020. SIGMA: A sparse and irregular GEMM accelerator with flexible interconnects for DNN training. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA). https://doi.org/10.1109/HPCA47549.2020.00015
[22]
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN accelerator simulator. arXiv:1811.02883 (2018).
[23]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR).
[24]
Yakun Sophia Shao and Others. 2019. Simba: Scaling deep-learning inference with multi-chip-module-based architecture. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (ICML).
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 17th International Conference on Neural Information Processing Systems (NIPS).
[27]
Xuechao Wei et al. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In Proceedings of the 54th Annual Design Automation Conference (DAC). https://doi.org/10.1145/3061639.3062207
[28]
Juan Yepez and Seok-Bum Ko. 2020. Stride 2 1-D, 2-D, and 3-D winograd for convolutional neural networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 4 (2020), 853–863.

Cited By

View all
  • (2024)Frequency-Domain and Spatial-Domain MLMVN-Based Convolutional Neural NetworksAlgorithms10.3390/a1708036117:8(361)Online publication date: 17-Aug-2024
  • (2024)BiRD: Bi-Directional Input Reuse Dataflow for Enhancing Depthwise Convolution Performance on Systolic ArraysIEEE Transactions on Computers10.1109/TC.2024.344910373:12(2708-2721)Online publication date: Dec-2024
  • (2023)A Survey of Design and Optimization for Systolic Array-based DNN AcceleratorsACM Computing Surveys10.1145/360480256:1(1-37)Online publication date: 25-Aug-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
October 2021
1367 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3481713
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 17 September 2021
Accepted: 01 July 2021
Revised: 01 June 2021
Received: 01 April 2021
Published in TECS Volume 20, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Accelerators
  2. deep neural networks
  3. depthwise convolution

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Institute of Information & communications Technology Planning & Evaluation (IITP)
  • Korea government(MSIT), Research on CPU vulnerability detection and validation, Development of high speed encrypt ion data processing technology that guarantees privacy based hardware
  • IC Design Education Center(IDEC), Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)183
  • Downloads (Last 6 weeks)20
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Frequency-Domain and Spatial-Domain MLMVN-Based Convolutional Neural NetworksAlgorithms10.3390/a1708036117:8(361)Online publication date: 17-Aug-2024
  • (2024)BiRD: Bi-Directional Input Reuse Dataflow for Enhancing Depthwise Convolution Performance on Systolic ArraysIEEE Transactions on Computers10.1109/TC.2024.344910373:12(2708-2721)Online publication date: Dec-2024
  • (2023)A Survey of Design and Optimization for Systolic Array-based DNN AcceleratorsACM Computing Surveys10.1145/360480256:1(1-37)Online publication date: 25-Aug-2023
  • (2023)Accelerating Attention Mechanism on FPGAs based on Efficient Reconfigurable Systolic ArrayACM Transactions on Embedded Computing Systems10.1145/354993722:6(1-22)Online publication date: 9-Nov-2023
  • (2023)A High-Throughput Full-Dataflow MobileNetv2 Accelerator on Edge FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319824642:5(1532-1545)Online publication date: 1-May-2023
  • (2023)Morphable CIM: Improving Operation Intensity and Depthwise Capability for SRAM-CIM Architecture2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247750(1-6)Online publication date: 9-Jul-2023
  • (2022)FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00067(846-860)Online publication date: Apr-2022
  • (2022)U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture SearchComputer Vision – ECCV 202210.1007/978-3-031-19775-8_11(173-190)Online publication date: 23-Oct-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media