ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors
Abstract
1 Introduction
2 Characterizations On Systolic Array
2.1 Latency on Different DNN Dataflows
Parameters | Description |
---|---|
Ow | The width of output feature map |
Oh | The height of output feature map |
kw | The width of filter |
kh | The height of filter |
F | The number of filters |
C | The number of channels |
N | The width and height of systolic array |
2.2 Latency Analysis on Different Dataflows
2.3 The PE Utilization of Systolic Array
2.4 Memory System for Multiple Tiny Tensors
2.5 Conventional Tensor Mapping on Systolic Architecture
2.6 Tensor Mapping on Spatially Partitioned Systolic Architecture
3 ReSA System Architecture
3.1 Overview of ReSA’s Architecture
3.2 The ReSA Control Command
3.3 The ReSA Host Layer Scheduler
4 Hardware-aware Layer Splitting and Data Mapping
4.1 The ReSA Tensor Splitting Policy
4.2 The ReSA Sub-array Fission
5 ReSA Micro-architecture and Design
5.1 Micro-architecture of the ReSA Heterogeneous Dataflow PE
Weight-stationary | Input-stationary | Output-stationary | |
---|---|---|---|
preload | \(Weight\) | \(Input\) | X |
Input_0 | X | X | \(Weight\) |
Input_1 | \(Partial Sum\) | \(Partial Sum\) | X |
Input_2 | \(Input\) | \(Weight\) | \(Input\) |
Output_0 | X | X | \(Weight\) |
Output_1 | \(Partial Sum^{\prime }\) | \(Partial Sum^{\prime }\) | X |
Output_2 | \(Input\) | \(Weight\) | \(Input\) |
5.2 Micro-architecture of the ReSA Data Path Controller
5.3 The ReSA Distributed On-chip Buffer
5.4 The ReSA Interleaving Memory Access Policy
6 Methodology
Number of PEs | 128 \(\times\) 128 |
Technology of PEs | 45 nm |
PE Clock Frequency | 700 MHz |
Number of PEs on a Sub-array | 32 \(\times\) 32 |
Total Size of On-chip Buffer | 12 MB |
Input/Weight Precision | 16 bits |
Energy per On-chip Memory Access | 0.61 pJ/bit |
Energy per Off-chip Memory Access | 12.5 pJ/bit |
On-chip Memory Bandwidth | 16,384 bits/cycle |
Off-chip Memory Bandwidth | 2,864 bits/cycle |
7 Evaluation
7.1 Performance of DNN Inference
7.2 Analysis of Memory Wait Latency
7.3 Analysis of Energy Efficiency
7.4 Analysis of Different ReSA’s Sub-array Size
7.5 Analysis of Area of ReSA’s Sub-array
Area(\(\mu m^2\)) | Normalized Area | Normalized Speedup | |
---|---|---|---|
Planaria [13] | 5,280,739.909 | 1 | 1 |
ReSA | 53,586,325.428 | 1.02 | 1.23 |
8 Related Work
9 Conclusion
References
Index Terms
- ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors
Recommendations
A Survey of Design and Optimization for Systolic Array-based DNN Accelerators
In recent years, it has been witnessed that the systolic array is a successful architecture for DNN hardware accelerators. However, the design of systolic arrays also encountered many challenges. As DNN structures and applications become more complex, a ...
Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks
The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control ...
An FPGA-based fault-tolerant 2D systolic array for matrix multiplications
Transactions on computational science XIIIThis paper proposes a method to implement fault-tolerant self-reconfigurable 2D systolic arrays to calculate matrix multiplications on FPGAs. The array uses a 1.5-track switching network for reconfiguration. The array implemented is compared to the ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Check for updates
Author Tags
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 1,025Total Downloads
- Downloads (Last 12 months)1,025
- Downloads (Last 6 weeks)339
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in