[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3625223.3649265acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article
Open access

Fast and Accurate Virtual Prototyping of an NPU with Analytical Memory Modeling

Published: 21 June 2024 Publication History

Abstract

As the application area of convolutional neural networks (CNNs) is fast expanding, the demand for a customized hardware accelerator called a neural processing unit (NPU), is increasing to process them efficiently in terms of execution time and energy consumption. In the design of an NPU, building a fast and accurate virtual prototype enables us to develop a compiler concurrently with the hardware and to explore the micro-architectural design space. Since the memory access latency has a great effect on performance, it is necessary to model the memory access overhead accurately in the virtual prototype. In this work, we propose a novel analytical model for memory access latency, improving the performance estimation accuracy significantly compared with the previous state-of-the-art analytical model by considering the effect of memory access patterns of the NPU on the latency. The proposed high-level virtual prototype achieves an estimated execution time gap within a 6.8% difference from the RTL simulation result. To demonstrate the usefulness of a fast and accurate prototype, we propose a compiler optimization technique and a new DMA logic tailored for the NPU for further performance improvement.

References

[1]
S. Yao et al, "The evolution of deep learning accelerators upon the evolution of deep learning algorithms", HotChips, 2018
[2]
Lee, Keonjoo, et al. "Analysis of the Effect of Off-chip Memory Access on the Performance of an NPU System." 2022 23rd International Symposium on Quality Electronic Design (ISQED). IEEE, 2022.
[3]
Sam (Likun) Xi, Yuan Yao, Kshitij Bhardwaj, Paul Whatmough, Gu-Yeon Wei, and David Brooks. "SMAUG: End-to-end full-stack simulation infrastructure for deep learning workloads." ACM Transactions on Architecture and Code Optimization (TACO) 17.4 (2020): 1-26.
[4]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292--308, June 2019.
[5]
Duseok Kang, Donghyun Kang, Soonhoi Ha, "Multi-Bank On-chip Memory Management Techniques for CNN Accelerators," IEEE Transactions on Computers, Vol. (Early Access), May, 2021.
[6]
Samajdar, Ananda, et al. "Scale-sim: Systolic cnn accelerator simulator." arXiv preprint arXiv:1811.02883 (2018).
[7]
Yi, Changjae, Donghyun Kang, and Soonhoi Ha. "Hardware-Software Codesign of a CNN Accelerator." 2022 25th Euromicro Conference on Digital System Design (DSD). IEEE, 2022.
[8]
Donghyun Kang, Jintaek Kang, Hyungdal Kwon, Hyunsik Park, and Soonhoi Ha. "A novel convolutional neural network accelerator that enables fully-pipelined execution of layers." In Proceedings of the 37th IEEE International Conference on Computer Design (ICCD). IEEE, 2019.
[9]
Y. -C. Lee, T. -S. Hsu, C. -T. Chen, J. -J. Liou and J. -M. Lu, "NNSim: A Fast and Accurate SystemC/TLM Simulator for Deep Convolutional Neural Network Accelerators," 2019 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2019, pp. 1--4.
[10]
J. Haris, P. Gibson, J. Cano, N. B. Agostini, and D. Kaeli, "SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference," 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Belo Horizonte, Brazil, 2021, pp. 33--43.
[11]
N. Bohm Agostini et al., "Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC," 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto, Portugal, 2020, pp. 10--19.
[12]
Kim, Sunwoo, Sungkyung Park, and Chester Sungchung Park. "System-level communication performance estimation for DMA-controlled accelerators." IEEE Access 9 (2021): 141389-141402.
[13]
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. "In-datacenter performance analysis of a tensor processing unit." In Proceedings of ISCA, pages 1--12. ACM, 2017.
[14]
Chen, Yu-Hsin, et al. "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks." IEEE journal of solid-state circuits 52.1 (2016): 127-138.
[15]
F. Sijstermans. "The nvidia deep learning accelerator." In HotChips 2018, http://nvdla.org, 2018.
[16]
J. -W. Jang et al., "Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC," 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 15--28
[17]
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. "Cambricon-x: An accelerator for sparse neural networks." In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 20. IEEE Press, 2016.
[18]
Yoongu Kim, Weikun Yang, and Onur Mutlu. Ramulator: A fast and extensible dram simulator. IEEE Comput. Archit. Lett., 15(1):45--49, January 2016.
[19]
Mark Sandler et al., "Inverted Residuals and Linear Bottlenecks", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp.4510-4520.
[20]
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao, "Scaled-YOLOv4: Scaling Cross Stage Partial Network", arXiv preprint arXiv:2011.08036, 2020.
[21]
Liang-Chieh Chen et al., "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation", Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801--818.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RSP '23: Proceedings of the 34th International Workshop on Rapid System Prototyping
September 2023
82 pages
ISBN:9798400704109
DOI:10.1145/3625223
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2024

Check for updates

Author Tags

  1. virtual prototype
  2. NPU
  3. cycle accuracy
  4. memory access pattern
  5. analytical model

Qualifiers

  • Research-article

Funding Sources

Conference

RSP '23

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 215
    Total Downloads
  • Downloads (Last 12 months)215
  • Downloads (Last 6 weeks)67
Reflects downloads up to 02 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media