More Web Proxy on the site http://driver.im/

research-article

Open access

Fast and Accurate Virtual Prototyping of an NPU with Analytical Memory Modeling

Authors:

Choonghoon Park,

Soonhoi HaAuthors Info & Claims

RSP '23: Proceedings of the 34th International Workshop on Rapid System Prototyping

Article No.: 01, Pages 1 - 7

https://doi.org/10.1145/3625223.3649265

Published: 21 June 2024 Publication History

Abstract

As the application area of convolutional neural networks (CNNs) is fast expanding, the demand for a customized hardware accelerator called a neural processing unit (NPU), is increasing to process them efficiently in terms of execution time and energy consumption. In the design of an NPU, building a fast and accurate virtual prototype enables us to develop a compiler concurrently with the hardware and to explore the micro-architectural design space. Since the memory access latency has a great effect on performance, it is necessary to model the memory access overhead accurately in the virtual prototype. In this work, we propose a novel analytical model for memory access latency, improving the performance estimation accuracy significantly compared with the previous state-of-the-art analytical model by considering the effect of memory access patterns of the NPU on the latency. The proposed high-level virtual prototype achieves an estimated execution time gap within a 6.8% difference from the RTL simulation result. To demonstrate the usefulness of a fast and accurate prototype, we propose a compiler optimization technique and a new DMA logic tailored for the NPU for further performance improvement.

References

[1]

S. Yao et al, "The evolution of deep learning accelerators upon the evolution of deep learning algorithms", HotChips, 2018

[2]

Lee, Keonjoo, et al. "Analysis of the Effect of Off-chip Memory Access on the Performance of an NPU System." 2022 23rd International Symposium on Quality Electronic Design (ISQED). IEEE, 2022.

[3]

Sam (Likun) Xi, Yuan Yao, Kshitij Bhardwaj, Paul Whatmough, Gu-Yeon Wei, and David Brooks. "SMAUG: End-to-end full-stack simulation infrastructure for deep learning workloads." ACM Transactions on Architecture and Code Optimization (TACO) 17.4 (2020): 1-26.

Digital Library

[4]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292--308, June 2019.

[5]

Duseok Kang, Donghyun Kang, Soonhoi Ha, "Multi-Bank On-chip Memory Management Techniques for CNN Accelerators," IEEE Transactions on Computers, Vol. (Early Access), May, 2021.

[6]

Samajdar, Ananda, et al. "Scale-sim: Systolic cnn accelerator simulator." arXiv preprint arXiv:1811.02883 (2018).

[7]

Yi, Changjae, Donghyun Kang, and Soonhoi Ha. "Hardware-Software Codesign of a CNN Accelerator." 2022 25th Euromicro Conference on Digital System Design (DSD). IEEE, 2022.

[8]

Donghyun Kang, Jintaek Kang, Hyungdal Kwon, Hyunsik Park, and Soonhoi Ha. "A novel convolutional neural network accelerator that enables fully-pipelined execution of layers." In Proceedings of the 37th IEEE International Conference on Computer Design (ICCD). IEEE, 2019.

[9]

Y. -C. Lee, T. -S. Hsu, C. -T. Chen, J. -J. Liou and J. -M. Lu, "NNSim: A Fast and Accurate SystemC/TLM Simulator for Deep Convolutional Neural Network Accelerators," 2019 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), 2019, pp. 1--4.

[10]

J. Haris, P. Gibson, J. Cano, N. B. Agostini, and D. Kaeli, "SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference," 2021 IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Belo Horizonte, Brazil, 2021, pp. 33--43.

[11]

N. Bohm Agostini et al., "Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC," 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto, Portugal, 2020, pp. 10--19.

[12]

Kim, Sunwoo, Sungkyung Park, and Chester Sungchung Park. "System-level communication performance estimation for DMA-controlled accelerators." IEEE Access 9 (2021): 141389-141402.

[13]

Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. "In-datacenter performance analysis of a tensor processing unit." In Proceedings of ISCA, pages 1--12. ACM, 2017.

[14]

Chen, Yu-Hsin, et al. "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks." IEEE journal of solid-state circuits 52.1 (2016): 127-138.

[15]

F. Sijstermans. "The nvidia deep learning accelerator." In HotChips 2018, http://nvdla.org, 2018.

[16]

J. -W. Jang et al., "Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC," 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 15--28

Digital Library

[17]

Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. "Cambricon-x: An accelerator for sparse neural networks." In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, page 20. IEEE Press, 2016.

[18]

Yoongu Kim, Weikun Yang, and Onur Mutlu. Ramulator: A fast and extensible dram simulator. IEEE Comput. Archit. Lett., 15(1):45--49, January 2016.

Digital Library

[19]

Mark Sandler et al., "Inverted Residuals and Linear Bottlenecks", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp.4510-4520.

[20]

Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao, "Scaled-YOLOv4: Scaling Cross Stage Partial Network", arXiv preprint arXiv:2011.08036, 2020.

[21]

Liang-Chieh Chen et al., "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation", Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801--818.

Index Terms

Fast and Accurate Virtual Prototyping of an NPU with Analytical Memory Modeling
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Accelerating end-to-end inference of transformer-based large language models (LLMs) is a critical component of AI services in datacenters. However, the diverse compute characteristics of LLMs' end-to-end inference present challenges as previously ...
CuMAPz: a tool to analyze memory access patterns in CUDA
DAC '11: Proceedings of the 48th Design Automation Conference

CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider several architectural details, and small changes in source code, ...
Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures

The introduction of General-Purpose computation on GPUs (GPGPUs) has changed the landscape for the future of parallel computing. At the core of this phenomenon are massively multithreaded, data-parallel architectures possessing impressive acceleration ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

RSP '23: Proceedings of the 34th International Workshop on Rapid System Prototyping

September 2023

82 pages

ISBN:9798400704109

DOI:10.1145/3625223

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Science and ICT, South Korea

Conference

RSP '23

Sponsor:

RSP '23: 34th International Workshop on Rapid System Prototyping

September 21, 2023

Hamburg, Germany

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
225
Total Downloads

Downloads (Last 12 months)225
Downloads (Last 6 weeks)70

Reflects downloads up to 07 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents