More Web Proxy on the site http://driver.im/

research-article

Open access

Physically Accurate Learning-based Performance Prediction of Hardware-accelerated ML Algorithms

Authors:

Hadi Esmaeilzadeh,

Soroush Ghodrati,

Andrew B. Kahng,

Joon Kyung Kim,

Rohan Mahapatra,

Susmita Dey Manasi,

Sachin S. Sapatnekar,

Ziqing ZengAuthors Info & Claims

MLCAD '22: Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD

Pages 119 - 126

https://doi.org/10.1145/3551901.3556489

Published: 12 September 2022 Publication History

Abstract

Parameterizable ML accelerators are the product of recent breakthroughs in machine learning (ML). To fully enable the design space exploration, we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It employs a unified methodology, coupling backend power, performance and area (PPA) analysis with frontend performance simulation, thus achieving realistic estimation of both backend PPA and system metrics (runtime and energy). Experimental studies show that the approach provides excellent predictions for both ASIC (in a 12nm commercial process) and FPGA implementations on the VTA and VeriGOOD-ML platforms.

References

[1]

A. Agnesina et al.,"VLSI Placement Parameter Optimization using Deep Reinforcement Learning", Proc. ICCAD, 2020, pp. 1--9.

[2]

C. Bai et al., "BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space Exploration Framework", Proc. ICCAD, 2021.

[3]

S. Banerjee et al.,"A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration", arXiv:2111.15024, 2020.

[4]

J. Bergstra and Y. Bengio,"Random Search for Hyper-parameter Optimization", Journal of Machine Learning Research, 13(10), 2012, pp. 281--305.

Digital Library

[5]

T. Chen et al.,"TVM: An Automated End-to-End Optimizing Compiler for Deep Learning", Proc. OSDI, 2018, pp. 578--594.

[6]

S. Dai et al.,"Fast and Accurate Estimation of Quality of Results in High-Level Synthesis with Machine Learning", Proc. FCCM, 2018, pp. 129--132.

[7]

H. Esmaeilzadeh et al.,"VeriGOOD-ML: An Open-Source Flow for Automated ML Hardware Synthesis", Proc. ICCAD, 2021, pp. 1--8.

[8]

H. Genc et al.,"Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration", Proc. DAC, 2021, pp. 769--774.

Digital Library

[9]

N. P. Jouppi et al.,"In-datacenter Performance Analysis of a Tensor Processing Unit", Proc. ISCA, 2017, pp. 1--12.

[10]

A. B. Kahng et al.,"ORION3.0: A Comprehensive NoC Router Estimation Tool", IEEE Embedded Systems Letters 7(2) (2015), pp. 41--45.

Digital Library

[11]

M. J. van der Laan et al.,"Super Learner'', Statistical Applications in Genetics and Molecular Biology. 2007;6(1). https://doi.org/10.2202/1544--6115.1309

[12]

W. Lee et al.,"PowerTrain: A Learning-based Calibration of McPAT Power Models", Proc. ISLPED, 2015, pp. 189--194.

[13]

S. Li et al.,"McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures", Proc. MICRO, 2009.

[14]

Z. Lin et al.,"HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis", Proc. ASP-DAC, 2020, pp. 574--580.

Digital Library

[15]

F. Last and U. Schlichtmann,"Feeding Hungry Models Less: Deep Transfer Learning for Embedded Memory PPA Models : Special Session", Proc. MLCAD, 2021, pp. 1--6.

[16]

D. Mahajan et al., "TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning", Proc. HPCA, 2016, pp. 14--26.

[17]

S. D. Manasi et al., "NeuPart: Using Analytical Models to Drive Energy-Efficient Partitioning of CNN Computations on Cloud-Connected Mobile Clients", IEEE TVLSI 28(8) (2018), pp. 1844--1857.

[18]

T. Moreau et al.,"A Hardware--Software Blueprint for Flexible Deep Learning Specialization", IEEE Micro, 39(5) (2019), pp. 8--16.

[19]

S. D. Manasi, and S. S. Sapatnekar,"DeepOpt: Optimized Scheduling of CNN Workloads for ASIC-based Systolic Deep Learning Accelerators",in Proc. ASPDAC, 2021, pp. 235--241.

Digital Library

[20]

Y. S. Shao et al.,"Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures", Proc. ISCA, 2014, pp. 97--108.

[21]

H. Wang et al.,"Orion: A Power-Performance Simulator for Interconnection Networks", Proc. MICRO, 2002, pp. 294--395.

[22]

P. Xu et al.,"AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs", Proc. FPGA, 2020, pp. 40--50.

Digital Library

[23]

E. Tabanelli et al.,"DNN Is Not All You Need: Parallelizing Non-Neural ML Algorithms on Ultra-Low-Power IoT Processors", arXiv:2107.09448, 2021.https://arxiv.org/abs/2107.09448.

[24]

VeriGood-ML, https://github.com/VeriGOOD-ML/public.

[25]

AutoML: Automatic Machine Learning, https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html.

[26]

"VTA Hardware Design Stack", https://github.com/pasqoc/incubator-tvm-vta.

Cited By

Esmaeilzadeh HGhodrati SKahng AKim JKinzer SKundu SMahapatra RManasi SSapatnekar SWang ZZeng Z(2024)An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/366465229:4(1-33)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3664652

Index Terms

Physically Accurate Learning-based Performance Prediction of Hardware-accelerated ML Algorithms
1. Hardware

Recommendations

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators
Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep ...
Hardware accelerated FPGA placement

A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic ...
A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform

This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MLCAD '22: Proceedings of the 2022 ACM/IEEE Workshop on Machine Learning for CAD

September 2022

181 pages

ISBN:9781450394864

DOI:10.1145/3551901

General Chairs:
Paul Franzon
North Carolina State University, USA
,
Andrew B. Kahng
University of California at San Diego, USA
,
Program Chairs:
Hai (Helen) Li
Duke University, USA
,
Bing Li
Technical University of Munich, Germany

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 September 2022

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MLCAD '22

Sponsor:

SIGDA

MLCAD '22: 2022 ACM/IEEE Workshop on Machine Learning for CAD

September 12 - 13, 2022

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 35 of 83 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
418
Total Downloads

Downloads (Last 12 months)205
Downloads (Last 6 weeks)23

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Esmaeilzadeh HGhodrati SKahng AKim JKinzer SKundu SMahapatra RManasi SSapatnekar SWang ZZeng Z(2024)An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning AcceleratorsACM Transactions on Design Automation of Electronic Systems10.1145/366465229:4(1-33)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3664652

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents