research-article

Flydeling: Streamlined Performance Models for Hardware Acceleration of CNNs through System Identification

Authors:

Walther Carballo-Hernández,

Maxime Pelcat,

Shuvra S. Bhattacharyya,

Ricardo Carmona Galán,

François BerryAuthors Info & Claims

ACM Transactions on Modeling and Performance Evaluation of Computing Systems, Volume 8, Issue 3

Article No.: 7, Pages 1 - 33

https://doi.org/10.1145/3594870

Published: 18 July 2023 Publication History

Get Access

Abstract

The introduction of deep learning algorithms, such as Convolutional Neural Networks (CNNs) in many near-sensor embedded systems, opens new challenges in terms of energy efficiency and hardware performance. An emerging solution to address these challenges is to use tailored heterogeneous hardware accelerators combining processing elements of different architectural natures such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), Field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). To progress towards heterogeneity, a great asset would be an automated design space exploration tool that chooses, for each accelerated partition of a CNN, the most appropriate architecture considering available resources. To feed such a design space exploration process, models are required that provide very fast yet precise evaluations of alternative architectures or alternative forms of CNNs. Quick configuration estimation could be achieved with few parameters from representative input sequences. This article studies a solution called flydeling (as a contraction of flyweight modeling) for obtaining these models by inspiring from the black-box System Identification (SI) domain. We refer to models derived using the proposed approach as flyweight models (flydels).

A methodology is proposed to generate these flydels, using CNN properties as predictor features together with SI techniques with a stochastic excitation input at a feature map dimensions level. For an embedded CPU-FPGA-GPU heterogeneous platform, it is demonstrated that it is possible to learn these Key Performance Indicators (KPIs) flydels at an early design stage and from high-level application features. For latency, energy, and resource utilization, flydels obtain estimation errors varying between 5% and 10% with less model parameters compared to state-of-the-art solutions and are built automatically from platform measurements.

References

[1]

K. Abdelouahab, M. Pelcat, J. Sérot, C. Bourrasset, and F. Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. 9, 4 (Aug.2017), 113–116. DOI:

Abstract

References

Cited By

Index Terms

Recommendations

MIC acceleration of short-range molecular dynamics simulations

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

FPGA Acceleration of CNNs Using OpenCL

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations