Block-Level Surrogate Models for Inference Time Estimation in Hardware-Aware Neural Architecture Search

Kurt Stolle^13,14,
Sebastian Vogel¹⁴,
Fons van der Sommen¹³ &
…
Willem Sanberg¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13717))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

863 Accesses
1 Citations

Abstract

Hardware-Aware Neural Architecture Search (HA-NAS) is an attractive approach for discovering network architectures that balance task accuracy and deployment efficiency. In an iterative search algorithm, inference time is typically determined at every step by directly profiling architectures on hardware. This imposes limitations on the scalability of search processes because access to specialized devices for profiling is required. As such, the ability to assess inference time without hardware access is an important aspect to enable deep learning on resource-constrained embedded devices. Previous work estimates inference time by summing individual contributions of the architecture’s parts. In this work, we propose using block-level inference time estimators to find the network-level inference time. Individual estimators are trained on collected datasets of independently sampled and profiled architecture block instances. Our experiments on isolated blocks commonly found in classification architectures show that gradient boosted decision trees serve as an accurate surrogate for inference time. More specifically, their Spearman correlation coefficient exceeds 0.98 on all tested platforms. When such blocks are connected in sequence, the sum of all block estimations correlates with the measured network inference time, having Spearman correlation coefficients above 0.71 on evaluated CPUs and an accelerator platform. Furthermore, we demonstrate the applicability of our Surrogate Model (SM) methodology in its intended HA-NAS context. To this end, we evaluate and compare two HA-NAS processes: one that relies on profiling via hardware-in-the-loop and one that leverages block-level surrogate models. We find that both processes yield similar Pareto-optimal architectures. This shows that our method facilitates a similar task-performance outcome without relying on hardware access for profiling during architecture search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 59.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 74.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploring Cell-Based Neural Architectures for Embedded Systems

Towards a Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms

Learning Where to Look – Generative NAS is Surprisingly Efficient

Notes

1.
Note that the experiments are meant to demonstrate our methodology. The experiments are not intended to benchmark specific hardware and deployment toolchains, hence those details are left out.

References

Abdelfattah, M.S., Dudziak, Ł., Chau, T., Lee, R., Kim, H., Lane, N.D.: Best of both worlds: Automl codesign of a cnn and its hardware accelerator. In: ACM/IEEE DAC. IEEE (2020)
Google Scholar
Baker, B., Gupta, O., Raskar, R., Naik, N.: Accelerating neural architecture search using performance prediction. ICLR Workshop (2018)
Google Scholar
Benmeziane, H., Maghraoui, K.E., Ouarnoughi, H., Niar, S., Wistuba, M., Wang, N.: A comprehensive survey on hardware-aware neural architecture search. arXiv preprint arXiv:2101.09336 (2021)
Bouzidi, H., Ouarnoughi, H., Niar, S., Cadi, A.A.E.: Performance prediction for convolutional neural networks on edge gpus. In: ACM ICCF, p. 54–62 (2021)
Google Scholar
Breiman, L.: Random forests. In: Machine Learning. vol. 45, pp. 5–32. Springer Science and Business Media LLC (2001). https://doi.org/10.1023/a:1010933404324
Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once for all: Train one network and specialize it for efficient deployment. In: ICLR (2020)
Google Scholar
Cai, H., Zhu, L., Han, S.: ProxylessNAS: Direct neural architecture search on target task and hardware. In: ICLR (2019)
Google Scholar
Dillon, J.V., et al.: Tensorflow distributions. arXiv preprint arXiv:1711.10604 (2017)
Dong, X., Yang, Y.: NAS-Bench-201: Extending the scope of reproducible neural architecture search. In: ICLR (2019)
Google Scholar
Dong, Z., Gao, Y., Huang, Q., Wawrzynek, J., So, H.K., Keutzer, K.: HAO: Hardware-aware neural architecture optimization for efficient inference. In: IEEE FCCM, pp. 50–59. IEEE (2021)
Google Scholar
Dudziak, L., Chau, T., Abdelfattah, M., Lee, R., Kim, H., Lane, N.: BRP-NAS: prediction-based NAS using GCNs. NeurIPS 33, 10480–10490 (2020)
Google Scholar
Elsken, T., Metzen, J.H., Hutter, F.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(1), 1997–2017 (2019)
MathSciNet MATH Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Statist. 29(5), 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Gupta, S., Akin, B.: Accelerator-aware neural network design using automl. arXiv preprint arXiv:2003.02838 (2020)
Guyon, I., et al.: Analysis of the automl challenge series, pp. 191–236 2015–2018
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE/CVF CVPR, pp. 770–778 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Howard, A., et al.: Searching for mobilenetv3. In: IEEE/CVF CVPR, pp. 1314–1324 (2019)
Google Scholar
Jin, H., Song, Q., Hu, X.: Auto-keras: An efficient neural architecture search system. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1946–1956. ACM (2019)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lee, H., Lee, S., Chong, S., Hwang, S.J.: HELP: Hardware-adaptive efficient latency prediction for NAS via meta-learning. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Li, C., et al.: HW-NAS-Bench: Hardware-aware neural architecture search benchmark. In: ICLR (2021)
Google Scholar
Li, W., Liewig, M.: A survey of ai accelerators for edge environment. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S., Orovic, I., Moreira, F. (eds.) WorldCIST 2020. AISC, vol. 1160, pp. 35–44. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45691-7_4
Chapter Google Scholar
Moons, B., et al.: Distilling optimal neural networks: Rapid search in diverse spaces. In: IEEE/CVF CVPR, pp. 12229–12238 (2021)
Google Scholar
Roijers, D.M., Zintgraf, L.M., Nowé, A.: Interactive Thompson sampling for multi-objective multi-armed bandits. In: Rothe, J. (ed.) ADT 2017. LNCS (LNAI), vol. 10576, pp. 18–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67504-6_2
Chapter Google Scholar
Shaw, A., Hunter, D., Iandola, F., Sidhu, S.: Squeezenas: Fast neural architecture search for faster semantic segmentation. arXiv preprint arXiv:1908.01748 (2019)
Stamoulis, D., et al.: Single-path NAS: Device-aware efficient convnet design. In: Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations with Industrial Applications (ODML-CDNNRIA) at ICML (2019)
Google Scholar
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: Mnasnet: Platform-aware neural architecture search for mobile. IEEE/CVF CVPR, pp. 2815–2823 (2019)
Google Scholar
Tsai, H., Ooi, J., Ferng, C.S., Chung, H.W., Riesa, J.: Finding fast transformers: One-shot neural architecture search by component composition. arXiv preprint arXiv:2008.06808 (2020)
Vanschoren, J.: Meta-Learning, pp. 35–61. Springer International Publishing (2019)
Google Scholar
Wistuba, M., Rawat, A., Pedapati, T.: A survey on neural architecture search. arXiv preprint arXiv:1905.01392 (2019)
Wu, B., et al.: FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: IEEE/CVF CVPR, pp. 10726–10734 (2019)
Google Scholar
Wu, J., et al.: Weak NAS predictors are all you need. arXiv preprint arXiv:2102.10490 (2021)
Yang, T.-J., et al.: NetAdapt: platform-aware neural network adaptation for mobile applications. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 289–304. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_18
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Kurt Stolle & Fons van der Sommen
NXP Semiconductors, Eindhoven, The Netherlands
Kurt Stolle, Sebastian Vogel & Willem Sanberg

Authors

Kurt Stolle
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Vogel
View author publications
You can also search for this author in PubMed Google Scholar
Fons van der Sommen
View author publications
You can also search for this author in PubMed Google Scholar
Willem Sanberg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Willem Sanberg .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stolle, K., Vogel, S., van der Sommen, F., Sanberg, W. (2023). Block-Level Surrogate Models for Inference Time Estimation in Hardware-Aware Neural Architecture Search. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-26419-1_28
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Block-Level Surrogate Models for Inference Time Estimation in Hardware-Aware Neural Architecture Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Cell-Based Neural Architectures for Embedded Systems

Towards a Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms

Learning Where to Look – Generative NAS is Surprisingly Efficient

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Block-Level Surrogate Models for Inference Time Estimation in Hardware-Aware Neural Architecture Search

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Cell-Based Neural Architectures for Embedded Systems

Towards a Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms

Learning Where to Look – Generative NAS is Surprisingly Efficient

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation