ST-NAS: Efficient Optimization of Joint Neural Architecture and Hyperparameter

Jinhang Cai¹⁰,
Yimin Ou¹⁰,
Xiu Li^10,11 &
…
Haoqian Wang^10,11

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

International Conference on Neural Information Processing

2707 Accesses

Abstract

Deep learning models often require intensive efforts in neural architecture search and hyperparameter optimization. Conventional hyperparameter optimization methods are inefficient because they refer to multi-trial: different configurations are undertaken in parallel to find the best one. In this paper, we propose ST-NAS, an efficient optimization framework of joint neural architecture and hyperparameter. ST-NAS generalizes the efficient weight sharing strategy from architecture search to hyperparameter optimization. Hence, ST-NAS can jointly optimize both architecture and hyperparameter in a single training phrase. Fundamentally, we design a new module, ST-layer, based on STN, then further extend it to ST-super-net. With the ST-layer, each sub-model’s hyperparameter configurations can simultaneously update using the shared weight while training. Thus, with these designs, ST-NAS can efficiently couple with its architecture searching and hyperparameter schedulers. Extensive experiments show ST-NAS can collaborate well with three different NAS algorithms and consistently improves performance across search spaces, datasets, and tasks.

J. Cai and Y. Ou—contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 87.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 109.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accelerating Hyperparameter Optimization of Deep Neural Network via Progressive Multi-Fidelity Evaluation

XferNAS: Transfer Neural Architecture Search

Across-task neural architecture search via meta learning

Article 14 November 2022

References

Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)
Google Scholar
Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Dai, X., et al.: FBNetV3: joint architecture-recipe search using neural acquisition function. arXiv e-prints pp. arXiv-2006 (2020)
Google Scholar
Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Dong, X., Tan, M., Yu, A.W., Peng, D., Gabrys, B., Le, Q.V.: AutoHAS: differentiable hyper-parameter and architecture search. arXiv preprint arXiv:2006.03656 (2020)
Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1761–1770 (2019)
Google Scholar
Dong, X., Yang, Y.: NAS-bench-201: extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 (2020)
Elsken, T., Metzen, J.H., Hutter, F., et al.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(55), 1–21 (2019)
MathSciNet MATH Google Scholar
Falkner, S., Klein, A., Hutter, F.: BOHB: robust and efficient hyperparameter optimization at scale. In: International Conference on Machine Learning, pp. 1437–1446. PMLR (2018)
Google Scholar
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 544–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_32
Chapter Google Scholar
Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)
Li, L., Talwalkar, A.: Random search and reproducibility for neural architecture search. In: Uncertainty in Artificial Intelligence, pp. 367–377. PMLR (2020)
Google Scholar
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)
Lorraine, J., Duvenaud, D.: Stochastic hyperparameter optimization through hypernetworks. CoRR abs/1802.09419 (2018). arXiv:1802.09419
Luo, R., Tian, F., Qin, T., Chen, E., Liu, T.Y.: Neural architecture optimization. arXiv preprint arXiv:1808.07233 (2018)
MacKay, M., Vicol, P., Lorraine, J., Duvenaud, D., Grosse, R.: Self-tuning networks: bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088 (2019)
Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65. PMLR (2016)
Google Scholar
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning, pp. 4095–4104. PMLR (2018)
Google Scholar
Runge, F., Stoll, D., Falkner, S., Hutter, F.: Learning to design RNA. arXiv preprint arXiv:1812.11951 (2018)
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2015)
Article Google Scholar
Snoek, J., Swersky, K., Zemel, R., Adams, R.: Input warping for Bayesian optimization of non-stationary functions. In: International Conference on Machine Learning, pp. 1674–1682. PMLR (2014)
Google Scholar
Zela, A., Klein, A., Falkner, S., Hutter, F.: Towards automated deep learning: efficient joint neural architecture and hyperparameter search. arXiv preprint arXiv:1807.06906 (2018)
Zela, A., Siems, J., Hutter, F.: NAS-Bench-1Shot1: benchmarking and dissecting one-shot neural architecture search. arXiv preprint arXiv:2001.10422 (2020)
Zimmer, L., Lindauer, M., Hutter, F.: Auto-Pytorch: multi-fidelity meta learning for efficient and robust AutoDL. IEEE Trans. Patt. Anal. Mach. Intell. (2021)
Google Scholar

Download references

Acknowledgments

This research was partly supported by the National Natural Science Foundation of China (Grant No. 41876098), the National Key R & D Program of China (Grant No. 2020AAA0108303), and Shenzhen Science and Technology Project (Grant No. JCYJ20200109143041798).

Author information

Authors and Affiliations

Tsinghua University, Beijing, China
Jinhang Cai, Yimin Ou, Xiu Li & Haoqian Wang
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, People’s Republic of China
Xiu Li & Haoqian Wang

Authors

Jinhang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yimin Ou
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Li
View author publications
You can also search for this author in PubMed Google Scholar
Haoqian Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiu Li or Haoqian Wang .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cai, J., Ou, Y., Li, X., Wang, H. (2021). ST-NAS: Efficient Optimization of Joint Neural Architecture and Hyperparameter. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-92307-5_32
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics