Abstract
Deep learning models often require intensive efforts in neural architecture search and hyperparameter optimization. Conventional hyperparameter optimization methods are inefficient because they refer to multi-trial: different configurations are undertaken in parallel to find the best one. In this paper, we propose ST-NAS, an efficient optimization framework of joint neural architecture and hyperparameter. ST-NAS generalizes the efficient weight sharing strategy from architecture search to hyperparameter optimization. Hence, ST-NAS can jointly optimize both architecture and hyperparameter in a single training phrase. Fundamentally, we design a new module, ST-layer, based on STN, then further extend it to ST-super-net. With the ST-layer, each sub-model’s hyperparameter configurations can simultaneously update using the shared weight while training. Thus, with these designs, ST-NAS can efficiently couple with its architecture searching and hyperparameter schedulers. Extensive experiments show ST-NAS can collaborate well with three different NAS algorithms and consistently improves performance across search spaces, datasets, and tasks.
J. Cai and Y. Ou—contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)
Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)
Dai, X., et al.: FBNetV3: joint architecture-recipe search using neural acquisition function. arXiv e-prints pp. arXiv-2006 (2020)
Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Dong, X., Tan, M., Yu, A.W., Peng, D., Gabrys, B., Le, Q.V.: AutoHAS: differentiable hyper-parameter and architecture search. arXiv preprint arXiv:2006.03656 (2020)
Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1761–1770 (2019)
Dong, X., Yang, Y.: NAS-bench-201: extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 (2020)
Elsken, T., Metzen, J.H., Hutter, F., et al.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(55), 1–21 (2019)
Falkner, S., Klein, A., Hutter, F.: BOHB: robust and efficient hyperparameter optimization at scale. In: International Conference on Machine Learning, pp. 1437–1446. PMLR (2018)
Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 544–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_32
Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)
Li, L., Talwalkar, A.: Random search and reproducibility for neural architecture search. In: Uncertainty in Artificial Intelligence, pp. 367–377. PMLR (2020)
Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)
Lorraine, J., Duvenaud, D.: Stochastic hyperparameter optimization through hypernetworks. CoRR abs/1802.09419 (2018). arXiv:1802.09419
Luo, R., Tian, F., Qin, T., Chen, E., Liu, T.Y.: Neural architecture optimization. arXiv preprint arXiv:1808.07233 (2018)
MacKay, M., Vicol, P., Lorraine, J., Duvenaud, D., Grosse, R.: Self-tuning networks: bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088 (2019)
Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65. PMLR (2016)
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning, pp. 4095–4104. PMLR (2018)
Runge, F., Stoll, D., Falkner, S., Hutter, F.: Learning to design RNA. arXiv preprint arXiv:1812.11951 (2018)
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2015)
Snoek, J., Swersky, K., Zemel, R., Adams, R.: Input warping for Bayesian optimization of non-stationary functions. In: International Conference on Machine Learning, pp. 1674–1682. PMLR (2014)
Zela, A., Klein, A., Falkner, S., Hutter, F.: Towards automated deep learning: efficient joint neural architecture and hyperparameter search. arXiv preprint arXiv:1807.06906 (2018)
Zela, A., Siems, J., Hutter, F.: NAS-Bench-1Shot1: benchmarking and dissecting one-shot neural architecture search. arXiv preprint arXiv:2001.10422 (2020)
Zimmer, L., Lindauer, M., Hutter, F.: Auto-Pytorch: multi-fidelity meta learning for efficient and robust AutoDL. IEEE Trans. Patt. Anal. Mach. Intell. (2021)
Acknowledgments
This research was partly supported by the National Natural Science Foundation of China (Grant No. 41876098), the National Key R & D Program of China (Grant No. 2020AAA0108303), and Shenzhen Science and Technology Project (Grant No. JCYJ20200109143041798).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, J., Ou, Y., Li, X., Wang, H. (2021). ST-NAS: Efficient Optimization of Joint Neural Architecture and Hyperparameter. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-92307-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)