[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

ST-NAS: Efficient Optimization of Joint Neural Architecture and Hyperparameter

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

Abstract

Deep learning models often require intensive efforts in neural architecture search and hyperparameter optimization. Conventional hyperparameter optimization methods are inefficient because they refer to multi-trial: different configurations are undertaken in parallel to find the best one. In this paper, we propose ST-NAS, an efficient optimization framework of joint neural architecture and hyperparameter. ST-NAS generalizes the efficient weight sharing strategy from architecture search to hyperparameter optimization. Hence, ST-NAS can jointly optimize both architecture and hyperparameter in a single training phrase. Fundamentally, we design a new module, ST-layer, based on STN, then further extend it to ST-super-net. With the ST-layer, each sub-model’s hyperparameter configurations can simultaneously update using the shared weight while training. Thus, with these designs, ST-NAS can efficiently couple with its architecture searching and hyperparameter schedulers. Extensive experiments show ST-NAS can collaborate well with three different NAS algorithms and consistently improves performance across search spaces, datasets, and tasks.

J. Cai and Y. Ou—contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 87.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2) (2012)

    Google Scholar 

  2. Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)

  3. Dai, X., et al.: FBNetV3: joint architecture-recipe search using neural acquisition function. arXiv e-prints pp. arXiv-2006 (2020)

    Google Scholar 

  4. Domhan, T., Springenberg, J.T., Hutter, F.: Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

    Google Scholar 

  5. Dong, X., Tan, M., Yu, A.W., Peng, D., Gabrys, B., Le, Q.V.: AutoHAS: differentiable hyper-parameter and architecture search. arXiv preprint arXiv:2006.03656 (2020)

  6. Dong, X., Yang, Y.: Searching for a robust neural architecture in four GPU hours. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1761–1770 (2019)

    Google Scholar 

  7. Dong, X., Yang, Y.: NAS-bench-201: extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 (2020)

  8. Elsken, T., Metzen, J.H., Hutter, F., et al.: Neural architecture search: a survey. J. Mach. Learn. Res. 20(55), 1–21 (2019)

    MathSciNet  MATH  Google Scholar 

  9. Falkner, S., Klein, A., Hutter, F.: BOHB: robust and efficient hyperparameter optimization at scale. In: International Conference on Machine Learning, pp. 1437–1446. PMLR (2018)

    Google Scholar 

  10. Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 544–560. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_32

    Chapter  Google Scholar 

  11. Jaderberg, M., et al.: Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017)

  12. Li, L., Talwalkar, A.: Random search and reproducibility for neural architecture search. In: Uncertainty in Artificial Intelligence, pp. 367–377. PMLR (2020)

    Google Scholar 

  13. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018)

  14. Lorraine, J., Duvenaud, D.: Stochastic hyperparameter optimization through hypernetworks. CoRR abs/1802.09419 (2018). arXiv:1802.09419

  15. Luo, R., Tian, F., Qin, T., Chen, E., Liu, T.Y.: Neural architecture optimization. arXiv preprint arXiv:1808.07233 (2018)

  16. MacKay, M., Vicol, P., Lorraine, J., Duvenaud, D., Grosse, R.: Self-tuning networks: bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088 (2019)

  17. Mendoza, H., Klein, A., Feurer, M., Springenberg, J.T., Hutter, F.: Towards automatically-tuned neural networks. In: Workshop on Automatic Machine Learning, pp. 58–65. PMLR (2016)

    Google Scholar 

  18. Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J.: Efficient neural architecture search via parameters sharing. In: International Conference on Machine Learning, pp. 4095–4104. PMLR (2018)

    Google Scholar 

  19. Runge, F., Stoll, D., Falkner, S., Hutter, F.: Learning to design RNA. arXiv preprint arXiv:1812.11951 (2018)

  20. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2015)

    Article  Google Scholar 

  21. Snoek, J., Swersky, K., Zemel, R., Adams, R.: Input warping for Bayesian optimization of non-stationary functions. In: International Conference on Machine Learning, pp. 1674–1682. PMLR (2014)

    Google Scholar 

  22. Zela, A., Klein, A., Falkner, S., Hutter, F.: Towards automated deep learning: efficient joint neural architecture and hyperparameter search. arXiv preprint arXiv:1807.06906 (2018)

  23. Zela, A., Siems, J., Hutter, F.: NAS-Bench-1Shot1: benchmarking and dissecting one-shot neural architecture search. arXiv preprint arXiv:2001.10422 (2020)

  24. Zimmer, L., Lindauer, M., Hutter, F.: Auto-Pytorch: multi-fidelity meta learning for efficient and robust AutoDL. IEEE Trans. Patt. Anal. Mach. Intell. (2021)

    Google Scholar 

Download references

Acknowledgments

This research was partly supported by the National Natural Science Foundation of China (Grant No. 41876098), the National Key R & D Program of China (Grant No. 2020AAA0108303), and Shenzhen Science and Technology Project (Grant No. JCYJ20200109143041798).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiu Li or Haoqian Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cai, J., Ou, Y., Li, X., Wang, H. (2021). ST-NAS: Efficient Optimization of Joint Neural Architecture and Hyperparameter. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92307-5_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92306-8

  • Online ISBN: 978-3-030-92307-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics