NDARTS: A Differentiable Architecture Search Based on the Neumann Series
<p>CIFAR-10 dataset.</p> "> Figure 2
<p>CIFAR-100 dataset.</p> "> Figure 3
<p>ImageNET Dataset.</p> "> Figure 4
<p>Cell structure of darts search space.</p> "> Figure 5
<p>Cell structure of NAS-Bench-201 search space.</p> "> Figure 6
<p>The impact of <span class="html-italic">T</span> on NDARTS.</p> "> Figure 7
<p>Comparison with DARTS when <math display="inline"><semantics> <mrow> <mi>T</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> <mo>,</mo> <mn>4</mn> </mrow> </semantics></math>.</p> "> Figure 8
<p>The impact of <span class="html-italic">K</span> on NDARTS.</p> "> Figure 9
<p>Comparison with DARTS when <math display="inline"><semantics> <mrow> <mi>K</mi> <mo>=</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mn>3</mn> </mrow> </semantics></math>.</p> "> Figure 10
<p>The impact of <math display="inline"><semantics> <mi>γ</mi> </semantics></math> on NDARTS.</p> "> Figure 11
<p>The impact of <math display="inline"><semantics> <msub> <mi>γ</mi> <mi>α</mi> </msub> </semantics></math> on NDARTS.</p> "> Figure 12
<p>Performance of different algorithms on the CIFAR-10 dataset.</p> "> Figure 13
<p>Performance of different algorithms on the CIFAR-100 dataset.</p> "> Figure 14
<p>Normal cell model found by NDARTS in DARTS search space.</p> "> Figure 15
<p>Reduction cell model found by NDARTS in DARTS search space.</p> "> Figure 16
<p>Performance of different algorithms on the CIFAR-10 dataset in the NAS-Bench-201 search space.</p> "> Figure 17
<p>Performance of different algorithms on the CIFAR-100 dataset in the NAS-Bench-201 search space.</p> "> Figure 18
<p>Performance of different algorithms on the ImageNet dataset in the NAS-Bench-201 search space.</p> "> Figure 19
<p>Test Acc of NDARTS and GDAS on three datasets.</p> "> Figure 20
<p>Cells found by NDARTS in the NAS-Bench-201 search space.</p> ">
Abstract
:1. Introduction
- Our proposed method, named NDARTS, utilizes the Neumann series to expand the super-gradient and employs approximations based on the representation of the super-gradient through the Implicit Function Theorem. Our experimental results demonstrate that NDARTS outperforms the baseline algorithm, DARTS, in terms of gradient approximation performance.
- We use an improved evolutionary strategy for parameter optimization between architecture searches, which is more in line with weight-sharing hyper-network design than gradient methods. The training design uses small batch samples to reduce the computational complexity during the training process.
- Our proposed algorithm achieves state-of-the-art performance in multiple datasets: CIFAR-10, CIFAR-100, and ImageNet.
2. Methods
2.1. Overview: DARTS
- Create a mixed operation parameterized by for each edge , initialize the hyper-net weight , and operate weight ;
- Train on the training dataset , update by ;
- Train on the validation dataset , update by ;
- Stop when the termination condition is met, otherwise return to step 2.
2.2. DARTS Optimization Algorithm Based on Neumann Series Approximation (NDARTS)
2.2.1. NDARTS
- Create a mixed operation parameterized by for each edge , initialize the hyper-net weight , and operate weight ;
- Train on the training dataset , using to update ;
- Train on the validation dataset , using to update ;
- Stop when the termination condition is met, otherwise return to step 2.
2.2.2. Proof
2.3. Evolutionary Strategy
- Select two individuals as parents for recombination from the current generation . The choice of parents is unbiased.
- Generate a new individual through the recombination of the selected parents.
- Perform mutation and evaluation on the new individuals. At the end of the iteration, select superior individuals from the set of offspring and parents to form a new generation of .
- Randomly sampling noise , , …, (0, 1) from Gaussian distribution;
- Calculate the loss function value corresponding to each parameter . Add individual and corresponding fitness function value to the population;
- Denote the smallest term in as , update the parameter ;
- Stop when the termination condition is met, otherwise return to step 1.
3. Experiment
3.1. Simulation Experiment
3.1.1. Dataset
3.1.2. Algorithm Settings
- When optimizing neural network weight parameters through the gradient descent method, the optimization of shallower parameters relies on layer-by-layer back-propagation starting from the output layer, while the optimization of deeper parameters also relies on the feature data corresponding to the output values of shallower layers. Once subnetworks with different structures are obtained through training in this way, it is difficult to maintain the dependency relationship between shallow and deep layers, which can easily lead to significant deviations. During the training process using evolutionary algorithms, each weight parameter within the neural network holds a unique and independent position. This allows for individualized training and optimization of each parameter up to a certain extent. Then, different subnetworks can obtain better evaluation results when obtaining parameters from the super-network. Therefore, when conducting experiments in the DARTS search space, the improved evolutionary strategy in Section 2.3 was chosen to optimize the network weight parameters.
- Each epoch is trained using a random small batch of samples [37], which reduces the computational complexity of each epoch while maintaining the training effect. At the same time, the algorithm can break through the sample data size limit and can be extended to large data volumes for calculation.
3.1.3. Search Space
3.1.4. Ablation Experiment
3.1.5. Performance Experiment
Algorithm | Val Error (%) | Test Error (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|
RandomNAS [39] * | —— | 2.65 | 4.3 | DARTS | Random search |
AmoebaNet * | —— | 3.34 | 5.1 | NASNet | EA |
NASNet * | —— | 2.65 | 5.3 | NASNet | RL |
PNAS * | —— | 3.41 | 5.1 | DARTS | Sequential |
ENAS * | —— | 2.89 | 4.6 | NASNet | RL |
GDAS * [40] | —— | 2.93 | 3.4 | DARTS | Gradient |
SETN * [41] | —— | 2.69 | 4.6 | DARTS | Gradient |
FairDARTS [38] | 3.82 | 2.54 | 2.83 | DARTS | Gradient |
PDARTS | 3.99 | 2.50 | 3.4 | DARTS | Gradient |
PC-DARTS | 3.88 | 2.57 | 3.6 | DARTS | Gradient |
DARTS | 4.38 | 2.76 | 3.4 | DARTS | Gradient |
NDARTS | 3.98 | 2.37 | 3.8 | DARTS | Gradient |
Algorithm | Val Error (%) | Test Error (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|
RandomNAS * | —— | 17.63 | 4.3 | DARTS | Random |
NASNet-A * | —— | 17.81 | 3.3 | NASNet | RL |
PNAS * | —— | 17.63 | 3.2 | DARTS | Sequential |
GDAS * | —— | 18.38 | 3.4 | DARTS | Gradient |
SETN * | —— | 17.25 | 4.6 | DARTS | Gradient |
PDARTS | 19.97 | 17.4 | 3.4 | DARTS | Gradient |
PC-DARTS | 20.25 | 17.11 | 3.6 | DARTS | Gradient |
DARTS | 21.42 | 17.54 | 3.4 | DARTS | Gradient |
NDARTS | 16.17 | 16.02 | 3.8 | DARTS | Gradient |
Algorithm | Top1 Test Error (%) | Top5 Test Error (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|
RandomNAS * | 27.1 | —— | 4.3 | DARTS | Random |
AmoebaNet-A * | 25.5 | 8.0 | 5.1 | NASNet | EA |
NASNet-A * | 26.0 | 8.4 | 3.3 | NASNet | RL |
PNAS * | 25.8 | 8.1 | 3.2 | DARTS | Sequential |
GDAS * | 26.0 | 8.5 | 3.4 | DARTS | Gradient |
SETN * | 25.7 | 8.0 | 4.6 | DARTS | Gradient |
SharpDARTS [42] | 25.1 | 7.8 | 4.9 | DARTS | Gradient |
PDARTS * | 24.4 | 7.4 | 3.4 | DARTS | Gradient |
PC-DARTS * | 25.1 | 7.8 | 3.6 | DARTS | Gradient |
DARTS | 26.9 | 8.7 | 3.4 | DARTS | Gradient |
NDARTS | 24.3 | 7.3 | 3.8 | DARTS | Gradient |
3.2. Experiment Results
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
NAS | Neural Architecture Search |
RL | Reinforcement |
EA | Evolutionary Algorithm |
MDPI | Multidisciplinary Digital Publishing Institute |
Appendix A
References
- Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv 2016, arXiv:1609.08144. [Google Scholar]
- Chen, M.X.; Firat, O.; Bapna, A.; Johnson, M.; Macherey, W.; Foster, G.; Jones, L.; Parmar, N.; Schuster, M.; Chen, Z.; et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. arXiv 2016, arXiv:1804.09849. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations.Computational and Biological Learning Society, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation Tech Report (v5). arXiv 2014, arXiv:1311.2524. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Dong, X.; Huang, J.; Yang, Y.; Yan, S. More is Less: A More Complicated Network with Less Inference Complexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1997–2017. [Google Scholar]
- Baker, B.; Gupta, O.; Naik, N.; Raskar, R. Designing Neural Network Architectures using Reinforcement Learning. arXiv 2016, arXiv:1611.02167. [Google Scholar]
- Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. 2022, 54, 1–34. [Google Scholar] [CrossRef]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. arXiv 2017, arXiv:1707.07012. [Google Scholar]
- Enzo, L.A.; Eduardo, L.; Vasty, Z.; Claudia, R.; John, M. Neural Architecture Search with Reinforcement Learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Cai, H.; Chen, T.; Zhang, W.; Yu, Y.; Wang, J. Efficient Architecture Search by Network Transformation. arXiv 2017, arXiv:1707.04873. [Google Scholar] [CrossRef]
- Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized Evolution for Image Classifier Architecture Search. Proc. Aaai Conf. Artif. Intell. 2018, 33, 4780–4789. [Google Scholar] [CrossRef]
- Liu, H.; Simonyan, K.; Vinyals, O.; Fernando, C.; Kavukcuoglu, K. Hierarchical Representations for Efficient Architecture Search.International Conference on Learning Representations. arXiv 2018, arXiv:1711.00436. [Google Scholar]
- Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-Scale Evolution of Image Classifiers. arXiv 2017, arXiv:1703.01041. [Google Scholar]
- Stanley, K.O.; Miikkulainen, R. Evolving neural networks through augmenting topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef] [PubMed]
- Xie, L.; Yuille, A. Genetic CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Wei, H.; Li, L.-J.; Li, F.-F.; Yuille, A.; Huang, J.; Murphy, K. Progressive Neural Architecture Search. arXiv 2017, arXiv:1712.00559. [Google Scholar]
- Kandasamy, K.; Neiswanger, W.; Schneider, J.; Póczos, B.; Xing, E.P. Neural Architecture Search with Bayesian Optimisation and Optimal Transport. arXiv 2018, arXiv:1802.07191. [Google Scholar]
- Negrinho, R.; Gordon, G. DeepArchitect: Automatically Designing and Training Deep Architectures. arXiv 2017, arXiv:1704.08792. [Google Scholar]
- Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. arXiv 2018, arXiv:1806.09055. [Google Scholar]
- Chen, X.; Xie, L.; Wu, J.; Tian, Q. Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation. arXiv 2019, arXiv:1904.12760. [Google Scholar]
- Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.J.; Tian, Q.; Xiong, H. PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search. arXiv 2019, arXiv:1907.05737. [Google Scholar]
- Cai, H.; Zhu, L.; Han, S. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. arXiv 2018, arXiv:1812.00332. [Google Scholar]
- Zela, A.; Elsken, T.; Saikia, T.; Marrakchi, Y.; Brox, T.; Hutter, F. Understanding and Robustifying Differentiable Architecture Search. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Chen, X.; Hsieh, C.J. Stabilizing Differentiable Architecture Search via Perturbation-based Regularization. arXiv 2020, arXiv:2002.05283. [Google Scholar]
- Xie, S.; Zheng, H.; Liu, C.; Lin, L. SNAS: Stochastic Neural Architecture Search. arXiv 2018, arXiv:1812.09926. [Google Scholar]
- Liang, H.; Zhang, S.; Sun, J.; He, X.; Huang, W.; Zhuang, K.; Li, Z. DARTS+: Improved Differentiable Architecture Search with Early Stopping. arXiv 2019, arXiv:1909.06035. [Google Scholar]
- Hou, P.; Jin, Y.; Chen, Y. Single-DARTS: Towards stable architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 373–382. [Google Scholar]
- Zhu, X.; Li, J.; Liu, Y.; Wang, W. Improving Differentiable Architecture Search via Self-Distillation. arXiv 2023, arXiv:2302.05629. [Google Scholar] [CrossRef]
- Salimans, T.; Ho, J.; Chen, X.; Sidor, S.; Sutskever, I. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv 2017, arXiv:1703.03864. [Google Scholar]
- Dong, X.; Yang, Y. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search. arXiv 2020, arXiv:2001.00326. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. In Handbook of Systemic Autoimmune Diseases; 2009; Volume 1, Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 7 October 2023).
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
- Chu, X.; Zhou, T.; Zhang, B.; Li, J. Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search. arXiv 2019, arXiv:1911.12126. [Google Scholar]
- Liam, L.I.; University, C.M.; Determined, A.I. Random Search and Reproducibility for Neural Architecture Search. arXiv 2019, arXiv:1902.07638. [Google Scholar]
- Dong, X.; Yang, Y. Searching for A Robust Neural Architecture in Four GPU Hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Dong, X.; Yang, Y. One-Shot Neural Architecture Search via Self-Evaluated Template Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019 ICCV, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Hundt, A.; Jain, V.; Hager, G.D. SharpDARTS: Faster and More Accurate Differentiable Architecture Search. arXiv 2019, arXiv:1903.09900. [Google Scholar]
Algorithm | Val Acc (%) | Test Acc (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|
RandomNAS | 72.42 | 76.10 | 0.4 | NAS-Bench-201 | Random |
RL | 89.83 | 92.20 | 0.4 | NAS-Bench-201 | RL |
ENAS | 39.77 | 54.30 | 0.1 | NAS-Bench-201 | RL |
EA | 91.18 | 93.40 | 1.0 | NAS-Bench-201 | EA |
GDAS | 89.78 | 93.49 | 0.8 | NAS-Bench-201 | Gradient |
SETN | 84.24 | 87.67 | 0.3 | NAS-Bench-201 | Gradient |
DARTS | 39.77 | 54.30 | 0.1 | NAS-Bench-201 | Gradient |
NDARTS | 89.94 | 93.67 | 1.3 | NAS-Bench-201 | Gradient |
Algorithm | Val Acc (%) | Test Acc (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|
RandomNAS | 47.54 | 46.94 | 0.4 | NAS-Bench-201 | Random |
RL | 68.72 | 67.80 | 0.4 | NAS-Bench-201 | RL |
ENAS | 15.03 | 15.61 | 0.1 | NAS-Bench-201 | RL |
EA | 70.88 | 71.00 | 1.0 | NAS-Bench-201 | EA |
GDAS | 71.28 | 70.28 | 0.8 | NAS-Bench-201 | Gradient |
SETN | 58.81 | 58.87 | 0.3 | NAS-Bench-201 | Gradient |
DARTS | 15.03 | 15.61 | 0.1 | NAS-Bench-201 | Gradient |
NDARTS | 71.37 | 70.91 | 1.3 | NAS-Bench-201 | Gradient |
Algorithm | Val Acc (%) | Test Acc (%) | Param (m) | Search Space | Search Method |
---|---|---|---|---|---|
RandomNAS | 16.53 | 15.63 | 0.4 | NAS-Bench-201 | Random |
RL | 41.90 | 42.23 | 0.4 | NAS-Bench-201 | RL |
ENAS | 16.43 | 16.32 | 0.1 | NAS-Bench-201 | RL |
EA | 44.03 | 44.23 | 1.0 | NAS-Bench-201 | EA |
GDAS | 43.47 | 43.10 | 0.8 | NAS-Bench-201 | Gradient |
SETN | 33.04 | 32.37 | 0.3 | NAS-Bench-201 | Gradient |
DARTS | 16.43 | 16.32 | 0.1 | NAS-Bench-201 | Gradient |
NDARTS | 40.66 | 41.02 | 1.3 | NAS-Bench-201 | Gradient |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Han, X.; Li, C.; Wang, Z.; Liu, G. NDARTS: A Differentiable Architecture Search Based on the Neumann Series. Algorithms 2023, 16, 536. https://doi.org/10.3390/a16120536
Han X, Li C, Wang Z, Liu G. NDARTS: A Differentiable Architecture Search Based on the Neumann Series. Algorithms. 2023; 16(12):536. https://doi.org/10.3390/a16120536
Chicago/Turabian StyleHan, Xiaoyu, Chenyu Li, Zifan Wang, and Guohua Liu. 2023. "NDARTS: A Differentiable Architecture Search Based on the Neumann Series" Algorithms 16, no. 12: 536. https://doi.org/10.3390/a16120536
APA StyleHan, X., Li, C., Wang, Z., & Liu, G. (2023). NDARTS: A Differentiable Architecture Search Based on the Neumann Series. Algorithms, 16(12), 536. https://doi.org/10.3390/a16120536