Abstract
To improve the performance of gradient descent learning algorithms, the impact of different types of norms is studied for deep neural network training. The performance of different norm types used on both finite-time and fixed-time convergence algorithms are compared. The accuracy of the multiclassification task realized by three typical algorithms using different types of norms is given, and the improvement of Jorge’s finite time algorithm with momentum or Nesterov accelerated gradient is also studied. Numerical experiments show that the infinity norm can provide better performance in finite time gradient descent algorithms and give strong robustness under different network structures.
The authors were supported by the Australian Research Council (ARC) under Discovery Program Grant DP200101197.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Olah, C.: Neural networks, manifolds, and topology. Blog post (2014)
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems 20 (2007)
Qian, N.: On the momentum term in gradient descent learning algorithms. Neural Netw. 12(1), 145–151 (1999)
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence o (1/k\(\hat{\,}\) 2). In: Doklady an ussr, vol. 269, pp. 543–547 (1983)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7) (2011)
Tieleman, T., Hinton, G.: Neural networks for machine learning. Technical report (2011). http://www.cs.toronto.edu/tijmen/csc321/slides/lectureslideslec6.pdf
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Cortés, J.: Finite-time convergent gradient flows with applications to network consensus. Automatica 42(11), 1993–2000 (2006)
Wibisono, A., Wilson, A.C., Jordan, M.I.: A variational perspective on accelerated methods in optimization. Proc. Natl. Acad. Sci. 113(47), E7351–E7358 (2016)
Romero, O., Benosman, M.: Finite-time convergence in continuous-time optimization. In: International Conference on Machine Learning, pp. 8200–8209. PMLR (2020)
Garg, K., Panagou, D.: Fixed-time stable gradient flows: applications to continuous-time optimization. IEEE Trans. Autom. Control 66(5), 2002–2015 (2020)
Gradshteyn, I.S., Ryzhik, I.M.: Table of integrals, series, and products. Academic Press (2014)
Pugh, C.C.: Real Mathematical Analysis, vol. 2011. Springer, Cham (2002). https://doi.org/10.1007/978-0-387-21684-3
Weisstein, E.W.: Vector norm (2002). https://mathworld.wolfram.com/
Ng, A.Y.: Feature selection, l 1 vs. l 2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine learning, p. 78 (2004)
Wassermann, A.J.: Functional analysis (1999)
Conrad, K.: Equivalence of norms. In: Expository Paper, University of Connecticut, Storrs, heruntergeladen von, vol. 17, no. 2018 (2018)
Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (2013)
Gongqing, Z., Yuanqu, L.: Functional Analysis Lecture Notes. Peaking University Press (1990). (in Chinese)
Karpathy, A.: Cs231n convolutional neural networks for visual recognition (2017). cs231n.github.io. Dostopno na. http://cs231n.github.io
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, L., Yu, X., Li, C., Eberhard, A., Nguyen, L.T., Doan, C.T. (2022). Impact of Mathematical Norms on Convergence of Gradient Descent Algorithms for Deep Neural Networks Learning. In: Aziz, H., Corrêa, D., French, T. (eds) AI 2022: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13728. Springer, Cham. https://doi.org/10.1007/978-3-031-22695-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-22695-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22694-6
Online ISBN: 978-3-031-22695-3
eBook Packages: Computer ScienceComputer Science (R0)