Abstract
For deep model training, an optimization technique is required that minimizes loss and maximizes accuracy. The development of an effective optimization method is one of the most important study areas. The diffGrad optimization method uses gradient changes during optimization phases but does not update 2nd order moments based on 1st order moments, and the AngularGrad optimization method uses the angular value of the gradient, which necessitates additional calculation. Due to these factors, both of those approaches result in zigzag trajectories that take a long time and require additional calculations to attain a global minimum. To overcome those limitations, a novel adaptive deep learning optimization method based on square of first momentum (sqFm) has been proposed. By adjusting 2nd order moments depending on 1st order moments and changing step size according to the present gradient on the non-negative function, the suggested sqFm delivers a smoother trajectory and better image classification accuracy. The empirical research comparing the performance of the proposed sqFm with Adam, diffGrad, and AngularGrad applying non-convex functions demonstrates that the suggested method delivers the best convergence and parameter values. In comparison to SGD, Adam, diffGrad, RAdam, and AngularGrad(tan) using the Rosenbrock function, the proposed sqFm method can attain the global minima gradually with less overshoot. Additionally, it is demonstrated that the proposed sqFm gives consistently good classification accuracy when training CNN networks (ResNet16, ResNet50, VGG34, ResNet18, and DenseNet121) on the CIFAR10, CIFAR100, and MNIST datasets, in contrast to SGDM, diffGrad, Adam, AngularGrad(Cos), and AngularGrad(Tan). The proposed method also gives the best classification accuracy than SGD, Adam, AdaBelief, Yogi, RAdam, and AngularGrad using the ImageNet dataset on the ResNet18 network. Source code link: https://github.com/UtpalNandi/sqFm-A-novel-adaptive-optimization-scheme-for-deep-learning-model.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
Data will be made available on reasonable request.
Code availability
Custom code is available.
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–44. https://doi.org/10.1038/nature14539
Subramanian M, Shanmugavadivel K, Nandhini P (2022) On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07246-w
Ikonomakis E, Kotsiantis S, Tampakas V (2005) Text classification using machine learning techniques. WSEAS Trans Comput 4:966–974
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738
Ren F, Bracewell D (2009) Advanced information retrieval. Electron Notes Theor Comput Sci 225:303–317. https://doi.org/10.1016/j.entcs.2008.12.082
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, NIPS’12, vol 1, pp 1097–1105. Curran Associates Inc., Red Hook
Maas AL, Hannun AY, Ng AY (2013) Rectifier Nonlinearities Improve Neural Netwo rk Acoustic Models. In: Proceedings of the 30th International Conference on Machine Learning, Vol 28, No. 1, pp 3. http://ai.stanford.edu/~amaas/papers/relu/_hybrid/_icml2013/_final.pdf
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (ELUS). Under Review of ICLR2016 (1997)
Shaziya H (2020). A study of the optimization algorithms in deep learning. https://doi.org/10.1109/ICISC44355.2019.9036442
Dubey SR, Chakraborty S, Roy S, Mukherjee S, Singh S, Chaudhuri B (2019) diffgrad: an optimization method for convolutional neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2955777
Yong H, Huang J, Hua X, Zhang L (2020) Gradient centralization: a new optimization technique for deep neural networks. https://doi.org/10.1007/978-3-030-58452-8_37
Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations. https://openreview.net/forum?id=ryQu7f-RZ
Roy SK, Paoletti ME, Haut JM, Dubey SR, Kar P, Plaza A, Chaudhuri BB (2021) AngularGrad: a new optimization technique for angular convergence of convolutional neural networks. https://doi.org/10.48550/arXiv.2105.10190
Heo B, Chun S, Oh SJ, Han D, Yun S, Kim G, Uh Y, Ha J-W (2021) Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. In: International conference on learning representations. https://openreview.net/forum?id=Iz3zU3M316D
Bhakta S, Nandi U, Si T, Ghosal S, Changdar C, Pal, R (2022) Diffmoment: an adaptive optimization technique for convolutional neural network. Appl Intell. https://doi.org/10.1007/s10489-022-04382-7
Bhakta S, Nandi U, Changdar C, Marjit Singh M (2023) Angularparameter: a novel optimization technique for deep learning models. In: Sisodia DS, Garg L, Pachori RB, Tanveer M (eds) Machine intelligence techniques for data analysis and signal processing. Springer, Singapore, pp 201–212. https://doi.org/10.1007/978-981-99-0085-5_17
Wang H, Luo Y, An W, Sun Q, Xu J, Zhang L (2020) PID controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst 31(12):5079–5091. https://doi.org/10.1109/TNNLS.2019.2963066
Huang H, Wang C, Dong B (2019) Nostalgic adam: weighting more of the past gradients when designing the adaptive learning rate. https://doi.org/10.24963/ijcai.2019/355
Zaheer M, Reddi S, Sachan D, Kale S, Kumar S (2018) Adaptive methods for nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf
Alzubaidi L, Zhang J, Humaidi A, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. https://doi.org/10.1186/s40537-021-00444-8
Nandi U, Ghorai A, Singh M, Changdar C, Bhakta S, Pal R (2022) Indian sign language alphabet recognition system using CNN with diffgrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4
Ghorai A, Nandi U, Changdar C, Si T, Singh M, Mondal JK (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08860-y
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Lechevallier Y, Saporta G (eds) Proceedings of COMPSTAT’2010. Physica-Verlag, Heidelberg, pp 177–186
Robbins HE (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151. https://doi.org/10.1016/S0893-6080(98)00116-6
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning, Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html
Botev A, Lever G, Barber D (2017) Nesterov’s accelerated gradient and momentum as approximations to regularised update descent. In: 2017 international joint conference on neural networks (IJCNN), pp 1899–1903. https://doi.org/10.1109/IJCNN.2017.7966082
Zaheer R, Shaziya H (2019) A study of the optimization algorithms in deep learning. In: 2019 3rd international conference on inventive systems and control (ICISC), pp 536–539. https://doi.org/10.1109/ICISC44355.2019.9036442
Park S, Jung S, Pardalos P (2020) Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J Optim Theory Appl. https://doi.org/10.1007/s10957-019-01624-6
Fang J-K, Fong C-M, Yang P, Hung C-K, Lu W-l, Chang C-W (2020) Adagrad gradient descent method for AI image management. In: 2020 IEEE international conference on consumer electronics—Taiwan (ICCE-Taiwan), pp 1–2
Lydia A, Francis S (2019) Adagrad—an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6:566–568
Zeiler MD (2012) ADADELTA: an adaptive learning rate method. CoRR arXiv:1212.5701
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings. arXiv:1412.6980
Zhang Z (2018) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS), pp 1–2. https://doi.org/10.1109/IWQoS.2018.8624183
Chouzenoux E, Fest J-B (2022) Sabrina: a stochastic subspace majorization-minimization algorithm. J Optim Theory Appl 195(3):919–952. https://doi.org/10.1007/s10957-022-02122-y
Mustapha A, Mohamed L, Ali K (2021) Comparative study of optimization techniques in deep learning: Application in the ophthalmology field. J Phys: Confer Ser 1743(1):012002. https://doi.org/10.1088/1742-6596/1743/1/012002
Lacotte J, Pilanci M (2020) All local minima are global for two-layer Relu neural networks: the hidden convex optimization landscape. arXiv:2006.05900
Kawaguchi K, Kaelbling L (2020) Elimination of all bad local minima in deep learning. In: Chiappa S, Calandra R (eds) Proceedings of the 23rd international conference on artificial intelligence and statistics. Proceedings of machine learning research, vol 108. PMLR, pp 853–863. https://proceedings.mlr.press/v108/kawaguchi20b.html
Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276. https://doi.org/10.1162/089976698300017746
Dorronsoro JR, González AM, Cruz CS (2001) Natural gradient learning in NLDA networks. In: Proceedings of the 6th international work-conference on artificial and natural neural networks: connectionist models of neurons, learning processes and artificial intelligence—part I. IWANN ’01. Springer, Berlin, Heidelberg, pp 427–434
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: International conference on learning representations. https://openreview.net/forum?id=rkgz2aEKDr
Zhuang J, Tang TM, Ding Y, Tatikonda SC, Dvornek NC, Papademetris X, Duncan JS (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.2010.07468
Defazio A, Jelassi S (2022) Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization. J Mach Learn Res 23(144): 1–34. http://jmlr.org/papers/v23/21-0226.html
Jain P, Kar P (2017) Non-convex optimization for machine learning. Found Trends Mach Learn 10(3–4):142–363. https://doi.org/10.1561/2200000058
Danilova M, Dvurechensky P, Gasnikov A, Gorbunov E, Guminov S, Kamzolov D, Shibaev I (2022) In: Nikeghbali A, Pardalos PM, Raigorodskii AM, Rassias MT (eds) Recent theoretical advances in non-convex optimization. Springer, Cham, pp 79–163. https://doi.org/10.1007/978-3-031-00832-0_3
Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184. https://doi.org/10.1093/comjnl/3.3.175. https://academic.oup.com/comjnl/article-pdf/3/3/175/988633/030175.pdf
Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Process Mag 29(6):141–142
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Zhang Z-H, Yang Z, Sun Y, Wu Y-F, Xing Y-D (2019) Lenet-5 convolution neural network with mish activation function and fixed memory step gradient descent method. In: 2019 16th international computer conference on wavelet active media technology and information processing, pp 196–199. https://doi.org/10.1109/ICCWAMTIP47768.2019.9067661
Tripathi AM, Mishra A (2022) Revamped knowledge distillation for sound classification. In: 2022 international joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892474
Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. https://doi.org/10.1109/BigData.2018.8622141
Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision—ECCV 2014. Springer, Cham, pp 740–755
Acknowledgements
We’d like to thank to the Dept. of Computer Science, Vidyasagar University, Paschim Medinipur, Midnapore 721102, West Bengal, India to provide infrastructures to carry out our experiments.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Shubhankar Bhakta: Conceptualization, Implementation and Drafting; Utpal Nandi: Conceptualization, Investigation, Methodology, Analysis, and Supervision; Others: Review and Editing.
Corresponding author
Ethics declarations
Conflict of interest
There is no conflicts of interest/competing interest.
Ethical approval
The authors approve that the research presented in this paper is conducted following the principles of ethical and professional conduct.
Consent to participate
Not applicable.
Consent for publication
Not applicable, the authors used publicly available data only and provide the corresponding references.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhakta, S., Nandi, U., Mondal, M. et al. sqFm: a novel adaptive optimization scheme for deep learning model. Evol. Intel. 17, 2479–2494 (2024). https://doi.org/10.1007/s12065-023-00897-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-023-00897-1