A block-coordinate approach of multi-level optimization with an application to physics-informed neural networks

Serge Gratton¹,
Valentin Mercier^2,3,
Elisa Riccietti ORCID: orcid.org/0000-0003-0864-6116⁴ &
…
Philippe L. Toint⁵

166 Accesses
Explore all metrics

Abstract

Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation complexity. We apply it to the solution of partial differential equations using physics-informed neural networks (PINNs) and consider two different types of neural architectures, a generic feedforward network and a frequency-aware network. We show that our approach is particularly effective if coupled with these specialized architectures and that this coupling results in better solutions and significant computational savings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Physics-Informed Deep Neural Operator Networks

Hybrid acceleration techniques for the physics-informed neural networks: a comparative analysis

Article 20 December 2023

Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations

Article Open access 31 July 2023

Data availibility

The tests presented in this manuscript do not rely on any particular dataset and they can be reproduced by the information provided in the numerical results sections.

Notes

This assumption is standard in complexity analysis and prevents arbitrary change in the gradient over short steps. In particular, it prevents the occurence of infinite gradients at iterates, which would cause the algorithm to fail. It is more general than assuming bounded gradients, as can be seen by considering convex quadratics. See also the discussion at the end of the present section.
Technically, the objective function must be bounded below, have a bounded directional subgradient map and a finite nonconvexity modulus.

References

Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial, 2nd edn. SIAM, Philadelphia, USA (2000)
Book Google Scholar
Trottenberg, U., Oosterlee, C.W., Schüller, A.: Multigrid. Elsevier, Amsterdam, The Netherlands (2001)
Google Scholar
Nash, S.G.: A multigrid approach to discretized optimization problems. Optim. Methods Softw. 14, 99–116 (2000)
Article MathSciNet Google Scholar
Gratton, S., Sartenaer, A. and Toint, Ph. L., On recursive multiscale trust-region algorithms for unconstrained minimization. In F. Jarre, C. Lemaréchal, and J. Zowe, editors, Oberwolfach Reports: Optimization and Applications, (2005)
Gratton, S., Sartenaer, A., Toint, Ph.L.: Second-order convergence properties of trust-region methods using incomplete curvature information, with an application to multigrid optimization. J. Comput. Appl. Math. 24(6), 676–692 (2006)
MathSciNet Google Scholar
Gratton, S., Sartenaer, A., Toint, Ph.L.: Recursive trust-region methods for multiscale nonlinear optimization. SIAM J. Optim. 19(1), 414–444 (2008)
Article MathSciNet Google Scholar
Calandra, H., Gratton, S., Riccietti, E., Vasseur, X.: On high-order multilevel optimization strategies. SIAM J. Optim. 31(1), 307–330 (2021)
Article MathSciNet Google Scholar
Kopaničáková, A., Krause, R.: Globally convergent multilevel training of deep residual networks. SIAM J. Sci. Comput. 45, S254–S280 (2022)
Article MathSciNet Google Scholar
Haber, E., Ruthotto, L., Holtham, E., and Jun, S.-H. : Learning across scales—multiscale methods for convolution neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence, pages 3142–3148, (2018)
Wu, C.-Y. , Girshick, R., He, K., Feichtenhofer, Ch., and Krahenbuhl, Ph.: A multigrid method for efficiently training video models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 153–162, (2020)
Von Planta, C., Kopaničáková, A., and Krause, R.: Training of deep residual networks with stochastic MG/OPT. arXiv:2108.04052v1
Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Prob. 34(1), 014004 (2017)
Article MathSciNet Google Scholar
Tsung-Wei, K., Maire, M., and Yu, S. X.: Multigrid neural architectures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6665–6673, (2017)
Chung, J., Ahn, S., Bengio, Y.: Hierarchical multiscale recurrent neural networks. arXiv:1609.01704, (2016)
Perdikaris, P., Raissi, M., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)
Article MathSciNet Google Scholar
Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 42, 981–1022 (2022)
Article MathSciNet Google Scholar
Shin, Y.: On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys. 28(5), 2042–2074 (2020)
Article MathSciNet Google Scholar
Cai, S., Mao, Z., Wang, Z., Yin, M., Karniadakis, G.E.: Physics-informed neural networks (PINNs) for fluid mechanics: a review. Acta. Mech. Sin. 37(12), 1727–1738 (2021)
Article MathSciNet Google Scholar
Wang, S., Wang, H., Perdikaris, P.: On the eigenvector bias of Fourier feature networks: from regression to solving multi-scale PDEs with physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 384, 113938 (2021)
Article MathSciNet Google Scholar
Xu, Z.-Q.J.: Frequency principle: Fourier analysis sheds light on deep neural networks. Commun. Comput. Phys. 28(5), 1746–1767 (2020)
Article MathSciNet Google Scholar
Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F., Bengio, Y., Courville. A.: (2019) On the spectral bias of neural networks. In International Conference On Machine Learning, pages 5301–5310,
Ronen, B., Jacobs, D., Kasten, Y., and Kritchman, S.: The convergence rate of neural networks for learned functions of different frequencies. Proceedings of the 33rd international conference on neural information processing systems, page 4761-4771, (2019)
Wang, Sifan, Xinling, Yu., Perdikaris, Paris: When and why pinns fail to train: a neural tangent kernel perspective. J. Comput. Phys. 449, 110768 (2022)
Article MathSciNet Google Scholar
Farhani, G., Kazachek, A., Wang, B.: Momentum diminishes the effect of spectral bias in physics-informed neural networks. (2022) arXiv:2206.14862,
Liu, Z., Cai, W., Xu, Z.-Q.J.: Multi-scale deep neural network (MscaleDNN) for solving Poisson-Boltzmann equation in complex domains. Commun. Comput. Phys. 28(5), 1970–2001 (2020)
Article MathSciNet Google Scholar
Li, X.-A.: A multi-scale DNN algorithm for nonlinear elliptic equations with multiple scales. Commun. Comput. Phys. 28(5), 1886–1906 (2020)
Article MathSciNet Google Scholar
Hackbusch, W.: Multi-grid Methods and Applications. Number 4 in Springer Series in Computational Mathematics. Springer Verlag, Heidelberg, Berlin, New York, (1995)
Borzi, A., Kunisch, K.: A globalisation strategy for the multigrid solution of elliptic optimal control problems. Optim. Methods Softw. 21(3), 445–459 (2006)
Article MathSciNet Google Scholar
Gross, Ch., Krause, R.: On the globalization of ASPIN employing trust-region control strategies - convergence analysis and numerical examples. Technical Report 2011-03, Universita della Svizzera italiana, Lugano, CH, (2011)
Richtárik, Peter, Takáč, Martin: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
Article MathSciNet Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods for huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Article MathSciNet Google Scholar
Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)
Article MathSciNet Google Scholar
Wright, S.J.: Coordinate descent algorithms. Math. Program., Series A 151(1), 3–34 (2015)
Article MathSciNet Google Scholar
Amaral, V.S., Andreani, R., Birgin, E.G., Marcondes, D.S., Martínez, J.M.: On complexity and convergence of high-order coordinate descent algorithms for smooth nonconvex box-constrained minimization. J. Global Optim. 84(3), 527–561 (2022)
Article MathSciNet Google Scholar
Cartis, C., Gould, N.I.M., Toint, Ph. L.: Evaluation complexity of algorithms for nonconvex optimization. Number 30 in MOS-SIAM Series on Optimization. SIAM, Philadelphia, USA, (2022)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Applied Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands (2004)
Book Google Scholar
Bottou, Léon., Curtis, Frank E., Nocedal, Jorge: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet Google Scholar
Stefania, B., Della Santa, F., and Alessandra P.: Alternate training of shared and task-specific parameters for multi-task neural networks. arXiv preprint arXiv:2312.16340, (2023)
Jordan, M. I. , Lin, T., and Zampetakis, M.: On the complexity of deterministic nonsmooth and nonconvex optimization. arXiv:2209.12403, (2022)
Tian, L., and Man-Cho So, A.: Computing Goldstein $(\epsilon ,\delta )$-stationary points of Lipschitz functions in $\cal{O}(\epsilon ^{-3}\delta ^{-1})$ iterations via random conic perturbation. arXiv:2112.09002, (2021)
Zhang, J., Lin, H., Jegelka, S., Sra, S., and Jadbabaie, A.: Complexity of finding stationary points of nonconvex nonsmooth functions. In Proceedings of the 37th international conference on machine learning, PMLR, volume 119, pages 11173–11182, (2020)
Kong, S., A. Lewis, S.: The cost of nonconvexity in nonconvex nonsmooth optimization. (2022) arXiv:2210.00652,
Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program., Series A 13(1), 14–22 (1977)
Article MathSciNet Google Scholar
Rapaka, N.R., Samtaney, R.: An efficient Poisson solver for complex embedded boundary domains using the multi-grid and fast multipole methods. J. Comput. Phys. 410, 109387 (2020)
Article MathSciNet Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In Proceedings in the International Conference on Learning Representations (ICLR), (2015)
Tancik, Matthew, Srinivasan, Pratul, Mildenhall, Ben, Fridovich-Keil, Sara, Raghavan, Nithin, Singhal, Utkarsh, Ramamoorthi, Ravi, Barron, Jonathan, Ng, Ren: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural. Inf. Process. Syst. 33, 7537–7547 (2020)
Google Scholar
Li, X.-A., Xu, Z.-Q. J. , and Zhang, L.: Subspace decomposition based DNN algorithm for elliptic type multi-scale PDEs. (2021) arXiv:2112.06660,
Yu, B., et al.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6, 1–12 (2018)
Article MathSciNet Google Scholar

Download references

Funding

Work partially supported by 3IA Artificial and Natural Intelligence Toulouse Institute (ANITI), French “Investing for the Future - PIA3” program under the Grant agreement ANR-19-PI3A-0004 and by the GDR ISIS project MOMIGS.

Author information

Authors and Affiliations

INP-ENSEEIHT, IRIT, Université de Toulouse, Toulouse, France
Serge Gratton
ANITI, CERFACS, IRIT, Université de Toulouse, Toulouse, France
Valentin Mercier
BRL, Ingénierie, Nîmes, France
Valentin Mercier
ENS Lyon: Ecole normale superieure de Lyon, INRIA, ENSL, UCBL, CNRS, LIP UMR, Université de Lyon, 5668, Lyon, France
Elisa Riccietti
Namur Center for Complex Systems (naXys), University of Namur, Namur, Belgium
Philippe L. Toint

Authors

Serge Gratton
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Mercier
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Riccietti
View author publications
You can also search for this author in PubMed Google Scholar
Philippe L. Toint
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisa Riccietti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Proofs of convergence theorems

1.1 Proof of Theorem 3.1

Consider iteration k of ML-BCD and suppose that this iteration occurs in the minimization of subproblem $\mathcal{A}_i$ on the variables given by $x_i$. From the Lipschitz continuity of the gradient and (15), we obtain that

$$\begin{aligned} f(x_{k+1})&\le f(x_k) - \alpha \nabla _x f(x_k)^T \overline{\nabla ^1_{x_i}f(x_k)}+\frac{\alpha ^2L}{2}\Vert \overline{\nabla ^1_{x_i}f(x_k)}\Vert ^2\\&= f(x_k) - \alpha \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2+\frac{\alpha ^2L}{2}\Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 < f(x_k) - \frac{\alpha }{2} \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2. \end{aligned}$$

Since the minimization of subproblem $\mathcal{A}_i$ has not yet terminated, we must have that $\Vert \nabla ^1_{x_i}f(x_k)\Vert \ge \frac{\epsilon }{\sqrt{2}}$, and thus

$$\begin{aligned} f(x_{k+1}) < f(x_k) - \frac{\alpha }{2}\epsilon ^2. \end{aligned}$$

Summing this inequality on all iterations, we deduce that, for all k before termination of the ML-BCD algorithm,

$$\begin{aligned} f(x_0)- f_\textrm{low} \ge f(x_0)-f(x_{k+1}) = \sum _{j=0}^k \Big (f(x_j)-f(x_{j+1})\Big ) \ge \frac{(k+1)\alpha }{2}\epsilon ^2. \end{aligned}$$

This implies that the algorithm cannot take more than

$$\begin{aligned} \frac{2(f(x_0)- f_\textrm{low})}{\alpha \epsilon ^2}-1 \end{aligned}$$

iterations before it terminates, proving the theorem with

$$\begin{aligned} \kappa _* = \frac{2(f(x_0)- f_\textrm{low})}{\alpha }> 2L(f(x_0)- f_\textrm{low}). \end{aligned}$$

$\square $

1.2 Proof of Theorem 3.2

$$\begin{aligned} \mathbb {E}_{\xi _k}[ f(x_{k+1})]\le f(x_k)-\alpha \nabla ^1_x f(x_k)^T\mathbb {E}_{\xi _k}(\overline{g_{i}(x_k,\xi _k)})+\frac{L\alpha ^2}{2}\mathbb {E}_{\xi _k}(\Vert \overline{g_{i}(x_k,\xi _k)}\Vert ^2).\nonumber \\ \end{aligned}$$

(26)

Using assumption (17) and the bound on $\alpha $ we deduce that

$$\begin{aligned} \mathbb {E}_{\xi _k}[ f(x_{k+1})]- f(x_k)&\le -\alpha \mu \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 +\frac{\alpha ^2L\sigma _1^2}{2} +\frac{\alpha ^2L\sigma _2^2}{2} \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2\\&= \alpha \left( \frac{\alpha L\sigma _2^2}{2}-\mu \right) \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 + \frac{\alpha ^2L\sigma _1^2}{2}\\&\le -\frac{\alpha \mu }{2} \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 + \frac{\alpha ^2L\sigma _1^2}{2}, \end{aligned}$$

and, if k is a major iteration, we have that

$$\begin{aligned} \mathbb {E}_{\xi _k}[ f(x_{k+1})]- f(x_k) \le -\frac{\alpha \mu \tau }{2} \Vert \nabla ^1_x f(x_k)\Vert ^2 + \frac{\alpha ^2L\sigma _1^2}{2}. \end{aligned}$$

(27)

Taking the total expectation gives that

$$\begin{aligned} \mathbb {E}[ f(x_{k+1})- f(x_k)]&\le -\frac{\alpha \mu }{2} \mathbb {E}(\Vert \nabla ^1_{x_i}f(x_k)\Vert ^2) + \frac{\alpha ^2L\sigma _1^2}{2}. \end{aligned}$$

(28)

Consider now a major iteration k, denoting by $x_{k+\ell }$ (for $\ell \in \{1,\dots ,\ell _k\}$ and $\ell _k\le {\bar{\ell }}$) the iterations performed on the selected subproblem. We deduce from (28) that

$$\begin{aligned} \mathbb {E}[ f(x_{k+\ell _k})- f(x_k)]&=\sum _{\ell =1}^{ \ell _k-1}\mathbb {E}(f(x_{k+\ell +1})-f(x_{k+\ell }))+\mathbb {E}(f(x_{k+1})-f(x_k))\\&\le \sum _{\ell =1}^{ \ell _k-1} \left( -\frac{\alpha \mu }{2}\mathbb {E}(\Vert \nabla ^1_{x_i}f(x_{k+\ell })\Vert ^2)+\frac{\alpha ^2 L\sigma _1^2}{2}\right) \\&\quad -\frac{\alpha \mu }{2} \mathbb {E}(\Vert \nabla ^1_{x_i}f(x_k)\Vert ^2) + \frac{\alpha ^2L\sigma _1^2}{2}. \end{aligned}$$

If the minimization of the subproblem is not stopped, this means that $\Vert g_i(x_{k+\ell },\xi _{k+\ell })\Vert \ge c_\epsilon $, so that $\mathbb {E}_{\xi _{k+\ell }}(\Vert g_{i}(x_{k+\ell },\xi _{k+\ell })\Vert ^2)\ge c_\epsilon ^2$. From (17), we have that

$$\begin{aligned} \Vert \nabla _{x_i}^1 f(x_{k+\ell })\Vert ^2 \ge \frac{\mathbb {E}_{\xi _{k+\ell }}(\Vert g_{i}(x_{k+\ell },\xi _{k+\ell })\Vert ^2)- \sigma _1^2}{\sigma _2^2}\ge \frac{c_\epsilon ^2-\sigma _1^2}{\sigma _2^2} \end{aligned}$$

and, since k is a major iteration, from the condition for the choice of the blocks we have $ \Vert \nabla ^1_{x_i}f(x_k)\Vert >\tau \Vert \nabla ^1_x f(x_k)\Vert . $ Thus

$$\begin{aligned} \mathbb {E}[ f(x_{k+\ell _k})- f(x_k)]&\le -\frac{\alpha \mu {\bar{\ell }}}{2} \frac{c_\epsilon ^2-\sigma _1^2}{\sigma _2^2} -\frac{\alpha \mu \tau }{2} \mathbb {E}(\Vert \nabla ^1_x f(x_k)\Vert ^2) + \frac{\alpha ^2L\sigma _1^2({\bar{\ell }}+1)}{2}. \end{aligned}$$

Taking into account the fact that that f is bounded from below, summing up K major iterations and using the assumption $c_\epsilon >2\sigma _1^2$ we obtain:

$$\begin{aligned} f_{\min }-f(x_1)&\le \sum _{k=1}^K\mathbb {E}[ f(x_{k+\ell _k})- f(x_k)]\\&\le -\frac{\alpha \mu \tau }{2} \sum _{k=1}^K\mathbb {E}(\Vert \nabla ^1_x f(x_k)\Vert ^2)+\frac{K({\bar{\ell }}+1)\alpha ^2 L\sigma _1^2}{2} -\frac{\alpha \mu {\bar{\ell }} K}{2} \frac{\sigma _1^2}{\sigma _2^2}. \end{aligned}$$

Thus

$$\begin{aligned} \mathbb {E}\left( \sum _{k=1}^K \frac{\Vert \nabla ^1_x f(x_k)\Vert ^2}{K}\right)&\le \frac{2(f(x_1)-f_{\min })}{\alpha \mu K\tau }+ \frac{\alpha L\sigma _1^2}{\mu \tau }+ \frac{{\bar{\ell }}\sigma _1^2}{\tau } \left( \frac{\alpha L}{\mu }- \frac{1}{\sigma _2^2}\right) \end{aligned}$$

(29)

$$\begin{aligned}&\le \frac{2(f(x_1)-f_{\min })}{\alpha \mu K\tau }+ \frac{\alpha L\sigma _1^2}{\mu \tau } \end{aligned}$$

(30)

where the last inequality follows from the fact that ${\bar{\ell }} \sigma _1^2 \left( \frac{\alpha L}{\mu }- \frac{1}{\sigma _2^2}\right) <0$ by the assumption on $\alpha $. Taking ${\bar{\ell }}$ large we can thus improve the bound, as long as $\epsilon $ remains larger than the variance. $\square $

A Mscale FAML variant and its performance

1.1 Another frequency-aware architecture

This appendix reports on tests conducted using a frequency-aware network architecture based on MscaleDNN defined in [25], instead of the WWP networks used in Sect. 5. A large part of the success of Mscale networks is due to their wavelet-inspired activation functions. These compactly-supported have good scale separation properties and are constructed so that the bandwidth of their Fourier transform increases with their input scaling. Several activation functions have been proposed in [25], but the most efficient one from a practical point of view has been introduced in [26] and is given by

$$\begin{aligned} \text {s2ReLU}(z) = \sin (2\pi z) \text {ReLU}(-(z-1)) \text {ReLU}(z), \end{aligned}$$

(31)

where z is the input scaling. This is a continuous function which decays faster and has better localization property in the frequency domain than the previously proposed sRelu. The amplitude peaks of the differently scaled s2ReLU in the frequency domain are well separated in the Fourier domain. They are indicated by black stars in Fig. 11, and we observe their expected monotonic growth with scaling.

As in Sect. 5, we use these functions to create a neural network composed of a sum of subnetworks each targeting a different frequency range (analogously to Fig. 4). In this modified architecture, the lower frequency networks still use $\tanh $ activation functions while the higher frequency networks now use (31). For each subnetwork, the first activation function is a SFM function associated with a fixed input scaling, as in standard Mscales networks. The details of this architecture are given in Table 4. Each of the subnetworks is composed of 3 hidden layers of 100 neurons each.

Table 4 Details of the parallel Mscale architecture

Full size table

Associated with the ML-BCD algorithm (exactly as in Sect. 5), this modified architecture defines an Mscale variant of the FAML approach.

1.2 Numerical results

We tested this approach using the same experimental setup and methodology as that of Sect. 5 and again compared its performance to that the standard single level training applied on the complete network. The results are reported in Table 5.

Table 5 Best value of the median loss and associated MSE for standard single-level and Mscale-FAML training (10 independent runs)

Full size table

As was the case when using P-WWP networks, the Mscale FAML variant produced lower losses than the standard training in each case. The differences are particularly significant for a small computational budget where the improvement is of several orders of magnitude. This is also almost always the case for the associated MSE. These results thus remain excellent, despite the fact that our complexity theory does not formally cover the Mscale FAML variant because of the non-smoothness of (31).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gratton, S., Mercier, V., Riccietti, E. et al. A block-coordinate approach of multi-level optimization with an application to physics-informed neural networks. Comput Optim Appl 89, 385–417 (2024). https://doi.org/10.1007/s10589-024-00597-1

Download citation

Received: 25 May 2023
Accepted: 24 July 2024
Published: 06 August 2024
Issue Date: November 2024
DOI: https://doi.org/10.1007/s10589-024-00597-1