[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A block-coordinate approach of multi-level optimization with an application to physics-informed neural networks

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation complexity. We apply it to the solution of partial differential equations using physics-informed neural networks (PINNs) and consider two different types of neural architectures, a generic feedforward network and a frequency-aware network. We show that our approach is particularly effective if coupled with these specialized architectures and that this coupling results in better solutions and significant computational savings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availibility

The tests presented in this manuscript do not rely on any particular dataset and they can be reproduced by the information provided in the numerical results sections.

Notes

  1. This assumption is standard in complexity analysis and prevents arbitrary change in the gradient over short steps. In particular, it prevents the occurence of infinite gradients at iterates, which would cause the algorithm to fail. It is more general than assuming bounded gradients, as can be seen by considering convex quadratics. See also the discussion at the end of the present section.

  2. Technically, the objective function must be bounded below, have a bounded directional subgradient map and a finite nonconvexity modulus.

References

  1. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial, 2nd edn. SIAM, Philadelphia, USA (2000)

    Book  Google Scholar 

  2. Trottenberg, U., Oosterlee, C.W., Schüller, A.: Multigrid. Elsevier, Amsterdam, The Netherlands (2001)

    Google Scholar 

  3. Nash, S.G.: A multigrid approach to discretized optimization problems. Optim. Methods Softw. 14, 99–116 (2000)

    Article  MathSciNet  Google Scholar 

  4. Gratton, S., Sartenaer, A. and Toint, Ph. L., On recursive multiscale trust-region algorithms for unconstrained minimization. In F. Jarre, C. Lemaréchal, and J. Zowe, editors, Oberwolfach Reports: Optimization and Applications, (2005)

  5. Gratton, S., Sartenaer, A., Toint, Ph.L.: Second-order convergence properties of trust-region methods using incomplete curvature information, with an application to multigrid optimization. J. Comput. Appl. Math. 24(6), 676–692 (2006)

    MathSciNet  Google Scholar 

  6. Gratton, S., Sartenaer, A., Toint, Ph.L.: Recursive trust-region methods for multiscale nonlinear optimization. SIAM J. Optim. 19(1), 414–444 (2008)

    Article  MathSciNet  Google Scholar 

  7. Calandra, H., Gratton, S., Riccietti, E., Vasseur, X.: On high-order multilevel optimization strategies. SIAM J. Optim. 31(1), 307–330 (2021)

    Article  MathSciNet  Google Scholar 

  8. Kopaničáková, A., Krause, R.: Globally convergent multilevel training of deep residual networks. SIAM J. Sci. Comput. 45, S254–S280 (2022)

    Article  MathSciNet  Google Scholar 

  9. Haber, E., Ruthotto, L., Holtham, E., and Jun, S.-H. : Learning across scales—multiscale methods for convolution neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence, pages 3142–3148, (2018)

  10. Wu, C.-Y. , Girshick, R., He, K., Feichtenhofer, Ch., and Krahenbuhl, Ph.: A multigrid method for efficiently training video models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 153–162, (2020)

  11. Von Planta, C., Kopaničáková, A., and Krause, R.: Training of deep residual networks with stochastic MG/OPT. arXiv:2108.04052v1

  12. Haber, E., Ruthotto, L.: Stable architectures for deep neural networks. Inverse Prob. 34(1), 014004 (2017)

    Article  MathSciNet  Google Scholar 

  13. Tsung-Wei, K., Maire, M., and Yu, S. X.: Multigrid neural architectures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6665–6673, (2017)

  14. Chung, J., Ahn, S., Bengio, Y.: Hierarchical multiscale recurrent neural networks. arXiv:1609.01704, (2016)

  15. Perdikaris, P., Raissi, M., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019)

    Article  MathSciNet  Google Scholar 

  16. Mishra, S., Molinaro, R.: Estimates on the generalization error of physics-informed neural networks for approximating a class of inverse problems for PDEs. IMA J. Numer. Anal. 42, 981–1022 (2022)

    Article  MathSciNet  Google Scholar 

  17. Shin, Y.: On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys. 28(5), 2042–2074 (2020)

    Article  MathSciNet  Google Scholar 

  18. Cai, S., Mao, Z., Wang, Z., Yin, M., Karniadakis, G.E.: Physics-informed neural networks (PINNs) for fluid mechanics: a review. Acta. Mech. Sin. 37(12), 1727–1738 (2021)

    Article  MathSciNet  Google Scholar 

  19. Wang, S., Wang, H., Perdikaris, P.: On the eigenvector bias of Fourier feature networks: from regression to solving multi-scale PDEs with physics-informed neural networks. Comput. Methods Appl. Mech. Eng. 384, 113938 (2021)

    Article  MathSciNet  Google Scholar 

  20. Xu, Z.-Q.J.: Frequency principle: Fourier analysis sheds light on deep neural networks. Commun. Comput. Phys. 28(5), 1746–1767 (2020)

    Article  MathSciNet  Google Scholar 

  21. Rahaman, N., Baratin, A., Arpit, D., Draxler, F., Lin, M., Hamprecht, F., Bengio, Y., Courville. A.: (2019) On the spectral bias of neural networks. In International Conference On Machine Learning, pages 5301–5310,

  22. Ronen, B., Jacobs, D., Kasten, Y., and Kritchman, S.: The convergence rate of neural networks for learned functions of different frequencies. Proceedings of the 33rd international conference on neural information processing systems, page 4761-4771, (2019)

  23. Wang, Sifan, Xinling, Yu., Perdikaris, Paris: When and why pinns fail to train: a neural tangent kernel perspective. J. Comput. Phys. 449, 110768 (2022)

    Article  MathSciNet  Google Scholar 

  24. Farhani, G., Kazachek, A., Wang, B.: Momentum diminishes the effect of spectral bias in physics-informed neural networks. (2022) arXiv:2206.14862,

  25. Liu, Z., Cai, W., Xu, Z.-Q.J.: Multi-scale deep neural network (MscaleDNN) for solving Poisson-Boltzmann equation in complex domains. Commun. Comput. Phys. 28(5), 1970–2001 (2020)

    Article  MathSciNet  Google Scholar 

  26. Li, X.-A.: A multi-scale DNN algorithm for nonlinear elliptic equations with multiple scales. Commun. Comput. Phys. 28(5), 1886–1906 (2020)

    Article  MathSciNet  Google Scholar 

  27. Hackbusch, W.: Multi-grid Methods and Applications. Number 4 in Springer Series in Computational Mathematics. Springer Verlag, Heidelberg, Berlin, New York, (1995)

  28. Borzi, A., Kunisch, K.: A globalisation strategy for the multigrid solution of elliptic optimal control problems. Optim. Methods Softw. 21(3), 445–459 (2006)

    Article  MathSciNet  Google Scholar 

  29. Gross, Ch., Krause, R.: On the globalization of ASPIN employing trust-region control strategies - convergence analysis and numerical examples. Technical Report 2011-03, Universita della Svizzera italiana, Lugano, CH, (2011)

  30. Richtárik, Peter, Takáč, Martin: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  Google Scholar 

  31. Nesterov, Y.: Efficiency of coordinate descent methods for huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)

    Article  MathSciNet  Google Scholar 

  32. Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)

    Article  MathSciNet  Google Scholar 

  33. Wright, S.J.: Coordinate descent algorithms. Math. Program., Series A 151(1), 3–34 (2015)

    Article  MathSciNet  Google Scholar 

  34. Amaral, V.S., Andreani, R., Birgin, E.G., Marcondes, D.S., Martínez, J.M.: On complexity and convergence of high-order coordinate descent algorithms for smooth nonconvex box-constrained minimization. J. Global Optim. 84(3), 527–561 (2022)

    Article  MathSciNet  Google Scholar 

  35. Cartis, C., Gould, N.I.M., Toint, Ph. L.: Evaluation complexity of algorithms for nonconvex optimization. Number 30 in MOS-SIAM Series on Optimization. SIAM, Philadelphia, USA, (2022)

  36. Nesterov, Y.: Introductory Lectures on Convex Optimization. Applied Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands (2004)

    Book  Google Scholar 

  37. Bottou, Léon., Curtis, Frank E., Nocedal, Jorge: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  Google Scholar 

  38. Stefania, B., Della Santa, F., and Alessandra P.: Alternate training of shared and task-specific parameters for multi-task neural networks. arXiv preprint arXiv:2312.16340, (2023)

  39. Jordan, M. I. , Lin, T., and Zampetakis, M.: On the complexity of deterministic nonsmooth and nonconvex optimization. arXiv:2209.12403, (2022)

  40. Tian, L., and Man-Cho So, A.: Computing Goldstein \((\epsilon ,\delta )\)-stationary points of Lipschitz functions in \(\cal{O}(\epsilon ^{-3}\delta ^{-1})\) iterations via random conic perturbation. arXiv:2112.09002, (2021)

  41. Zhang, J., Lin, H., Jegelka, S., Sra, S., and Jadbabaie, A.: Complexity of finding stationary points of nonconvex nonsmooth functions. In Proceedings of the 37th international conference on machine learning, PMLR, volume 119, pages 11173–11182, (2020)

  42. Kong, S., A. Lewis, S.: The cost of nonconvexity in nonconvex nonsmooth optimization. (2022) arXiv:2210.00652,

  43. Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program., Series A 13(1), 14–22 (1977)

    Article  MathSciNet  Google Scholar 

  44. Rapaka, N.R., Samtaney, R.: An efficient Poisson solver for complex embedded boundary domains using the multi-grid and fast multipole methods. J. Comput. Phys. 410, 109387 (2020)

    Article  MathSciNet  Google Scholar 

  45. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In Proceedings in the International Conference on Learning Representations (ICLR), (2015)

  46. Tancik, Matthew, Srinivasan, Pratul, Mildenhall, Ben, Fridovich-Keil, Sara, Raghavan, Nithin, Singhal, Utkarsh, Ramamoorthi, Ravi, Barron, Jonathan, Ng, Ren: Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural. Inf. Process. Syst. 33, 7537–7547 (2020)

    Google Scholar 

  47. Li, X.-A., Xu, Z.-Q. J. , and Zhang, L.: Subspace decomposition based DNN algorithm for elliptic type multi-scale PDEs. (2021) arXiv:2112.06660,

  48. Yu, B., et al.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Commun. Math. Stat. 6, 1–12 (2018)

    Article  MathSciNet  Google Scholar 

Download references

Funding

Work partially supported by 3IA Artificial and Natural Intelligence Toulouse Institute (ANITI), French “Investing for the Future - PIA3” program under the Grant agreement ANR-19-PI3A-0004 and by the GDR ISIS project MOMIGS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elisa Riccietti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Proofs of convergence theorems

1.1 Proof of Theorem 3.1

Consider iteration k of ML-BCD and suppose that this iteration occurs in the minimization of subproblem \(\mathcal{A}_i\) on the variables given by \(x_i\). From the Lipschitz continuity of the gradient and (15), we obtain that

$$\begin{aligned} f(x_{k+1})&\le f(x_k) - \alpha \nabla _x f(x_k)^T \overline{\nabla ^1_{x_i}f(x_k)}+\frac{\alpha ^2L}{2}\Vert \overline{\nabla ^1_{x_i}f(x_k)}\Vert ^2\\&= f(x_k) - \alpha \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2+\frac{\alpha ^2L}{2}\Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 < f(x_k) - \frac{\alpha }{2} \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2. \end{aligned}$$

Since the minimization of subproblem \(\mathcal{A}_i\) has not yet terminated, we must have that \(\Vert \nabla ^1_{x_i}f(x_k)\Vert \ge \frac{\epsilon }{\sqrt{2}}\), and thus

$$\begin{aligned} f(x_{k+1}) < f(x_k) - \frac{\alpha }{2}\epsilon ^2. \end{aligned}$$

Summing this inequality on all iterations, we deduce that, for all k before termination of the ML-BCD algorithm,

$$\begin{aligned} f(x_0)- f_\textrm{low} \ge f(x_0)-f(x_{k+1}) = \sum _{j=0}^k \Big (f(x_j)-f(x_{j+1})\Big ) \ge \frac{(k+1)\alpha }{2}\epsilon ^2. \end{aligned}$$

This implies that the algorithm cannot take more than

$$\begin{aligned} \frac{2(f(x_0)- f_\textrm{low})}{\alpha \epsilon ^2}-1 \end{aligned}$$

iterations before it terminates, proving the theorem with

$$\begin{aligned} \kappa _* = \frac{2(f(x_0)- f_\textrm{low})}{\alpha }> 2L(f(x_0)- f_\textrm{low}). \end{aligned}$$

\(\square \)

1.2 Proof of Theorem 3.2

Consider iteration k of ML-BCD and suppose that this iteration occurs in the minimization of subproblem \(\mathcal{A}_i\) on the variables given by \(x_i\). From the Lipschitz continuity, using the SG update of the selected block (16) we obtain:

$$\begin{aligned} \mathbb {E}_{\xi _k}[ f(x_{k+1})]\le f(x_k)-\alpha \nabla ^1_x f(x_k)^T\mathbb {E}_{\xi _k}(\overline{g_{i}(x_k,\xi _k)})+\frac{L\alpha ^2}{2}\mathbb {E}_{\xi _k}(\Vert \overline{g_{i}(x_k,\xi _k)}\Vert ^2).\nonumber \\ \end{aligned}$$
(26)

Using assumption (17) and the bound on \(\alpha \) we deduce that

$$\begin{aligned} \mathbb {E}_{\xi _k}[ f(x_{k+1})]- f(x_k)&\le -\alpha \mu \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 +\frac{\alpha ^2L\sigma _1^2}{2} +\frac{\alpha ^2L\sigma _2^2}{2} \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2\\&= \alpha \left( \frac{\alpha L\sigma _2^2}{2}-\mu \right) \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 + \frac{\alpha ^2L\sigma _1^2}{2}\\&\le -\frac{\alpha \mu }{2} \Vert \nabla ^1_{x_i}f(x_k)\Vert ^2 + \frac{\alpha ^2L\sigma _1^2}{2}, \end{aligned}$$

and, if k is a major iteration, we have that

$$\begin{aligned} \mathbb {E}_{\xi _k}[ f(x_{k+1})]- f(x_k) \le -\frac{\alpha \mu \tau }{2} \Vert \nabla ^1_x f(x_k)\Vert ^2 + \frac{\alpha ^2L\sigma _1^2}{2}. \end{aligned}$$
(27)

Taking the total expectation gives that

$$\begin{aligned} \mathbb {E}[ f(x_{k+1})- f(x_k)]&\le -\frac{\alpha \mu }{2} \mathbb {E}(\Vert \nabla ^1_{x_i}f(x_k)\Vert ^2) + \frac{\alpha ^2L\sigma _1^2}{2}. \end{aligned}$$
(28)

Consider now a major iteration k, denoting by \(x_{k+\ell }\) (for \(\ell \in \{1,\dots ,\ell _k\}\) and \(\ell _k\le {\bar{\ell }}\)) the iterations performed on the selected subproblem. We deduce from (28) that

$$\begin{aligned} \mathbb {E}[ f(x_{k+\ell _k})- f(x_k)]&=\sum _{\ell =1}^{ \ell _k-1}\mathbb {E}(f(x_{k+\ell +1})-f(x_{k+\ell }))+\mathbb {E}(f(x_{k+1})-f(x_k))\\&\le \sum _{\ell =1}^{ \ell _k-1} \left( -\frac{\alpha \mu }{2}\mathbb {E}(\Vert \nabla ^1_{x_i}f(x_{k+\ell })\Vert ^2)+\frac{\alpha ^2 L\sigma _1^2}{2}\right) \\&\quad -\frac{\alpha \mu }{2} \mathbb {E}(\Vert \nabla ^1_{x_i}f(x_k)\Vert ^2) + \frac{\alpha ^2L\sigma _1^2}{2}. \end{aligned}$$

If the minimization of the subproblem is not stopped, this means that \(\Vert g_i(x_{k+\ell },\xi _{k+\ell })\Vert \ge c_\epsilon \), so that \(\mathbb {E}_{\xi _{k+\ell }}(\Vert g_{i}(x_{k+\ell },\xi _{k+\ell })\Vert ^2)\ge c_\epsilon ^2\). From (17), we have that

$$\begin{aligned} \Vert \nabla _{x_i}^1 f(x_{k+\ell })\Vert ^2 \ge \frac{\mathbb {E}_{\xi _{k+\ell }}(\Vert g_{i}(x_{k+\ell },\xi _{k+\ell })\Vert ^2)- \sigma _1^2}{\sigma _2^2}\ge \frac{c_\epsilon ^2-\sigma _1^2}{\sigma _2^2} \end{aligned}$$

and, since k is a major iteration, from the condition for the choice of the blocks we have \( \Vert \nabla ^1_{x_i}f(x_k)\Vert >\tau \Vert \nabla ^1_x f(x_k)\Vert . \) Thus

$$\begin{aligned} \mathbb {E}[ f(x_{k+\ell _k})- f(x_k)]&\le -\frac{\alpha \mu {\bar{\ell }}}{2} \frac{c_\epsilon ^2-\sigma _1^2}{\sigma _2^2} -\frac{\alpha \mu \tau }{2} \mathbb {E}(\Vert \nabla ^1_x f(x_k)\Vert ^2) + \frac{\alpha ^2L\sigma _1^2({\bar{\ell }}+1)}{2}. \end{aligned}$$

Taking into account the fact that that f is bounded from below, summing up K major iterations and using the assumption \(c_\epsilon >2\sigma _1^2\) we obtain:

$$\begin{aligned} f_{\min }-f(x_1)&\le \sum _{k=1}^K\mathbb {E}[ f(x_{k+\ell _k})- f(x_k)]\\&\le -\frac{\alpha \mu \tau }{2} \sum _{k=1}^K\mathbb {E}(\Vert \nabla ^1_x f(x_k)\Vert ^2)+\frac{K({\bar{\ell }}+1)\alpha ^2 L\sigma _1^2}{2} -\frac{\alpha \mu {\bar{\ell }} K}{2} \frac{\sigma _1^2}{\sigma _2^2}. \end{aligned}$$

Thus

$$\begin{aligned} \mathbb {E}\left( \sum _{k=1}^K \frac{\Vert \nabla ^1_x f(x_k)\Vert ^2}{K}\right)&\le \frac{2(f(x_1)-f_{\min })}{\alpha \mu K\tau }+ \frac{\alpha L\sigma _1^2}{\mu \tau }+ \frac{{\bar{\ell }}\sigma _1^2}{\tau } \left( \frac{\alpha L}{\mu }- \frac{1}{\sigma _2^2}\right) \end{aligned}$$
(29)
$$\begin{aligned}&\le \frac{2(f(x_1)-f_{\min })}{\alpha \mu K\tau }+ \frac{\alpha L\sigma _1^2}{\mu \tau } \end{aligned}$$
(30)

where the last inequality follows from the fact that \({\bar{\ell }} \sigma _1^2 \left( \frac{\alpha L}{\mu }- \frac{1}{\sigma _2^2}\right) <0\) by the assumption on \(\alpha \). Taking \({\bar{\ell }}\) large we can thus improve the bound, as long as \(\epsilon \) remains larger than the variance. \(\square \)

A Mscale FAML variant and its performance

1.1 Another frequency-aware architecture

This appendix reports on tests conducted using a frequency-aware network architecture based on MscaleDNN defined in [25], instead of the WWP networks used in Sect. 5. A large part of the success of Mscale networks is due to their wavelet-inspired activation functions. These compactly-supported have good scale separation properties and are constructed so that the bandwidth of their Fourier transform increases with their input scaling. Several activation functions have been proposed in [25], but the most efficient one from a practical point of view has been introduced in [26] and is given by

$$\begin{aligned} \text {s2ReLU}(z) = \sin (2\pi z) \text {ReLU}(-(z-1)) \text {ReLU}(z), \end{aligned}$$
(31)

where z is the input scaling. This is a continuous function which decays faster and has better localization property in the frequency domain than the previously proposed sRelu. The amplitude peaks of the differently scaled s2ReLU in the frequency domain are well separated in the Fourier domain. They are indicated by black stars in Fig. 11, and we observe their expected monotonic growth with scaling.

Fig. 11
figure 11

The s2Relu activation function (left) and its Fourier transform for several scalings (right). The black stars highlight the amplitude peaks

As in Sect. 5, we use these functions to create a neural network composed of a sum of subnetworks each targeting a different frequency range (analogously to Fig. 4). In this modified architecture, the lower frequency networks still use \(\tanh \) activation functions while the higher frequency networks now use (31). For each subnetwork, the first activation function is a SFM function associated with a fixed input scaling, as in standard Mscales networks. The details of this architecture are given in Table 4. Each of the subnetworks is composed of 3 hidden layers of 100 neurons each.

Table 4 Details of the parallel Mscale architecture

Associated with the ML-BCD algorithm (exactly as in Sect. 5), this modified architecture defines an Mscale variant of the FAML approach.

1.2 Numerical results

We tested this approach using the same experimental setup and methodology as that of Sect. 5 and again compared its performance to that the standard single level training applied on the complete network. The results are reported in Table 5.

Table 5 Best value of the median loss and associated MSE for standard single-level and Mscale-FAML training (10 independent runs)

As was the case when using P-WWP networks, the Mscale FAML variant produced lower losses than the standard training in each case. The differences are particularly significant for a small computational budget where the improvement is of several orders of magnitude. This is also almost always the case for the associated MSE. These results thus remain excellent, despite the fact that our complexity theory does not formally cover the Mscale FAML variant because of the non-smoothness of (31).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gratton, S., Mercier, V., Riccietti, E. et al. A block-coordinate approach of multi-level optimization with an application to physics-informed neural networks. Comput Optim Appl 89, 385–417 (2024). https://doi.org/10.1007/s10589-024-00597-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-024-00597-1

Keywords

Navigation