Abstract
We study the application of a tailored quasi-Monte Carlo (QMC) method to a class of optimal control problems subject to parabolic partial differential equation (PDE) constraints under uncertainty: the state in our setting is the solution of a parabolic PDE with a random thermal diffusion coefficient, steered by a control function. To account for the presence of uncertainty in the optimal control problem, the objective function is composed with a risk measure. We focus on two risk measures, both involving high-dimensional integrals over the stochastic variables: the expected value and the (nonlinear) entropic risk measure. The high-dimensional integrals are computed numerically using specially designed QMC methods and, under moderate assumptions on the input random field, the error rate is shown to be essentially linear, independently of the stochastic dimension of the problem—and thereby superior to ordinary Monte Carlo methods. Numerical results demonstrate the effectiveness of our method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many problems in science and engineering, including optimal control problems governed by partial differential equations (PDEs), are subject to uncertainty. If not taken into account, the inherent uncertainty of such problems has the potential to render worthless any solutions obtained using state-of-the-art methods for deterministic problems. The careful analysis of the uncertainty in PDE-constrained optimization is hence indispensable and a growing field of research (see, e.g., [5,6,7, 18, 29, 30, 36, 37, 45, 47, 48]).
In this paper we consider the heat equation with an uncertain thermal diffusion coefficient, modelled by a series in which a countably infinite number of independent random variables enter affinely. By controlling the source term of the heat equation, we aim to steer its solution towards a desired target state. To study the effect of this randomness on the objective function, we consider two risk measures: the expected value and the entropic risk measure, both involving integrals with respect to the countably infinite random variables. The integrals are replaced by integrals over finitely many random variables by truncating the series that represents the input random field to a sum over finitely many terms and then approximated using quasi-Monte Carlo (QMC) methods.
QMC approximations are particularly well suited for optimization since they preserve convexity due to their nonnegative (equal) cubature weights. Moreover, for sufficiently smooth integrands it is possible to construct QMC rules with error bounds not depending on the number of stochastic variables while attaining faster convergence rates compared to Monte Carlo methods. For these reasons QMC methods have been very successful in applications to PDEs with random coefficients (see, e.g., [2, 11, 16, 19, 20, 24,25,26, 33,34,35, 39, 42, 43]) and especially in PDE-constrained optimization under uncertainty, see [22, 23]. In [32] the authors derive regularity results for the saddle point operator, which fall within the same framework as the QMC approximation of affine parametric operator equation setting considered in [43].
This paper builds upon our previous work [22]. The novelty lies in the use and analysis of parabolic PDE constraints in conjunction with the nonlinear entropic risk measure, which inspired the development of an error analysis that is applicable in separable Banach spaces and thus discretization invariant. Solely based on regularity assumptions, our novel error analysis covers a very general class of problems. Specifically, we extend QMC error bounds in the literature (see, e.g., [12, 33]) to separable Banach spaces. A crucial part of our new work is the regularity analysis of the entropic risk measure, which is used to prove our main theoretical results about error estimates and convergence rates for the dimension truncation and the QMC errors. We then apply these new bounds to assess the total errors in the optimal control problem under uncertainty.
The structure of this paper is as follows. The parametric weak formulation of the PDE problem is given in Sect. 2. The corresponding optimization problem is discussed in Sect. 3, with linear risk measures considered in Sect. 3.1, the entropic risk measures in Sect. 3.2, and optimality conditions in Sect. 3.3. While the regularity of the adjoint PDE problem is the topic of Sect. 4, the regularity analysis for the entropic risk measure is addressed in Sect. 5. Section 6 contains the main error analysis of this paper. Section 6.1 covers the truncation error and Sect. 6.2 analyzes the QMC integration error. Our approach differs from most studies of QMC in the literature insofar as we develop the QMC and dimension truncation error theory for the full PDE solutions (with respect to an appropriately chosen function space norm) instead of considering the composition of the PDE solution with a linear functional. In Sect. 7 we confirm our theoretical findings with supporting numerical experiments. Section 8 is a concluding summary of this paper.
2 Problem formulation
Let \(D\subset {\mathbb {R}}^d\), \(d\in \{1,2,3\}\), denote a bounded physical domain with Lipschitz boundary, let \(I:= [0,T]\) denote the time interval with finite time horizon \(0<T<\infty \), and let \(U:=[-\frac{1}{2},\frac{1}{2}]^{\mathbb {N}}\) denote a space of parameters. The components of the sequence \({\varvec{y}}\in U\) are realizations of independently and identically distributed uniform random variables in \([-\tfrac{1}{2},\tfrac{1}{2}]\), and the corresponding probability measure is
Let
be an uncertain (thermal) diffusion coefficient, where we assume (i) for a.e. \(t\in I\) we have \(a_0(\cdot ,t) \in L^\infty (D)\), \(\psi _j(\cdot ,t)\in L^\infty (D)\) for all \(j\ge 1\), and that we have \((\sup _{t\in I}\Vert \psi _j(\cdot ,t)\Vert _{L^\infty (D)})_{j\ge 1}\in \ell ^1\); (ii) \(t \mapsto a^{\varvec{y}}({\varvec{x}},t)\) is measurable on I; (iii) uniform ellipticity: there exist positive constants \(a_{\min }\) and \(a_{\max }\) such that \(0<a_{\min }\le a^{\varvec{y}}({\varvec{x}},t)\le a_{\max }<\infty \) for all \({\varvec{x}}\in D\), \({\varvec{y}}\in U\) and a.e. \(t\in I\). Time-varying diffusion coefficients occur e.g., in finance or cancer tomography. However, the presented setting clearly also includes time-constant diffusion coefficients, i.e., \(a^{\varvec{y}}({\varvec{x}},t) = a^{\varvec{y}}({\varvec{x}})\, \forall t \in I\).
We consider the heat equation over the time interval \(I=[0,T]\), given by the partial differential equation (PDE)
for all \({\varvec{y}}\in U\). Here \(z({\varvec{x}},t)\) is the control and \(u_0 \in L^2(D)\) denotes the initial heat distribution. We denote the input functions collectively by \(f:= (z, u_0)\). We have imposed homogeneous Dirichlet boundary conditions.
Given a target state \({\widehat{u}}({\varvec{x}},t)\), we will study the problem of minimizing the following objective function:
subject to the PDE (2.2) and constraints on the control to be defined later in the manuscript. By \({{\mathcal {R}}}\) we denote a risk measure, which is a functional that maps a set of random variables into the extended real numbers. Specifically, \({{\mathcal {R}}}\) will later be either the expected value or the entropic risk measure, both involving high-dimensional integrals with respect to \({\varvec{y}}\). We will first introduce a function space setting to describe the problem properly, including the definition of the \(L^2(V;I)\) and \(L^2(V';I)\) norms.
2.1 Function space setting
We define \(V:= H_0^1(D)\) and its (topological) dual space \(V':= H^{-1}(D)\), and identify \(L^2(D)\) with its own dual. Let \(\langle \cdot ,\cdot \rangle _{V',V}\) denotes the duality pairing between \(V'\) and V. The norm and inner product in V are defined as usual by
We shall make use of the Riesz operator \(R_V\!: V\rightarrow V'\) defined by
as well as its inverse \(R_V^{-1}\!: V'\rightarrow V\) satisfying \(R_V^{-1} w = v \Leftrightarrow w = R_V v\) for \(v\in V,\,w\in V'\). It follows from (2.4) that
In turn we define the inner product in \(V'\) by
The norm induced by this inner product is equal to the usual dual norm.
We use analogous notations for inner products and duality pairings between function spaces on the space-time cylinder \(D\times I\). The space \(L^2(V;I)\) consists of all measurable functions \(v: I \rightarrow V\) with finite norm
Note that \((L^2(V;I))'=L^2(V';I)\), with the duality pairing given by
We extend the Riesz operator \(R_V\) to \(R_V: L^2(V;I) \rightarrow L^2(V';I)\) so that
and we extend the inverse \(R_V^{-1}: L^2(V';I) \rightarrow L^2(V;I)\) analogously.
We define the space of solutions \(u^{\varvec{y}}\) for \({\varvec{y}}\in U\) by
which is the space of all functions v in \(L^2(V;I)\) with (distributional) derivative \(\tfrac{\partial }{\partial t}v\) in \(L^2(V';I)\), and which is equipped with the (graph) norm
Finally, because there are two inputs in Eq. (2.2), namely \(z \in L^2(V';I)\) and \(u_0 \in L^2(D)\), it is convenient to define the product space \({{\mathcal {Y}}}:= L^2(V;I) \times L^2(D)\), and its dual space by \({{\mathcal {Y}}}':= L^2(V';I) \times L^2(D)\), with the norms
In particular, we extend \({{\mathcal {X}}}\) to \({{\mathcal {Y}}}\) as follows. For all \(v \in {{\mathcal {X}}}\) we interpret v as an element of \({{\mathcal {Y}}}\) as \(v = (v({\varvec{x}},t),v({\varvec{x}},0))\). This gives \({{\mathcal {X}}} \subseteq {{\mathcal {Y}}}\). We further know from [13, Theorem 5.9.3] that \({{\mathcal {X}}} \hookrightarrow {{\mathcal {C}}}(L^2(D);I)\) and \(\max _{t \in I} \Vert v(\cdot ,t)\Vert _{L^2(D)} \le C_1(\Vert v\Vert _{L^2(V;I)} + \Vert \tfrac{\partial }{\partial t} v\Vert _{L^2(V';I)}) \le \sqrt{2}\, C_1 \Vert v\Vert _{{\mathcal {X}}}\) for \(v \in {{\mathcal {X}}}\), where \(C_1\) depends on T only. Hence we obtain for all \(v \in {{\mathcal {X}}}\) that
and thus we get that \({{\mathcal {X}}}\) is continuously embedded into \({{\mathcal {Y}}}\), i.e., \({{\mathcal {X}}} \hookrightarrow {{\mathcal {Y}}}\).
2.2 Variational formulation
Based on these spaces, using integration by parts with respect to \({\varvec{x}}\) we can write (2.2) as a variational problem as follows. Given the input functions \(f =(z,u_0)\in {{\mathcal {Y}}}'\) and \({\varvec{y}}\in U\), find a function \(u^{{\varvec{y}}} \in {{\mathcal {X}}}\) such that
where for all \(w\in {{\mathcal {X}}}\), \(v = (v_1,v_2) \in {{\mathcal {Y}}}\) and \({\varvec{y}}\in U\),
with operators \(B^{\varvec{y}}: {{\mathcal {X}}} \rightarrow {{\mathcal {Y}}}'\), \(B_1^{\varvec{y}}: {{\mathcal {X}}} \rightarrow L^2(V';I)\), \(B_2^{\varvec{y}}:{{\mathcal {X}}} \rightarrow L^2(D)\), and \(B^{\varvec{y}}w = (B_1^{\varvec{y}}w, B_2^{\varvec{y}}w)\). For better readability we have omitted the parameter dependence \(v = (v_1({\varvec{x}},t),v_2({\varvec{x}}))\), \(f = (z({\varvec{x}},t), u_0({\varvec{x}}))\), \(w = w({\varvec{x}},t)\) and \(a^{{\varvec{y}}} = a^{{\varvec{y}}}({\varvec{x}},t)\). Note that a solution of (2.6) automatically satisfies \(u^{\varvec{y}}(\cdot ,0) = u_0\), as can be seen by setting \(v_1 = 0\) and allowing arbitrary \(v_2\).
The parametric family of parabolic evolution operators \(\{B^{\varvec{y}},\, {\varvec{y}}\in U\}\) associated with this bilinear form is a family of isomorphisms from \({{\mathcal {X}}}\) to \({{\mathcal {Y}}}'\), see, e.g., [10]. In [44] a shorter proof based on the characterization of the bounded invertibility of linear operators between Hilbert spaces is presented, together with precise bounds on the norms of the operator and its inverse: there exist constants \(0< \beta _1 \le \beta _2 < \infty \) such that
where \(\beta _1 \ge \tfrac{\min \{a_{\min } a_{\max }^{-2},a_{\min }\}}{\sqrt{2\max \{a_{\min }^{-2},1\}+\varrho ^2}}\) and \(\beta _2 \le \sqrt{2\max \{1,a_{\max }^2\}+\varrho ^2}\) with \(\varrho := \sup \limits _{w \in {{\mathcal {X}}}} \tfrac{\Vert w(\cdot ,0)\Vert _{L^2(D)}}{\Vert w\Vert _{{{\mathcal {X}}}}}\), and hence for all \({\varvec{y}}\in U\) we have the a priori estimate
With \(\mathbb {N}_0:= \{0,1,2,\ldots \}\), let \({\varvec{\nu }}\in {\mathbb {N}}_0^\infty \) denote a multi-index, and define \({\textrm{supp}}({\varvec{\nu }}):=\{j\ge 1: \nu _j\ne 0\}\) and \(|{\varvec{\nu }}|:= \sum _{j\ge 1} \nu _j\). In the sequel, we shall consider the set \({\mathscr {F}}:=\{{\varvec{\nu }}\in \mathbb N_0^\infty :|\textrm{supp}({\varvec{\nu }})|<\infty \}\) of multi-indices with finite support. We use the notation \(\partial ^{\varvec{\nu }}_{\varvec{y}}:= \prod _{j\ge 1} (\partial /\partial y_j)^{\nu _j}\) to denote the mixed partial derivatives with respect to \({\varvec{y}}\). For any sequence of real numbers \({\varvec{b}}= (b_j)_{j\ge 1}\), we define \({\varvec{b}}^{\varvec{\nu }}:= \prod _{j\ge 1} b_j^{\nu _j}\).
The following regularity result for the state \(u^{\varvec{y}}\) was proved in [32].
Lemma 2.1
Let \(f=(z,u_0)\in {{\mathcal {Y}}}'\). For all \({\varvec{\nu }}\in {\mathscr {F}}\) and all \({\varvec{y}}\in U\), we have
where \(\beta _1\) is as described in (2.8) and the sequence \({\varvec{b}}= (b_j)_{j\ge 1}\) is defined by
For our later derivation of the optimality conditions for the optimal control problem, it is helpful to write the variational form of the PDE (2.6) as an operator equation using (2.7):
with \(B_1^{\varvec{y}}: {{\mathcal {X}}} \rightarrow L^2(V';I)\) and \(B_2^{\varvec{y}}: {{\mathcal {X}}} \rightarrow L^2(D)\) given by
where \(\Lambda _1: {{\mathcal {Y}}}' \rightarrow L^2(V';I)\) and \(\Lambda _2: {{\mathcal {Y}}}' \rightarrow L^2(D)\) are the restriction operators defined, for any \(v = (v_1,v_2) \in {{\mathcal {Y}}}'\), by
For the definition of a meaningful inverse of the operators \(B_1^{\varvec{y}}\) and \(B_2^{\varvec{y}}\), we first define the trivial extension operators \(\Xi _1:L^2(V';I) \rightarrow {{\mathcal {Y}}}'\) and \(\Xi _2: L^2(D) \rightarrow {{\mathcal {Y}}}'\), for any \(v_1 \in L^2(V';I)\) and \(v_2\in L^2(D)\), by
We observe that \(P_1:= \Xi _1\Lambda _1\) is an orthogonal projection on the \(L^2(V';I)\)-component in \({{\mathcal {Y}}}'\) and analogously \(P_2:= \Xi _2\Lambda _2\) is an orthogonal projection on the \(L^2(D)\)-component in \({{\mathcal {Y}}}'\). This is verified as follows. For all \(v,u\in {{\mathcal {Y}}}'\) it is true that
where \({{\mathcal {I}}}_{{{\mathcal {Y}}}'}\) denotes the identity operator on \({{\mathcal {Y}}}'\). We clearly have \({{\mathcal {I}}}_{{{\mathcal {Y}}}'} = P_1 + P_2\). Therefore we can write any element v in \({{\mathcal {Y}}}'\) as \(v = P_1 v + P_2 v\) in \({{\mathcal {Y}}}'\), and by linearity of \((B^{\varvec{y}})^{-1}\) we get
A meaningful inverse of the operators \(B^{\varvec{y}}_1: {{\mathcal {X}}} \rightarrow L^2(V';I)\) and \(B^{\varvec{y}}_2: {{\mathcal {X}}} \rightarrow L^2(D)\) are then given by \((B_1^{\varvec{y}})^\dagger : L^2(V';I) \rightarrow {{\mathcal {X}}}\) and \((B_2^{\varvec{y}})^\dagger : L^2(D) \rightarrow {{\mathcal {X}}}\), defined as
We call the operator \((B_1^{\varvec{y}})^\dagger \) the pseudoinverse of \(B_1^{\varvec{y}}\) and the operator \((B_2^{\varvec{y}})^\dagger \) the pseudoinverse of \(B_2^{\varvec{y}}\). Clearly, the pseudoinverse operators are linear and bounded operators.
Lemma 2.2
The pseudoinverse operators \((B_1^{\varvec{y}})^\dagger \) and \((B_2^{\varvec{y}})^\dagger \) defined by (2.13) satisfy
which are the identity operators on \(L^2(V';I)\), \(L^2(D)\), and \({{\mathcal {X}}}\), respectively.
Proof
From the definition of various operators, we have
as required. \(\square \)
Lemma 2.3
For \({\varvec{y}}\in U\) and given \((z,u_0)\in {{\mathcal {Y}}}'\), the solution \(u^{\varvec{y}}\) of the operator equation (2.12) can be written as
Proof
From (2.14) we have \(u^{\varvec{y}}= (B_1^{\varvec{y}})^{\dagger } B_1^{\varvec{y}}u^{\varvec{y}}+ (B_2^{\varvec{y}})^{\dagger } B_2^{\varvec{y}}u^{\varvec{y}}= (B_1^{\varvec{y}})^{\dagger }z + (B_2^{\varvec{y}})^{\dagger } u_0\), as required. \(\square \)
2.3 Dual problem
In the following we will need the dual operators \((B^{\varvec{y}})'\), \((B^{\varvec{y}}_1)'\) and \((B^{\varvec{y}}_2)'\) of \(B^{\varvec{y}}\), \(B^{\varvec{y}}_1\) and \(B^{\varvec{y}}_2\), respectively, which are formally defined by
for all \(w\in {{\mathcal {X}}}\), \(v = (v_1,v_2) \in {{\mathcal {Y}}}\) and \({\varvec{y}}\in U\), with \((B^{\varvec{y}})'v = (B_1^{\varvec{y}})'v_1 + (B_2^{\varvec{y}})'v_2\).
The dual problem to (2.6) (or equivalently (2.12)) is as follows. Given the input function \(f_{\textrm{dual}} \in {{\mathcal {X}}}'\) and \({\varvec{y}}\in U\), find a function \(q^{\varvec{y}}= (q_1^{\varvec{y}},q_2^{\varvec{y}})\in {{\mathcal {Y}}}\) such that
or in operator form \( (B^{\varvec{y}})'q^{\varvec{y}}=f_{\textrm{dual}}, \) which has the unique solution \( q^{\varvec{y}}= \big ((B^{\varvec{y}})'\big )^{-1} f_{\textrm{dual}}\,. \)
Existence and uniqueness of the solution of the dual problem follow directly from the bounded invertibility of \(B^{\varvec{y}}\). We know that its inverse, \((B^{\varvec{y}})^{-1}\), is a bounded linear operator and thus the dual of \((B^{\varvec{y}})^{-1}\) is (uniquely) defined (see, e.g., [49, Theorem 1 and Definition 1, Chapter VII]). The operator \((B^{\varvec{y}})^{-1}\) and its dual operator \(((B^{\varvec{y}})^{-1})' =((B^{\varvec{y}})')^{-1}\) are equal in their operator norms (see, e.g., [49, Theorem 2, Chapter VII]), i.e., the operator norms of the dual operator \((B^{\varvec{y}})'\) and its inverse are bounded by the constants \(\beta _2\) and \(\tfrac{1}{\beta _1}\) in (2.8).
Applying integration by parts with respect to the time variable in (2.7), the left-hand side of the dual problem (2.16) can be written as
We may express the solution \(q^{\varvec{y}}= (q_1^{\varvec{y}},q_2^{\varvec{y}})\in {{\mathcal {Y}}}\) of the dual problem (2.16) in terms of the dual operators of the pseudoinverse operators \((B_1^{\varvec{y}})^\dagger \) and \((B_2^{\varvec{y}})^\dagger \). This is true because we get an analogous result to Lemma 2.2 in the dual spaces.
Lemma 2.4
The dual operators \(((B_1^{\varvec{y}})^\dagger )'\) and \(((B_2^{\varvec{y}})^\dagger )'\) of the pseudoinverse operators defined in (2.13) satisfy
which are the identity operators on \(L^2(V;I)\), \(L^2(D)\) and \({{\mathcal {X}}}'\), respectively.
Proof
For all \(v_1 \in L^2(V';I)\), \(w_1 \in L^2(V;I)\), \(v_2,w_2\in L^2(D)\), it follows from (2.14) that
Similarly, for all \(v\in {{\mathcal {X}}}\) and \(w\in {{\mathcal {X}}}'\) we have
This completes the proof. \(\square \)
Lemma 2.5
Given the input function \(f_{\textrm{dual}} \in {{\mathcal {X}}}'\) and \({\varvec{y}}\in U\), the (unique) solution of the dual problem (2.16) is given by
Proof
Existence and uniqueness follow from the bounded invertibility of \((B^{\varvec{y}})^\prime \), see Sect. 2.2. Thus, we only need to verify that (2.19) solves the dual problem (2.16). It follows from (2.18) that
as required. \(\square \)
We will see in the next section that, with the correct choice of the right-hand side \(f_{\textrm{dual}}\), the gradient of the objective function (2.3) can be computed using the solution \(q^{\varvec{y}}\) of the dual problem.
3 Parabolic optimal control problems under uncertainty with control constraints
The presence of uncertainty in the optimization problem requires the introduction of a risk measure \({\mathcal {R}}\) that maps the random variable objective function (see (3.3) below) to the extended real numbers. Let \((\Omega , {\mathcal {A}}, {\mathbb {P}})\) be a complete probability space. A functional \({{\mathcal {R}}}: L^p(\Omega , {\mathcal {A}}, {\mathbb {P}}) \rightarrow {\mathbb {R}} \cup \{\infty \}\), for \(p \in [1,\infty )\), is said to be a coherent risk measure [1] if for \(X,{\tilde{X}} \in L^p(\Omega , {\mathcal {A}}, {\mathbb {P}})\) we have
-
(1)
Convexity: \({\mathcal {R}}(\lambda X + (1-\lambda ) {\tilde{X}}) \le \lambda {\mathcal {R}}(X) + (1-\lambda ) {\mathcal {R}}({\tilde{X}})\) for all \(\lambda \in [0,1]\).
-
(2)
Translation equivariance: \({\mathcal {R}}(X+c) = {\mathcal {R}}(X) + c\) for all \(c \in {\mathbb {R}}\).
-
(3)
Monotonicity: If \(X \le {\tilde{X}}\) \({\mathbb {P}}\)-a.e. then \({\mathcal {R}}(X) \le {\mathcal {R}}({\tilde{X}})\).
-
(4)
Positive homogeneity: \({\mathcal {R}}(tX) = t {\mathcal {R}}(X)\) for all \(t\ge 0\).
Coherent risk measures are popular as numerous desirable properties can be derived from the above conditions (see, e.g., [29] and the references therein). However, it can be shown (see [30, Theorem 1]) that the only coherent risk measures that are Fréchet differentiable are linear ones. The expected value has all of these properties, but is risk-neutral. In order to address also risk-averse problems we focus on the (nonlinear) entropic risk measures, which are risk-averse, Fréchet differentiable, and satisfy the conditions (1)–(3) above, i.e., they are not positively homogeneous (and thus not coherent). Risk measures satisfying (2) and (3) are called monetary risk measures, and a monetary risk measure that also satisfies (1) is called a convex risk measure (see [14]).
In this section we will first discuss the required conditions on the risk measure \({\mathcal {R}}\) under which the optimal control problem has a unique solution. We will then present two classes of risk measures that satisfy these conditions, namely the linear risk measures that include the expected value, and the entropic risk measures. Finally we derive necessary and sufficient optimality conditions for the optimal control problem with these two risk measures. We assume that the target state \({\widehat{u}}\) belongs to \({{\mathcal {X}}}\) and that the constants \(\alpha _{1}, \alpha _{2}\) are nonnegative with \(\alpha _1+\alpha _2 >0\) and \(\alpha _{3}>0\). Then we consider the following problem: minimize \({\widetilde{J}}(u,z)\) defined in (2.3) subject to the parabolic PDE (2.2) and constraints on the control
with \({\mathcal {Z}}\) being nonempty, bounded, closed and convex.
We want to analyze the problem in its reduced form, i.e., expressing the state \(u^{\varvec{y}}= (B^{\varvec{y}})^{-1} (z,u_0)\) in (2.3) in terms of the control z. This reformulation is possible because of the bounded invertibility of the operator \(B^{\varvec{y}}\) for every \({\varvec{y}}\in U\), see Sect. 2.2 and the references therein. We therefore introduce an alternative notation \(u(z) = (u^{\varvec{y}}(z))({\varvec{x}},t) = u^{\varvec{y}}({\varvec{x}},t)\). (Of course \(u^{\varvec{y}}\) depends also on \(u_0\), but we can think of \(u_0\) as fixed, and therefore uninteresting.) The reduced problem is then to minimize
where \(E_T\!:{{\mathcal {X}}}\rightarrow L^2(D)\) is the bounded linear operator defined by \(v\mapsto v(\cdot ,T)\) for some fixed terminal time \(T>0\).
Defining
we can equivalently write the reduced problem as
With the uniformly boundedly invertible forward operator \(B^{\varvec{y}}\), our setting fits into the abstract framework of [29] where the authors derive existence and optimality conditions for PDE-constrained optimization under uncertainty. In particular, the forward operator \(B^{\varvec{y}}\), the regularization term \(\tfrac{\alpha _3}{2}\Vert z\Vert _{L^2(V';I)}^2\) and the random variable tracking-type objective function \(\Phi ^{\varvec{y}}\) satisfy the assumptions of [29, Proposition 3.12]. In order to present the result about the existence and uniqueness of the solution of (3.4), which is based on [29, Proposition 3.12], we recall some definitions from convex analysis (see, e.g., [29] and the references therein): A functional \({\mathcal {R}}: L^p(\Omega ,{\mathcal {A}},{\mathbb {P}}) \rightarrow {\mathbb {R}} \cup \{\infty \}\) is called proper if \({\mathcal {R}}(X) > -\infty \) for all \(X \in L^p(\Omega ,{\mathcal {A}},{\mathbb {P}})\) and \({\textrm{dom}}({\mathcal {R}}):= \{X \in L^p(\Omega ,{\mathcal {A}},{\mathbb {P}}): {\mathcal {R}}(X) < \infty \} \ne \emptyset \); it is called lower semicontinuous or closed if its epigraph \(\textrm{epi}({\mathcal {R}}):= \{(X,\alpha ) \in L^p(\Omega ,{\mathcal {A}},{\mathbb {P}}) \times {\mathbb {R}}: {\mathcal {R}}(X) \le \alpha \}\) is closed in the product topology \(L^p(\Omega ,{\mathcal {A}},{\mathbb {P}}) \times {\mathbb {R}}\).
Lemma 3.1
Let \(\alpha _1,\alpha _2 \ge 0\) and \(\alpha _3>0\) with \(\alpha _1+\alpha _2 >0\) and let \({\mathcal {R}}\) be proper, closed, convex and monotonic, then there exists a unique solution of (3.4).
Proof
The existence of the solution follows directly from [29, Proposition 3.12]. We thus only prove the strong convexity of the objective function, which implies strict convexity and hence uniqueness of the solution. Clearly \(\frac{\alpha _3}{2}\Vert z\Vert ^2_{L^2(V';I)}\) is strongly convex. Since the sum of a convex and a strongly convex function is strongly convex it remains to show the convexity of \(\mathcal R(\Phi ^{\varvec{y}}(z))\). By the linearity and the bounded invertibility of the linear forward operator \(B^{\varvec{y}}\), the tracking-type objective functional \(\Phi ^{\varvec{y}}(z)\) is quadratic in z and hence convex, i.e., we have for \(z,{\tilde{z}} \in L^2(V';I)\) and \(\lambda \in [0,1]\) that \(\Phi ^{\varvec{y}}(\lambda z + (1-\lambda ) {\tilde{z}}) \le \lambda \Phi ^{\varvec{y}}(z) + (1-\lambda )\Phi ^{\varvec{y}}({\tilde{z}})\). Then, by the monotonicity and the convexity of the risk measure \({\mathcal {R}}\) we get \( {\mathcal {R}}(\Phi ^{\varvec{y}}(\lambda z + (1-\lambda ) {\tilde{z}})) \le {\mathcal {R}}(\lambda \Phi ^{\varvec{y}}(z) + (1-\lambda )\Phi ^{\varvec{y}}({\tilde{z}})) \le \lambda {\mathcal {R}}( \Phi ^{\varvec{y}}(z)) + (1-\lambda )\mathcal R(\Phi ^{\varvec{y}}({\tilde{z}}))\,, \) as required. \(\square \)
3.1 Linear risk measures, including the expected value
First we derive a formula for the Fréchet derivative of (3.2) when \({{\mathcal {R}}}\) is linear, which includes the special case \({{\mathcal {R}}}(\cdot ) = \int _U (\cdot )\,\textrm{d}{\varvec{y}}\).
Lemma 3.2
Let \({{\mathcal {R}}}\) be linear. Then the Fréchet derivative of (3.2) as an element of \((L^2(V';I))'=L^2(V;I)\) is given by
for \(z\in L^2(V';I)\).
Proof
For \(z,\delta \in L^2(V';I)\), we can write
where we used (2.15) to write \(u^{\varvec{y}}(z+\delta )-u^{\varvec{y}}(z) = [(B_1^{\varvec{y}})^{\dagger }(z+\delta ) + (B_2^{\varvec{y}})^{\dagger }u_0] - [(B_1^{\varvec{y}})^{\dagger }(z) + (B_2^{\varvec{y}})^{\dagger }u_0] = (B_1^{\varvec{y}})^{\dagger }\delta \). Expanding the squared norms using \(\Vert v+w\Vert ^2 = \langle v+w,v+w\rangle = \Vert v\Vert ^2 + 2\langle v,w\rangle + \Vert w\Vert ^2\), we obtain
with the Fréchet derivative \(\partial _z J(z): L^2(V';I)\rightarrow {\mathbb {R}}\) defined by
It remains to simplify the three terms. Using the extended Riesz operator \(R_V\!: L^2(V;I) \rightarrow L^2(V';I)\), we have
where the third equality follows since \((B_1^{\varvec{y}})^{\dagger }\delta \in {{\mathcal {X}}} \hookrightarrow L^2(V;I)\), and the fourth equality follows from the definition of the dual operator \(((B_1^{\varvec{y}})^{\dagger })': {{\mathcal {X}}}' \rightarrow L^2(V;I)\), noting that \((L^2(V';I))' = L^2(V;I)\).
Next, using the definition of the dual operator \((E_T)':L^2(D) \rightarrow {{\mathcal {X}}}'\), we can write
Finally, using the definition of the \(L^2(V',I)\) inner product and the extended inverse Riesz operator \(R_V^{-1}\!:L^2(V';I)\rightarrow L^2(V;I)\), we obtain
Writing \((\partial _z J(z))\, \delta = \langle J'(z),\delta \rangle _{L^2(V;I),L^2(V';I)}\) and collecting the terms above leads to the expression for \(J'(z)\) in (3.5).\(\square \)
We call \(J'(z)\) the gradient of J(z) and show next, that \(J'(z)\) can be computed using the solution of the dual problem (2.16) with
We show this first for the special case when \({{\mathcal {R}}}\) is linear.
Lemma 3.3
Let \(\alpha _1,\alpha _2 \ge 0\) and \(\alpha _3 > 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and then let \(q^{\varvec{y}}\in {{\mathcal {Y}}}\) be the solution of (2.16) with \(f_{\textrm{dual}}\) given by (3.6). Then for \({{\mathcal {R}}}\) linear, the gradient of (3.2) is given as an element of \(L^2(V;I)\) by
for \(z \in L^2(V';I)\).
Proof
This follows immediately from (3.6), Lemma 3.2 and Lemma 2.5. \(\square \)
Proposition 3.4
Under the conditions of Lemma 3.3, with \(f_{\textrm{dual}}\) given by (3.6), the dual solution \(q^{\varvec{y}}= (q_1^{\varvec{y}},q_2^{\varvec{y}}) \in {{\mathcal {Y}}}\) satisfies
Consequently, the left-hand side of (2.16) reduces to
and hence \(q_1^{\varvec{y}}\) is the solution to
where the first equation holds for \({\varvec{x}}\in D\), \(t\in I\), and the second equation holds for \({\varvec{x}}\in \partial D\), \(t\in I\), and the last equation holds for \(x\in D\).
Proof
Since (2.16) holds for arbitrary \(w\in {{\mathcal {X}}}\), it holds in particular for the special case
with arbitrary \(v \in V\). For \(f_{\textrm{dual}}\) given by (3.6), the right-hand side of (2.16) becomes
From (2.17) the left-hand side of (2.16) is now
Equating the two sides, letting \(n\rightarrow \infty \), and noting that \(v\in V\) is arbitrary, we conclude that necessarily \(q_2^{\varvec{y}}= q_1^{\varvec{y}}(\cdot ,0)\).
Hence, the left-hand side of (2.16) reduces to (3.8). By analogy with the weak form of (2.2), using the transformation \(t \mapsto T-t\), we conclude that \(q_1^{\varvec{y}}\) is the solution to (3.9). \(\square \)
3.2 The entropic risk measure
The expected value is risk neutral. Next, we consider risk averse risk measures such as the entropic risk measure
for an essentially bounded random variable \(Y({\varvec{y}})\) and some \(\theta \in (0,\infty )\). Using \({{\mathcal {R}}} = {{\mathcal {R}}}_{\textrm{e}}\) in (3.2), the optimal control problem becomes \(\min _{z \in {\mathcal {Z}}} J(z)\), with
for some \(\theta \in (0,\infty )\) and \(\Phi ^{\varvec{y}}\) defined in (3.3).
In the following we want to compute the Fréchet derivative of J(z) with respect to \(z \in L^2(V';I)\). To this end, we verify that \(\Phi ^{\varvec{y}}(z) \le C < \infty \) is uniformly bounded in \({\varvec{y}}\in U\) for any \(z\in L^2(V';I)\), i.e. the constant \(C>0\) is independent of \({\varvec{y}}\in U\).
Lemma 3.5
Let \(f = (z,u_0)\in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\), and let \(\alpha _1,\alpha _2 \ge 0\) with \(\alpha _1+\alpha _2 >0\). Then for all \({\varvec{y}}\in U\), the function \(\Phi ^{\varvec{y}}\) defined by (3.3) satisfies
Thus for all \(\theta >0\) we have
Proof
We have from (3.3) that
which yields (3.11) after applying (2.9). \(\square \)
Using the preceding lemma, we compute the gradient of (3.10).
Lemma 3.6
Let \(\alpha _1,\alpha _2 \ge 0\) and \(\alpha _3 > 0\), with \(\alpha _1 + \alpha _2 > 0\), and let \(0<\theta <\infty \). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and then let \(q^{\varvec{y}}= (q_1^{\varvec{y}},q_2^{\varvec{y}}) \in {{\mathcal {Y}}}\) be the solution of (2.16) with \(f_{\textrm{dual}}\) given by (3.6). Then the gradient of (3.10) is given as an element of \(L^2(V;I)\) for \(z \in L^2(V';I)\) by
where \(\Phi ^{\varvec{y}}(z)\) is defined in (3.3).
Proof
The application of the chain rule gives
Lemma 3.5 implies that \(1 \le \int _U \exp \big (\theta \,\Phi ^{\varvec{y}}(z)\big )\,\textrm{d}{\varvec{y}}<\infty \). Then the integral is a bounded and linear operator and hence its Fréchet derivative is the operator itself. Exploiting this fact, we obtain that \(\partial _z \left( \int _U \exp \big (\theta \,\Phi ^{\varvec{y}}(z)\big )\,\mathrm d{\varvec{y}}\right) = \int _U \left( \partial _z \exp \big (\theta \,\Phi ^{\varvec{y}}(z)\big )\right) \mathrm d{\varvec{y}}\). By the chain rule it follows for each \({\varvec{y}}\in U\) that \( \partial _z \exp \big (\theta \,\Phi ^{\varvec{y}}(z)\big ) = \theta \, \exp \big (\theta \,\Phi ^{\varvec{y}}(z)\big )\, \partial _z \Phi ^{\varvec{y}}(z)\,. \) Recalling from the previous subsection that \(\partial _z (\frac{\alpha _3}{2}\Vert z\Vert _{L^2(V';I)}^2) = \alpha _3 R_V^{-1} z\) and \( \partial _z \Phi ^{\varvec{y}}(z) = \big ((B_1^{\varvec{y}})^{\dagger }\big )'(\alpha _1 R_V +\alpha _2 E_T'E_T)(u^{\varvec{y}}(z)-{\widehat{u}}) = q_1^{\varvec{y}}\,, \) and collecting terms gives (3.14). \(\square \)
3.3 Optimality conditions
In the case when the feasible set of controls \({\mathcal {Z}}\) is a nonempty and convex set, we know (see, e.g., [46, Lemma 2.21]) that the optimal control \(z^*\) satisfies the variational inequality
For convex objective functionals J(z), like the ones considered in this work, the variational inequality is a necessary and sufficient condition for optimality. The complete optimality conditions are then given by the following result.
Theorem 3.7
Let \({\mathcal {R}}\) be the expected value or the entropic risk measure. A control \(z^* \in L^2(V';I)\) is the unique minimizer of (2.3) subject to (2.2) and (3.1) if and only if it satisfies the optimality system:
which holds for all \({\varvec{y}}\in U\), and \(J'(z)\) is given by (3.7) for the expected value, or (3.14) for the entropic risk measure.
Observe that the optimality system in Theorem 3.7 contains the variational formulations of the state PDE (2.6) and the dual PDE (2.16) in the first and second equation, respectively.
It is convenient to reformulate the variational inequality (3.15) in terms of an orthogonal projection onto \(\mathcal Z\). The orthogonal projection onto a nonempty, closed and convex subset \({\mathcal {Z}} \subset H\) of a Hilbert space H, denoted by \(P_{{\mathcal {Z}}}: H \rightarrow {\mathcal {Z}}\), is defined as
Then, see, e.g., [27, Lemma 1.11], for all \(h \in H\) and \(\gamma >0\) the condition \(h \in {\mathcal {Z}}\), \(\langle h, v - z\rangle _{H} \ge 0\, \forall v \in {\mathcal {Z}}\) is equivalent to \(z - P_{{\mathcal {Z}}}(z - \gamma h) = 0\). Using the definition of the Riesz operator and \(H = L^2(V';I)\), we conclude that (3.15) is equivalent to
This equivalence can then be used to develop projected descent methods to solve the optimal control problem, see, e.g., [27, Chapter 2.2.2].
Remark 3.8
If \({\mathcal {Z}}\) is the closed ball with radius \(r > 0\) in a Hilbert space H, then the orthogonal projection \(P_{{\mathcal {Z}}}\) is given by
4 Parametric regularity of the adjoint state
In this section we derive an a priori bound for the adjoint state and the partial derivatives of the adjoint state with respect to the parametric variables. Existing results, e.g., [32, Theorem 4], do not directly apply to our case, since the right-hand side of the affine linear, parametric operator equation depends on the parametric variable, more specifically
Lemma 4.1
Let \(\alpha _1,\alpha _2 \ge 0\) and \(\alpha _3 > 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and then let \(q^{\varvec{y}}\in {{\mathcal {Y}}}\) be the solution of (2.16) with \(f_\textrm{dual}\) given by (3.6). Then we have
where \(\beta _1\) is described in (2.8).
Proof
By the bounded invertibility of \(B^{\varvec{y}}\) and its dual operator, we have
with \(\Vert ((B^{\varvec{y}})')^{-1}\Vert _{{{\mathcal {X}}}'\rightarrow {{\mathcal {Y}}}} \le 1/\beta _1\),
where we used (2.9). Combining the estimates gives the desired result. \(\square \)
Theorem 4.2
Let \(\alpha _1,\alpha _2 \ge 0\) and \(\alpha _3 > 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and then let \(q^{\varvec{y}}\in {{\mathcal {Y}}}\) be the solution of (2.16) with \(f_{\textrm{dual}}\) given by (3.6). Then for every \({\varvec{\nu }}\in {\mathscr {F}}\) we have
where \(\beta _1\) is described in (2.8) and the sequence \({\varvec{b}}= (b_j)_{j\ge 1}\) is defined in (2.11).
Proof
For \({\varvec{\nu }}= {\varvec{0}}\) the assertion follows from the previous lemma. For \({\varvec{\nu }}\ne {\varvec{0}}\) we take derivatives \(\partial _{\varvec{y}}^{\varvec{\nu }}((B^{\varvec{y}})' q^{\varvec{y}}) = \partial _{\varvec{y}}^{\varvec{\nu }}((\alpha _1 R_V + \alpha _2 E_T'E_T)(u^{\varvec{y}}- {\widehat{u}}))\) and use the Leibniz product rule to get
Separating out the \({\varvec{m}}= {\varvec{0}}\) term, we obtain
By the bounded invertibility of \((B^{\varvec{y}})'\), we have \(\Vert ((B^{\varvec{y}})')^{-1}\Vert _{{{\mathcal {X}}}'\rightarrow {{\mathcal {Y}}}} \le \tfrac{1}{\beta _1}\) and
Recall that
For \({\varvec{m}}\ne {\varvec{0}}\), we conclude with (2.1) that \(\langle v,\partial ^{{\varvec{m}}} (B^{\varvec{y}})' w\rangle _{{{\mathcal {X}}},{{\mathcal {X}}}'} = \int _I \int _D \psi _j\, \nabla v \cdot \nabla w \,\textrm{d}x \,\textrm{d}t\) if \({\varvec{m}}=\varvec{e}_j\), and otherwise it is zero. Hence for \({\varvec{m}}={\varvec{e}}_j\) we obtain for all \(v\in {{\mathcal {Y}}}\) that
Hence
By Lemma 4.1 this recursion is true for \({\varvec{\nu }}= {\varvec{0}}\) and we may apply [33, Lemma 9.1] to get
We finally arrive at
where the equality follows from [33, Formula (9.4)]. \(\square \)
5 Regularity analysis for the entropic risk measure
Our goal is to use QMC to approximate the following high-dimensional integrals appearing in the denominator and numerator of the gradient (3.14). To this end, we develop regularity bounds for the integrands.
Lemma 5.1
Let \(\theta >0\), \(\alpha _1,\alpha _2 \ge 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and let \(\Phi ^{\varvec{y}}\) be as in (3.3). Then for all \({\varvec{\nu }}\in {\mathscr {F}}\) we have
where the sequence \({\varvec{b}}= (b_j)_{j\ge 1}\) is defined by (2.11).
Proof
The case \({\varvec{\nu }}={\varvec{0}}\) is precisely (3.11). Consider now \({\varvec{\nu }}\ne {\varvec{0}}\). We estimate the partial derivatives of \(\Phi ^{\varvec{y}}\) by differentiating under the integral sign and using the Leibniz product rule in conjunction with the Cauchy–Schwarz inequality to obtain
Separating out the \({\varvec{m}}={\varvec{0}}\) and \({\varvec{m}}={\varvec{\nu }}\) terms and utilizing (2.10), we obtain
where the sum over \({\varvec{m}}\) can be rewritten as
where we used the identity
which is a simple consequence of the Vandermonde convolution [38, Eq. (5.1)]. Combining the estimates yields the required result. \(\square \)
For future reference, we state a recursive form of Faà di Bruno’s formula [41] for the exponential function.
Theorem 5.2
Let \(G: U\rightarrow \mathbb {R}\). For all \({\varvec{y}}\in U\) and \({\varvec{\nu }}\in {\mathscr {F}}\setminus \{{\varvec{0}}\}\), we have
where the sequence \((\alpha _{{\varvec{\nu }},\lambda }({\varvec{y}}))_{{\varvec{\nu }}\in \mathscr {F},\lambda \in {\mathbb {N}}_0}\) is defined recursively by \(\alpha _{{\varvec{\nu }}, 0}({\varvec{y}})=\delta _{{\varvec{\nu }},{\varvec{0}}}\), \(\alpha _{{\varvec{\nu }},\lambda }({\varvec{y}})=0\) for \(\lambda >|{\varvec{\nu }}|\), and otherwise
Proof
This is a special case of [41, Formulas (3.1) and (3.5)] in which f is the exponential function and \(m=1\) so that \(\lambda \) is an integer. \(\square \)
Lemma 5.3
Let the sequence \((\mathbb {A}_{{\varvec{\nu }},\lambda })_{{\varvec{\nu }}\in {\mathscr {F}},\,\lambda \in \mathbb {N}_0}\) satisfy \(\mathbb {A}_{{\varvec{\nu }},0}=\delta _{{\varvec{\nu }},{\varvec{0}}}\), \(\mathbb {A}_{{\varvec{\nu }},\lambda }=0\) for \(\lambda >|{\varvec{\nu }}|\), and otherwise satisfy the recursion
for some \(c>0\) and a nonnegative sequence \({\varvec{\rho }}\). Then for all \({\varvec{\nu }}\ne {\varvec{0}}\) and \(1\le \lambda \le |{\varvec{\nu }}|\) we have
The result is sharp in the sense that both inequalities can be replaced by equalities.
Proof
We prove (5.3) for all \({\varvec{\nu }}\ne {\varvec{0}}\) and \(1\le \lambda \le |{\varvec{\nu }}|\) by induction on \(|{\varvec{\nu }}|\). The base case \(\mathbb {A}_{{\varvec{e}}_j,1}\) is easy to verify. Let \({\varvec{\nu }}\ne {\varvec{0}}\) and suppose that (5.3) holds for all multi-indices \({\varvec{m}}\) of order \(\le |{\varvec{\nu }}|\) and all \(1\le \lambda \le |{\varvec{m}}|\). The case \(\mathbb {A}_{{\varvec{\nu }}+{\varvec{e}}_j,1}\) is also straightforward to verify. We consider therefore \(2\le \lambda \le |{\varvec{\nu }}|+1\). Using (5.2) and the induction hypothesis, we have
where we used (5.1) and then regrouped the factors as binomial coefficients. Next we take the binomial identity [38, Eq. (5.6)]
separate out the \(\ell =0\) term, and use \(\sum _{k=1}^{\lambda -1} (-1)^k \left( {\begin{array}{c}\lambda -1\\ k\end{array}}\right) = \sum _{k=0}^{\lambda -1} (-1)^k \left( {\begin{array}{c}\lambda -1\\ k\end{array}}\right) -1 = -1\), to rewrite T as
Substituting this back into (5.4) and simplifying the factors, we obtain
as required. \(\square \)
Theorem 5.4
Let \(\theta >0\), \(\alpha _1,\alpha _2 \ge 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and let \(\Phi ^{\varvec{y}}\) be as in (3.3). Then for all \({\varvec{\nu }}\in {\mathscr {F}}\) we have
where the sequence \({\varvec{b}}= (b_j)_{j\ge 1}\) is defined by (2.11) and \(\sigma \) is defined by (3.13).
Proof
For \({\varvec{\nu }}={\varvec{0}}\) we have from (3.12) that \(|\exp (\theta \,\Phi ^{\varvec{y}})| \le e^\sigma \), which satisfies the required bound. For \({\varvec{\nu }}\ne {\varvec{0}}\), from Faà di Bruno’s formula (Theorem 5.2) we have
with \(\alpha _{{\varvec{\nu }}, 0}({\varvec{y}})=\delta _{{\varvec{\nu }},{\varvec{0}}}\), \(\alpha _{{\varvec{\nu }},\lambda }({\varvec{y}})=0\) for \(\lambda >|{\varvec{\nu }}|\), and
where we used Lemma 5.1. Applying Lemma 5.3 we conclude that
We have
where we used \(\left( {\begin{array}{c}n\\ m\end{array}}\right) \le n^m/m! \le e^n\). Combining (5.5), (5.6), (5.7) and (3.11) gives
as required. \(\square \)
Remark 5.5
In the proof of Theorem 5.4, a different manipulation of (5.7) can yield a different bound \(2c\, e^{|{\varvec{\nu }}|+\sigma e^2+\sigma +1}(|{\varvec{\nu }}|-1)!\) for \({\varvec{\nu }}\ne {\varvec{0}}\), leading to a tighter upper bound for large \(|{\varvec{\nu }}|\) at the expense of a bigger constant,
This then leads to a more complicated bound for Theorem 5.6 below. We have chosen to present the current form of Theorem 5.4 to simplify our subsequent analysis.
Interestingly, the sum in (5.6) can also be rewritten as a sum with only positive terms: denoting \(v = |{\varvec{\nu }}|\),
which is identical to the sequence [3, Proposition 7] and the sequence A181289 in the OEIS (written in slightly different form). However, we were unable to find a closed form expression for the sum; neither [21] nor [38] shed any light. The hope is to obtain an alternative bound for (5.7) that does not involve the factor \(e^{|{\varvec{\nu }}|}\). This is open for future research.
As an alternative approach to the presented bootstrapping method, holomorphy arguments can be used to derive similar regularity bounds, see, e.g., [8].
Theorem 5.6
Let \(\theta >0\), \(\alpha _1,\alpha _2 \ge 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and \(\Phi ^{\varvec{y}}\) be as in (3.3), and then let \(q^{\varvec{y}}= (q_1^{\varvec{y}},q_2^{\varvec{y}}) \in {{\mathcal {Y}}}\) be the solution of (2.16) with \(f_{\textrm{dual}}\) given by (3.6). Then for all \({\varvec{\nu }}\in {\mathscr {F}}\) we have
where the sequence \({\varvec{b}}= (b_j)_{j\ge 1}\) is defined by (2.11), \(\sigma \) is defined by (3.13) and
Proof
Using the Leibniz product rule and Theorems 5.4 with 4.2, we obtain
with the last equality due to [33, Formula (9.5)]. \(\square \)
6 Error analysis
Let \(z^*\) denote the solution of (3.4) and let \(z_{s,n}^*\) be the minimizer of
where \(\Phi _s^{{\varvec{y}}}(z) = \Phi ^{(y_1,y_2,\ldots ,y_s,0,0,\ldots )}(z)\) is the truncated version of \(\Phi ^{\varvec{y}}(z)\) defined in (3.3), and \({\mathcal {R}}_{s,n}\) is an approximation of the risk measure \({\mathcal {R}}\), for which the integrals over the parameter domain \(U= [-\frac{1}{2},\frac{1}{2}]^{{\mathbb {N}}}\) are replaced by s-dimensional integrals over \(U_s= [-\frac{1}{2},\frac{1}{2}]^{s}\) and then approximated by an n-point randomly-shifted QMC rule:
for \(\theta \in (0,\infty )\), for carefully chosen QMC points \({\varvec{y}}^{(i)}\), \(i=1,\ldots ,n\), involving a uniformly sampled random shift \({\varvec{\Delta }}\in [0,1]^s\), see Sect. 6.2.
We have seen in the proof of Lemma 3.1 that the risk measures considered in this manuscript are convex and the objective function J, see (3.2), is thus strongly convex. It is important to note that the n-point QMC rule preserves the convexity of the risk measure, so \(J_{s,n}\) is a strongly convex function, because it is a sum of a convex and a strongly convex function. Therefore we have the optimality conditions \(\langle J_{s,n}'(z_{s,n}^*), z - z^*_{s,n}\rangle _{L^2(V;I),L^2(V';I)} \ge 0\) for all \(z \in {\mathcal {Z}}\) and thus in particular \(\langle J_{s,n}'(z_{s,n}^*), z^* - z^*_{s,n}\rangle _{L^2(V;I),L^2(V';I)} \ge 0\). Similarly, we have \(\langle J'(z^*), z - z^* \rangle _{L^2(V;I),L^2(V';I)} \ge 0\), and in particular \(\langle -J'(z^*), z^* - z^*_{s,n} \rangle _{L^2(V;I),L^2(V';I)} \ge 0\). Adding these inequalities gives
Hence
where we used the \(\alpha _3\)-strong convexity of \(J'_{s,n}\) in the fourth step, i.e.,
Thus we have with (3.4)
We will next expand this upper bound in order to split it into the different error contributions: dimension truncation error and QMC error. The different error contributions are then analyzed separately in the following subsections for both risk measures.
In the case of the expected value, it follows from (3.7) that
where \(q^{\varvec{y}}_{1,s}:= q^{(y_1,y_2,\ldots ,y_s,0,0,\ldots )}_{1}\) denotes the truncated version of \(q_1^{\varvec{y}}\), and \(\mathbb {E}_{\varvec{\Delta }}\) denotes the expected value with respect to the random shift \({\varvec{\Delta }}\in [0,1]^s\).
In the case of the entropic risk measure, we recall that \(J'(z)\) is given by (3.14). Let
then we have
where we used \(T\ge 1\) and \(T_{s,n}\ge 1\) and, using the abbreviation \(g^{\varvec{y}}({\varvec{x}},t):= \exp (\theta \,\Phi ^{{\varvec{y}}}(z))\,q_1^{{\varvec{y}}}({\varvec{x}},t)\) we get
where we used Theorem 5.6 with \({\varvec{\nu }}={\varvec{0}}\).
We can write
For the first term on the right-hand side of (6.2) we obtain
and for the second term we have
Remark 6.1
Since we have \(\Vert v_1\Vert _{L^2(V;I)} \le \Vert v\Vert _{{{\mathcal {Y}}}}\) for all \(v=(v_1,v_2)\in {{\mathcal {Y}}}\) by definition, and thus in particular \(\Vert \int _U (q_1^{\varvec{y}}- q_{1,s}^{\varvec{y}})\,\mathrm d{\varvec{y}}\Vert _{L^2(V;I)} \le \Vert \int _U (q^{\varvec{y}}- q_s^{\varvec{y}}) \,\mathrm d{\varvec{y}}\Vert _{{{\mathcal {Y}}}}\), we can replace \(q_1^{\varvec{y}},q_{1,s}^{\varvec{y}}\in L^2(V;I)\) in (6.1) and (6.4) by \(q^{\varvec{y}},q_s^{\varvec{y}}\in {{\mathcal {Y}}}\). In order to obtain error bounds and convergence rates for (6.1) and (6.4), it is then sufficient to derive the results in the \({{\mathcal {Y}}}\)-norm, which is slightly stronger than the \(L^2(V;I)\)-norm.
6.1 Truncation error
In this section we derive bounds and convergence rates for the errors that occur by truncating the dimension, i.e., for the first terms in (6.1), (6.3) and (6.4).
We prove a new and very general theorem for the truncation error based on knowledge of regularity. The idea of the proof is based on a Taylor series expansion and is similar to the approach in [19, Theorem 4.1]. The use of Taylor series for dimension truncation error analysis has also been considered, for instance, in [4, 17].
Theorem 6.2
Let Z be a separable Banach space and let \(g({\varvec{y}}): U \rightarrow Z\) be analytically dependent on the sequence of parameters \({\varvec{y}}\in U = [-\frac{1}{2},\frac{1}{2}]^{{\mathbb {N}}}\). Suppose there exist constants \(C_0>0\), \(r_1\ge 0\), \(r_2>0\), and a sequence \({\varvec{\rho }}= (\rho _j)_{j\ge 1} \in \ell ^p({\mathbb {N}})\) for \(0<p<1\), with \(\rho _1 \ge \rho _2 \ge \cdots \), such that for all \({\varvec{y}}\in U\) and \({\varvec{\nu }}\in {\mathscr {F}}\) we have
Then, denoting \(({\varvec{y}}_{\le s};\varvec{0}) = (y_1,y_2,\ldots ,y_s,0,0,\ldots )\), we have for all \(s \in {\mathbb {N}}\)
for \(C>0\) independent of s.
Proof
Let \({\varvec{y}}\in U\) and \(G\in Z'\) with \(\Vert G\Vert _{Z'}\le 1\) and define
Evidently, \(\partial _{{\varvec{y}}}^{{\varvec{\nu }}}F({\varvec{y}})=\langle G,\partial _{{\varvec{y}}}^{{\varvec{\nu }}}g({\varvec{y}})\rangle _{Z',Z}\) for all \({\varvec{\nu }}\in \mathscr {F}\). Moreover,
For arbitrary \(k\ge 1\) we consider the Taylor expansion of F about \(({\varvec{y}}_{\le s};{\textbf{0}}) = (y_1,y_2,\ldots ,y_s,0,0,\ldots )\):
Rearranging this equation and integrating over \({\varvec{y}}\in U\) yields
If there is any component \(\nu _j=1\) with \(j>s\), then the summand in the first term vanishes, since (for all \({\varvec{\nu }}\in {\mathscr {F}}\) with \(\nu _j = 0\) \(\forall j\le s\))
where we used Fubini’s theorem. Taking the absolute value on both sides in (6.5) and using \(|y_j|\le \frac{1}{2}\), we obtain
Furthermore, we have
which is also bounded by the last expression in (6.6).
For s sufficiently large, we obtain in complete analogy to [15] that the first term in (6.6) satisfies
since \({\varvec{\rho }}\in \ell ^p\) with \(0<p<1\) and \(\rho _1\ge \rho _2\ge \cdots \) by assumption.
On the other hand, we can use the multinomial theorem to bound the second term in (6.6)
where we used \(\sum _{j>s}\rho _j\le s^{-1/p+1} (\sum _{j=1}^\infty \rho _j^p )^{1/p}\). (The last inequality follows directly from [9, Lemma 5.5], which is often attributed to Stechkin. For an elementary proof we refer to [31, Lemma 3.3].)
Taking now \(k=\lceil \frac{1}{1-p}\rceil \) yields that (6.6) is of order \({{\mathcal {O}}}(s^{-2/p+1})\). Note that \(k\ge 2\) for \(0<p<1\). The result can be extended to all s by a trivial adjustment of the constants. \(\square \)
Remark 6.3
The assumption of analyticity of the integrand can be replaced since the Taylor series representation remains valid under the weaker assumption that only the \({\varvec{\nu }}\)-th partial derivatives with \(|{\varvec{\nu }}|\le k+1\) for \(k=\lceil \frac{1}{1-p}\rceil \) and \(0<p<1\) exist.
We now apply this general result to the first terms in (6.1), (6.3) and (6.4).
Theorem 6.4
Let \(\theta >0\), \(\alpha _1,\alpha _2 \ge 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\), let \(u^{\varvec{y}}\in {{\mathcal {X}}}\) be the solution of (2.2) and \(\Phi ^{\varvec{y}}\) be as in (3.3), and then let \(q^{\varvec{y}}\in {{\mathcal {Y}}}\) be the solution of (2.16) with \(f_\textrm{dual}\) given by (3.6). Suppose the sequence \({\varvec{b}}= (b_j)_{j\ge 1}\) defined by (2.11) satisfies
Then for every \(s \in {\mathbb {N}}\), the truncated solutions \(u_s^{\varvec{y}}\), \(q_s^{\varvec{y}}\) and \(\Phi _s^{\varvec{y}}\) satisfy
In each case we have a generic constant \(C>0\) independent of s, but depending on z, \(u_0\), \({\widehat{u}}\) and other constants as appropriate.
Proof
The result is a corollary of Theorem 6.2 by applying the regularity bounds in Lemma 2.1, Theorems 4.2, 5.6 and 5.4. \(\square \)
6.2 Quasi-Monte Carlo error
We are interested in computing s-dimensional Bochner integrals of the form
where \(g({\varvec{y}})\) is an element of a separable Banach space Z for each \({\varvec{y}}\in U_s\). As our estimator of \(I_s(g)\), we use a cubature rule of the form
with weights \(\alpha _i \in {\mathbb {R}}\) and cubature points \({\varvec{y}}^{(i)} \in U_s\). In particular, we are interested in QMC rules (see, e.g., [12, 33]), which are cubature rules characterized by equal weights \(\alpha _i = 1/n\) and carefully chosen points \({\varvec{y}}^{(i)}\) for \(i=1,\ldots ,n\).
We shall see that for sufficiently smooth integrands, randomly shifted lattice rules lead to convergence rates not depending on the dimension, and which are faster compared to Monte Carlo methods. Randomly shifted lattice rules are QMC rules with cubature points given by
where \(\varvec{z}\in {\mathbb {N}}^s\) is known as the generating vector, \(\varvec{\Delta } \in [0,1]^s\) is the random shift and \(\textrm{frac}(\cdot )\) means to take the fractional part of each component in the vector. In order to get an unbiased estimator, in practice we take the mean over R uniformly drawn random shifts, i.e., we estimate \(I_s(g)\) using
In this section we derive bounds and convergence rates for the errors that occur by applying a QMC method for the approximation of the integrals in the second terms in (6.1), (6.3) and (6.4). We first prove a new general result which holds for any cubature rule in a separable Banach space setting.
Theorem 6.5
Let \(U_s = [-\frac{1}{2},\frac{1}{2}]^s\) and let \({{\mathcal {W}}}_s\) be a Banach space of functions \(F: U_s\rightarrow \mathbb {R}\), which is continuously embedded in the space of continuous functions. Consider an n-point cubature rule with weights \(\alpha _i\in \mathbb {R}\) and points \({\varvec{y}}^{(i)}\in U_s\), given by
and define the worst case error of \(Q_{s,n}\) in \({{\mathcal {W}}}_s\) by
Let Z be a separable Banach space and let \(Z'\) denote its dual space. Let \(g: {\varvec{y}}\mapsto g({\varvec{y}})\) be continuous and \(g({\varvec{y}}) \in Z\) for all \({\varvec{y}}\in U_s\). Then
Proof
From the separability of Z and the continuity of \(g({\varvec{y}})\) we get strong measurability of \(g({\varvec{y}})\). Moreover, from the compactness of \(U_s\) and the continuity of \({\varvec{y}}\mapsto g({\varvec{y}})\) we conclude that \(\sup _{{\varvec{y}}\in U_s}\Vert g({\varvec{y}})\Vert _Z < \infty \) and hence \(\int _{U_s} \Vert g({\varvec{y}})\Vert _Z\, \textrm{d}{\varvec{y}}< \infty \), which in turn implies \(\Vert \int _{U_s} g({\varvec{y}})\, \textrm{d}{\varvec{y}}\Vert _Z <\infty \). Thus \(g({\varvec{y}})\) is Bochner integrable.
Furthermore, for every normed space Z, its dual space \(Z'\) is a Banach space equipped with the norm \(\Vert G\Vert _{Z'}:= \sup _{g \in Z,\, \Vert g\Vert _{Z} \le 1} |\langle G,g\rangle _{Z',Z}|\). Then it holds for every \(g\in Z\) that \(\Vert g\Vert _Z = \sup _{G \in Z',\, \Vert G\Vert _{Z'} \le 1} |\langle G, g\rangle _{Z',Z}|\). This follows from the Hahn–Banach Theorem, see, e.g., [40, Theorem 4.3].
Thus we have
where we used the linearity of G and the fact that for Bochner integrals we can swap the integral with the linear functional, see, e.g., [49, Corollary V.2].
From the definition of the worst case error of \(Q_{s,n}\) in \({{\mathcal {W}}}_s\) it follows that for any \(F\in {{\mathcal {W}}}_s\) we have
Applying this to the special case \(F({\varvec{y}}) = G(g({\varvec{y}})) = \langle G, g({\varvec{y}})\rangle _{Z',Z}\) in (6.12) yields (6.11). \(\square \)
Theorem 6.6
Let the assumptions of the preceding Theorem hold. In addition, suppose there exist constants \(C_0>0\), \(r_1\ge 0\), \(r_2>0\) and a positive sequence \({\varvec{\rho }}= (\rho _j)_{j\ge 1}\) such that for all \(\mathrm {\mathfrak {u}}\subseteq \{1:s\}\) and for all \({\varvec{y}}\in U_s\) we have
Then, a randomly shifted lattice rule can be constructed using a component-by-component (CBC) algorithm such that
where \(\phi _{\textrm{tot}}(n)\) is the Euler totient function, with \(1/\phi _{\textrm{tot}}(n)\le 2/n\) when n is a prime power, and
Proof
We consider randomly shifted lattice rules in the unanchored weighted Sobolev space \({{\mathcal {W}}}_{s,{\varvec{\gamma }}}\) with norm
It is known that CBC construction yields a lattice generating vector satisfying
We have from (6.11) that
Using the definition of the \({{\mathcal {W}}}_{s,{\varvec{\gamma }}}\)-norm, we have
We can now use the assumption (6.13) and combine all of the estimates to arrive at the required bound. \(\square \)
Remark 6.7
Theorem 6.5 holds for arbitrary cubature rules, thus similar results to Theorem 6.6 can be stated for other cubature rules. In particular, the regularity bounds obtained in Sects. 4 and 5 can also be used for worst case error analysis of higher-order QMC quadrature as well as sparse grid integration.
We now apply Theorem 6.6 to the second terms in (6.1), (6.3) and (6.4).
Theorem 6.8
Let \(\theta >0\), \(\alpha _1,\alpha _2 \ge 0\), with \(\alpha _1 + \alpha _2 > 0\). Let \(f = (z,u_0) \in {{\mathcal {Y}}}'\) and \({\widehat{u}} \in {{\mathcal {X}}}\). For every \({\varvec{y}}\in U\) and \(s\in \mathbb {N}\), let \(u^{\varvec{y}}_s \in {{\mathcal {X}}}\) be the truncated solution of (2.2) and \(\Phi ^{\varvec{y}}_s\) be as in (3.3), and then let \(q^{\varvec{y}}_s \in {{\mathcal {Y}}}\) be the truncated solution of (2.16) with \(f_\textrm{dual}\) given by (3.6). Then a randomly shifted lattice rule can be constructed using a CBC algorithm such that for all \(\lambda \in (\tfrac{1}{2},1]\) we have
where \(\phi _{\textrm{tot}}(n)\) is the Euler totient function, with \(1/\phi _{\textrm{tot}}(n)\le 2/n\) when n is a prime power. Here \(C_{s,{\varvec{\gamma }},\lambda }\) is given by (6.14), with \(r_1 = 2\), \(r_2 = e\), \(\rho _j = b_j\) defined in (2.11), and \(C_0>0\) is independent of s, n, \(\lambda \) and weights \({\varvec{\gamma }}\) but depends on u, \(z_0\), \({\widehat{u}}\) and other constants.
With the choices
we have that \(C_{s,{\varvec{\gamma }}^*,\lambda }\) is bounded independently of s. (However, \(C_{s,{\varvec{\gamma }}^*,\frac{1}{2-2\delta }} \rightarrow \infty \) as \(\delta \rightarrow 0\) and \(C_{s,{\varvec{\gamma }}^*,\frac{p}{2-p}} \rightarrow \infty \) as \(p \rightarrow (2/3)^+\).) Consequently, under the assumption (6.7), the above three mean-square errors are of order
Proof
The mean-square error bounds are a corollary of Theorem 6.6 by applying the regularity bounds in Lemma 2.1, Theorems 4.2, 5.4 and 5.6. For simplicity we set \(C_0\), \(r_1\) and \(r_2\) to be the largest values arising from the four results.
We know from [35, Lemma 6.2] that for any \(\lambda \), \(C_{s,{\varvec{\gamma }},\lambda }\) defined in (6.14) is minimized by \(\gamma _\mathrm {\mathfrak {u}}= \gamma _\mathrm {\mathfrak {u}}^*\). By inserting \({\varvec{\gamma }}^*\) into \(C_{s,{\varvec{\gamma }},\lambda }\) we can then derive the condition \(p< \frac{2\lambda }{1+\lambda } <1\) for which \(C_{s,{\varvec{\gamma }}^*,\lambda }\) is bounded independently of s. This condition on \(\lambda \), together with \(\lambda \in (\frac{1}{2},1]\) and \(p \in (0,1)\) yields the result. \(\square \)
6.3 Combined error bound
Combining the results in this section gives the final theorem.
Theorem 6.9
Let \(\alpha _1,\alpha _2 \ge 0\) and \(\alpha _3>0\) with \(\alpha _1+\alpha _2 >0\), and let the risk measure \({{\mathcal {R}}}\) be either the expected value or the entropic risk measure with \(\theta >0\). Denote by \(z^*\) the unique solution of (3.4) and by \(z^*_{s,n}\) the unique solution of the truncated problem using a randomly shifted lattice rule as approximation of the risk measure. Then, if (6.7) and (6.8) hold, we have
where \(\kappa (n)\) is given in (6.19), and the constant \(C>0\) is independent of s and n but depends on z, \(u_0\), \({\widehat{u}}\) and other constants.
Proof
This follows from (6.1)–(6.4), Remark 6.1, Theorems 6.4 and 6.8. \(\square \)
7 Numerical experiments
We consider the coupled PDE system
and
where the first equations in (7.1) and (7.2) hold for \({\varvec{x}}\in D\), \(t\in I\), \({\varvec{y}}\in U\), the second equations hold for \({\varvec{x}}\in \partial D\), \(t\in I\), \({\varvec{y}}\in U\), and the last equations hold for \({\varvec{x}}\in D\) and \({\varvec{y}}\in U\). We fix the physical domain \(D=(0,1)^2\) and the terminal time \(T=1\). The uncertain diffusion coefficient, defined as in (2.1), is independent of t, and parameterized in all experiments with mean field \(a_0({\varvec{x}})\equiv 1\) and the fluctuations
We use the implicit Euler finite difference scheme with step size \(\Delta t=\frac{T}{500}=2\cdot 10^{-3}\) to discretize the PDE system with respect to the temporal variable. The spatial part of the PDE system is discretized using a first order finite element method with mesh size \(h=2^{-5}\) and the Riesz operator in the loading term corresponding to (7.2) can be evaluated using (2.4). In all experiments, the lattice rules are generated using the fast CBC algorithm with weights chosen as in Theorem 6.8, where we used the parameter value \(\beta _1=1\) in (2.11).
In the numerical experiments presented in Sects. 7.1–7.3, we choose
where
As the parameters appearing in the objective functional (2.3) and adjoint equation (7.2), we use \(\alpha _1=10^{-3}\), \(\alpha _2=10^{-2}\), and \(\alpha _3=10^{-7}\). Moreover, we always use
In Sects. 7.1 and 7.2, we fix the source term
to assess the dimension truncation and QMC errors.
All computations were carried out on the computational cluster Katana supported by Research Technology Services at UNSW Sydney [28].
7.1 Dimension truncation error
The dimension truncation errors in Theorem 6.4 are estimated by approximating the quantities
as well as
for \(s'\gg s\), by using a tailored lattice cubature rule generated using the fast CBC algorithm with \(n=2^{15}\) nodes and a single fixed random shift to compute the high-dimensional parametric integrals. The obtained results are displayed in Figs. 1 and 2 for the fluctuations \((\psi _j)_{j\ge 1}\) corresponding to decay rates \(\vartheta \in \{1.3,2.6\}\) and dimensions \(s\in \{2^k\mid k\in \{1,\ldots ,9\}\}\). We use \(\theta =10\) in the computations corresponding to \(S_s\) and \(T_s\). As the reference solution, we use the solutions corresponding to dimension \(s'=2048=2^{11}\).
The theoretical dimension truncation rate is readily observed in the case \(\vartheta =1.3\). We note in the case \(\vartheta =2.6\) that the dimension truncation convergence rates degenerate for large values of s. This may be explained by the fact that the QMC cubature with \(n=2^{15}\) nodes has an error around \(10^{-8}\) (see Fig. 3 in Sect. 7.2), but the finite element discretization error may also be a contributing factor. For smaller values of s, the higher order convergence is also apparent in the case \(\vartheta =2.6\).
7.2 QMC error
We investigate the QMC error rate by computing the root-mean-square approximations
corresponding to (6.15)–(6.18), where \({\overline{Q}}\) and \(Q^{(r)}\) are as in (6.10) for a randomly shifted lattice rule with cubature nodes (6.9), where the random shift \(\varvec{\Delta }\) is drawn from \(U([0,1]^s)\). As the generating vector, we use lattice rules constructed using the fast CBC algorithm with \(n=2^m\), \(m\in \{4,\ldots ,15\}\), lattice points and \(R=16\) random shifts, and \(s=100\). We carry out the experiments using two different decay rates \(\vartheta \in \{1.3,2.6\}\) for the input random field. The results are displayed in Fig. 3. The root-mean-square error converges at a linear rate in all experiments, which is consistent with the theory.
7.3 Optimal control problem
We consider the problem of finding the optimal control \(z\in {\mathcal {Z}}\) that minimizes (2.3) subject to the PDE constraint (2.2). We consider constrained optimization over \({\mathcal {Z}}=\{z\in L^2(V';I):\Vert z\Vert _{L^2(V';I)}\le 2\}\) and compare our results with the reconstruction obtained by carrying out unconstrained optimization over \({\mathcal {Z}}=L^2(V';I)\). To this end, we define the projection operator
which is used in the constrained setting, while in the unconstrained setting we use \({\mathcal {P}}:={\mathcal {I}}_{L^2(V;I)}\). The operator \({\mathcal {P}}\) acts on \(L^2(V;I)\) and hence it is different from the operator \(P_{{\mathcal {Z}}}\) introduced in Sect. 3.3, which projects onto \({\mathcal {Z}}\).
To be able to handle elements of \({\mathcal {Z}}\) numerically, we apply the projected gradient method (see, e.g., [27]) as described in Algorithm 1 together with the projected Armijo rule stated in Algorithm 2. Note that evaluating \(J(R_Vw)\) and \(J'(R_Vw)\) in Algorithms 1 and 2 requires solving the state PDE with the source term \(R_Vw\). In particular, the Riesz operator appears in the loading term after finite element discretization and can thus be evaluated using (2.4). We use the initial guess \(w_0=0\). The parameters of the gradient descent method were chosen to be \(\eta _0=100\), \(\gamma =10^{-4}\), and \(\beta =0.1\).
We consider the entropic risk measure with \(\theta =10\) and set \(\vartheta = 1.3\). The reconstructed optimal control obtained using the bounded set of feasible controls \({\mathcal {Z}}\) is displayed in Fig. 4. The reconstructed optimal control at the terminal time \(T=1\) and its pointwise difference to the control obtained without imposing control constraints are displayed in Fig. 5. Finally, the evolution of the objective functional as the number of gradient descent iterations increases is plotted in Fig. 6 for the constrained and unconstrained optimization problems.
8 Conclusion
We developed a specially designed QMC method for an optimal control problem subject to a parabolic PDE with an uncertain diffusion coefficient. To account for the uncertainty, we considered as measures of risk the expected value and the more conservative (nonlinear) entropic risk measure. For the high-dimensional integrals originating from the risk measures, we provide error bounds and convergence rates in terms of dimension truncation and the QMC approximation. In particular, after dimension truncation, the QMC error bounds do not depend on the number of uncertain variables, while leading to faster convergence rates compared to Monte Carlo methods. In addition we extended QMC error bounds in the literature to separable Banach spaces, and hence the presented error analysis is discretization invariant.
References
Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Math. Finance 9, 203–228 (2001). https://doi.org/10.1111/1467-9965.00068
Blondeel, P., Robbe, P., Van hoorickx, C., François, S., Lombaert, G., Vandewalle, S.: \(p\)-refined multilevel quasi-Monte Carlo for Galerkin finite element methods with applications in civil engineering. Algorithms 13(5), 110 (2020). https://doi.org/10.3390/a13050110
Castiglione, G., Frosini, A., Munarini, E., Restivo, A., Rinaldi, S.: Combinatorial aspects of \(L\)-convex polyominoes. Eur. J. Combin. 28(6), 1724–1741 (2007). https://doi.org/10.1016/j.ejc.2006.06.020
Charrier, J.: Strong and weak error estimates for elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50(1), 216–246 (2012). https://doi.org/10.1137/100800531
Chen, P., Ghattas, O.: Taylor approximation for chance constrained optimization problems governed by partial differential equations with high-dimensional random parameters. SIAM/ASA J. Uncertain. Quantif. 9(4), 1381–1410 (2021). https://doi.org/10.1137/20M1381381
Chen, P., Royset, J.O.: Performance bounds for PDE-constrained optimization under uncertainty. SIAM J. Optim. (2023). https://doi.org/10.1137/21M1457916
Chen, P., Villa, U., Ghattas, O.: Taylor approximation and variance reduction for PDE-constrained optimal control under uncertainty. J. Comput. Phys. 385, 163–186 (2019). https://doi.org/10.1016/j.jcp.2019.01.047
Cohen, A., DeVore, R.: Approximation of high-dimensional parametric PDEs. Acta Numer. 24, 1–159 (2015). https://doi.org/10.1017/S0962492915000033
Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best \(N\)-term Galerkin approximations for a class of elliptic sPDEs. Found. Comput. Math. 6(10), 615–646 (2010). https://doi.org/10.1007/s10208-010-9072-2
Dautray, R., Lions, J.L.: Mathematical Analysis and Numerical Methods for Science and Technology: Volume 5 Evolution Problems I. Springer, Heidelberg (2012)
Dick, J., Gantner, R.N., Le Gia, Q.T., Schwab, C.: Higher order quasi-Monte Carlo integration for Bayesian PDE inversion. Comput. Math. Appl. 77(1), 144–172 (2019). https://doi.org/10.1016/j.camwa.2018.09.019
Dick, J., Kuo, F.Y., Sloan, I.H.: High-dimensional integration: the quasi-Monte Carlo way. Acta Numer. 22, 133–288 (2013). https://doi.org/10.1017/S0962492913000044
Evans, L.C.: Partial Differential Equations. American Mathematical Society, Providence (2010)
Föllmer, H., Schied, A.: Convex measures of risk and trading constraints. Finance Stoch. 6, 429–447 (2002). https://doi.org/10.1007/s007800200072
Gantner, R.N.: Dimension truncation in QMC for affine-parametric operator equations. In: Owen, A.B., Glynn, P.W. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2016, pp. 249–264. Springer, Stanford (2018)
Gantner, R.N., Herrmann, L., Schwab, C.: Quasi-Monte Carlo integration for affine-parametric, elliptic PDEs: local supports and product weights. SIAM J. Numer. Anal. 56(1), 111–135 (2018). https://doi.org/10.1137/16M1082597
Gantner, R.N., Peters, M.D.: Higher-order quasi-Monte Carlo for Bayesian shape inversion. SIAM/ASA J. Uncertain. Quantif. 6(2), 707–736 (2018). https://doi.org/10.1137/16M1096116
Garreis, S., Surowiec, T.M., Ulbrich, M.: An interior-point approach for solving risk-averse PDE-constrained optimization problems with coherent risk measures. SIAM J. Optim. 31(1), 1–29 (2021). https://doi.org/10.1137/19M125039X
Gilbert, A.D., Graham, I.G., Kuo, F.Y., Scheichl, R., Sloan, I.H.: Analysis of quasi-Monte Carlo methods for elliptic eigenvalue problems with stochastic coefficients. Numer. Math. 142(4), 863–915 (2019). https://doi.org/10.1007/s00211-019-01046-6
Gilbert, A.D., Scheichl, R.: Multilevel quasi-Monte Carlo for random elliptic eigenvalue problems I: regularity and error analysis. IMA J. Numer. Anal. (2023). https://doi.org/10.1093/imanum/drad011
Gould, H.W.: Combinatorial Identities: A Standardized Set of Tables Listing 500 Binomial Coefficient Summations. Morgantown (1972)
Guth, P.A., Kaarnioja, V., Kuo, F.Y., Schillings, C., Sloan, I.H.: A quasi-Monte Carlo method for optimal control under uncertainty. SIAM/ASA J. Uncertain. Quantif. 9(2), 354–383 (2021). https://doi.org/10.1137/19M1294952
Guth, P.A., Van Barel, A.: Multilevel quasi-Monte Carlo for optimization under uncertainty. Numer. Math. 154, 443–484 (2023). https://doi.org/10.1007/s00211-023-01364-w
Harbrecht, H., Peters, M., Siebenmorgen, M.: On the quasi-Monte Carlo method with Halton points for elliptic PDEs with log-normal diffusion. Math. Comput. 86, 771–797 (2017). https://doi.org/10.1090/mcom/3107
Herrmann, L., Keller, M., Schwab, C.: Quasi-Monte Carlo Bayesian estimation under Besov priors in elliptic inverse problems. Math. Comput. 90, 1831–1860 (2021). https://doi.org/10.1090/mcom/3615
Herrmann, L., Schwab, C.: QMC integration for lognormal-parametric, elliptic PDEs: local supports and product weights. Numer. Math. 141, 63–102 (2019). https://doi.org/10.1007/s00211-018-0991-1
Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE Constraints. Springer, Heidelberg (2009)
Katana (2010). https://doi.org/10.26190/669X-A286
Kouri, D.P., Surowiec, T.M.: Existence and optimality conditions for risk-averse PDE-constrained optimization. SIAM/ASA J. Uncertain. Quantif. 6(2), 787–815 (2018). https://doi.org/10.1137/16M1086613
Kouri, D.P., Surowiec, T.M.: Epi-regularization of risk measures. Math. Oper. Res. 45(2), 774–795 (2020). https://doi.org/10.1287/moor.2019.1013
Kressner, D., Tobler, C.: Low-rank tensor Krylov subspace methods for parametrized linear systems. SIAM J. Matrix Anal. Appl. 32(4), 1288–1316 (2011). https://doi.org/10.1137/100799010
Kunoth, A., Schwab, C.: Analytic regularity and GPC approximation for control problems constrained by linear parametric elliptic and parabolic PDEs. SIAM J. Control. Optim. 51(3), 2442–2471 (2013). https://doi.org/10.1137/110847597
Kuo, F.Y., Nuyens, D.: Application of quasi-Monte Carlo methods to elliptic PDEs with random diffusion coefficients: a survey of analysis and implementation. Found. Comput. Math. 16, 1631–1696 (2016). https://doi.org/10.1007/s10208-016-9329-5
Kuo, F.Y., Scheichl, R., Schwab, C., Sloan, I.H., Ullmann, E.: Multilevel quasi-Monte Carlo methods for lognormal diffusion problems. Math. Comput. 86, 2827–2860 (2017). https://doi.org/10.1090/mcom/3207
Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50(6), 3351–3374 (2012). https://doi.org/10.1137/110845537
Martin, M., Krumscheid, S., Nobile, F.: Complexity analysis of stochastic gradient methods for PDE-constrained optimal control problems with uncertain parameters. ESAIM: Math. Model. Numer. Anal. 55(4), 1599–1633 (2021). https://doi.org/10.1051/m2an/2021025
Martin, M., Nobile, F.: PDE-constrained optimal control problems with uncertain parameters using SAGA. SIAM/ASA J. Uncertain. Quantif. 9(3), 979–1012 (2021). https://doi.org/10.1137/18M1224076
Quaintance, J., Gould, H.W.: Combinatorial Identities for Stirling Numbers: The Unpublished Notes of H. W. Gould. World Scientific Publishing Company, River Edge (2015)
Robbe, P., Nuyens, D., Vandewalle, S.: Recycling samples in the multigrid multilevel (quasi-)Monte Carlo method. SIAM J. Sci. Comput. 41(5), S37–S60 (2019). https://doi.org/10.1137/18M1194031
Rudin, W.: Functional Analysis. McGraw-Hill, Singapore (1991)
Savits, T.H.: Some statistical applications of Faa di Bruno. J. Multivariate Anal. 97(10), 2131–2140 (2006). https://doi.org/10.1016/j.jmva.2006.03.001
Scheichl, R., Stuart, A.M., Teckentrup, A.L.: Quasi-Monte Carlo and multilevel Monte Carlo methods for computing posterior expectations in elliptic inverse problems. SIAM/ASA J. Uncertain. Quantif. 5(1), 493–518 (2017). https://doi.org/10.1137/16M1061692
Schwab, C.: QMC Galerkin discretization of parametric operator equations. In: Dick, J., Kuo, F.Y., Peters, G.W., Sloan, I.H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2012, pp. 613–629. Springer, Heidelberg (2013)
Schwab, C., Stevenson, R.: Space-time adaptive wavelet methods for parabolic evolution problems. Math. Comput. 78, 1293–1318 (2009). https://doi.org/10.1090/S0025-5718-08-02205-9
Tong, S., Vanden-Eijnden, E., Stadler, G.: Extreme event probability estimation using PDE-constrained optimization and large deviation theory, with application to tsunamis. Commun. Appl. Math. Comput. Sci. 16(2), 181–225 (2021). https://doi.org/10.2140/camcos.2021.16.181
Tröltzsch, F.: Optimal Control of Partial Differential Equations: Theory, Methods and Applications. American Mathematical Society, Providence (2010)
Van Barel, A., Vandewalle, S.: Robust optimization of PDEs with random coefficients using a multilevel Monte Carlo method. SIAM/ASA J. Uncertain. Quantif. 7(1), 174–202 (2019). https://doi.org/10.1137/17M1155892
Van Barel, A., Vandewalle, S.: MG/OPT and multilevel Monte Carlo for robust optimization of PDEs. SIAM J. Optim. 31(3), 1850–1876 (2021). https://doi.org/10.1137/20M1347164
Yosida, K.: Functional Analysis. Springer, Heidelberg (1980)
Acknowledgements
P. A. Guth is grateful to the DFG RTG1953 “Statistical Modeling of Complex Systems and Processes” for funding of this research. F. Y. Kuo and I. H. Sloan acknowledge the support from the Australian Research Council (DP210100831).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Guth, P.A., Kaarnioja, V., Kuo, F.Y. et al. Parabolic PDE-constrained optimal control under uncertainty with entropic risk measure using quasi-Monte Carlo integration. Numer. Math. 156, 565–608 (2024). https://doi.org/10.1007/s00211-024-01397-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-024-01397-9