1 Introduction

Consider the following variational inequality problem \({\text {VI}}({\mathcal {X}},F)\): find \(x^* \in {\mathcal {X}}\) such that

$$\begin{aligned} (x - x^*)^\top F(x^*) \ge 0, \quad \text { for all } x \in {\mathcal {X}}, \end{aligned}$$
(1)

where \({\mathcal {X}}\subset {\mathbb {R}}^n\) is a compact set and each component \(i \in \{1,2,\dots n\}\) of the map \(F:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\), denoted \(F_i:{\mathbb {R}}^n \rightarrow {\mathbb {R}}\), is given by

$$\begin{aligned} F_i (x) = {\text {CVaR}}_\alpha ^{\mathbb {P}}[f_i(x,u)]. \end{aligned}$$
(2)

In the above equation, \(f_i:{\mathcal {X}}\times {\mathcal {U}} \rightarrow {\mathbb {R}}\) is referred to as the random function, the set \({\mathcal {U}}\subset {\mathbb {R}}^m\) is compact, and \({\mathbb {P}}\) is the distribution of u supported over the set \({\mathcal {U}}\). We assume \(f_i\) is continuous. The map \(F_i\) gives the conditional value-at-risk (CVaR) at level \(\alpha \in (0,1)\) of the random function \(f_i\). The CVaR computes the tail expectation of the underlying random variable [18] and can be determined by the following optimization

$$\begin{aligned} {\text {CVaR}}_\alpha ^{{\mathbb {P}}} [f_i(x,u)] = \inf _{t \in {\mathbb {R}}} \, \Bigl \{ t + \frac{1}{\alpha } {\mathbb {E}}_{\mathbb {P}}[f_i(x,u) - t]_+ \Bigr \}, \end{aligned}$$
(3)

where \({\mathbb {E}}_{\mathbb {P}}\) is the expectation under the distribution \({\mathbb {P}}\) and the operator \([\,\cdot \,]_+\) gives the positive part, i.e., \([v]_+ = \max \{0,v\}\). The parameter \(\alpha\) characterizes the risk-averseness. When \(\alpha\) is close to unity, the decision-maker is risk-neutral, whereas, \(\alpha\) close to the origin implies high risk-averseness. The main purpose of the paper is to analyze the statistical properties of a sample average approximation (SAA) scheme for solving the variational inequality \({\text {VI}}({\mathcal {X}},F)\) given in (1). The set of solutions of this problem is denoted by \({\text {SOL}}({\mathcal {X}},F)\).

Variational inequality problems defined using a set of random functions is surveyed in [17]. The most widely studied VI problem in this context, termed stochastic variational inequalities (SVIs), is the one where the map defining the VI is the expectation of a random function. Risk-based VIs, where the VI map is given as the risk of a random function, naturally generalize the setup of SVI and find application in finding the Wardrop equilibria in a network routing problem where users are risk-averse. While several works explore sample average schemes for SVIs, there is no such study for risk-averse VIs. This paper aims to fill this gap.

Early investigations on statistical aspects of SAA for generalized equations and SVIs appeared in [8, 9], respectively. These works focused on asymptotic properties of the SAA schemes, that is, consistency of estimators and their asymptotic distributions. The former is concerned with showing the convergence with probability one of solutions of the SAA to solutions of the original problem as the sample size tends to infinity. The latter determines the distribution of the approximate solutions in the asymptotic limit. While these properties show the limiting behavior, they do not illustrate the guarantees in the finite-sample regime. This feature was explored in [15, 19, 27, 28] where it was shown that for generalized equilibrium problems under various set of assumptions, one can demonstrate exponential convergence of the approximate solutions. Meaning that the probability that the SAA solution is a fixed distance away from the original solution decays exponentially as the sample size tends to infinity. Technically, establishing such a property relies on conducting sensitivity analysis for the VI and then combining it with uniform large deviation bounds on random functions. All these studies share the common property that the underlying map is the expectation of the random function, while in this paper we look at \({\text {CVaR}}\)-based maps.

The works [1, 11, 21] study SAA of \({\text {CVaR}}\) in the context of stochastic optimization problems, where \({\text {CVaR}}\) is either being minimized or used to define the constraints. In [11, 21] asymptotic consistency and exponential convergence of Karush-Kuhn-Tucker (KKT) points of the sample average optimization problem to that of the true one was established. In [1], the SAA of \({\text {CVaR}}\) is used to approximate the solution of risk-constrained optimization problem. Since \({\text {CVaR}}\) is used to define a VI problem in our case, the analysis does not follow directly from these existing results. Moreover, as opposed to the general large deviation bounds provided in these works, the exponential bounds derived here are explicit without involving ambiguous constants. In another data-based approach [16], the \({\text {CVaR}}\) is perceived as the expected shortfall and desirable statistical guarantees are obtained for the optimizers of its sample average.

One of the motivations for our work is to approximate the Wardrop equilibirum for a network routing problem where agents choose paths that have minimum risk. Such a setting was extensively studied in [13] where various notions of equilibrium and related computational aspects of finding them were discussed. Among other works that consider risk, [12, 14] assume the cost of each path to be the weighted sum of the mean and the variance of the uncertain cost. However, none of these works focus on CVaR-based routing. In the transportation literature, the CVaR-based equilibrium is also known as the mean excess traffic equilibrium, see e.g., [2, 29] and references therein. While these works have explored numerous algorithms for computing the equilibrium, they lack theoretical performance guarantees for sample-based solutions.

For analyzing the SAA of (1), we assume that a certain number of independent and identically distributed samples of the random variable u are available using which the expectation operator in the definition of the \({\text {CVaR}}\) is replaced with its sample average. The resulting empirical \({\text {CVaR}}\) gives rise to a set of functions that are sample average versions of F. Using these, we define a sample average variational inequality. Our contributions are as follows:

  1. (i)

    We establish asymptotic consistency of the sample average scheme, that is, the set of solutions of the sample average VI converge almost surely, in a set-valued sense, to the set \({\text {SOL}}({\mathcal {X}},F)\).

  2. (ii)

    Under the assumption that random functions are uniformly Lipschitz continuous in x, we show exponential convergence of the solution set of the sample average VI to the set \({\text {SOL}}({\mathcal {X}},F)\). That is, given any constant, the probability that the distance of a solution of the sample average problem from the set \({\text {SOL}}({\mathcal {X}},F)\) is less than that constant approaches unity exponentially with the number of samples.

  3. (iii)

    We give tighter sample guarantees with explicit expression for the coefficient in the exponential bound for a particular class of separable random functions.

  4. (iv)

    We illustrate the application of the derived approximations in computing a CVaR-based Wardrop equilibrium for a network routing problem that is defined using uncertain costs.

A preliminary version of the paper appeared as [3], where the focus was finding the Wardrop equilibrium problem for a network routing problem. As compared to it, the present article has a more general problem setup focusing not just on a Wardrop equilibrium problem, but on a general VI. In addition, the tighter sample guarantees for separable functions given in Sect. 4 are new here and the simulation example is much more elaborate.

Notation Let \({\mathbb {R}}\), \({{\mathbb {R}}}_{\ge 0}\), \({\mathbb {R}}_{>0}\), and \({\mathbb {N}}\) denote the set of real, nonnegative real, positive real, and natural numbers, respectively. Let \(\Vert \cdot \Vert\) denote the Euclidean 2-norm. We use \([N]:=\{1, \dots , N\}\) for positive integer N. For \(x \in {\mathbb {R}}\), we let \([x]_+ = \max (x,0)\) and \(\lceil x \rceil\) be the smallest integer greater than or equal to x. The cardinality of a set \({\mathcal {S}}\) is denoted by \(|{\mathcal {S}}|\). The distance of a point \(x \in {\mathbb {R}}^m\) to a set \({\mathcal {S}}\subset {\mathbb {R}}^m\) is denoted by \({\text {dist}}(x,{\mathcal {S}}):= \inf _{y \in {\mathcal {S}}} \Vert x-y \Vert\). The deviation of a set \({\mathcal {A}}\subset {\mathbb {R}}^m\) from \({\mathcal {S}}\) is \({\mathbb {D}}({\mathcal {A}},{\mathcal {S}}):= \sup _{y \in {\mathcal {A}}} {\text {dist}}(y,{\mathcal {S}})\).

2 Preliminaries

Here we collect relevant mathematical background used throughout the paper.

2.1 Variational inequality

Given a map \(F:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) and a closed set \({\mathcal {X}}\subset {\mathbb {R}}^n\), the variational inequality (VI) problem, denoted \({\text {VI}}({\mathcal {X}},F)\), involves finding \(x^* \in {\mathcal {X}}\) such that \((x-x^*)^\top F(x^*) \ge 0\) for all \(x \in {\mathcal {X}}\). Such a point is called a solution of the VI problem. The set of solutions of \({\text {VI}}({\mathcal {X}},F)\) are denoted by \({\text {SOL}}({\mathcal {X}},F)\). The map F is monotone on the set \({\mathcal {X}}\) if \((F(x) - F(x'))^\top (x - x') \ge 0\) for all \(x, x' \in {\mathcal {X}}\). The map F is strictly monotone on \({\mathcal {X}}\) if this inequality is strict for \(x \not = x'\). Finally, F is strongly monotone on \({\mathcal {X}}\) with modulus \(\sigma > 0\) if \((F(x) - F(x'))^\top (x - x') \ge \sigma \Vert x - x' \Vert ^2\) for all \(x, x' \in {\mathcal {X}}\). If F is either strictly or strongly monotone, then \({\text {SOL}}({\mathcal {X}},F)\) is singleton.

2.2 Uniform convergence

A sequence of functions \(\{f_N:{\mathcal {X}} \rightarrow {\mathcal {Y}}\}_{N=1}^\infty\), where \({\mathcal {X}}\) and \({\mathcal {Y}}\) are Euclidean spaces, is said to converge uniformly on a set \(X \subset {\mathcal {X}}\) to \(f:{\mathcal {X}} \rightarrow {\mathcal {Y}}\) if for any \(\epsilon > 0\), there exists \(N_\epsilon \in {\mathbb {N}}\) such that \(\sup _{x \in X} \Vert f_N(x) - f(x) \Vert \le \epsilon\), for all \(N \ge N_\epsilon\). Similar definition applies for convergence in probability. That is, consider a random sequence of functions \(\{f_N^\omega :{\mathcal {X}} \rightarrow {\mathcal {Y}}\}_{N=1}^\infty\) defined on a probability space \((\varOmega , {\mathcal {F}}, P)\). The sequence is said to converge uniformly to \(f:{\mathcal {X}} \rightarrow {\mathcal {Y}}\) on X almost surely (shorthand, a.s.) if \(f_N^\omega \rightarrow f\) uniformly on X for almost all \(\omega \in \varOmega\).

2.3 Risk measures

Next we review notions on value-at-risk and CVaR from [18]. Given a real-valued random variable Z with probability distribution \({\mathbb {P}}\), we denote the cumulative distribution function by \(H_Z(\zeta ):={\mathbb {P}}(Z \le \zeta )\). The left-side \(\alpha\)-quantile of Z is defined as \(H_Z^{-1}(\alpha ):= \inf \{\zeta \; | \; H_Z(\zeta ) \ge \alpha \}\). Given a probability level \(\alpha \in (0,1)\), the value-at-risk of Z at level \(\alpha\), denoted \({\text {VaR}}_\alpha ^{{\mathbb {P}}}[Z]\), is the left-side \((1-\alpha )\)-quantile of Z. Formally,

$$\begin{aligned} {\text {VaR}}_\alpha ^{{\mathbb {P}}}[Z] := H_Z^{-1}(1-\alpha ) = \inf \{\zeta \; | \; {\mathbb {P}}(Z \le \zeta ) \ge 1-\alpha \}. \end{aligned}$$

The CVaR, also referred to as the average value-at-risk in [18], of Z at level \(\alpha\), denoted \({\text {CVaR}}_\alpha ^{{\mathbb {P}}}[Z]\), is given as

$$\begin{aligned} {\text {CVaR}}_\alpha ^{{\mathbb {P}}} [Z] = \inf _{t \in {\mathbb {R}}} \Bigl \{ t + \frac{1}{\alpha } {\mathbb {E}}[Z - t]_+ \Bigr \}. \end{aligned}$$
(4)

Under the continuity of the cumulative distribution function at \({\text {VaR}}_\alpha ^{{\mathbb {P}}}[Z]\), we have that \({\text {CVaR}}^{{\mathbb {P}}}_\alpha [Z]\) is the expectation of Z when it takes values bigger than \({\text {VaR}}_\alpha ^{{\mathbb {P}}}[Z]\). That is, \({\text {CVaR}}_\alpha ^{{\mathbb {P}}} [Z]:= {\mathbb {E}}[Z \ge {\text {VaR}}_\alpha ^{{\mathbb {P}}}[Z]]\). The parameter \(\alpha\) characterizes the risk-averseness. When \(\alpha\) is close to unity, the decision-maker is risk-neutral, whereas, \(\alpha\) close to the origin implies high risk-averseness. The minimum in (4) is attained at a point in the interval \([t^m,t^M]\), where \(t^m: = \inf \{\zeta \; | \; H_Z(\zeta ) \ge 1-\alpha \}\), and \(t^M:= \sup \{\zeta \; | \; H_Z(\zeta ) \le 1-\alpha \}\).

3 Sample average approximation of \({\text {VI}}({\mathcal {X}},F)\)

The approach in the sample average framework is to replace the expectation operator in any problem with the average over the obtained samples [18]. This is one of the main Monte Carlo methods for problems with expectations; see [5] for a detailed survey of other sample-based techniques. In our setup, for each component \(F_i\), we will replace the expectation operator in the definition of the \({\text {CVaR}}\) in (3) with the sample average. The thus formed set of functions result in a \({\text {VI}}\) problem that approximates \({\text {VI}}({\mathcal {X}},F)\).

Note that the map F is continuous since \(f_i\), \(i \in [n]\) are so and \({\mathcal {X}}\) and \({\mathcal {U}}\) are compact. One can reason this fact using arguments similar to those of the proof of Lemma 4. As a consequence of the continuity of F, the set of solutions \({\text {SOL}}({\mathcal {X}},F)\) of the problem \({\text {VI}}({\mathcal {X}},F)\) is nonempty and compact [7, Corollary 2.2.5]. For convenience, we use the notation \({\mathcal {S}}= {\text {SOL}}({\mathcal {X}},F)\).

Let \(\widehat{{\mathcal {U}}}^N:= \{\widehat{u}^1, \widehat{u}^2, \dots , \widehat{u}^N\}\) be the set of \(N \in {\mathbb {N}}\) independent and identically distributed samples of u drawn from \({\mathbb {P}}\). Then, the sample average approximation of the \({\text {CVaR}}\) associated to component \(i \in [n]\) is

$$\begin{aligned} \widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] := \inf _{t \in {\mathbb {R}}} \Bigl \{t + \frac{1}{N \alpha } \sum _{j=1}^N [f_i (x,\widehat{u}^j) - t]_+ \Bigr \}. \end{aligned}$$
(5)

The above expression is also known as the empirical estimate of the \({\text {CVaR}}\), or empirical \({\text {CVaR}}\) in short. The expression is also the \({\text {CVaR}}\) of the random function at level \(\alpha\) under the empirical distribution \(\frac{1}{N} \sum _{j=1}^N \delta _{\widehat{u}^j}\), where \(\delta _{\widehat{u}^j}\) is the unit point mass at \(\widehat{u}^j\). Note that \(\widehat{{\text {CVaR}}}^N_\alpha\) is random as it depends on the realization \(\widehat{{\mathcal {U}}}^N\) of the random variable. To emphasize this dependency, we represent with \(\widehat{\, \cdot \,}^N\) entities that are random. Using (5) as the approximate function, define the sample average VI problem as \({\text {VI}}({\mathcal {X}},\widehat{F}^N)\), where

$$\begin{aligned} \widehat{F}^N_i (x) := \widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)], \end{aligned}$$

for all \(i \in [n]\). We denote the set of solutions of \({\text {VI}}({\mathcal {X}},\widehat{F}^N)\) by \(\widehat{{\mathcal {S}}}^N \subset {\mathcal {X}}\). This serves as a reminder that it approximates \({\mathcal {S}}\). The notion of approximation is made precise next. Note that \(\widehat{{\mathcal {S}}}^N\) is nonempty as \({\mathcal {X}}\) is compact and \(\widehat{F}^N\) is continuous.

Definition 1

(Asymptotic consistency and exponential convergence) The set \(\widehat{{\mathcal {S}}}^N\) is an asymptotically consistent approximation of \({\mathcal {S}}\), or in short, \(\widehat{{\mathcal {S}}}^N\) is asymptotically consistent, if any sequence of solutions \(\{\widehat{x}^N \in \widehat{{\mathcal {S}}}^N\}_{N=1}^\infty\) has almost surely (a.s.) all accumulation points in \({\mathcal {S}}\). The set \(\widehat{{\mathcal {S}}}^N\) is said to converge exponentially to \({\mathcal {S}}\) if for any \(\epsilon > 0\), there exist positive constants \(c_\epsilon\) and \(\delta _\epsilon\) such that for any sequence \(\{ \widehat{x}^N \in \widehat{{\mathcal {S}}}^N\}_{N=1}^\infty\), the following holds

$$\begin{aligned} {\mathbb {P}}^N \Bigl ({\text {dist}}(\widehat{x}^N, {\mathcal {S}}) \le \epsilon \Bigr ) \ge 1-c_\epsilon e^{-\delta _\epsilon N} \end{aligned}$$
(6)

for all \(N \in {\mathbb {N}}\). \(\bullet\)

The asymptotic consistency of \(\widehat{{\mathcal {S}}}^N\) is equivalent to saying \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N,{\mathcal {S}}) \rightarrow 0\) a.s. as \(N \rightarrow \infty\). The expression (6) gives a precise rate for this convergence. In our work, all convergence results are for \(N \rightarrow \infty\) and so we drop restating this fact for convenience’s sake. In the following sections, we will establish the asymptotic consistency and the exponential convergence of \(\widehat{{\mathcal {S}}}^N\).

3.1 Asymptotic consistency of \(\widehat{{\mathcal {S}}}^N\)

We begin with stating the bound on the optimizers of the problem defining the \({\text {CVaR}}\) (3) and the empirical \({\text {CVaR}}\) (5). This restricts our attention to compact domains for variables (xtu), a property useful in showing consistency. Denote for each \(i \in [n]\), functions

$$\begin{aligned} \psi _i(x,t)&:= t + \frac{1}{\alpha } {\mathbb {E}}_{\mathbb {P}}[f_i(x,u)-t]_+, \end{aligned}$$
(7a)
$$\begin{aligned} \widehat{\psi }^N_i(x,t)&:= t + \frac{1}{N \alpha } \sum _{j=1}^N {[}f_i(x,\widehat{u}^j) - t]_+. \end{aligned}$$
(7b)

The map \(\widehat{\psi }^N_i\) is the sample average of \(\psi _i\). Given our assumption that \({\mathcal {U}}\) is compact, we have that the expected value of \(f_i\) is bounded for any \(x \in {\mathcal {X}}\). Using this fact, one can deduce by strong law of large numbers [6] that for any fixed \((x,t) \in {\mathcal {X}}\times {\mathbb {R}}\), \(\widehat{\psi }^N_i(x,t) \rightarrow \psi _i(x,t)\) a.s. We however require uniform convergence of these maps to conclude consistency, which will be established in Theorem 1 below. Observe that, by definition, \({\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)] = \inf _{t \in {\mathbb {R}}} \psi _i(x,t)\) and \(\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] = \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x,t)\). The following result gives explicit bounds on the optimizers of these problems.

Lemma 1

(Bounds on optimizers of problems defining (empirical) \({\text {CVaR}}\)) For any \(x \in {\mathcal {X}}\) and \(i \in [n]\), the optimizers of the problems in (3) and (5) exist and belong to the compact set \({\mathcal {T}}= [\ell ,L]\), where \(\ell := \min \{f_i(x,u) \; | \; x \in {\mathcal {X}}, u \in {\mathcal {U}}, i \in [n]\}\) and \(L:= \max \{f_i(x,u) \; | \; x \in {\mathcal {X}}, u \in {\mathcal {U}}, i \in [n]\}\). Furthermore, the set of functions

$$\begin{aligned} \phi _i (x,t,u) := t + \frac{1}{\alpha } [f_i(x,u)-t]_{+}, \end{aligned}$$
(8)

for \(i \in [n]\), satisfy for all \((x,t,u) \in {\mathcal {X}}\times {\mathcal {T}}\times {\mathcal {U}}\),

$$\begin{aligned} \phi _i(x,t,u) \in \Bigl [ \ell , \ell + \frac{L-\ell }{\alpha } \Bigr ]. \end{aligned}$$
(9)

Proof

From [18, Chapter 6], optimizers of (3) and (5) exist and they lie in the closed interval defined by the left- and the right-side \((1-\alpha )\)-quantile of the respective random variables. Hence, they belong to \({\mathcal {T}}\). To conclude (9), notice

$$\begin{aligned} \phi _i (x,t,u)&= t + \frac{1}{\alpha } [f_i(x,u) -t]_+ \le t + \frac{1}{\alpha } [L-t]_+ \\&= t + \frac{1}{\alpha } (L-t) = (1-\frac{1}{\alpha }) t + \frac{1}{\alpha } L \\&\le (1-\frac{1}{\alpha }) \ell + \frac{1}{\alpha } L. \end{aligned}$$

Here, the first inequality follows from the bound on \(f_i\), the first equality is because \(t \in [\ell ,L]\), and the second inequality is due to \(\alpha < 1\). Similarly, for the lower bound, \(\phi _i(x,t,u) \ge t + \frac{1}{\alpha }[\ell -t]_+ = t \ge \ell\). This completes the proof. \(\square\)

We make a note here that optimizers of problems defining the \({\text {CVaR}}\) in (3) and (5) exist and are bounded for more general cases, even when the support of the random variable is unbounded, see e.g., [18, Chapter 6]. Nevertheless, the above result provides an explicit bound which is used later in deriving precise exponential convergence guarantees.

As a consequence of Lemma 1, one can show uniform convergence of \(\widehat{\psi }^N_p\) to \(\psi _p\). Our next step is to analyze the sensitivity of F as one perturbs the underlying map \(\psi\). In combination with the uniform convergence of \(\widehat{\psi }^N_p\), this result leads to the uniform convergence of \(\widehat{F}^N\) to F.

Lemma 2

(Sensitivity of F with respect to \(\psi\)) For any \(\epsilon > 0\), if we have \(\sup _{i \in [n], (x,t) \in {\mathcal {X}}\times {\mathcal {T}}} |\widehat{\psi }^N_i (x,t) - \psi _i(x,t)| \le \epsilon\), where \({\mathcal {T}}\) is defined in Lemma 1, then \(\sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert \le \sqrt{n} \epsilon\).

Proof

The first step is to show the sensitivity of the map \({\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(\cdot ,u)]\) with respect to \(\psi _p\). To this end, fix \(i \in [n]\) and \(x \in {\mathcal {X}}\), and let \(\widehat{t}^N_i(x) \in \underset{t \in {\mathbb {R}}}{{\text {argmin}}}\ \, \widehat{\psi }^N_i(x,t)\) and \(t_i(x) \in \underset{t \in {\mathbb {R}}}{{\text {argmin}}}\ \, \psi _i(x,t)\). These optimizers exist due to Lemma 1. We now have

$$\begin{aligned} \psi _i \Bigl (x, t_i(x)\Bigr ) - \epsilon \le \psi _i \Bigl (x, \widehat{t}^N_i(x) \Bigr ) -\epsilon \le \widehat{\psi }^N_i \Bigl (x, \widehat{t}^N_i(x)\Bigr ). \end{aligned}$$

The first inequality is due to optimality and the second inequality holds by assumption. Similarly, one can show that

$$\begin{aligned} \widehat{\psi }^N_i\Bigl (x, \widehat{t}^N_i(x)\Bigr ) -\epsilon \le \psi _i\Bigl (x, t_i(x)\Bigr ). \end{aligned}$$

The above two sets of inequalities along with the fact that \(\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] = \widehat{\psi }^N_i \Bigl (x, \widehat{t}^N_i(x)\Bigr )\) and \({\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)] = \psi _i \Bigl (x, t_i(x)\Bigr )\) lead to the conclusion

$$\begin{aligned} \sup _{x \in {\mathcal {X}}} \Big | \widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)] \Big | \le \epsilon . \end{aligned}$$
(10)

Finally, the conclusion follows from the inequality \(\Vert \widehat{F}^N(x) - F(x) \Vert \le \sqrt{n} \sup _{i \in [n]} |\widehat{F}^N_i(x) - F_i(x)|\) that holds for all \(x \in {\mathcal {X}}\). \(\square\)

The final preliminary result states proximity of \(\widehat{{\mathcal {S}}}^N\) to \({\mathcal {S}}\) given that the difference between \(\widehat{F}^N\) and F is bounded. The proof is a consequence of [27, Lemma 2.1] that studies sensitivity of generalized equations.

Lemma 3

(Sensitivity of \({\mathcal {S}}\) with respect to F) For any \(\epsilon > 0\), there exists \(\delta (\epsilon ) > 0\) such that \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N, {\mathcal {S}}) \le \epsilon\) whenever \(\sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert \le \delta (\epsilon )\).

Next is the main result of this section, establishing the asymptotic consistency of \(\widehat{{\mathcal {S}}}^N\).

Theorem 1

(Asymptotic consistency of \(\widehat{{\mathcal {S}}}^N\)) We have \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N,{\mathcal {S}}) \rightarrow 0\) almost surely.

Proof

Consider first the a.s. uniform convergence \(\widehat{\psi }^N_i \rightarrow \psi _i\) over the compact set \({\mathcal {X}}\times {\mathcal {T}}\). Note that \(\psi _i(x,t) = {\mathbb {E}}_{\mathbb {P}}[\phi _i(x,t,u)]\) where \(\phi _i\) is given in (8) and so, \(\widehat{\psi }^N_i\) is the sample average of \(\psi _i\). For any fixed \(u \in {\mathcal {U}}\), the map \(\phi _i(\cdot , \cdot , u)\) is continuous and for any \((x,t) \in {\mathcal {X}}\times {\mathcal {T}}\), due to Lemma 1, the map \(\phi _i(x,t,\cdot )\) is dominated by the integrable function (a constant in this case) \(\ell + \frac{L-\ell }{\alpha }\). Hence, by the uniform law of large numbers result [18, Theorem 7.48], we conclude that \(\widehat{\psi }^N_i \rightarrow \psi _i\) uniformly a.s. on \({\mathcal {X}}\times {\mathcal {T}}\). Using this fact in the sensitivity result of Lemma 2 implies that \(\widehat{F}^N \rightarrow F\) uniformly a.s. on the set \({\mathcal {X}}\). Finally, we arrive at the conclusion using Lemma 3. \(\square\)

3.2 Exponential convergence of \(\widehat{{\mathcal {S}}}^N\)

Here, our strategy will be to use the concentration inequality for the empirical \({\text {CVaR}}\) given in [24] and derive the uniform exponential convergence of \(\widehat{F}^N\) to F. Later, we will use Lemma 3 to infer exponential convergence of \(\widehat{{\mathcal {S}}}^N\). Note that the inequality given in [24] requires compact support of the random variable and it is tight when it comes to the dependency on the risk parameter \(\alpha\). For unbounded support, one can use deviation inequalities from [10].

For \(i \in [n]\) and \(x \in {\mathcal {X}}\), the deviation between the \({\text {CVaR}}\) and its empirical counterpart can be bounded using the results in [24, Theorem 3.1] as

$$\begin{aligned} {\mathbb {P}}^N \Bigl (\Bigl |\widehat{{\text {CVaR}}}_\alpha ^N&[f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\Bigr | \ge \epsilon \Bigr ) \le 6 \exp \Bigl (-\frac{\alpha \epsilon ^2}{11 (L-\ell )^2} N \Bigr ). \end{aligned}$$
(11)

In the above bound, the denominator in the exponent uses the fact that any realization of \(f_i(x,u)\) given any x is supported on the compact set \([\ell ,L]\). Similar to the narrative of the previous section, while the above inequality holds pointwise, what we need is uniform exponential bound for proximity of F to \(\widehat{F}^N\). Below, we will derive such a bound under the following condition.

Assumption 2

(Uniform Lipschitz continuity of \(f_i\)) There exists a constant \(M > 0\) such that \(|f_i(x,u) - f_i(x',u)| \le M \Vert x - x' \Vert\), for all \(x, x' \in {\mathcal {X}}\), \(u \in {\mathcal {U}}\), and \(i \in [n]\). \(\bullet\)

Under the above Lipschitz condition on the random functions, one can show the following.

Lemma 4

(Lipschitz continuity of (empirical) \({\text {CVaR}}\)) Under Assumption 2, for any \(i \in [n]\), functions \(x \mapsto \widehat{{\text {CVaR}}}_\alpha ^N[f_i(x,u)]\) and \(x \mapsto {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\) are Lipschitz continuous over the set \({\mathcal {X}}\) with constant \(\frac{M}{\alpha }\).

Proof

We will show the property for the function \(x \mapsto \widehat{{\text {CVaR}}}_\alpha ^N[f_i(x,u)]\). The reasoning for \(x \mapsto {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\) follows analogously. Consider any \(x, x' \in {\mathcal {X}}\). Recall from (7) that

$$\begin{aligned} \Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)]&- \widehat{{\text {CVaR}}}^N_\alpha [f_i(x',u)]\Bigr | = \Bigl |\inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x,t) - \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x',t)\Bigr |. \end{aligned}$$
(12)

Assumption 2 yields the Lipschitz continuity property for the map \(\widehat{\psi }^N_i\). To establish this, fix any \(i \in [n]\) and \(t \in {\mathbb {R}}\) and notice that

$$\begin{aligned} \Bigl |\widehat{\psi }^N_i (x,t) - \widehat{\psi }^N_i(x',t)\Bigr |&= \Bigl |t + \frac{1}{N \alpha } \sum _{j=1}^N [f_i(x,\widehat{u}^j) - t]_+ \\&\qquad \qquad - \Bigl ( t + \frac{1}{N \alpha } \sum _{j=1}^N {[}f_i(x',\widehat{u}^j) - t]_+ \Bigr )\Bigr | \\&\le \frac{1}{N \alpha } \sum _{j=1}^N \Bigl |[f_i(x,\widehat{u}^j) - t]_+ - {[}f_i(x',\widehat{u}^j) - t]_+\Bigr | \\&\le \frac{1}{N \alpha } \sum _{j=1}^N \Bigl |f_i(x,\widehat{u}^j) - f_i(x',\widehat{u}^j)\Bigr | \le \frac{M}{\alpha } \Vert x - x' \Vert . \end{aligned}$$

In the above relations, the first is a consequence of the triangle inequality, the second inequality follows from the fact that the map \([ \, \cdot \, ]_+\) is Lipschitz continuous with constant as unity, and the last inequality uses the Lipschitz continuity property of \(f_i\). Now let \(\bar{t}, \bar{t}' \in {\mathbb {R}}\) be such that \(\widehat{\psi }^N_i(x,\bar{t}) = \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x,t)\) and \(\widehat{\psi }^N_i(x',\bar{t}') = \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x',t)\). Existence of such an optimizer follows from the discussion in [18, Section 6.2.4]. Next note the following sequence of inequalities that can be inferred from the optimality condition and the Lipschitz continuity property of \(\widehat{\psi }^N_i\) shown above,

$$\begin{aligned} \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x,t)&= \widehat{\psi }^N_i(x,\bar{t}) \le \widehat{\psi }^N_i(x,\bar{t}') \le \widehat{\psi }^N_i(x', \bar{t}') + \frac{M}{\alpha } \Vert x - x' \Vert \nonumber \nonumber \\&= \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x',t) + \frac{M}{\alpha } \Vert x - x' \Vert . \end{aligned}$$
(13)

One can exchange x with \(x'\) in the above reasoning and obtain

$$\begin{aligned} \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x',t) \le \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x,t) + \frac{M}{\alpha } \Vert x - x' \Vert . \end{aligned}$$
(14)

Inequalities (13) and (14) imply that \(\Bigl |\inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x,t) - \inf _{t \in {\mathbb {R}}} \widehat{\psi }^N_i(x',t)\Bigr | \le \frac{M}{\alpha } \Vert x - x' \Vert\). The proof concludes by using this fact in (12). \(\square\)

Next, we establish the exponential convergence of \(\widehat{F}^N\). The proof is largely inspired from the steps given in [19, Theorem 5.1] and is a standard argument in these set of results. We note that the obtained bound is very crude and in practice, the achieved performance is much better.

Proposition 1

(Uniform exponential convergence of \(\widehat{F}^N\) to F) Under Assumption 2, for any \(0< \epsilon < {\text {diam}}({\mathcal {X}})/2\), the following holds for all \(N \in {\mathbb {N}}\),

$$\begin{aligned} {\mathbb {P}}^N \Bigl ( \sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert > \epsilon \Bigr ) \le \gamma (\epsilon ) \exp (-\beta (\epsilon )N), \end{aligned}$$

where

$$\begin{aligned} \gamma (\epsilon )&:= 6 n \Bigl ( \frac{12 M {\text {diam}}({\mathcal {X}})}{\epsilon \alpha } \Bigr )^n \frac{\lceil n/2 \rceil !}{2 \pi ^{n/2}}, \end{aligned}$$
(15a)
$$\begin{aligned} \beta (\epsilon )&:= \frac{\alpha \epsilon ^2}{44 n (L-\ell )^2} , \end{aligned}$$
(15b)

and \({\text {diam}}({\mathcal {X}}) = \sup _{x, x' \in {\mathcal {X}}} \Vert x-x' \Vert\) is the diameter of \({\mathcal {X}}\).

Proof

The idea of moving from the pointwise exponential bound (11) to a uniform bound is to impose the pointwise bound jointly on a finite number of points and use the Lipschitz continuity property (Lemma 4) to bound the deviation of the rest of the set from this finite set. Making precise the mathematical details, note that one can cover the set \({\mathcal {X}}\) with

$$\begin{aligned} K := \Bigl ( \frac{12 M {\text {diam}}({\mathcal {X}})}{\epsilon \alpha } \Bigr )^n \frac{\lceil n/2 \rceil !}{2 \pi ^{n/2}} \end{aligned}$$
(16)

number of points, labeled \({\mathcal {C}}:= \{\tilde{x}^1, \dots , \tilde{x}^K\} \subset {\mathcal {X}}\), such that for any \(x \in {\mathcal {X}}\), there exists a point \(\tilde{x}^{i(x)} \in {\mathcal {C}}\) with

$$\begin{aligned} (M/\alpha ) \Vert x-\tilde{x}^{i(x)} \Vert \le \epsilon /4 . \end{aligned}$$
(17)

The number K can be computed as follows. From (17), we require \(\Vert x - \tilde{x}^{i(x)} \Vert \le \frac{\epsilon \alpha }{4\,M}\). Thus, from Definition 3 in Appendix, the number of points in \({\mathcal {C}}\) need only be bigger than the \(\frac{\epsilon \alpha }{4 M}\)-covering number of \({\mathcal {X}}\). Thus, any upper bound on this covering number suffices. From Lemma 7, one such upper bound is \(\Bigl ( \frac{3 {\text {diam}}({\mathcal {X}})}{(\epsilon \alpha / 4\,M)} \Bigr )^{n} \frac{1}{{\text {vol}}(B)}\), where \({\text {vol}}(B)\) is the volume of the unit norm ball B in \({\mathbb {R}}^{n}\). Since \({\text {vol}}(B) \ge \frac{2 \pi ^{n/2}}{\lceil n/2 \rceil !}\), we get the desired value for K given in (16). Having identified the set of points \({\mathcal {C}}\), we next combine the Lipschitz bound given in Lemma 4 and the inequality (17), to get for all \(i \in [n]\) and \(x \in {\mathcal {X}}\),

$$\begin{aligned} \Bigl | \widehat{{\text {CVaR}}}_\alpha ^N[f_i(x,u)] - \widehat{{\text {CVaR}}}_\alpha ^N[f_i(\tilde{x}^{i(x)},u)]\Bigr | \le \epsilon /4 , \end{aligned}$$
(18a)
$$\begin{aligned} \Bigl |{\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(\tilde{x}^{i(x)},u)]\Bigr | \le \epsilon /4. \end{aligned}$$
(18b)

The above inequalities control the deviation of the functions \(\widehat{{\text {CVaR}}}_\alpha ^N[f_i(\cdot ,u)]\) and \({\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(\cdot ,u)]\) over the set \({\mathcal {X}}\) from the values these functions take on the set \({\mathcal {C}}\). The next step entails bounding the deviation of the \({\text {CVaR}}\) and the empirical \({\text {CVaR}}\) on the set \({\mathcal {C}}\). Employing (11) and the union bound, we have

$$\begin{aligned} {\mathbb {P}}^N \Bigl ( \sup _{i \in [n], x \in {\mathcal {C}}}&\Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\Bigr | \ge \frac{\epsilon }{2} \Bigr ) \nonumber \nonumber \\&\le \sum _{i \in [n]} \sum _{x \in {\mathcal {C}}} {\mathbb {P}}^N \Bigl ( \Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\Bigr | \ge \frac{\epsilon }{2} \Bigr ) \nonumber \nonumber \\&\le 6 n K \exp \Bigl (-\frac{\alpha \epsilon ^2}{44 (L-\ell )^2} N \Bigr ). \end{aligned}$$
(19)

The next set of inequalities characterize the difference between the \({\text {CVaR}}\) and the empirical \({\text {CVaR}}\) over the set \({\mathcal {X}}\) using the Lipschitz continuity property (18). Fix \(i \in [n]\) and let \(x \in {\mathcal {X}}\). Note that using (18),

$$\begin{aligned} |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)]&- {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]| \\&\le |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - \widehat{{\text {CVaR}}}^N_\alpha [f_i(\tilde{x}^{i(x)},u)]| \\&\qquad \quad + |\widehat{{\text {CVaR}}}^N_\alpha [f_i(\tilde{x}^{i(x)},u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(\tilde{x}^{i(x)},u)]| \\&\qquad \quad + |{\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(\tilde{x}^{i(x)},u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]| \\&\le \frac{\epsilon }{2} + |\widehat{{\text {CVaR}}}^N_\alpha [f_i(\tilde{x}^{i(x)},u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(\tilde{x}^{i(x)},u)]|. \end{aligned}$$

Next, the deviation between the \({\text {CVaR}}\) and its empirical counterpart is bounded using (19) and the above characterization as

$$\begin{aligned} {\mathbb {P}}^N \Bigl ( \sup _{i \in [n], x \in {\mathcal {X}}}&\Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\Bigr | \ge \epsilon \Bigr ) \nonumber \nonumber \\&\le {\mathbb {P}}^N \Bigl ( \sup _{i \in [n], x \in {\mathcal {X}}} \Bigl | \widehat{{\text {CVaR}}}^N_\alpha [f_i(x,t)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\Bigr | \ge \frac{\epsilon }{2} \Bigr ) \nonumber \nonumber \\&\le 6 n K \exp \Bigl (-\frac{\alpha \epsilon ^2}{44 (L-\ell )^2} N \Bigr ). \end{aligned}$$
(20)

The final step is to connect the above inequality to the difference between \(\widehat{F}^N\) and F. From the proof of Lemma 2, one can deduce that if \(\sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert > \epsilon\), then

$$\begin{aligned} \sup _{i \in [n], x \in {\mathcal {X}}} \Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)]\Bigr | > \frac{\epsilon }{\sqrt{n }}. \end{aligned}$$

Therefore, using (20) we obtain

$$\begin{aligned} {\mathbb {P}}^N (\sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert> \epsilon )&\le {\mathbb {P}}^N\Bigl ( \sup _{i \in [n], x \in {\mathcal {X}}} \Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] \\&\qquad \qquad - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)] \Bigr | > \frac{\epsilon }{ \sqrt{n}} \Bigr ) \\&\le 6 n K \exp \Bigl (-\frac{\alpha \epsilon ^2}{44 n (L-\ell )^2} N \Bigr ). \end{aligned}$$

This concludes the proof. \(\square\)

The main result is given below. The proof follows from the uniform exponential convergence of \(\widehat{F}^N\).

Theorem 3

(Exponential convergence of \(\widehat{{\mathcal {S}}}^N\) to \({\mathcal {S}}\)) Let Assumption 2 hold. Then, for any \(0< \epsilon < {\text {diam}}({\mathcal {X}})/2\), and \(N \in {\mathbb {N}}\), the following inequality holds

$$\begin{aligned} {\mathbb {P}}^N \bigl ({\mathbb {D}}(\widehat{{\mathcal {S}}}^N, {\mathcal {S}}) \le \epsilon \bigr ) \ge 1- \gamma (\delta (\epsilon )) e^{-\beta (\delta (\epsilon ))N}, \end{aligned}$$

where \(\gamma\) and \(\beta\) are given in (15) and \(\delta :{\mathbb {R}}_{>0} \rightarrow {\mathbb {R}}_{>0}\) is a map such that the pair \((\epsilon ,\delta (\epsilon ))\) satisfies the condition of Lemma 3.

Proof

Consider any \(\epsilon > 0\). By Lemma 3, if \(\sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert \le \delta (\epsilon )\), then \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N, {\mathcal {S}}) \le \epsilon\). From Proposition 1, for any \(\delta (\epsilon ) > 0\), there exist \(\gamma (\delta (\epsilon ))\) and \(\beta (\delta (\epsilon ))\), given in (15a) and (15b), respectively, such that

$$\begin{aligned} {\mathbb {P}}^N \Bigl ( \sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert > \delta (\epsilon ) \Bigr ) \le \gamma (\delta (\epsilon )) e^{-\beta (\delta (\epsilon )) N} \end{aligned}$$

for all N. The proof follows by using the above facts and the set of inequalities:

$$\begin{aligned} {\mathbb {P}}^N ({\mathbb {D}}(\widehat{{\mathcal {S}}}^N,&{\mathcal {S}}) \le \epsilon ) \ge {\mathbb {P}}^N \Bigl ( \sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert \le \delta (\epsilon ) \Bigr ) \\&= 1 - {\mathbb {P}}^N \Bigl ( \sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert > \delta (\epsilon ) \Bigr ). \end{aligned}$$

\(\square\)

Remark 1

(Sample guarantees for approximating \({\mathcal {S}}\) with \(\widehat{{\mathcal {S}}}^N\)) Theorem 3 implies that if one wants \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N,{\mathcal {S}}) \le \epsilon\) with confidence \(1-\zeta\), where \(\zeta \in (0,1)\) is a small positive number, then one would require at most

$$\begin{aligned} N(\zeta ,\epsilon )&= \frac{1}{\beta (\delta (\epsilon ))} \log \Bigl ( \frac{\gamma (\delta (\epsilon ))}{\zeta } \Bigr ) \\&= \frac{44 n (L-\ell )^2}{\alpha \delta (\epsilon )^2} \Bigl ( \log \Bigl ( \frac{ 6 n \lceil n/2 \rceil ! }{ 2 \pi ^{n/2} \zeta } \Bigr ) + n \log \Bigl ( \frac{12 M {\text {diam}}({\mathcal {X}})}{\epsilon \alpha } \Bigr ) \Bigr ) \end{aligned}$$

number of samples of the random variable. Due to the exponential rate, a good feature of this sample guarantee is that N depends on the accuracy \(\zeta\) logarithmically. That is, one can obtain high confidence bounds with fewer samples. However, the sample size grows poorly with other parameters, especially, \(\delta (\epsilon )\) and the dimension n. Further, note that to obtain an accurate sample guarantee, one needs to estimate \(\delta (\cdot )\) which depends on the regularity of random functions. Improving the sample complexity for specific random functions is discussed in the following section. \(\bullet\)

4 Separable uncertain functions

Here we illustrate how specific structure of the random functions yields tighter sample guarantees. Further, we discuss the tractability of solving the sample average VI problem.

Proposition 2

(Exponential convergence for separable functions) Assume that the random functions have the form

$$\begin{aligned} f_i(x,u) = \tilde{f}_i(x)g_i(u)+ \check{f}_i(x) \text { for all } i \in [n], \end{aligned}$$
(21)

where \(\tilde{f}_i\), \(g_i\), and \(\check{f}_i\) are non-negative real-valued continuous functions. Then, for any \(\epsilon > 0\) and \(N \in {\mathbb {N}}\), the following holds

$$\begin{aligned} {\mathbb {P}}^N \bigl ({\mathbb {D}}(\widehat{{\mathcal {S}}}^N, {\mathcal {S}}) \le \epsilon \bigr ) \ge 1- 6 n e^{-\beta (\delta (\epsilon ))N}, \end{aligned}$$
(22)

where \(\begin{aligned} \beta (\epsilon ):= \frac{\alpha \epsilon ^2}{11 n (f^{\textrm{max}} g^{\textrm{rge}})^2}, \end{aligned}\) with \(f^{\textrm{max}}:= \sup _{i \in [n], x \in {\mathcal {X}}} \tilde{f}_i(x)\) and \(\begin{aligned}g^{\textrm{rge}}:= \sup _{i \in [n]} \bigl ( \sup _{u \in {\mathcal {U}}} g_i(u) - \inf _{u \in {\mathcal {U}}} g_i(u) \bigr ).\end{aligned}\)

Proof

Since \({\text {CVaR}}\) is positive-homogeneous and shift-invariant [18, Chapter 6], one gets

$$\begin{aligned} \widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] = \tilde{f}_i(x) \widehat{{\text {CVaR}}}^N_\alpha [g_i(u)] + \check{f}_i(x), \end{aligned}$$
(23)

for all \(i \in [n]\). Using this fact, for any \(\epsilon > 0\), we reason as

$$\begin{aligned}&{\mathbb {P}}^N \Bigl ( \sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert> \epsilon \Bigr ) \nonumber \\&\quad \le {\mathbb {P}}^N \Bigl ( \sup _{i \in [n], x \in {\mathcal {X}}} \Bigl |\widehat{{\text {CVaR}}}^N_\alpha [f_i(x,u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [f_i(x,u)] \Bigr |> \frac{\epsilon }{ \sqrt{n}} \Bigr ) \nonumber \\&\quad \overset{(a)}{\le }\ {\mathbb {P}}^N \Bigl ( \sup _{i \in [n]} \, \, \Bigl | \widehat{{\text {CVaR}}}^N_\alpha [g_i(u)] - {\text {CVaR}}^{\mathbb {P}}_\alpha [g_i(u)]\Bigr | > \frac{\epsilon }{\sqrt{n} f^{\textrm{max}}} \Bigr ) \nonumber \\&\quad \overset{(b)}{\le }\ 6 n \exp \Bigl ( - \frac{\alpha \epsilon ^2}{11 n (f^{\textrm{max}} g^{\textrm{rge}})^2} N \Bigr ), \end{aligned}$$
(24)

where (a) follows from (23) (note that a similar equality as (23) holds for \({\text {CVaR}}^{\mathbb {P}}_\alpha\)) and the fact that \(\tilde{f}\) takes non-negative values and (b) is a result of the deviation inequality (11) applied in combination with the union bound. Using (24) and proceeding along the lines of Theorem 3 we obtain (22). \(\square\)

Recounting the way we obtained exponential convergence in the previous section, the key step was in Proposition 1 where we moved from the deviation inequality for finite number of points in \({\mathcal {X}}\) to the uniform exponential convergence of \(\widehat{F}\). Such an exercise was inevitable due to the possible interdependence of x and u in the random function. Not only that, the bound for this reason scaled poorly with many parameters, such as, the dimension n and the size of \({\mathcal {X}}\). However, when the random function takes the form (21), then one need not construct a cover for \({\mathcal {X}}\) to derive uniform exponential convergence of \(\widehat{F}\). Thus, we obtain a tighter bound (22).

Remark 2

(Sample guarantees and tractability for separable functions) Same as Remark 1, we deduce using Proposition 2 that for separable functions (21), the accuracy \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N,{\mathcal {S}}) \le \epsilon\) with confidence \(1-\zeta\) is guaranteed with

$$\begin{aligned} N(\epsilon ,\zeta ) = \frac{ 11 n (f^{\textrm{max}} g^{\textrm{rge}})^2}{\alpha \delta (\epsilon )^2} \log \Bigl ( \frac{6 n}{\zeta } \Bigr ) \end{aligned}$$

number of samples. As expected, the above sample size does not depend on the size of \({\mathcal {X}}\). Further, for the above derivation we need not assume the random function to be Lipschitz continuous.

We next comment about solving \({\text {VI}}({\mathcal {X}},\widehat{F}^N)\) for separable functions. For convenience, denote the concatenation of \(\tilde{f}_i\) and \(\check{f}_i\) for \(i \in [n]\) with functions \(\tilde{f}\) and \(\check{f}\), respectively. Further, let the vector \(\widehat{g}^N:= (\widehat{{\text {CVaR}}}^N_\alpha [g_i(u)])_{i \in [n]}\) collect the empirical \({\text {CVaR}}\) of the random function of each path. Then, the aim is to solve \({\text {VI}}({\mathcal {X}}, \tilde{f} \odot \widehat{g}^N + \check{f})\), where \(\odot\) represents component-wise product. Given samples, the approach would be to compute \(\widehat{g}^N\) and then solve the VI. Note that computing each component \(\widehat{g}^N_i\) amounts to solving a linear program:

$$\begin{aligned} \widehat{g}^N_i = \min \Biggl \{t + \frac{1}{N \alpha } \sum _{j=1}^N y_j \, \Bigg | \begin{array}{l l} y_j \ge g_p(\widehat{u}^j) - t, &{} \, \, \forall j \in [N], \\ t \in {\mathbb {R}}, y_j \ge 0, &{} \, \, \forall j \in [N] \end{array} \Biggr \}. \end{aligned}$$

The appealing part of this process is the deconstruction into two steps: computing the empirical \({\text {CVaR}}\) independent of x and solving the VI without worrying about samples. Further, one can derive conditions on the underlying functions that guarantee monotonicity of \(\tilde{f} \odot \widehat{g}^N + \check{f}\) that consequently lead to efficient algorithms that solve the VI, see e.g., methods given in [7]. \(\bullet\)

Note that if F is strictly monotone, then the solution set \({\mathcal {S}}\) is a singleton. In that case, asymptotic consistency implies that all sequences \(\{\widehat{x}^N\}\) converge to the unique solution. If in addition F satisfies a stronger assumption, that of being strongly monotone and having a separable form, then one can estimate the map \(\delta\) used in the exponential convergence bound. The next result formalizes this implication.

Lemma 5

(Estimating \(\delta\) for strongly monotone F) Assume that the random functions are of the form \(f_i(x,u) = \tilde{f}_i(x) + g_i(u)\) for all \(i \in [n]\), where \(\tilde{f}_i\) and \(g_i\) are real-valued continuous functions. Suppose the concatenated function \(\tilde{f}:=(f_i)_{i \in [n]}\) is strongly monotone over \({\mathcal {X}}\) with modulus \(\sigma > 0\). Then, \(\sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert \le \epsilon\) implies \({\mathbb {D}}(\widehat{{\mathcal {S}}}^N,{\mathcal {S}}) \le \sigma ^{-1}\epsilon\).

Proof

Note that \(F(x) = \tilde{f}(x) + \kappa\), where components of the vector \(\kappa\) are given by \(\kappa _i:= {\text {CVaR}}^{\mathbb {P}}_\alpha [g_i(u)]\) for all i. Similarly, we have \(\widehat{F}^N(x) = \tilde{f}(x) + \widehat{\kappa }^N\) where \(\widehat{\kappa }_i^N:= \widehat{{\text {CVaR}}}_\alpha ^N[g_i(u)]\). By assumption,

$$\begin{aligned} \sup _{x \in {\mathcal {X}}} \Vert \widehat{F}^N(x) - F(x) \Vert = \Vert \widehat{\kappa } - \kappa \Vert \le \epsilon . \end{aligned}$$
(25)

By the definition of the solution of a VI, for any \(x^* \in {\text {SOL}}({\mathcal {X}},F)\) and \(\widehat{x}^N \in {\text {SOL}}({\mathcal {X}},\widehat{F}^N)\), we have \((\widehat{x}^N - x^*)^\top F(x^*) \ge 0\) and \((x^* - \widehat{x}^N)^\top \widehat{F}^N(\widehat{x}^N) \ge 0\). Combining these inequalities gives us \((x^* - \widehat{x}^N)^\top (F(x^*) - \widehat{F}^N(\widehat{x}^N)) \le 0\). Using the separable forms of the functions, we get

$$\begin{aligned} (x^* - \widehat{x}^N)^\top (\tilde{f}(x^*) - \tilde{f}(\widehat{x}^N)) \le (x^* - \widehat{x}^N)^\top (\widehat{\kappa } - \kappa ). \end{aligned}$$

Using strong monotonicity condition \((x^* - \widehat{x}^N)^\top (\tilde{f}(x^*) - \tilde{f}(\widehat{x}^N)) \ge \sigma \Vert x^* - \widehat{x}^N \Vert ^2\) and the Cauchy-Schwartz inequality in the above expression, we get

$$\begin{aligned} \sigma \Vert x^* - \widehat{x}^N \Vert ^2 \le \Vert x^* - \widehat{x}^N \Vert \Vert \widehat{\kappa } - \kappa \Vert . \end{aligned}$$

The proof now follows from (25). \(\square\)

The above result can be used to further refine the convergence rate given in Proposition 2. In the following section, we apply our results to the uncertain network routing problem.

5 Application: computing CVaR-based Wardrop equilibrium

Consider a network given by a directed graph \({\mathcal {G}}:= ({\mathcal {V}}, {\mathcal {E}})\), where \({\mathcal {V}}\) and \({\mathcal {E}}\subseteq {\mathcal {V}}\times {\mathcal {V}}\) stand for the set of nodes and edges, respectively. The sets of origin and destination nodesFootnote 1 are the sets of sources and sinks in the network, and are denoted by \({\mathcal {O}}\subset {\mathcal {V}}\) and \({\mathcal {D}}\subset {\mathcal {V}}\), respectively. The set of origin–destination (OD) pairs is \({\mathcal {W}}\subseteq {\mathcal {O}}\times {\mathcal {D}}\). Let \({\mathcal {P}}_w\) denote the set of available paths for the OD pair \(w \in {\mathcal {W}}\) and let \({\mathcal {P}}= \cup _{w \in {\mathcal {W}}} {\mathcal {P}}_w\) be the set of all paths.Footnote 2 Consider the setting of nonatomic routing where numerous agents traverse the network and so each individual agent’s action has infinitesimal impact on the aggregate traffic flow. As a consequence, flow is modeled as a continuous variable. Each agent is associated with an OD pair \(w \in {\mathcal {W}}\) and is allowed to select any path \(p \in {\mathcal {P}}_w\). The route choices give rise to the aggregate traffic which is modeled as a flow vector \(h \in {{\mathbb {R}}}_{\ge 0}^{|{\mathcal {P}}|}\) with \(h_p\) being the flow on a path \(p \in {\mathcal {P}}\). The flow between each OD pair must satisfy the travel demand. We denote the demand for the OD pair \(w \in {\mathcal {W}}\) by \(d_w \in {{\mathbb {R}}}_{\ge 0}\) and the set of feasible flows by

$$\begin{aligned} {\mathcal {H}}:= \textstyle \Bigl \{h \in {{\mathbb {R}}}_{\ge 0}^{|{\mathcal {P}}|} \Big | \sum _{p \in {\mathcal {P}}_w} h_p = d_w \text { for all } w \in {\mathcal {W}}\Bigr \}. \end{aligned}$$
(26)

Agents who choose path \(p \in {\mathcal {P}}\) experience a non-negative uncertain cost denoted by \(C_p:{{\mathbb {R}}}_{\ge 0}^{|{\mathcal {P}}|} \times {\mathbb {R}}^m \rightarrow {{\mathbb {R}}}_{\ge 0}\), \((h,u) \mapsto C_p(h,u)\), where \(u \in {\mathbb {R}}^m\) models the uncertainty. Let \({\mathbb {P}}\) and \({\mathcal {U}}\subset {\mathbb {R}}^m\) be the distribution and support of u, respectively. Assume that \({\mathcal {U}}\) is compact. For the cost function, assume that for every \(p \in {\mathcal {P}}\) and \(u \in {\mathcal {U}}\), the function \(h \mapsto C_p(h,u)\) is continuous. For every \(p \in {\mathcal {P}}\) and \(h \in {\mathcal {H}}\), the function \(u \mapsto C_p(h,u)\) is measurable. In addition, for all \(p \in {\mathcal {P}}\), assume that \(C_p\) takes finite value over \({\mathcal {H}}\times {\mathcal {U}}\). The above described elements collectively represent an uncertain routing game. To assign an appropriate objective for agents, we assume that agents are risk-averse and look for paths with least CVaR. We assume that all agents have the same risk-aversion characterized by the parameter \(\alpha \in (0,1)\). The \({\text {CVaR}}\) associated to path p as a function of the flow is

$$\begin{aligned} {\text {CVaR}}^{\mathbb {P}}_\alpha [C_p(h,u)] = \inf _{t \in {\mathbb {R}}} \Bigl \{ t + \frac{1}{\alpha } {\mathbb {E}}_{\mathbb {P}}\left[ C_p(h,u)-t \right] _+ \Bigr \}. \end{aligned}$$
(27)

The notion of equilibrium then is that of Wardrop [4], given below.

Definition 2

(Conditional value-at-risk based Wardrop equilibrium (CWE)) A flow vector \(h^* \in {{\mathbb {R}}}_{\ge 0}^{|{\mathcal {P}}|}\) is called a CVaR-based Wardrop equilibrium (CWE) for the uncertain routing game if: (i) \(h^*\) satisfies the demand for all OD pairs and (ii) for any OD pair w, a path \(p \in {\mathcal {P}}_w\) has nonzero flow if the CVaR of path p is minimum among all paths in \({\mathcal {P}}_w\). Formally, \(h^*\) is a CWE if \(h^* \in {\mathcal {H}}\) and \(h_p^* > 0\) for \(p \in {\mathcal {P}}_w\) only if

$$\begin{aligned} {\text {CVaR}}^{\mathbb {P}}_\alpha [C_p(h^*,u)] \le {\text {CVaR}}^{\mathbb {P}}_\alpha [C_q(h^*,u)], \quad \forall q \in {\mathcal {P}}_w. \end{aligned}$$
(28)

We denote the set of CWE by \({\mathcal {S}}_{\texttt{CWE}}\subset {\mathcal {H}}\). \(\bullet\)

The set \({\mathcal {S}}_{\texttt{CWE}}\) is equivalent to the set of solutions to the variational inequality (VI) problem \({\text {VI}}({\mathcal {H}},G)\) (see Sect. 2 for relevant notions) [20], where

$$\begin{aligned} G_p(h) := {\text {CVaR}}^{\mathbb {P}}_\alpha [C_p(h,u)], \end{aligned}$$

for all \(p \in {\mathcal {P}}\). Note that the set \({\mathcal {H}}\) is compact and convex. Further, the map \(h \mapsto G(h)\) is continuous since \(C_p\), \(p \in {\mathcal {P}}\) are so and \({\mathcal {H}}\) and \({\mathcal {U}}\) are compact. Therefore, the set of solutions \({\text {SOL}}({\mathcal {H}},G)\) is nonempty and compact [7, Corollary 2.2.5]. Consequently, the set \({\mathcal {S}}_{\texttt{CWE}}\) is nonempty and compact. Due to this connection between the CWE and the solution of the risk-based VI, we can apply the results developed in the previous section to study the sample average approximation to the CWE. We present the following main result. Given N i.i.d samples of u, the sample average approximation of the function G is defined component-wise as \(\widehat{G}^N_p (h):= \widehat{{\text {CVaR}}}^N_\alpha [C_p(h,u)]\), where

$$\begin{aligned} \widehat{{\text {CVaR}}}^N_\alpha [C_p(h,u)] := \inf _{t \in {\mathbb {R}}} \Bigl \{t + \frac{1}{N \alpha } \sum _{i=1}^N [C_p (h,\widehat{u}^i) - t]_+ \Bigr \}. \end{aligned}$$

Denote the set of solutions of the \({\text {VI}}({\mathcal {H}},\widehat{G}^N)\) by \(\widehat{{\mathcal {S}}}_{\texttt{CWE}}^N\). We have the guarantee:

Theorem 4

(Convergence of \(\widehat{{\mathcal {S}}}_{\texttt{CWE}}^N\) to \({\mathcal {S}}_{\texttt{CWE}}\)) The following hold:

  1. (i)

    For any \(\epsilon > 0\), there exists \(\delta (\epsilon ) > 0\) such that \({\mathbb {D}}(\widehat{{\mathcal {S}}}_{\texttt{CWE}}^N, {\mathcal {S}}_{\texttt{CWE}}) \le \epsilon\) whenever \(\sup _{h \in {\mathcal {H}}} \Vert \widehat{G}^N(h) - G(h) \Vert \le \delta (\epsilon )\).

  2. (ii)

    Almost surely \({\mathbb {D}}(\widehat{{\mathcal {S}}}_{\texttt{CWE}}^N,{\mathcal {S}}_{\texttt{CWE}}) \rightarrow 0\).

Furthermore, assume there exists a constant \(M > 0\) such that \(|C_p(h,u) - C_p(h',u)| \le M \Vert h - h' \Vert\) for all \(h, h' \in {\mathcal {H}}\), \(u \in {\mathcal {U}}\), and \(p \in {\mathcal {P}}\). Let \(\ell := \min \{C_p(h,u) \; | \; h \in {\mathcal {H}}, u \in {\mathcal {U}}, p \in {\mathcal {P}}\}\) and \(L:= \max \{C_p(h,u) \; | \; h \in {\mathcal {H}}, u \in {\mathcal {U}}, p \in {\mathcal {P}}\}\). Then, for any \(\epsilon > 0\), and \(N \in {\mathbb {N}}\), the following inequality holds

$$\begin{aligned} {\mathbb {P}}^N \bigl ({\mathbb {D}}(\widehat{{\mathcal {S}}}_{\texttt{CWE}}^N, {\mathcal {S}}_{\texttt{CWE}}) \le \epsilon \bigr ) \ge 1- \gamma (\delta (\epsilon )) e^{-\beta (\delta (\epsilon ))N}, \end{aligned}$$

where \(\delta (\epsilon )\) satisfies the condition given in (i), and

$$\begin{aligned} \gamma (\epsilon ) := 6 |{\mathcal {P}}| \prod _{w \in {\mathcal {W}}} \lceil {\frac{4 M |{\mathcal {W}}| \sqrt{|{\mathcal {P}}_w|}}{\epsilon \alpha }} \rceil , \quad \beta (\epsilon ) := \frac{\alpha \epsilon ^2}{44 |{\mathcal {P}}| (L-\ell )^2} . \end{aligned}$$
(29)

The proof follows in the same way as that of Theorem 1 and 3, using the fact that the covering number of \({\mathcal {H}}\) can be bounded as given in Lemma 9 in the appendix. This sharper bound on covering number brings out a notable difference in the constants given in the exponential bound (29) as compared to those derived in Theorem 3.

6 Numerical example: Sioux falls

Here we illustrate the method of sample average approximation for the computation of the CWE through an example. We consider the Sioux Falls traffic network that consists of 24 nodes and 76 edges [23]. The (deterministic) cost associated to each edge \(e \in {\mathcal {E}}\) of the network is the travel time and is given as an affine function of the flow on the edge, \(f_e(\ell _e):= t_e \Bigl ( 1 + b_e \frac{\ell _e}{c_e} \Bigr )\), where \(\ell _e\) is the flow on edge \(e \in {\mathcal {E}}\), \(t_e\) is the free-flow travel time, and \(c_e\) is the capacity of the edge. The values for constants \(t_e\) and \(c_e\) are taken from the repository [23]. The constant \(b_e\) is fixed to be 100 for all edges. For simplicity, we consider three OD pairs \({\mathcal {W}}= \{(1,19),(13,8),(12,18)\}\) and for each pair, we choose 10 paths that have the shortest free-flow travel time. The demand is given as \(d_{(1,19)} = 300\), \(d_{(13,8)} = 600\), and \(d_{(12,18)} = 200\). The cost of each path is set to be the summation of the costs of the edges contained in it. That is, for some path \(p \in {\mathcal {P}}\), \(C_p(h) = \sum _{e \in p} f_e(\ell _e)\), where the summation is over all edges that constitute the path. Note that the flow on any edge is the sum of the flow of the paths that use that edge. That is, \(\ell _e = \sum _{\{p \in {\mathcal {P}} \; | \; e \in p\}} h_p\). We assume that the cost associated to each edge that contains either of the nodes 10, 16, or 17 is uncertain. Specifically, for such an edge e, the uncertain cost is given as \(J_e(\ell _e,u_e) = f_e(\ell _e) + u_e\), where \(u_e\) has uniform distribution over the set \([0,0.5 t_e]\). For all other edges, we set \(u_e\) to be zero with probability one. All uncertainties are assumed to be mutually independent. The uncertain cost of a path \(p \in {\mathcal {P}}\) is given as \(C_p(h,u) = \sum _{e \in p} J_e(\ell _e,u_e)\). This defines completely the routing game with uncertain costs.

6.1 Affine separable costs and LCP

In the example explained above, cost \(C_p\) is linear in flow and the uncertainty is additive. Therefore, solving the sample average VI is equivalent to solving a linear complementarity problem (LCP), which in turn is a convex optimization problem with quadratic cost and affine constraints. We next drive this optimization problem. Let \(Q \in \{0,1\}^{|{\mathcal {E}}|\times |{\mathcal {P}}|}\) be the (edge, path)-incidence matrix where \(Q_{ep}\) entry is 1 if and only if edge e belongs to path p. Then, using the linear relationship between edge and path flows, the vector containing all edge flows can be written as \(\ell = Q h\), where h consists of flows on paths. Stacking all uncertain edge costs \(J_e\) in a vector J and using its affine separable form, we obtain

$$\begin{aligned} J(h,u) = R Q h + t + u, \end{aligned}$$

where u and t are vector of uncertainties and free-flow travel time of all edges, respectively, and R is a diagonal matrix where the diagonal entry corresponding to edge e is \(b_e t_e/ c_e\). Using the above relation, the vector of costs incurred on paths takes the form

$$\begin{aligned} C(h,u) = Q^\top R Q h + Q^\top t + Q^\top u. \end{aligned}$$

Further, since \({\text {CVaR}}\) is shift-invariant, we obtain

$$\begin{aligned} G(h) = Q^\top R Q h + Q^\top t + {\text {CVaR}}^{\mathbb {P}}_\alpha [Q^\top u], \end{aligned}$$

where the last term in the above relation is the vector of element-wise \({\text {CVaR}}\)s. Similarly, the sample average approximation of G is given as

$$\begin{aligned} \widehat{G}^N(h) = Q^\top R Q h + Q^\top t + \widehat{{\text {CVaR}}}_\alpha ^N[Q^\top u]. \end{aligned}$$

Our aim is find the solution of \({\text {VI}}({\mathcal {H}},\widehat{G})\). To represent the set of feasible flows in a compact form, denote \(B \in \{0,1\}^{|{\mathcal {W}}| \times |{\mathcal {P}}|}\) as the (OD pair, path)-incidence matrix where \(B_{wp}\) is 1 if and only if \(p \in {\mathcal {P}}_w\). Denoting the vector of demands as \(d = (d_w)_{w \in {\mathcal {W}}}\), the set of feasible flows are vectors \(h \ge 0\) satisfying \(B h = d\). Using this notation and the explanation given in [26, Section 2.2], finding the solution of \({\text {VI}}({\mathcal {H}},\widehat{G}^N)\) is equivalent to solving the LCP given as: find \(x = (h;v) \in {\mathbb {R}}^{|{\mathcal {P}}|+|{\mathcal {W}}|}\) such that

$$\begin{aligned} x \ge 0, \quad \widehat{M}^N(x) \ge 0, \quad \text {and } \, \, x^\top \widehat{M}^N(x) = 0, \end{aligned}$$

where

$$\begin{aligned} \widehat{M}^N(x) := \begin{bmatrix} Q^\top R Q &{} -B^\top \\ B &{} 0 \end{bmatrix} x + \begin{bmatrix} Q^\top t + \widehat{{\text {CVaR}}}_\alpha ^N[Q^\top u] \\ -d \end{bmatrix}. \end{aligned}$$

Solving LCP is equivalent to finding the optimizer of the following problem

$$\begin{aligned} \begin{aligned} {\text {minimize}}&\quad x^\top \widehat{M}^N (x) \\ \text {subject to}&\quad \widehat{M}^N (x) \ge 0, \\&\quad x \ge 0. \end{aligned} \end{aligned}$$
(30)

Since \(\widehat{M}^N\) is affine in x, the above problem is quadratic and one can show using the properties of matrices Q, R, and B, that the problem is convex. To summarize, the \({\text {VI}}({\mathcal {H}},G)\) can be approximated by \({\text {VI}}({\mathcal {H}},\widehat{G}^N)\) and the latter can be solved by first finding \(\widehat{{\text {CVaR}}}_\alpha ^N[Q^\top u]\) and then solving (30).

6.2 Computing CWE

For the above explained Sioux falls example, we set \(\alpha = 0.05\). This defines uniquely the CWE \(h^*\). For the sample average approximation, we consider three scenarios with different number of samples, \(N \in \{50, 500, 5000\}\). We consider 500 runs for each of the scenarios. Each run collects N number of i.i.d samples of the uncertainty u, constructs the empirical \(\widehat{{\text {CVaR}}}_\alpha ^N[Q^\top u]\), and computes the approximation of the CWE \(\widehat{h}^N\) by solving (30). Figure 1 illustrates our results. It plots the cumulative distribution function of the random variable \(\Vert \widehat{h}^N - h^* \Vert\) as estimated using the 500 runs. Note that the complete distribution moves to the left with increasing number of samples. This confirms our theoretical findings that as N increases, the approximate solution \(\widehat{h}^N\) approaches the CWE almost surely.

Fig. 1
figure 1

Plot illustrating the the convergence of the approximate solution \(\widehat{h}^N\) to the CVaR-based Wardrop equilibrium \(h^*\) for Sioux falls network, see Sect. 6 for details. Each line is the cumulative distribution of \(\Vert \widehat{h}^N-h^* \Vert\) with a different sample size used for the approximation. The distribution is obtained using 500 runs

7 Conclusions

We considered a risk-based variational inequality and studied the sample average approximation method for solving it. In particular, we derived asymptotic consistency and exponential convergence under suitable assumptions. For the case of separable random function, we derived sharper convergence bounds. Lastly we demonstrated the application of our result in the case of uncertain network routing problem where one can determine the CVaR-based Wardrop equilibrium using the sample average scheme. Future work will involve exploring tractability of the resulting sample average VI under various conditions. We also wish to investigate other efficient sampling techniques, especially when the dimension of the problem is large. Finally, we plan to investigate decentralized learning methods for finding the solution of risk-averse VIs.