1 Introduction and Motivation

Approximation by the geometric distribution and its generalisations have continued to receive particular attention in recent years, especially in conjunction with Stein’s method, a powerful technique for probability approximation that has been used successfully for a variety of theoretical and applied topics. Stein’s method was first introduced in Stein (1972); see also Ross (2011) for a more recent introduction to the area. Stein’s method for geometric approximation was developed by Barbour and Grübel (1995) and Peköz (1996). We refer the reader to Peköz et al. (2013) and references therein for more recent developments.

Our goal in this paper is twofold. Our first aim is to consider how we can improve upon geometric approximation by incorporating a convolution with another random variable. Though a geometric approximation can yield effective error bounds in some applications, extra precision can be gained, and the range of useful applications increased, by using an approximating distribution that incorporates some additional parameters. Examples of this approach include negative binomial approximation (see Ross (2013); Yu et al. (2023) and references therein) and approximation by a geometric sum (Daly, 2010, 2016, 2019). Such approaches naturally lead to additional technical complexity in the evaluation of the approximating distribution or the corresponding error bounds. Our approach of convoluting the approximating geometric distribution with an independent translation is an alternative to these ideas. The mass function of this convolution must be kept simple to use, otherwise the benefits are lost. We are motivated here by the work of Röllin (2007) on translated Poisson approximation, where better error bounds are obtained than using a traditional Poisson approximation. Our translation will allow us to introduce additional parameters in the approximating distribution which can then be chosen to gain precision, at the cost of some additional complexity in the approximation. Unlike the usual approach to translated Poisson approximation where the translation is by a constant, we will here allow translation by an independent random variable.

Our second goal is to consider geometric approximation of random sums (i.e., of sums of a random number of random variables) and related applications to risk processes. The case in which the number of terms in the sum is itself geometrically distributed has previously been treated by Peköz et al. (2013); here we consider the more general case. Our results are applied to give a simple approximation, with explicit error bounds, of the infinite-horizon ruin probability in the compound binomial risk process (Gerber, 1988). As further applications of our general results, we will also give explicit error bounds in the approximation of the number of points in a Poisson process over a random time horizon, and of Markov chain hitting times.

In these applications, and in particular in the approximation of random sums, there are a number of other approximating distributions available, including compound Poisson approximations (see, for example, Vellaisamy and Chaudhuri (1996)) which offer a flexible approximation framework. A geometric distribution is a simple special case of a compound Poisson distribution. This simplicity leads to more simply stated and easily calculated error bounds compared to a more general compound Poisson setting. As we shall see in the examples and applications of Sections 2 and 3 below, there are distributions of interest which are close to geometric, so this approximation is an appropriate and useful one to consider here.

Recall that \(Y\sim \text {Geom}(p)\) has a geometric distribution if \(\mathbb {P}(Y=k)=p(1-p)^k\) for \(k\in \mathbb {Z}^+=\{0,1,\ldots \}\). In evaluating the approximation error in the work that follows, we will employ the total variation distance between two non-negative, integer-valued random variables W and Y, which can be defined in any of the following equivalent ways:

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\mathcal {L}(Y)) = \sup _{A\subseteq \mathbb {Z}^+}|\mathbb {P}(W\in A) - \mathbb {P}(Y\in A)|\\ = \frac{1}{2}\sum _{j\in \mathbb {Z}^+}|\mathbb {P}(W=j)-\mathbb {P}(Y=j)| = \inf \mathbb {P}(W\not =Y)\,, \end{aligned}$$
(1)

where \(\mathcal {L}(W)\) denotes the law of W, and the infimum is taken over all couplings of W and Y with the required marginals.

The remainder of this paper is organised as follows. In Section 2 we introduce Stein’s method for geometric approximation, and use these techniques to derive results in the setting where we approximate by the convolution of a geometric distribution and an independent translation. This is illustrated by applications to Poisson processes and Markov chain hitting times. We then consider geometric approximation of random sums in Section 3, and apply these results to derive simple approximations for the ruin probability in the compound binomial risk process.

2 Translated Geometric Approximation

Following Stein’s method for geometric approximation (Barbour and Grübel, 1995; Peköz, 1996), for each \(A\subseteq \mathbb {Z}^+\) we let \(g_A:\mathbb {Z}^+\rightarrow \mathbb {R}\) be such that \(g_A(k)=0\) for \(k\le 0\) and

$$\begin{aligned} I(k\in A)-\mathbb {P}(Y\in A) = (1-p)g_A(k+1) - g_A(k)\,, \end{aligned}$$
(2)

for some \(p\in (0,1]\), where \(Y\sim \text{ Geom }(p)\). Recall that Stein’s method proceeds by replacing k in Eq. 2 by the random variable W to be approximated, taking expectations, and bounding the expression thus obtained from the right-hand side of Eq. 2. Essential ingredients in the final part of this procedure include bounds on the behaviour of the functions \(g_A\). For example,

$$\begin{aligned} \sup _{A\subseteq \mathbb {Z}^+}\sup _{j,k\in \mathbb {Z}}|g_A(j)-g_A(j+k)|\le \min \left\{ |k|,\frac{1}{p}\right\} \,, \end{aligned}$$
(3)

by Lemma 4.2 of Daly (2016) and Lemma 1 of Peköz (1996).

In the work that follows, we let W denote the non-negative, integer-valued random variable we wish to approximate. We consider the approximation of W by \(Y+T\), where \(Y\sim \text{ Geom }(p)\) for some p, and T is our random translation independent of W and Y. Our results will employ a coupling construction based on that used by Peköz (1996) and Daly (2010). To that end, we define the random variable V to be such that

$$\begin{aligned} \mathcal {L}(V+T+1) = \mathcal {L}(W|W>T)\,, \end{aligned}$$
(4)

emphasising that V will depend on both T and W.

With this setup, we may obtain the following

Theorem 1

Let W and T be independent integer-valued random variables with \(W\ge 0\). Suppose V is as defined by Eq. 4 and let \(Y\sim \text{ Geom }(p)\) be independent of T, where \(p=\mathbb {P}(W\le T)\). Then

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\mathcal {L}(Y+T))\le & \mathbb {P}(W<T) + (1-p)\mathbb {E}|W-T-V|\,,\end{aligned}$$
(5)
$$\begin{aligned} d_{TV}(\mathcal {L}(W),\mathcal {L}(Y+T))\le & \mathbb {P}(W<T) + \frac{1-p}{p}d_{TV}(\mathcal {L}(W-T),\mathcal {L}(V))\,. \end{aligned}$$
(6)

Proof

Since T is independent of W and Y,

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\mathcal {L}(Y+T))=d_{TV}(\mathcal {L}(W-T),\mathcal {L}(Y))\,. \end{aligned}$$

Motivated by the argument of (3.11) of Röllin (2007), we now write

$$\begin{aligned} d_{TV}(\mathcal {L}(W-T),\mathcal {L}(Y))&=\sup _{A\subseteq \mathbb {Z}}\left| \mathbb {P}(W-T\in A)-\mathbb {P}(Y\in A)\right| \\&\le \mathbb {P}(W<T)\\&\hspace{20pt}+ \sup _{A\subseteq \mathbb {Z}^+}\left| \mathbb {E}[(1-p)g_A(W-T+1)-g_A(W-T)]\right| \,, \end{aligned}$$

where we recall that \(g_A(k)=0\) for \(k\le 0\) and use Eq. 2 with \(k=W-T\) for the final expression. Now, conditioning on the value of W,

$$\begin{aligned} \mathbb {E}[g_A(W-T)]&= (1-p)\mathbb {E}[g_A(W-T)|W>T] +p\mathbb {E}[g_A(W-T)|W\le T]\\&= (1-p)\mathbb {E}[g_A(V+1)]\,, \end{aligned}$$

using the definitions of \(g_A\) and V. Our bound Eq. 5 then follows from Eq. 3 when writing

$$\begin{aligned} |\mathbb {E}[g_A(W-T+1)-g_A(V+1)]|\le \mathbb {E}|W-T-V|\,. \end{aligned}$$

Similarly, for Eq. 6 we let \(I(\cdot )\) denote an indicator function and write

$$\begin{aligned} |\mathbb {E}[g_A(W\!-\!T\!+\!1)-g_A(V+1)]|&=|\mathbb {E}[\left\{ g_A(W-T+1)-g_A(V+1)\right\} I(W-T\not =V)]|\\&\le \frac{1}{p}d_{TV}(\mathcal {L}(W-T),\mathcal {L}(V))\,, \end{aligned}$$

using the definition Eq. 1 of total variation distance and the bound Eq. 3. \(\square \)

Remark 1

Depending on the situation, it is sometimes more natural to define a geometric random variable on \(\{1,2\ldots \}\) rather than \(\{0,1,\ldots \}\). In our framework it is straightforward to switch between the two definitions by simply altering the definition of T.

In the remainder of this section we consider two applications of Theorem 1, to approximation of the number of points observed in a Poisson process over a random time horizon and of Markov chain hitting times. For later use, we recall that a random variable U is said to be stochastically larger than a random variable V if \(\mathbb {P}(U>t)\ge \mathbb {P}(V>t)\) for all t, and that this is equivalent to the existence of a probability space on which we may construct the pair \((U^\prime ,V^\prime )\) such that \(\mathcal {L}(U^\prime )=\mathcal {L}(U)\), \(\mathcal {L}(V^\prime )=\mathcal {L}(V)\), and \(U^\prime \ge V^\prime \) almost surely; see Theorem 1.A.1 of Shaked and Shanthikumar (2007). We denote this by \(U\ge _{st}V\).

2.1 Application: Poisson Processes with Random Time Horizon

Let \(\{N(t):t\ge 0\}\) be a homogeneous Poisson process of rate \(\lambda \), and \(\tau \) be a non-negative random variable independent of this process. We consider the approximation of \(W=N(\tau )\) by \(Y+T\), where Y has a geometric distribution and T is a non-negative, integer-valued random variable independent of W and Y. In this we are motivated by Daly (2016), who previously considered geometric-type approximations in this setting, and by applications considered by Asmussen et al. (2002) and others in which Poisson processes with random time horizons play a role. We expect a geometric-type approximation to be relevant in cases of practical importance, since if \(\tau \) has an exponential distribution then \(N(\tau )\) is precisely geometrically distributed.

Let \(\xi _0=0\) and, for \(i\ge 1\), let \(\xi _i\) be the time of the ith event in our Poisson process, so that \(N(t)=\max \{i:\xi _i\le t\}\). We define \(V=N(\sigma _T)\), where

$$\begin{aligned} \mathcal {L}(\sigma _T)=\mathcal {L}(\tau -\xi _{T+1}|\tau >\xi _{T+1}) \end{aligned}$$

and \(\sigma _T\) depends on the random variables \(\{N(t):t\ge 0\}\), through \(\xi _{T+1}\), and \(\tau \). This V satisfies Eq. 4. To see this, note that conditioning on \(N(\tau )>T\) is equivalent to conditioning on the \((T+1)\)th event occurring before time \(\tau \), i.e., that \(\xi _{T+1}\le \tau \). Conditioning on this, the number of events we have seen up to time \(\tau \) is then \(T+1+V\).

We thus obtain from Eq. 5 that

$$\begin{aligned} d_{TV}(\mathcal {L}(N(\tau )),\mathcal {L}(Y+T))\le \mathbb {P}(\tau <\xi _T)+(1-p)\mathbb {E}|N(\tau )-T-N(\sigma _T)|\,, \end{aligned}$$
(7)

where \(Y\sim \text{ Geom }(p)\) and \(p=\mathbb {P}(\tau <\xi _{T+1})\).

Consider, for simplicity, the case where \(T=0\) almost surely, and write \(\sigma =\sigma _0\). Note that

$$\begin{aligned} \mathbb {P}(\sigma>u)=\frac{\mathbb {P}(\tau>u+\xi _1)}{\mathbb {P}(\tau >\xi _1)}\,, \end{aligned}$$
(8)

and

$$\begin{aligned} \mathbb {E}[\sigma ]&=\int _0^\infty \mathbb {P}(\sigma>u)\,\text {d}u=\frac{1}{1-p}\mathbb {E}\int _0^\infty \mathbb {P}(\tau>\xi _1+u|\xi _1)\,\text {d}u\\&=\frac{\mathbb {E[\tau ]}}{1-p}-\frac{1}{1-p}\int _0^\infty \int _0^{x}\mathbb {P}(\tau >u)\lambda e^{-\lambda x}\,\text {d}u\,\text {d}x\\&=\frac{\mathbb {E[\tau ]}}{1-p}-\frac{1}{\lambda }\,, \end{aligned}$$

where \(p=\mathbb {P}(\tau <\xi _1)=\mathbb {E}[e^{-\lambda \tau }]\).

In the case where \(\sigma \ge _{st}\tau \) the bound Eq. 7 simplifies considerably, since we then have

$$\begin{aligned} \mathbb {E}|N(\tau )-N(\sigma )|=\mathbb {E}N(\sigma -\tau )=\lambda \mathbb {E}[\sigma -\tau ] =\frac{\lambda p\mathbb {E}[\tau ]-(1-p)}{1-p}=\frac{p(1+\lambda \mathbb {E}[\tau ])-1}{1-p}\,. \end{aligned}$$
(9)

Similarly, if \(\tau \ge _{st}\sigma \) we have

$$\begin{aligned} \mathbb {E}|N(\tau )-N(\sigma )|=\frac{1-p(1+\lambda \mathbb {E}[\tau ])}{1-p}\,. \end{aligned}$$
(10)

Observe that if \(\tau \) satisfies the new worse than used (NWU) property that \(\mathbb {P}(\tau>u)\mathbb {P}(\tau>v)\le \mathbb {P}(\tau >u+v)\) for all \(u,v\ge 0\), the independence of \(\tau \) and \(\xi _1\) combined with Eq. 8 implies that \(\sigma \ge _{st}\tau \). Similarly, if \(\tau \) satisfies the new better than used (NBU) property that \(\mathbb {P}(\tau>u)\mathbb {P}(\tau>v)\ge \mathbb {P}(\tau >u+v)\) for all \(u,v\ge 0\), we have that \(\tau \ge _{st}\sigma \). By combining the bound Eq. 7 with the choice \(p=\mathbb {E}[e^{-\lambda \tau }]\) and the expectations calculated in Eqs. 9 and 10 we thus have the following

Theorem 2

Let \(\{N(t):t\ge 0\}\) be a homogeneous Poisson process of rate \(\lambda \). Let \(\tau \) be a non-negative random variable independent of this process.

  1. (a)

    If \(\tau \) is NBU,

    $$\begin{aligned} d_{TV}(\mathcal {L}(N(\tau )),\text {Geom}(\mathbb {E}[e^{-\lambda \tau }]))\le 1-\mathbb {E}[e^{-\lambda \tau }](1+\lambda \mathbb {E}[\tau ])\,. \end{aligned}$$
  2. (b)

    If \(\tau \) is NWU,

    $$\begin{aligned} d_{TV}(\mathcal {L}(N(\tau )),\text {Geom}(\mathbb {E}[e^{-\lambda \tau }]))\le \mathbb {E}[e^{-\lambda \tau }](1+\lambda \mathbb {E}[\tau ])-1\,. \end{aligned}$$

If \(\tau \) is exponentially distributed (which is trivially both NBU and NWU) then \(\tau \) and \(\sigma =\sigma _0\) as above have the same distribution, and the upper bounds of Theorem 2 are each zero, reflecting the fact that \(N(\tau )\) has a geometric distribution.

The upper bound of Theorem 2(a) was previously established in Section 2.2 of Daly (2016), but only under the stronger condition that \(\tau \) has increasing failure rate.

We conclude this section with two examples illustrating Theorem 2.

Example 1

Let \(\tau \) have a gamma distribution, and note that \(N(\tau )\) is thus negative binomial. For simplicity of presentation we normalise \(\tau \) to have unit mean, with density function \(x\mapsto \frac{\beta ^\beta }{\Gamma (\beta )}x^{\beta -1}e^{-\beta x}\) for \(x>0\), where \(\beta >0\) is a parameter and \(\Gamma (\cdot )\) is the usual gamma function. We have that \(\mathbb {E}[e^{-\lambda \tau }]=\left( 1+\frac{\lambda }{\beta }\right) ^{-\beta }\) in this case. Note that \(\tau \) has decreasing failure rate when \(\beta <1\) (and is thus NWU), constant failure rate when \(\beta =1\) (the exponential case), and increasing failure rate when \(\beta >1\) (and is thus NBU). Applying both parts of Theorem 2 we have

$$\begin{aligned} d_{TV}\left( \mathcal {L}\left( N(\tau )\right) ,\text{ Geom }\left( \left( 1+\frac{\lambda }{\beta }\right) ^{-\beta }\right) \right) \le \left| 1-(1+\lambda )\left( 1+\frac{\lambda }{\beta }\right) ^{-\beta }\right| \,. \end{aligned}$$
(11)

The upper bound Eq. 11 is illustrated in Fig. 1 for several values of \(\lambda \). As expected, this upper bound is zero when \(\beta =1\), and is small when \(\beta \) is close to 1.

Fig. 1
figure 1

The upper bound Eq. 11 as a function of \(\beta \) for \(\lambda =0.5,1,2\)

Example 2

Let \(\tau \) be uniformly distributed on the interval [ab] for some \(0\le a<b\), so that \(\tau \) has increasing failure rate (and is thus NBU) and satisfies \(\mathbb {E}[e^{-\lambda \tau }]=\frac{e^{-\lambda a}-e^{-\lambda b}}{\lambda (b-a)}\) for all \(\lambda >0\). Hence, by Theorem 2(a),

$$\begin{aligned} d_{TV}\left( \mathcal {L}\left( N(\tau )\right) ,\text{ Geom }\left( \frac{e^{-\lambda a}-e^{-\lambda b}}{\lambda (b-a)}\right) \right) \le 1-\frac{e^{-\lambda a}-e^{-\lambda b}}{b-a}\left( \lambda +\frac{a+b}{2}\right) \,. \end{aligned}$$

If \(\lambda a\) and \(\lambda b\) are small, writing \(e^{-\lambda a}\approx 1-\lambda a\) and \(e^{-\lambda b}\approx 1-\lambda b\) we see that this upper bound is small when \(a+b\approx \frac{2(1-\lambda ^2)}{\lambda }\).

2.2 Application: Markov Chain Hitting Times

We now let W be the hitting time of an ergodic Markov chain to a set of states A. That is, we let \((\xi _0,\xi _1,\ldots )\) be a Markov chain and \(W=\min \{i\ge 0:\xi _i\in A\}\) for some fixed subset A of the state space. We will write \(P_{ij}^{(n)}=\mathbb {P}(\xi _n=j|\xi _0=i)\) for the n-step transition probabilities and \(\pi \) for the stationary distribution of our Markov chain, with \(\pi (A)\) denoting the measure of a set A according to \(\pi \); abusing notation, we will write \(\pi _i\) for \(\pi (\{i\})\).

For some distribution F on this state space, we let \(\xi _0\sim F\), and define \(T=T(F)\) to be a stationary time for this initial distribution, independent of W. That is, T is such that \(\mathcal {L}(\xi _T|\xi _0\sim F)\) is \(\pi \). In line with Theorem 1, we consider the approximation of W by \(Y+T\), where \(Y\sim \text{ Geom }(p)\) is a geometric random variable with parameter \(p=\mathbb {P}(W\le T)\) which is independent of T. We will use Eq. 6 to give a bound. To that end, we construct the random variables \(W-T\) and V as follows: Consider the Markov chain \((X,\xi _0^\prime ,\xi _1^\prime ,\ldots )\), where the initial state X is sampled according to \(\pi \). Since T is a stationary time, we may construct \(W-T\) as the number of time steps required to hit A from an initial stationary distribution. That is, we write \(W-T=\min \{i\ge 0:\xi _i^\prime \in A\}\). Now consider another Markov chain \((Y,\xi _0^{\prime \prime },\xi _1^{\prime \prime },\ldots )\) whose initial state Y is sampled according to \(\pi \) restricted to \(A^\text {c}\), the complement of A. We may then write \(V=\min \{i\ge 0:\xi _i^{\prime \prime }\in A\}\) since, in conditioning on the event that \(W>T\), we consider the number of steps from such an initial state until we hit A in addition to the \(T+1\) steps already taken to reach the state \(\xi _0^{\prime \prime }\).

With these random variables, the maximal coupling argument of Theorem 2 of Peköz (1996) gives

$$\begin{aligned} d_{TV}(\mathcal {L}(W-T),\mathcal {L}(V))\le \frac{1}{\pi (A^{\text {c}})}\sum _{i,j\in A}\pi _i\sum _{n=1}^\infty |P_{ij}^{(n)}-\pi _j|\,. \end{aligned}$$

The following bound is then immediate from Eq. 6.

Theorem 3

Let \((\xi _0,\xi _1,\ldots )\) be the Markov chain above, \(W=\min \{i\ge 0:\xi _i\in A\}\), and T be a stationary time as above. With \(Y\sim \text{ Geom }(p)\) for \(p=\mathbb {P}(W\le T)\),

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\mathcal {L}(Y+T))\le \mathbb {P}(W<T)+\frac{1-p}{p\pi (A^{\text {c}})}\sum _{i,j\in A}\pi _i\sum _{n=1}^\infty |P_{ij}^{(n)}-\pi _j|\,. \end{aligned}$$

This result generalises Theorem 2 of Peköz (1996), where the hitting time from an initial stationary distribution only is considered, in which case our T is equal to zero almost surely and \(p=\pi (A)\).

3 Geometric Approximation of Random Sums and the Compound Binomial Risk Process

Throughout this section we let \(W=\sum _{i=1}^NX_i\), where \(X_1,X_2,\ldots \) are independent and identically distributed, positive integer-valued random variables with \(\mathbb {E}[X_1]=\mu \ge 1\). We let N be a non-negative, integer-valued random variable independent of the \(X_i\). For simplicity we restrict ourselves to the case where the \(X_i\) are identically distributed, but this condition can easily be relaxed.

We will consider the approximation of W by a geometric random variable Y in Section 3.1, and then apply our results in Section 3.2 to derive simple and explicit bounds for the infinite-horizon ruin probability in the compound binomial risk process.

In this section we focus on geometric approximation (i.e., \(T=0\) almost surely in the setting of Theorem 1), since construction of the random variable V defined in Eq. 4 seems to be natural here only when \(T=0\). This case is also sufficient to give simple and explicit bounds on our ruin probability of interest, which is the principal application of this section.

3.1 Geometric Approximation of Random Sums

With W the random sum defined above, we now derive geometric approximation bounds for W. For later use, we define \(N^{(0)}\) such that \(\mathcal {L}(N^{(0)})=\mathcal {L}(N|N>0)\).

We consider the approximation of W by \(Y\sim \text{ Geom }(p)\). In line with the discussion in Section 2 in the case where \(T=0\) almost surely, we choose the parameter p to be

$$\begin{aligned} p = \mathbb {P}(W=0) = \mathbb {P}(N=0)\,, \end{aligned}$$
(12)

and V to be such that

$$\begin{aligned} \mathcal {L}(V+1) = \mathcal {L}(W|N>0) = \mathcal {L}\left( \sum _{j=1}^{N^{(0)}}X_j\right) \,, \end{aligned}$$

so that we may choose \(V=\sum _{j=1}^{N^{(0)}}X_j-1\). With this choice we have that

$$\begin{aligned} \mathbb {E}|W-V|=\mathbb {E}\left| \sum _{i=1}^NX_i-\sum _{j=1}^{N^{(0)}}X_j+1\right| =\mathbb {E}\left| \sum _{i=N+1}^{N^{(0)}}X_i-1\right| \,, \end{aligned}$$

since \(N^{(0)}\ge _{st}N\), and so we may construct these on the same probability space in such a way that \(N^{(0)}\ge N\) almost surely.

Furthermore, if \(N^{(0)}\ge _{st}N+1\) then we may construct our random variables such that \(\sum _{i=N+1}^{N^{(0)}}X_i\) is almost surely greater than 1, since each \(X_i\) is at least 1 almost surely. In this case we thus have that

$$\begin{aligned} \mathbb {E}|W-V|=\mu \mathbb {E}[N^{(0)}-N]-1=\frac{\mu \mathbb {P}(N=0)\mathbb {E}[N]}{\mathbb {P}(N>0)}-1=\frac{\mu p\mathbb {E}[N]}{1-p}-1\,. \end{aligned}$$

From the definition, we note that \(N^{(0)}\ge _{st}N+1\) if

$$\begin{aligned} \frac{\mathbb {P}(N=j)}{\mathbb {P}(N>j)}\le \frac{p}{1-p}\,, \end{aligned}$$
(13)

for all \(j\in \mathbb {Z}^+\). That is, \(N^{(0)}\ge _{st}N+1\) if N is larger than Y in the hazard rate ordering; see Section 1.B of Shaked and Shanthikumar (2007).

Combining all these ingredients, we may apply Eq. 5 with \(T=0\) to give

Theorem 4

Let \(W=\sum _{i=1}^NX_i\) be as above and p be given by Eq. 12. Then

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\text{ Geom }(p)) \le (1-p)\mathbb {E}\left| \sum _{i=N+1}^{N^{(0)}}X_i-1\right| \,. \end{aligned}$$

Moreover, if Eq. 13 holds for all \(j\in \mathbb {Z}^+\), then

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\text{ Geom }(p)) \le p(\mu \mathbb {E}[N]+1)-1\,. \end{aligned}$$
(14)

One case which has received particular attention in the literature is that in which \(N\sim \text{ Geom }(r)\) has a geometric distribution, so that W is a geometric sum. Note that in this case \(\mathcal {L}(N^{(0)})=\mathcal {L}(N+1)\), so that the bound Eq. 14 applies. Geometric approximation for geometric sums has also been considered by Peköz et al. (2013), who show in their Theorem 3.2 that

$$\begin{aligned} d_{TV}(\mathcal {L}(W),\text{ Geom }(p^\prime ))\le \frac{1}{2}\min \left\{ 1,r\left[ 1+\sqrt{\frac{-2}{u\log (1-r)}}\right] \right\} \left( \frac{\mathbb {E}[X_1^2]}{\mu }-1\right) \,, \end{aligned}$$
(15)

where \(p^\prime =\frac{r}{r+\mu (1-r)}\) and u is such that \(d_{TV}(\mathcal {L}(X_1),\mathcal {L}(X_1+1))\le 1-u\). Note that this parameter \(p^\prime \) is chosen such that the mean of the approximating geometric distribution matches that of W; on the other hand our choice is made such that each have the same probability of taking the value zero. In the setting where we have no information on the smoothness of the \(X_i\) in the form of a bound on u, our geometric approximation bound improves upon this for sufficiently large \(\mathbb {E}[X_1^2]\), and also applies in the case where N is not geometric. We note also that both geometric approximation results have the desirable property that the upper bound is zero whenever \(\mu =1\). Finally, we refer to Peköz and Röllin (2011) for related results on the exponential approximation of random sums in the case where the \(X_i\) are no longer assumed to be integer-valued.

3.2 Application: Compound Binomial Risk Process

We now consider how our Theorem 4 may be applied to the compound binomial risk process in discrete time (Gerber, 1988), defined as follows: Let \(m\in \mathbb {Z}^+\) and let \(\eta ,\eta _1,\eta _2,\ldots \) be independent and identically distributed random variables supported on \(\mathbb {Z}^+\). The risk process \(\{U_t:t=0,1,\ldots \}\) is defined by

$$\begin{aligned} U_t=m+t-\sum _{i=1}^t\eta _i\,, \end{aligned}$$

where we write \(q=\mathbb {E}[\eta ]\) and make the assumption that \(q<1\) to ensure a positive drift of this process. We will also assume the existence of second and third moments of \(\eta \) as required. Letting \(\tau =\min \{t\ge 1:U_t\le 0\}\), our interest is in the infinite-horizon ruin probability \(\psi (m)\) defined by

$$\begin{aligned} \psi (m)=\mathbb {P}(\tau <\infty |U_0=m). \end{aligned}$$
(16)

This ruin probability is given by a discrete Pollaczek–Khinchine formula: We have \(\psi (0)=q\) and

$$\begin{aligned} \psi (m)=\mathbb {P}\left( \sum _{i=1}^MY_i\ge m\right) \end{aligned}$$
(17)

for \(m=1,2,\ldots \), where \(M\sim \text{ Geom }(1-q)\) and \(Y,Y_1,Y_2,\ldots \) are independent and identically distributed with \(\mathbb {P}(Y=j)=q^{-1}\mathbb {P}(\eta >j)\) for \(j=0,1,\ldots \); see Proposition 2.6 of Santana and Rincón (2020). We note that Santana and Rincón (2020) gives some numerical approximations for \(\psi (m)\), but without any explicit error bounds. Our goal in this section is to give explicit approximations for \(\psi (m)\), complete with error bounds, that are significantly easier to work with than the tail probabilities in Eq. 17.

The geometric sum in Eq. 17 does not quite fit into the framework of Theorem 4, since here the summands are allowed to take the value zero with positive probability. We therefore rewrite this by letting \(X,X_1,X_2\ldots \) be independent and identically distributed with

$$\begin{aligned} \mathbb {P}(X=j)=\mathbb {P}(Y=j|Y>0)=\frac{\mathbb {P}(\eta>j)}{q-\mathbb {P}(\eta >0)} \end{aligned}$$

for \(j=1,2,\ldots \), and letting \(N\sim \text{ Geom }(r)\) with

$$\begin{aligned} r=\frac{1-q}{\mathbb {P}(\eta =0)}\,. \end{aligned}$$
(18)

The following lemma is well known (it follows, for example, from a stronger observation on page 165 of Michel (1987)), but we include its short proof to keep our exposition self-contained.

Lemma 1

Letting \(W=\sum _{i=1}^NX_i\), with N and the \(X_i\) as above, we have \(\mathcal {L}(W)=\mathcal {L}\left( \sum _{i=1}^MY_i\right) \).

Proof

Let \(K_1,K_2,\ldots \) be independent and identically distributed, with \(K_j\sim \text{ Bin }(j,\mathbb {P}(Y>0))\) having a binomial distribution for each \(j=1,2,\ldots \). For \(m\ge 0\) we write

$$\begin{aligned} \mathbb {P}\left( \sum _{i=1}^MY_i>m\right) =(1-q)\sum _{j=1}^\infty q^j\mathbb {P}(Y_1+\cdots +Y_j>m)\,. \end{aligned}$$

Of the j summands within the latter probability, we have \(K_j\) that are non-zero, with the remaining \(j-K_j\) taking the value zero. Conditioning on these \(K_j\) and noting that those \(Y_i\) that are conditioned to be non-zero have the same distribution as X, we obtain the following, where we write \(a=\mathbb {P}(Y=0)\):

$$\begin{aligned} \mathbb {P}\left( \sum _{i=1}^MY_i>m\right)&=(1-q)\sum _{j=1}^\infty q^j\sum _{k=0}^j\left( {\begin{array}{c}j\\ k\end{array}}\right) (1-a)^ka^{j-k}\mathbb {P}(X_1+\cdots +X_k>m)\\&=(1-q)\sum _{k=0}^\infty \sum _{j=k}^\infty \left( {\begin{array}{c}j\\ k\end{array}}\right) (aq)^j\left( \frac{1-a}{a}\right) ^k\mathbb {P}(X_1+\cdots +X_k>m)\\&=\frac{(1-q)}{1-aq}\sum _{k=0}^\infty \left( \frac{(1-a)q}{1-aq}\right) ^k\mathbb {P}(X_1+\cdots +X_k>m)\\&=r\sum _{k=1}^\infty (1-r)^k\mathbb {P}(X_1+\cdots +X_k>m)=\mathbb {P}(W>m)\,, \end{aligned}$$

as required. \(\square \)

In order to state a bound on \(\psi (m)\) we will need some further properties of the random variable X, which we collect in the following lemma.

Lemma 2

With X as above,

$$\begin{aligned} \mathbb {E}[X]&=\frac{\mathbb {E}[\eta (\eta -1)]}{2(q-\mathbb {P}(\eta>0))}\,,\qquad \mathbb {E}[X^2]=\frac{\mathbb {E}[\eta (\eta -1)(2\eta -1)]}{6(q-\mathbb {P}(\eta >0))}\,, \end{aligned}$$

and

$$\begin{aligned} d_{TV}(\mathcal {L}(X),\mathcal {L}(X+1))&=\frac{\mathbb {P}(\eta>0)}{2(q-\mathbb {P}(\eta >0))}\,. \end{aligned}$$

Proof

For \(l=1,2\) we write

$$\begin{aligned} \mathbb {E}[X^l]=\frac{1}{q-\mathbb {P}(\eta>0)}\sum _{j=1}^\infty j^l\mathbb {P}(\eta>j)=\frac{1}{q-\mathbb {P}(\eta >0)}\sum _{i=1}^\infty \mathbb {P}(\eta =i)\sum _{j=1}^{i-1}j^l\,, \end{aligned}$$

from which the first two expressions in the lemma follow. For the total variation distance, the definition Eq. 1 gives

$$\begin{aligned} d_{TV}(\mathcal {L}(X),\mathcal {L}(X+1))&=\frac{1}{2(q-\mathbb {P}(\eta>0))}\sum _{j=1}^\infty |\mathbb {P}(\eta>j-1)-\mathbb {P}(\eta>j)|\\&=\frac{1}{2(q-\mathbb {P}(\eta >0))}\sum _{j=1}^\infty \mathbb {P}(\eta =j)\,, \end{aligned}$$

as required. \(\square \)

We are now in a position to state the main result of this section.

Theorem 5

With \(\psi (m)\) the ruin probability defined in Eq. 16 above,

$$\begin{aligned} \left( \frac{q-\mathbb {P}(\eta>0)}{\mathbb {P}(\eta =0)}\right) ^m\le \psi (m)\le \left( \frac{q-\mathbb {P}(\eta>0)}{\mathbb {P}(\eta =0)}\right) ^m\\ +\frac{1}{\mathbb {P}(\eta =0)}\left( \frac{1}{2}\mathbb {E}[\eta (\eta -1)]-q+\mathbb {P}(\eta >0)\right) \,, \end{aligned}$$
(19)

and

$$\begin{aligned} \left| \psi (m)-\left( \frac{\mathbb {E}[\eta (\eta -1)]}{2(1-q)+\mathbb {E}[\eta (\eta -1)]}\right) ^m\right| \\ \le \frac{1}{2}\min \left\{ 1,\frac{(1-q)(1+v)}{\mathbb {P}(\eta =0)}\right\} \left( \frac{\mathbb {E}[\eta (\eta -1)(2\eta -1)]}{3\mathbb {E}[\eta (\eta -1)]}-1\right) \,, \end{aligned}$$
(20)

for \(m=1,2,\ldots \), where

$$\begin{aligned} v=\sqrt{\frac{-4(q-\mathbb {P}(\eta>0))}{(2q-3\mathbb {P}(\eta>0))\log \left( \frac{q-\mathbb {P}(\eta >0)}{\mathbb {P}(\eta =0)}\right) }}\,. \end{aligned}$$

Proof

From Lemma 1 and the Pollaczeck–Khinchine formula Eq. 17 we may write that \(\psi (m)=\mathbb {P}(W\ge m)\) for \(m=1,2,\ldots \). With this representation, we employ the bound Eq. 14 of Theorem 4 in conjunction with the formula for \(\mathbb {E}[X]\) in Lemma 2 to obtain that

$$\begin{aligned} \left| \psi (m)-\left( \frac{q-\mathbb {P}(\eta>0)}{\mathbb {P}(\eta =0)}\right) ^m\right| \le \frac{1}{\mathbb {P}(\eta =0)}\left( \frac{1}{2}\mathbb {E}[\eta (\eta -1)]-q+\mathbb {P}(\eta >0)\right) \,. \end{aligned}$$

The proof of Eq. 19 is completed by noting that W is stochastically larger than our approximating geometric random variable, and so \(\mathbb {P}(W\ge m)\ge \left( \frac{q-\mathbb {P}(\eta >0)}{\mathbb {P}(\eta =0)}\right) ^m\).

For Eq. 20 we will use Eq. 15. In line with the notation there, and using the total variation distance given in Lemma 2, we write \(1-u=\frac{\mathbb {P}(\eta>0)}{2(q-\mathbb {P}(\eta >0))}\), so that

$$\begin{aligned} u=\frac{2q-3\mathbb {P}(\eta>0)}{2(q-\mathbb {P}(\eta >0))}\,, \end{aligned}$$

and so, with r given by Eq. 18,

$$\begin{aligned} v=\sqrt{\frac{-2}{u\log (1-r)}}\,. \end{aligned}$$

The result then follows from Eq. 15 and Lemma 2. \(\square \)

Note that, since they are based on geometric approximation of a geometric sum, we would expect the bounds in Theorem 5 to perform well when the summands X are typically not much larger than one, that is, when \(\mathbb {P}(\eta >2)\) is small. To conclude this section, we illustrate this point and the bounds of Theorem 5 with three examples.

Example 3

Suppose that \(\eta \) is supported on \(\{0,1,2\}\), with \(\mathbb {P}(\eta =0)=\alpha _0\), \(\mathbb {P}(\eta =2)=\alpha _2\), and \(\mathbb {P}(\eta =1)=1-\alpha _0-\alpha _2\). We will assume that \(\alpha _2<\alpha _0\) to ensure that \(q<1\). In this case, both the bounds of Theorem 5 identify that \(\psi (m)=(\alpha _2/\alpha _0)^m\). We note, though, that Eq. 19 applies for all \(\alpha _2\in [0,\alpha _0)\), while the upper bound of Eq. 20 requires \(\alpha _2>0\) to be well-defined. In this example the expression for \(\psi (m)\) also follows more directly from the observation that \((U_0,U_1,\ldots )\) is a random walk on the integers with initial value \(U_0=m\), and independent increments supported on \(\{-1,0,1\}\) and distributed as \(1-\eta \).

Fig. 2
figure 2

The approximation error Eq. 21 as a function of \(\lambda \)

Fig. 3
figure 3

The approximation error Eq. 22 as a function of \(\beta \) for \(\alpha =0.5, 1, 1.5\)

Example 4

In the setting where \(\eta \sim \text{ Geom }(\alpha )\) has a geometric distribution, with \(\alpha >1/2\) to ensure that \(q<1\), it is known that

$$\begin{aligned} \psi (m)=\left( \frac{1-\alpha }{\alpha }\right) ^{m+1}\,. \end{aligned}$$

See Section 2 of Santana and Rincón (2020). Neither of the estimates in Theorem 5 yield exactly this value; the approximations given by Eqs. 19 and 20 are

$$\begin{aligned} \psi (m)\approx \left( \frac{1-\alpha }{\alpha }\right) ^{2m}\,,\;\text { and }\;\psi (m)\approx \left( \frac{(1-\alpha )^2}{3\alpha ^2-3\alpha +1}\right) ^m\,, \end{aligned}$$

respectively. The first of these gives the correct value for \(m=1\), but the second is a better approximation for larger m.

Example 5

Suppose \(\eta \) has a mixed Poisson distribution. That is, given some non-negative random variable \(\xi \), suppose that \(\eta |\xi \sim \text{ Pois }(\xi )\) has a Poisson distribution. We will illustrate our bound Eq. 19 in this case; the bound Eq. 20 may be evaluated similarly, but we omit that to keep the discussion concise. We have that \(q=\mathbb {E}[\xi ]\), which we assume is less than 1. We also have \(\mathbb {P}(\eta =0)=\mathbb {E}[e^{-\xi }]\), and \(\mathbb {E}[\eta (\eta -1)]=\mathbb {E}[\xi ^2]\). Hence, the approximation error in Eq. 19, i.e., the final term on the right-hand side, is

$$\begin{aligned} \frac{1}{\mathbb {P}(\eta =0)}\left( \frac{1}{2}\mathbb {E}[\eta (\eta -1)]-q+\mathbb {P}(\eta >0)\right) =\frac{1}{\mathbb {E}[e^{-\xi }]}\left( \frac{1}{2}\mathbb {E}[\xi ^2]-\mathbb {E}[\xi ]+1-\mathbb {E}[e^{-\xi }]\right) \,. \end{aligned}$$
(21)

For example, if \(\eta \sim \text{ Pois }(\lambda )\) has a Poisson distribution for some small \(\lambda >0\), writing \(e^{\pm \lambda }\approx 1\pm \lambda \), this error is approximately \(\frac{1}{2}\lambda ^2(1+\lambda )\). The approximation error Eq. 21 in this case is illustrated in Figure 2.

As a second example, suppose instead that \(\xi \) has a gamma distribution with density \(x\mapsto \frac{\beta ^\alpha }{\Gamma (\alpha )}x^{\alpha -1}e^{-\beta x}\) for \(x>0\), where \(\alpha >0\) and \(\beta >0\) are parameters, so that \(\eta \) has a negative binomial distribution. Then our approximation error is

$$\begin{aligned} \left( 1+\frac{1}{\beta }\right) ^\alpha \left( \frac{\alpha (\alpha +1)}{2\beta ^2}-\frac{\alpha }{\beta }+1\right) -1\,. \end{aligned}$$
(22)

This converges to zero as \(\alpha \rightarrow 0\) for fixed \(\beta \), or as \(\beta \rightarrow \infty \) for fixed \(\alpha \), for example. This can be seen on Fig. 3, which shows the approximation error Eq. 22 as a function of \(\beta \) for various values of \(\alpha \).