1 Introduction

Despite a long history in the literature analyzing continuous time series variables, it is only in the recent years or so that much attention has been given to time series variables that are integer-valued (see Winkelmann 2008; Fahrmeir et al. 2013; Davis et al. 2016; Weiß 2018, and references therein for reviews). Generally, integer-valued time series models can be classified into two categories: “thinning” operator based models (Scotto et al. 2015) and regression based models (Fokianos 2012; Tjøstheim 2016).

To allow for dependence between time series data, two classes of models have been proposed in Cox et al. (1981): observation-driven models and parameter-driven models. In observation-driven models, the mean of the conditional distribution of the current observation \(y_{t}\) is directly specified as a function of past observations \(y_{t-1},\ldots ,y_{1}\). In parameter-driven models, dependence among observations is introduced via latent factors which follow a stochastic process, such as a hidden Markov chain (Leroux and Puterman 1992), or a latent stationary auto-regressive process (Zeger 1988; Chan and Ledolter 1995). Compared with parameter-driven models, observation-driven models are easier to fit in practical contexts with numerous covariates and long time series. See Zeger and Qaqish (1988) for a review of various observation-driven models for count time series data. A reference for the substantial development of observation-driven models can be found in Kedem and Fokianos (2005). A variety of observation-driven models for count responses have been developed. Davis et al. (2003, 2005) presented a class of observation-driven models for time series of Poisson counts and provided properties of the maximum likelihood estimators. Ahmad and Francq (2016) derived regularity conditions for the consistency and asymptotic normality (CAN) of the Poisson quasi-maximum likelihood estimator (QMLE) for time series of counts. However, the equidispersion assumption in Poisson distribution makes it too restrictive to be applied in empirical settings. Given this, Drescher (2005) considered various generalized count distributions for observation driven models and explored their maximum likelihood estimations. Regarding existing R packages (R Core Team 2021), the glarma package (Dunsmuir and Scott 2015) provides functions for estimation, testing, diagnostic checking and forecasting based on the generalized linear autoregressive moving average (GLARMA) class of observation-driven models for discrete-valued time series with regression variables.

The benchmark parameter-driven count data model introduced in Zeger (1988) has been widely extended. It has been considered as a class of the state-space model, which extends the generalized linear model by introducing a latent auto-regressive process as the conditional mean function. The parameter-driven models allow the distribution of \(y_{t}\) to be dependent on this latent process and can deal with auto-correlation as well as over-dispersion in the model. However, parameter estimations in parameter-driven models require considerable computational effort. The main issue lies in the calculation requirement of very high dimensional integrals when using maximum likelihood estimation techniques, such that estimation methods based on Monte-Carlo (MC) integration are typically considered. To estimate the parameters of parameter-driven models, Chan and Ledolter (1995) employed a Monte Carlo EM algorithm. Kuk and Cheng (1997) considered the MC Newton Raphson method. However, such estimation approaches are not yet routinely available and therefore not ready for general use (Davis et al. 2003).

Bayesian estimation for time series of counts turns out to be a feasible and more elaborate alternative. Applications of Bayesian paradigm to count times series have mainly focused on parameter-driven models. Dynamic latent factor models within Bayesian count time series contexts have been actively studied (see, e.g., West and Harrison 1989; Durbin and Koopman 2000; Chib and Winkelmann 2001). Hay and Pettitt (2001) presented a fully Bayesian analysis of counts time series for a parameter-driven model with the form of a generalized linear mixed model, and investigated its application to the control of an infectious disease. Unlike the MC EM estimation approach for parameter-driven models, the Markov chain Monte Carlo (MCMC) procedure provides information of posterior distributions for both regression and time series parameters. In maximum likelihood-based estimations, estimation uncertainty is produced by constructing confidence intervals around the point forecasts. However, this kind of confidence intervals can only be justified asymptotically. When counts are small, such approximation is less accurate and the Bayesian technique arises as a prime candidate. When forecasting counts from the Bayesian perspective, not only the parameter uncertainty, but also the uncertainty caused by model specification, can be directly incorporated into the predictive probability mass function, which is a natural outcome of Bayes’ theorem.

Although parameter-driven models are very flexible, the existence of unobservable latent factors brings a heavy computational burden even manageable via the MCMC sampling technique. On the other hand, Bayeisan analysis of the more parsimonious observation-driven models has received growing attention recently. Generalized autoregressive moving average (GARMA) model extends the univariate Gaussian ARMA model to a flexible non-Gaussian observation-driven model. Silveira de Andrade et al. (2015) investigated the Bayesian approach for GARMA models with Poisson, binomial and negative binomial distributions. They utilized the Bayesian model selection criteria to choose the most appropriate model. Another advantage of Bayesian methodology over the corresponding frequentist procedure for forecasting discrete time series data is that Bayesian approach can produce only integer estimates of the count variable, while traditional forecasting often yields non-coherent (i.e. non-integer) estimates. For example, the Autoregressive Integrated Moving Average (ARIMA) model is one of the most prominent methods in financial time series forecasting. It has shown robust and efficient capability for short-term predictions and has been extensively applied to economics and finance fields (Contreras et al. 2003; Khashei et al. 2009; Lee and Ko 2011, among others). However, forecasts from ARIMA model can give negative values. Techniques such as log scale transformation or constrained forecast might guarantee non-negative predictions, but with the burden of elaborate post-processing and consequences of obtaining back-transformed forecasts that behave abnormally. Given the fact that many actual data in socioeconomic and business areas cannot have negative values, the classical ARIMA forecasting methods are improper when applied to non-negative count data series. From another point of view, Bayesians utilize the likelihood and prior multiplicity to generate forecasts from posterior predictive distributions by the iterative loop of MCMC procedures. Therefore, Bayesian model is prone to obey the non-negative value rules with its probabilistic predictive distributions, providing new perceptions for time series forecasting research. Nariswari and Pudjihastuti (2019) implemented Bayesian time series estimation on ARIMA model for monthly medicine demand count data, showing the validity of Bayesian time series approach to avoid negative-value predictions, which is consistent with characteristics in the actual medicine data where the stock cannot have a negative value. McCabe and Martin (2005) developed Bayesian predictions of low count time series within the context of the integer-valued first-order autoregressive (INAR(1)) class of model, and showed the Bayesian method is feasible for producing coherent forecasts. Estimation uncertainty associated with both parameters and model specification is fully incorporated in their proposed methodology.

A commonly used model in most count time series data is the Poisson integer-valued generalized autoregressive conditional heteroscedastic (Poisson INGAR CH) model proposed in Ferland et al. (2006). Since then, this model has been widely explored (see Neumann 2011; Doukhan et al. 2012, amongothers). However, the Poisson INGARCH model is not eligible to be applied in existence of potential extreme observations due to its equidispersion assumption. To this end, Zhu (2011) developed a negative binomial (NB) INGARCH model via the maximum likelihood approach. The NB-INGARCH model is flexible and allows for both overdispersion and extreme observations simultaneously. Later, Christou and Fokianos (2014) explored probabilistic properties and quasi-likelihood estimation for NB-INGARCH(1,1) process, and Xiong and Zhu (2019) considered a robust quasi-likelihood estimation for this process with an application to transaction counts. From a Bayesian perspective, Truong et al. (2017) proposed a hysteretic Poisson INGARCH model within the MCMC sampling scheme to estimate model parameters and adopted the Bayesian information criteria for model comparison. They highlighted their proposed model with a better performance of hysteresis in modelling the integer-valued time series. Chen et al. (2019) developed a Markov switching Poisson INGARCH model within a Bayesian framework to cope with the lagged dependence, overdispersion, consecutive zeros, non-linear dynamics and time varying coefficients for the meteorological variables. Some studies considered the natural candidates for the Poisson model. Chen and Lee (2017) proposed a Bayesian causality test based on the Poisson, negative binomial and log-linear Poisson INGARCH models with applications to climate and crime data. Recently, Chen and Khamthong (2020) introduced two nonlinear negative binomial INGARCH models (Markov switching and threshold specifications) along with the exogenous covariates in the conditional mean to describe time series of counts. They conducted parameter estimations and one-step-ahead forecasting via the Bayesian MCMC methods.

When modelling time series with outlying and atypical data, a commonly used approach is to develop models based on heavy-tailed distributions. The literature coping with continuous-valued time series with extreme observations is well explored via the Student’s t-distribution (Harvey and Luati 2014). However, current literature on modelling discrete time series with heavy-tailedness is less considered. To fill this gap, very recently, Qian et al. (2020) proposed a new integer-valued autoregressive process with generalized Poisson-inverse Gaussian (GPIG) innovations to model heavy-tailed count time series data. Gorgi (2020) explored a heavy-tailed mixture of negative binomial distributions, known as the Beta-negative binomial (BNB) distribution, and developed a BNB auto-regression process for modelling integer-valued time series with outliers, where a linear observation-driven dynamic equation for the conditional mean has been specified. Both Qian et al. (2020) and Gorgi (2020) employed maximum likelihood approaches to estimate the model parameters. This paper gives rise to the Bayesian inference for log-linear BNB-INGARCH models and conducts parameter estimations within adaptive Markov chain Monte Carlo frameworks. Moreover, the conditions for the posterior distribution of the full model parameter to be proper given some general priors have been derived and presented.

2 The log-linear BNB-INGARCH model

The Beta-negative binomial distribution can be represented as a beta mixture of the negative binomial distribution. Denote a discrete random variable Y, then \(Y\sim BNB(\beta ,r,\alpha )\) if its probability mass function (PMF) is given by

$$\begin{aligned} \begin{aligned} f(Y=y)=\frac{\Gamma (y+r)}{\Gamma (y+1)\Gamma (r)}\frac{B(\alpha +r,\beta +y)}{B(\alpha ,\beta )}, y\in \mathcal {N} \end{aligned} \end{aligned}$$
(1)

where \(\Gamma (\cdot )\) is the gamma function and \(B(\cdot , \cdot )\) is the beta function. \(r>0\) is the dispersion parameter, \(\alpha >0\) is the tail parameter and \(\beta >0\). We follow the parameterization of the BNB distribution in terms of its mean parameter \(\lambda \) presented in Gorgi (2020)

$$\begin{aligned} \begin{aligned} f(Y=y)=\frac{\Gamma (y+r)}{\Gamma (y+1)\Gamma (r)}\frac{B(\alpha +r,\frac{(\alpha -1)\lambda }{r}+y)}{B(\alpha ,\frac{(\alpha -1)\lambda }{r})}, y\in \mathcal {N} \end{aligned} \end{aligned}$$
(2)

with mean \(\lambda >0\), dispersion \(r>0\) and tail \(\alpha >1\). Let \(\{y_t\},t=1,\ldots ,n\) denote a univariate time series with the conditional distribution following the representation of \(BNB(\lambda _t,r,\alpha )\) at time t. We model the log-intensity process \(\mu _{t+1}=\log (\lambda _{t+1})\) in terms of a linear auto-regression process lying on its own past \(\mu _{t}\) and the past observation \(y_{t}\). The log-linear BNB integer-valued generalized auto-regressive conditional heteroscedastic model of order (1,1) is defined by

$$\begin{aligned} \begin{aligned} y_{t}|\mathcal {F}_{t-1} \sim BNB(\lambda _{t}, r, \alpha ), \\ \mu _{t+1}=\log (\lambda _{t+1}), \\ \mu _{t+1}=\omega +\phi \log (y_{t}+1)+\tau \mu _{t} \end{aligned} \end{aligned}$$
(3)

where \(\mathcal {F}_{t-1}\) denotes the “\(\sigma \)-filed” generated by \(\{y_{t-1}, y_{t-2}, \ldots \}\). Here we follow Fokianos and Tjøstheim (2011) to choose \(\log (y_{t}+c)\) with constant \(c=1\) in model (3) to map zeros of \(y_{t}\) into zeros of \(\log (y_{t}+1)\). Other reasonable choices for c may be considered. Note that model (3) accommodates both negative and positive serial correlations by allowing parameters \(\omega ,\phi ,\tau \) to take values in \(\mathbb {R}\), whereas the linear BNB-INGARCH model introduced by Gorgi (2020) accommodates positive serial correlation only by restricting parameters to be positive to guarantee positivity of the conditional mean. Moreover, model (3) permits faster increase or decrease in \(\lambda _{t}\) according to the values of \(\omega ,\phi \) and \(\tau \) than the linear model. Extensions to higher order log-linear BNB-INGARCH (p, q) models can be given as follows:

$$\begin{aligned} \begin{aligned} y_{t}|\mathcal {F}_{t-1} \sim BNB(\lambda _{t}, r, \alpha ), \\ \mu _{t+1}=\log (\lambda _{t+1}), \\ \mu _{t+1}=\omega +\sum _{i=1}^{p}\phi _{i} \log (y_{t+1-i}+1) + \sum _{j=1}^{q} \tau _{j} \mu _{t+1-j} \end{aligned} \end{aligned}$$
(4)

We note that the BNB distribution belongs to the class of mixed Poisson distributions and can approximate arbitrarily well the negative binomial distribution as well as the Poisson distribution. Specifically, as \(\alpha \rightarrow \infty \), the parameterized distribution \(\text {BNB}(\lambda ,r,\alpha )\) converges to a NB distribution with dispersion r and success probability \(\lambda /(\lambda +r)\). Furthermore, as \(r \rightarrow \infty \), the BNB converges to a Poisson distribution with mean \(\lambda \) (Gorgi 2020). Given this Poisson approximation to the BNB distribution, we follow (Douc et al. (2013), Lemma 14) and Liboschik et al. (2017) to impose the conditions \(\{|\phi |, |\tau |<1, |\phi +\tau |<1\}\) and \(\{|\phi _{i}|, |\tau _{j}|<1, |\sum _{i=1}^{p}\phi _{i}+\sum _{j=1}^{q}\tau _{j}|<1\}\) to guarantee stationarity of the proposed processes (3) and (4) respectively. We further follow Wang et al. (2014) and Gorgi (2020) to preassign the initial point \(\lambda _{1}\) to a fixed positive value for both models (3) and (4). As noted in Gorgi (2020), this approach is quite standard in the literature of observation-driven time series models. In fact, Gorgi (2020) showed that the filtered parameter \(\{\hat{\lambda }_{t}(\varvec{\theta })\}_{t\in \mathbb {N}}\) converges exponentially almost surely and uniformly over the compact parameter sets \(\Theta \) to a unique stationary and ergodic sequence \(\{\tilde{\lambda }_{t}(\varvec{\theta })\}_{t\in \mathbb {Z}}\) for any initialization \(\hat{\lambda }_{1}(\varvec{\theta }) \in \mathbb {R}^{+}\).

3 Bayesian inference

Due to the high computational demand, we resort to Bayesian analysis for parameter estimations and inferences of the log-linear BNB-INGARCH processes. Without loss of generality, we focus on the first-order specification \((p=q=1)\) as presented in model (3) for simplicity of inference illustration. We denote the time series of interest as \(\textbf{y}=(y_{1}, y_{2}, \ldots , y_{n})^{T}\) and \(\varvec{\theta }=(\omega , \phi ,\tau ,r,\alpha )^{T}\) as the entire unknown parameter vector. By the Bayes theorem, the posterior distribution \(p(\varvec{\theta |\textbf{y}})\) is given by

$$\begin{aligned} \begin{aligned} p(\varvec{\theta |\textbf{y}}) \propto L(\textbf{y}|\varvec{\theta })\pi (\varvec{\theta }) \end{aligned} \end{aligned}$$
(5)

where \(\pi (\varvec{\theta })\) denotes the prior distribution and \(L(\textbf{y}|\varvec{\theta })\) represents the likelihood function

$$\begin{aligned} \begin{aligned} L(\varvec{y}|\varvec{\theta })=\prod _{t=2}^{n} \frac{\Gamma \left( y_{t}+r\right) }{\Gamma (r)} \frac{B\left( \alpha +r, \frac{(\alpha -1) e^{\omega +\phi \log \left( y_{t-1}+1\right) +\tau \log \left( \lambda _{t-1}\right) }}{r}+y_{t}\right) }{B\left( \alpha , \frac{(\alpha -1) e^{\omega +\phi \log \left( y_{t-1}+1\right) +\tau \log \left( \lambda _{t-1}\right) }}{r}\right) } \end{aligned} \end{aligned}$$
(6)

with \(\omega \in \mathbb {R}\), \(|\phi |<1\), \(|\tau |<1\), \(|\phi +\tau |<1\), \(r>0\) and \(\alpha >1\). We implement Bayesian inference for parameter groups (i) \(\omega \); (ii) \(\{\phi ,\tau \}\); (iii) r; and (iv) \(\alpha \) with the assumption that they are priori-independent. The conditional posterior distributions for each parameter group are presented as follows. Hereafter let \(\theta _{j}\) denote the j-th element of the parameter vector \(\varvec{\theta }\), with \(j=1,2,3,4\) referring to \(\omega \), \(\{\phi ,\tau \}\), r and \(\alpha \) respectively, and \(\varvec{\theta }_{-j}\) denote the vector of all parameters excluding the component \(\theta _{j}\).

  1. (i)

    For the intercept \(\omega \), we consider a normal prior \(\pi (\omega )=N(\mu _{\omega },\sigma _{\omega }^2)\). The full conditional posterior distribution of \(\omega \) is then as follows:

    $$\begin{aligned} \begin{aligned} p(\omega |\varvec{\theta }_{-1}, \textbf{y})&\propto L(\textbf{y}|\varvec{\theta })\pi (\omega ) \\&\propto L(\textbf{y}|\varvec{\theta }) e^{-\frac{(\omega -\mu _\omega )^2}{2\sigma _{\omega }^2}} \end{aligned} \end{aligned}$$
    (7)
  2. (ii)

    For the prior \(\pi (\phi ,\tau )\) of the parameter block \(\{\phi ,\tau \}\), we employ constrained normals \(\pi (\phi )= N(\mu _{\phi },\sigma _{\phi }^2)\) and \(\pi (\tau )= N(\mu _{\tau },\sigma _{\tau }^2)\) with \(\{\phi ,\tau \}\) satisfying the set A

    $$\begin{aligned} |\phi |<1, |\tau |<1, |\phi +\tau |<1 \end{aligned}$$
    (8)

    Then the full conditional posterior distribution of \(\{\phi ,\tau \}\) is given by

    $$\begin{aligned} \begin{aligned} p(\phi ,\tau |\varvec{\theta }_{-2}, \textbf{y})&\propto L(\textbf{y}|\varvec{\theta })\pi (\phi ,\tau ) \\&\propto L(\textbf{y}|\varvec{\theta })e^{-\frac{(\phi -\mu _{\phi })^2}{2\sigma _{\phi }^2}} e^{-\frac{(\tau -\mu _{\tau })^2}{2\sigma _{\tau }^2}}\mathcal {I}(A) \end{aligned} \end{aligned}$$
    (9)

    where \(\mathcal {I}(\cdot )\) denotes the indicator function.

  3. (iii)

    For the dispersion parameter r, we impose a gamma prior \(\pi (r)=\text {Ga}(\eta _{1r},\eta _{2r})\) with shape parameter \(\eta _{1r}\) and rate parameter \(\eta _{2r}\), then the full conditional posterior distribution of r is given by

    $$\begin{aligned} \begin{aligned} p(r|\varvec{\theta }_{-3}, \textbf{y})&\propto L(\textbf{y}|\varvec{\theta })\pi (r) \\&\propto L(\textbf{y}|\varvec{\theta })r^{\eta _{1r}-1}e^{-\eta _{2r} \cdot r} \end{aligned} \end{aligned}$$
    (10)
  4. (iv)

    For the tail parameter \(\alpha \), we impose a truncated gamma prior \(\pi (\alpha )=\text {Ga}(\eta _{1\alpha },\eta _{2\alpha })\mathcal {I}(\alpha >1)\) with shape \(\eta _{1\alpha }\) and rate \(\eta _{2\alpha }\), then the full conditional posterior distribution of \(\alpha \) is given by

    $$\begin{aligned} \begin{aligned} p(\alpha |\varvec{\theta }_{-4}, \textbf{y})&\propto L(\textbf{y}|\varvec{\theta })\pi (\alpha ) \\&\propto L(\textbf{y}|\varvec{\theta })\alpha ^{\eta _{1\alpha }-1}e^{-\eta _{2\alpha } \cdot \alpha } \mathcal {I}(\alpha >1) \end{aligned} \end{aligned}$$
    (11)

Theorem 1 elaborated below presents the sufficient conditions for the posterior distribution of \(\varvec{\theta }\) to be proper given some general priors.

Theorem 1

Let \(\{y_{t}|\mathcal {F}_{t-1}\}_{t \in \mathbb {Z}^{+}}\) denote the target count time series with the conditional Beta-negative binomial distribution and \(\varvec{\theta }=(\omega , \phi , \tau , r, \alpha )^{T}\) be the full parameter vector. For ease of notation, we denote \(\kappa {(\varvec{\theta })}=(\alpha -1)/r \cdot e^{\omega +\phi \log \left( y_{t-1}+1\right) +\tau \log \left( \lambda _{t-1}\right) }\) in the remainder of this section. Then under model (3) and proper prior specifications, the posterior distribution of \(\varvec{\theta }\) given \(\textbf{y}=(y_{1}, y_{2}, \ldots , y_{n})^{T}\) is obtained by

$$\begin{aligned} \begin{aligned} p(\varvec{\theta }|\varvec{y}) \propto \prod _{t=2}^{n} \frac{\Gamma \left( y_{t}+r\right) }{\Gamma (r)} \frac{B\left( \alpha +r, \kappa (\varvec{\theta })+y_{t}\right) }{B\left( \alpha , \kappa (\varvec{\theta }) \right) }\,\pi (\omega ) \pi (\phi ,\tau ) \pi (\alpha ) \pi (r) \end{aligned} \end{aligned}$$

and is well defined.

Proof

Under proper prior specifications, we have

$$\begin{aligned}&\int p(\varvec{\theta }|\varvec{y})d\varvec{\theta } \nonumber \\&\propto \int \Biggl \{ \prod _{t=2}^{n} \frac{\Gamma \left( y_{t}+r\right) }{\Gamma (r)} \frac{B\left( \alpha +r, \kappa (\varvec{\theta })+y_{t}\right) }{B\left( \alpha , \kappa (\varvec{\theta })\right) } \times \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r) \Biggr \} d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&= \int \Biggl \{ \prod _{t=2}^{n} \frac{\Gamma \left( y_{t}+r\right) }{\Gamma (r)} \frac{\Gamma (\alpha +r)}{\Gamma (\alpha )} \frac{\Gamma \left( \kappa (\varvec{\theta })+y_{t}\right) }{\Gamma \left( \kappa (\varvec{\theta })\right) } \frac{\Gamma \left( \alpha +\kappa (\varvec{\theta })\right) }{\Gamma \left( \alpha +\kappa (\varvec{\theta })+y_{t}+r\right) } \pi (\omega )\pi (\phi ,\tau )\nonumber \\&\pi (\alpha )\pi (r) \Biggl \} d \omega \, d \phi \, d \tau \, d \alpha \, d r \end{aligned}$$
(12)

where the equation follows by the relation \(B(a,b)=\frac{\Gamma (a)\Gamma (b)}{\Gamma (a+b)}\). From the Stirling’s approximation of the gamma function \(\Gamma (\cdot )\), the following approximation is obtained for large r and \(\alpha \):

$$\begin{aligned} (12)&\approx \int r^{\sum _{t=2}^{n}y_{t}} \Bigg \{\prod _{t=2}^{n} \frac{\Gamma (\alpha +r)}{\Gamma (\alpha )} \frac{\Gamma \left( \kappa {(\varvec{\theta })}+y_{t}\right) }{\Gamma \left( \kappa {(\varvec{\theta })}\right) } \frac{\Gamma \left( \alpha +\kappa {(\varvec{\theta })}\right) }{\Gamma \left( \alpha +\kappa {(\varvec{\theta })}+y_{t}+r\right) } \nonumber \\ {}&\pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r) \Bigg \}\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&\approx \int r^{\sum _{t=2}^{n}y_{t}} \Bigg \{\prod _{t=2}^{n} \alpha ^{r} \Bigl (\kappa {(\varvec{\theta })}\Bigl )^{y_t} \Bigl (\alpha +\kappa {(\varvec{\theta })}\Bigl )^{-(y_t+r)} \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r) \Bigg \} \nonumber \\ {}&d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&=\int r^{\sum _{t=2}^{n}y_{t}} \Bigg \{\prod _{t=2}^{n}\left( \frac{\alpha }{\alpha +\kappa {(\varvec{\theta })}} \right) ^{r} \left( \frac{\kappa {(\varvec{\theta })}}{\alpha +\kappa {(\varvec{\theta })}}\right) ^{y_t} \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\,\Bigg \} \nonumber \\ {}&d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&=\int r^{\sum _{t=2}^{n}y_{t}} \Bigg \{ \prod _{t=2}^{n}\left( \frac{1}{1+\frac{\kappa {(\varvec{\theta })}}{\alpha }}\right) ^{r} \left( \frac{1}{1+\frac{\alpha }{\kappa {(\varvec{\theta })}}}\right) ^{y_{t}} \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\,\Bigg \} \nonumber \\ {}&d \omega \, d \phi \, d \tau \, d \alpha \, d r \end{aligned}$$
(13)

For \(r\ge 1\), we obtain

$$\begin{aligned} (13)&\le \int r^{\sum _{t=2}^{n}y_{t}} \Bigg \{ \prod _{t=2}^{n} \frac{1}{1+\frac{r \kappa {(\varvec{\theta })}}{\alpha }}\,\left( \frac{1}{1+\frac{\alpha }{\kappa {(\varvec{\theta })}}}\right) ^{y_{t}} \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\,\Bigg \} \nonumber \\&d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&\le \int r^{\sum _{t=2}^{n}y_{t}} \Bigg \{ \prod _{t=2}^{n} \frac{1}{1+\frac{r\kappa {(\varvec{\theta })}}{\alpha }}\,\, \frac{1}{1+\frac{\alpha y_{t}}{\kappa {(\varvec{\theta })}}} \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\,\Bigg \} \nonumber \\ {}&d \omega \, d \phi \, d \tau \, d \alpha \, d r \end{aligned}$$
(14)

where the first inequality follows by Bernoulli’s inequality \((1+x)^d \ge 1+dx\) for real numbers \(d \ge 1, x \ge -1\) and the second inequality follows by Bernoulli’s inequality \((1+x)^d \ge 1+dx\) for integer \(d \ge 0\) and real number \(x \ge -1\). Without loss of generality, we assume that the first \(n_{1}\) \(y_{t}\)’s have zero-valued observations and the remaining \((n-n_{1})\) observations have positive \(y_{t}\)’s. Then we have

$$\begin{aligned} (14)&= \int r^{\sum _{t=n_1+1}^{n} y_t} \Bigg \{ \prod _{t=2}^{n_1} \frac{1}{1+\left( \frac{\alpha -1}{\alpha }\right) e^{\omega +\tau \log \left( \lambda _{t-1}\right) }} \prod _{t=n_1+1}^{n} \frac{1}{1+\frac{r\kappa {(\varvec{\theta })}}{\alpha }}\, \frac{1}{1+\frac{\alpha y_{t}}{\kappa {(\varvec{\theta })}}} \nonumber \\&\pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\,\Bigg \}\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&\le \int r^{\sum _{t=n_1+1}^{n}y_{t}} \Bigg \{ \prod _{t=2}^{n_1} \frac{1}{\left( \frac{\alpha -1}{\alpha }\right) e^{\omega +\tau \log \left( \lambda _{t-1}\right) }} \prod _{t=n_1+1}^{n} \frac{\alpha }{r\kappa {(\varvec{\theta })}}\, \frac{\kappa {(\varvec{\theta })}}{\alpha y_{t}} \pi (\omega )\pi (\phi ,\tau )\nonumber \\ {}&\pi (\alpha )\pi (r)\,\Bigg \}\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&= \int r^{\sum _{t=n_1+1}^{n}y_{t}} \Bigg \{ \prod _{t=2}^{n_1} \frac{\alpha \,e^{-\{\omega +\tau \log \left( \lambda _{t-1}\right) \}}}{\alpha -1} \prod _{t=n_1+1}^{n} \frac{\alpha \,e^{-\{\omega +\phi \log \left( y_{t-1}+1\right) +\tau \log \left( \lambda _{t-1}\right) \}}}{\alpha -1}\nonumber \\&\frac{(\alpha -1)\,e^{\omega +\phi \log \left( y_{t-1}+1\right) +\tau \log \left( \lambda _{t-1}\right) }}{\alpha \,r\, y_{t}} \pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\,\Bigg \}\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \nonumber \\&= C_1 \int r^{\sum _{t=n_1+1}^{n}(y_{t}-1)} \left( \frac{\alpha }{\alpha -1}\right) ^{n_1-1} e^{-(n_1-1)\omega } e^{-(\sum _{t=2}^{n_1}\log (\lambda _{t-1}))\tau } \nonumber \\&\pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \end{aligned}$$
(15)

where \(C_1=\prod _{t=n_1+1}^{n}\frac{1}{y_t}\). By the Binomial approximation \(\left( \frac{\alpha }{\alpha -1}\right) ^{n_1-1}=\left( 1+\frac{1}{\alpha -1}\right) ^{n_1-1}\) \(\approx 1+\frac{n_1-1}{\alpha -1}\) for large \(\alpha \) and further assuming \(\alpha \ge M\) for any large positive number M, we obtain

$$\begin{aligned} (15)&\approx C_1 \int r^{\sum _{t=n_1+1}^{n}(y_{t}-1)} \left( 1+\frac{n_1-1}{\alpha -1}\right) e^{-(n_1-1)\omega } e^{-(\sum _{t=2}^{n_1}\log (\lambda _{t-1}))\tau } \\&\pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \\&\le C_1 \int r^{\sum _{t=n_1+1}^{n}(y_{t}-1)} \left( 1+\frac{n_1-1}{M-1}\right) e^{-(n_1-1)\omega } e^{-(\sum _{t=2}^{n_1}\log (\lambda _{t-1}))\tau } \\&\pi (\omega )\pi (\phi ,\tau )\pi (\alpha )\pi (r)\, d \omega \, d \phi \, d \tau \, d \alpha \, d r \\&= C_2 \int r^{\sum _{t=n_1+1}^{n}(y_{t}-1)} e^{-(n_1-1)\omega } \pi (\omega )\pi (r)\, d \omega \, d r < \infty \end{aligned}$$

given that the prior density functions \(\pi (\omega ), \pi (\phi ,\tau ), \pi (\alpha )\) and \(\pi (r)\) integrate to a finite quantity, the \(\sum _{t=n_1+1}^{n}(y_{t}-1)\)-th moment about the origin of r exists and the moment generating function \(M_{\omega }(1-n_1)\) of \(\omega \) exists. Here \(C_2\) is a constant unrelated with the parameters of interest.

It follows from the proof of Theorem 1 that the priors for \((\phi ,\tau )\) and \(\alpha \) can be chosen very flexibly. In fact, any proper distributions can be considered because the appropriateness of the posterior will not be affected. On the other hand, the choice of the prior \(\pi (r)\) requires existence of the \(\sum _{t=n_1+1}^{n}(y_{t}-1)\)-th moment about the origin of r and the choice of the prior \(\pi (\omega )\) requires existence of the moment generating function \(M_{\omega }(1-n_1)\).

Since the obtained posterior distributions do not correspond to closed-form distributions, we resort to MCMC sampling methods. Specifically, for faster convergence and better mixing of the chain, we follow the adaptive MCMC method of Chen and So (2006). We employ the random-walk Metropolis-Hastings in the first H iterations (the burn-in period) and the independent-kernel Metropolis-Hastings in the following \((N-H)\) iterations to draw samples for the parameter vector \(\varvec{\theta }\). The adaptive MCMC procedure is provided as follows:

Step1.:

Set initial values for \(\varvec{\theta }^{(0)}=(\omega ^{(0)},\phi ^{(0)},\tau ^{(0)},r^{(0)},\alpha ^{(0)})^{T}\).

Step2.:

When \(1\le k \le H\), we adopt the random-walk Metropolis-Hastings algorithm for sampling \(\varvec{\theta }^{(k)}\):

Step2.1.:

Generate candidate values \(\varvec{\theta }^*=\{\theta _{l}^*\}\), \(l=1,2,\ldots ,5\), where \(\theta _{l}^*=\theta _{l}^{(k-1)}+\epsilon ,\epsilon \sim N(0,\sigma _{l}^2)\), and the tuning parameter \(\sigma _{l}^2\) is selected to achieve the acceptance rate around 23%.

Step2.2.:

Keep the candidate values if \(\varvec{\theta }^*\) satisfies that: \(\theta _{1}>0\), \(\{\theta _2,\theta _3\}\) do not violate the stationarity conditions of the model, \(\theta _4>0\) and \(\theta _5>1\). Otherwise, go back to Step 2.1.

Step2.3.:

Calculate the acceptance probability

$$\begin{aligned} \text {Prob}(\varvec{\theta }^*,\varvec{\theta }^{(k-1)}) = \min \biggl \{1,\frac{p(\varvec{\theta }^*)}{p(\varvec{\theta }^{(k-1)})} \biggl \}, \end{aligned}$$

where \(p(\cdot )\) is the target posterior distribution given in (5). Then generate a random uniform number \(u\in [0,1]\). If \(u<\text {Prob}(\varvec{\theta }^*,\varvec{\theta }^{(k-1)})\), accept the new candidate and set \(\varvec{\theta }^{(k)}=\varvec{\theta }^*\). Otherwise, set \(\varvec{\theta }^{(k)}=\varvec{\theta }^{(k-1)}\).

Step3.:

When \(k \ge H+1\), we adopt the independent-kernel Metropolis-Hastings algorithm for sampling \(\varvec{\theta }^{(k)}\):

Step3.1.:

Generate candidate values \(\varvec{\theta }^* \sim N(\varvec{\mu }_{\theta },\varvec{\Omega }_{\theta })\), where the sample mean \(\varvec{\mu }_{\theta }\) and sample covariance matrix \(\varvec{\Omega }_{\theta }\) are calculated using the burn-in H iteration samples.

Step3.2.:

Keep the candidate values if \(\varvec{\theta }^*\) satisfies that: \(\theta _{1}>0\), \(\{\theta _2,\theta _3\}\) do not violate the stationarity conditions of the model, \(\theta _4>0\) and \(\theta _5>1\). Otherwise, go back to Step 3.1.

Step3.3.:

Calculate the acceptance probability

$$\begin{aligned} \text {Prob}(\varvec{\theta }^*,\varvec{\theta }^{(k-1)}) = \min \biggl \{1,\frac{p(\varvec{\theta }^*)g(\varvec{\theta }^{(k-1)})}{p(\varvec{\theta }^{(k-1)})g(\varvec{\theta }^*)} \biggl \}, \end{aligned}$$

where \(g(\cdot )\) is the Gaussian proposal density with mean \(\varvec{\mu }_{\theta }\) and sample covariance matrix \(\varvec{\Omega }_{\theta }\). Then generate a random uniform number \(u\in [0,1]\). If \(u<\text {Prob}(\varvec{\theta }^*,\varvec{\theta }^{(k-1)})\), accept the new candidate and set \(\varvec{\theta }^{(k)}=\varvec{\theta }^*\). Otherwise, set \(\varvec{\theta }^{(k)}=\varvec{\theta }^{(k-1)}\).

Step4.:

Go to the next iteration or stop if the chain has converged.

Given the obtained N iteration samples \(\varvec{\theta }^{(1)},\varvec{\theta }^{(2)},\ldots , \varvec{\theta }^{(N)}\), we discard the first H in the burn-in period and perform model parameter estimations using the remaining \((N-H)\) iterations.

4 Simulation analysis

Table 1 Simulation results for the Bayesian log-linear BNB-INGARCH model obtained from 1000 replications
Table 2 Simulation results for the ML estimators of the log-linear BNB-INGARCH model obtained from 1000 Monte Carlo replications. The parameters r and \(\alpha \) are reparameterized in terms of their inverse

To examine the performance of the adaptive MCMC algorithm for the log-linear BNB-INGARCH model, we conduct a simulation analysis under the following data generating process (DGP) with sample sizes \(n=100\) and \(n=250\). The count series \(y_t\) is sampled from the log-linear BNB-INGARCH model (3) where \((\omega ,\phi ,\tau ,r,\alpha )^{T}\) is set to be \((0.65,0.7,-0.2,5,3)^{T}\). To investigate the sensitivity of the prior and hyperparameter selections, we consider three prior calibrations for the parameters of interest, which is:

Prior 1: \(\omega \sim N(0.1,0.3^2)\), \(\{\phi ,\tau \}\sim N(0.1,0.25^2) \cdot N(0.1,0.75^2)\mathcal {I}(A)\), \(r\sim \text {Ga}(10,0.5)\) and \(\alpha \sim \text {Ga}(10,0.5)\mathcal {I}(\alpha >1)\)

Prior 2: \(\omega \sim N(0.1,0.3^2)\), \(\pi (\phi ,\tau )\propto \mathcal {I}(A)\), \(r\sim \text {Ga}(10,0.5)\) and \(\alpha \sim \text {Exp}(0.01) \mathcal {I}(\alpha >1)\)

Prior 3: \(\omega \sim N(0.1,0.3^2)\), \(\{\phi ,\tau \}\sim N(0.1,0.25^2) \cdot N(0.1,0.75^2)\mathcal {I}(A)\), \(r\sim \text {Ga}(10,1)\) and \(\pi (\alpha )\propto \frac{1}{(1+\alpha )^2}\mathcal {I}(\alpha >1)\)

where \(\text {Exp}(\eta )\) denotes the exponential distribution with the rate parameter \(\eta \) and the set A is given in (8). Note that in Prior 2, we employ a constrained uniform prior on \(\{\phi ,\tau \}\) which configures a flat prior on the parameters restricted by the indicator \(\mathcal {I}(A)\). We perform 10,000 MCMC iterations by discarding 5000 iterations as a burn-in sample for inference in each scenario. Table 1 presents the averages of posterior means, medians, standard deviations, and 95% credible intervals over 1000 replications for all model parameters. For a comparison with the frequentist approach, we report the simulation results for the maximum likelihood (ML) estimator of the considered model in Table 2. The averages of the mean and the standard deviation for different parameter values and sample sizes are obtained from 1000 Monte Carlo replications. We follow Gorgi (2020) to reparameterize r and \(\alpha \) in terms of their inverse to facilitate the ML estimation. As noted by Gorgi (2020), this procedure is adopted because, especially for small sample sizes, the estimate of \(\alpha \) may become arbitrarily large since the likelihood function is flat for large \(\alpha \).

Given Tables 1 and 2, we summarize the simulation results as follows:

(1) We observe that the employed adaptive MCMC approach gives a reasonably accurate estimate of all the parameters of interest for small sample sizes. The standard deviation decreases as the sample size increases from 100 to 250 in both Bayesian and ML estimation scenarios. Furthermore, all the reported standard deviation values for parameters \(\omega \), \(\phi \) and \(\tau \) under Bayesian estimation are smaller than that of ML estimator in each simulation scheme.

(2) The simulation results demonstrate that both positive and negative serial correlations can be captured by the proposed model. Moreover, the average posterior means and posterior medians are overall reasonably close to the true values of the parameters, implying the validity of the considered adaptive MCMC method. We therefore suggest using posterior medians as the model parameter estimates since the median is a robust measure of central tendency compared with the mean.

(3) We observe that the adaptive MCMC procedure is robust to the selection of priors and hyperparameters via delivering similar reasonably accurate estimation results under different setting scenarios.

(4) We extensively examine the sensitivity of starting values by specifying different starting points for the adaptive MCMC sampler. We also investigate the sensitivity relating to the choice of \(\lambda _{1}\) by randomly setting different initialization values for the intensity process. We observe that the employed MCMC sampler is robust to the selection of the starting values and the initial intensity as different calibrations deliver similar reasonably accurate estimation results for the considered small sample sizes n. For space limitation, we do not present these results, but they are available from the authors upon request.

5 Empirical application on futures tick count data

This section illustrates the proposed methodology by an empirical application on the numbers of minute-bar VIX futures tick count data. This historical intraday market data is delivered by Tick Data provider and is available from https://www.tickdata.com/. The sample data set we consider consists of 920 available observations between 01:00 a.m. and 11:00 p.m. on the day 02nd January, 2020. The empirical mean and variance are 25.789 and 1669.625 respectively, indicating considerable over-dispersion pattern of the data set. Figure 1 depicts the plot and the empirical auto-correlation functions of the tick count series. The auto-correlation function (ACF) and partial auto-correlation function (PACF) plots suggest the existence of significant auto-correlations in the data. The series exhibits several extreme observations. Specifically, the number of minute-bar VIX futures tick count is exceedingly high at 08:31 a.m., 14:58 p.m. and 15:15 p.m.. These attributes indicate the desirability of BNB auto-regressions to capture the auto-correlation structure and to account for the extreme observations in the data by means of the heavy-tailedness characteristics of the BNB distribution.

Fig. 1
figure 1

Numbers of minute-bar VIX futures tick count data on 02nd January, 2020 from 01:00 a.m. to 11:00 p.m. (top panel) and sample auto-correlation function (bottom left) and partial auto-correlation function of the series (bottom right)

We compare the performances of Bayesian log-linear BNB-INGARCH models and their counterpart log-linear NB-INGARCH models with order specifications \(\{(p,q)\}=\{(1,1),(1,2),(2,1),(2,2)\}\). The prior calibrations are presented below. For both BNB-INGARCH and NB-INGARCH models, we adopt normal prior \(N(0.5,0.25^2)\) for \(\omega \), constrained normal priors \(N(0.2,0.15^2)\) for \(\phi _{i}\) and \(\tau _{j}\) with \(\{\phi _{i},\tau _{j}\}\) satisfying \(|\phi _{i}|<1,|\tau _{j}|<1\), \(|\sum _{i=1}^{p}\phi _{i}+\sum _{j=1}^{q}\tau _{j}|<1\) and gamma prior \(\text {Ga}(5,0.5)\) for r. For the tail parameter \(\alpha \) under BNB model specifications, we consider the truncated gamma prior \(\text {Ga}(5,0.5)\mathcal {I}(\alpha >1)\). We also refer to the analogical prior establishments in the simulation analysis section for the considered empirical application and observe the parameter estimations are robust to the selection of priors and hyperparameters. We perform 30,000 MCMC iterations and discard the first 10,000 iterations as a burn-in sample in each model set-up scenario. Table 3 summarizes the estimation results including the posterior mean, standard deviation, the posterior 2.5 and 97.5 percentiles, Akaike information criterion (AIC) and Bayesian information criterion (BIC) for each specification separately. The convergence diagnostics of each of the Markov chains is investigated through the trace plots for each parameter of the fitted models. We conclude that proper mixing is achieved by each MCMC sampler. Based on the values of AIC and BIC provided in Table 3, the BNB-INGARCH models under all considered order specifications are superior to their corresponding NB-INGARCH counterparts. The preference for the BNB distribution can be further embodied in the low estimate values of the tail parameter \(\alpha \), which is estimated to be around 2.5 with a standard deviation of about 0.15 in most order specification scenarios. This implies the heavy-tailedness of the estimated conditional distribution of the data with only a finite second order moment. Moreover, the BNB-INGARCH(1,2) is found to be a competitive model for this data set among all the considered candidates. Due to space limitation, we only provide the diagnostic trace plots of the Markov chains for each parameter in the favoured BNB-INGARCH(1,2) model in Fig. 2.

Table 3 Summary of estimation results for VIX futures tick count data
Fig. 2
figure 2

Trace plots of the Markov chains for the log-linear BNB-INGARCH(1, 2) model marginal posterior distributions of parameters \(\omega ,\phi ,\tau _1,\tau _2,r\) and \(\alpha \) for VIX futures tick count data. The grey rectangles represent the burn-in periods

To check the adequacy of the best fitted log-linear BNB-INGARCH(1,2) model based on the lowest AIC and BIC values, we examine the standardized Pearson residuals \(z_t=\frac{y_t-\text {E}(y_t|\mathcal {F}_{t-1})}{\sqrt{\text {Var}(y_t|\mathcal {F}_{t-1})}}\) proposed by Jung et al. (2006). For correctly specified model, the Pearson residuals should have mean zero and variance one, with no significant auto-correlations. Figure 3 presents the series plot and auto-correlation functions of the standardized residuals for the fitted BNB-INGARCH(1,2) model. The plots indicate no significant serial correlations in the residuals. On the basis of the diagnostic checking figures, the auto-correlation characteristics in the futures tick count data can be captured and described by the dynamics of log-linear BNB-INGARCH models. We conclude that the proposed and fitted model is adequate.

Fig. 3
figure 3

Diagnostic checking plots for standardized residuals. The top panel reports the standardized Pearson residuals of the log-linear BNB-INGARCH(1, 2) model for VIX futures tick count data. The bottom left and bottom right plots present the auto-correlation and partial auto-correlation functions of the residuals respectively

6 Conclusion

The Bayesian estimation approach and model selections for the proposed log-linear Beta-negative binomial integer-valued GARCH model have been presented. Parameter estimations for the proposed process are performed based on adaptive Markov chain Monte Carlo methods. The empirical application on high-frequency intraday VIX futures tick count data indicates that the proposed model is adequate and provides better performance than the INGARCH model under negative binomial distribution assumptions. This research, by extending the log-linear BNB-INGARCH model in Bayesian frameworks, complements another aspect of current literature on modelling discrete time series with heavy-tailedness characteristics.