[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Detecting conflicting summary statistics in likelihood-free inference

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Bayesian likelihood-free methods implement Bayesian inference using simulation of data from the model to substitute for intractable likelihood evaluations. Most likelihood-free inference methods replace the full data set with a summary statistic before performing Bayesian inference, and the choice of this statistic is often difficult. The summary statistic should be low-dimensional for computational reasons, while retaining as much information as possible about the parameter. Using a recent idea from the interpretable machine learning literature, we develop some regression-based diagnostic methods which are useful for detecting when different parts of a summary statistic vector contain conflicting information about the model parameters. Conflicts of this kind complicate summary statistic choice, and detecting them can be insightful about model deficiencies and guide model improvement. The diagnostic methods developed are based on regression approaches to likelihood-free inference, in which the regression model estimates the posterior density using summary statistics as features. Deletion and imputation of part of the summary statistic vector within the regression model can remove conflicts and approximate posterior distributions for summary statistic subsets. A larger than expected change in the estimated posterior density following deletion and imputation can indicate a conflict in which inferences of interest are affected. The usefulness of the new methods is demonstrated in a number of real examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Anderson, C.W., Coles, S.G.: The largest inclusions in a piece of steel. Extremes 5(3), 237–252 (2002)

    Article  MathSciNet  Google Scholar 

  • Bayarri, M.J., Castellanos, M.E.: Bayesian checking of the second levels of hierarchical models. Stat. Sci. 22, 322–343 (2007)

    MathSciNet  MATH  Google Scholar 

  • Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)

    Article  Google Scholar 

  • Bissiri, P.G., Holmes, C.C., Walker, S.G.: A general framework for updating belief distributions. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 78(5), 1103–1130 (2016)

    Article  MathSciNet  Google Scholar 

  • Blum, M.G.B., François, O.: Non-linear regression models for approximate Bayesian computation. Stat. Comput. 20, 63–75 (2010)

    Article  MathSciNet  Google Scholar 

  • Blum, M.G.B., Nunes, M.A., Prangle, D., Sisson, S.A.: A comparative review of dimension reduction methods in approximate Bayesian computation. Stat. Sci. 28(2), 189–208 (2013)

    Article  MathSciNet  Google Scholar 

  • Bortot, P., Coles, S., Sisson, S.: Inference for stereological extremes. J. Am. Stat. Assoc. 102(477), 84–92 (2007)

    Article  MathSciNet  Google Scholar 

  • Box, G.E.P.: Sampling and Bayes inference in scientific modelling and robustness (with discussion). J. R. Stat. Soc. Ser. A 143, 383–430 (1980)

    Article  Google Scholar 

  • Csilléry, K., Francçois, O., Blum, M.G.B.: abc: an R package for approximate Bayesian computation (ABC). Methods Ecol. Evol. 3(3), 475–479 (2012)

    Article  Google Scholar 

  • Dinev, T., Gutmann, M.: Dynamic likelihood-free inference via ratio estimation (DIRE). arXiv:1810.09899 (2018)

  • Drovandi, C.C., Pettitt, A.N., Lee, A.: Bayesian indirect inference using a parametric auxiliary model. Stat. Sci. 30(1), 72–95 (2015)

    Article  MathSciNet  Google Scholar 

  • Erhardt, R., Sisson, S.A.: Modelling extremes using approximate Bayesian computation. In: Dey, D., Yan, J. (eds.) Extreme Value Modelling and Risk Analysis, pp. 281–306. Chapman and Hall/CRC (2016)

  • Evans, M.: Measuring Statistical Evidence Using Relative Belief. Taylor & Francis (2015)

  • Evans, M., Moshonov, H.: Checking for prior-data conflict. Bayesian Anal. 1, 893–914 (2006)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Ma, C., Zhong, Y.: A selective overview of deep learning. Stat. Sci. 36(2), 264–290 (2021)

    MathSciNet  MATH  Google Scholar 

  • Fan, Y., Nott, D.J., Sisson, S.A.: Approximate Bayesian computation via regression density estimation. Stat 2(1), 34–48 (2013). https://doi.org/10.1002/sta4.15

    Article  MathSciNet  Google Scholar 

  • Fasiolo, M., Pya, N., Wood, S.N.: A comparison of inferential methods for highly nonlinear state space models in ecology and epidemiology. Stat. Sci. 31, 96–118 (2016)

    Article  MathSciNet  Google Scholar 

  • Frazier, D.T., Drovandi, C.: Robust approximate Bayesian inference with synthetic likelihood. J. Comput. Gr. Stat. (2021). (to Appear)

  • Frazier, D.T., Drovandi, C., Loaiza-Maya, R.: (2020a) Robust approximate Bayesian computation: an adjustment approach. arXiv preprint arXiv:2008.04099

  • Frazier, D.T., Robert, C.P., Rousseau, J.: Model misspecification in approximate Bayesian computation: consequences and diagnostics. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 82(2), 421–444 (2020b)

    Article  MathSciNet  Google Scholar 

  • Gelman, A., Meng, X.L., Stern, H.: Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6, 733–807 (1996)

    MathSciNet  MATH  Google Scholar 

  • Gelman, A., Vehtari, A., Simpson, D., Margossian, C.C., Carpenter, B., Yao, Y., Kennedy, L., Gabry, J., Bürkner, P.C., Modrák, M.: Bayesian workflow. arXiv:2011.01808 (2020)

  • Gurney, W., Blythe, S., Nisbet, R.: Nicholsons blowflies revisited. Nature 287, 17–21 (1980)

    Article  Google Scholar 

  • Izbicki, R., Lee, A.B.: Converting high-dimensional regression to high-dimensional conditional density estimation. Electron. J. Stat. 11, 2800–2831 (2017)

    Article  MathSciNet  Google Scholar 

  • Izbicki, R., Lee, A.B., Pospisil, T.: ABC–CDE: toward approximate Bayesian computation with complex high-dimensional data and limited simulations. J. Comput. Gr. Stat. (2019). (to appear)

  • Joyce, P., Marjoram, P.: Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7, 26 (2008). https://doi.org/10.2202/1544-6115.1389

    Article  MathSciNet  MATH  Google Scholar 

  • Klein, N., Nott, D.J., Smith, M.S.: Marginally calibrated deep distributional regression. J. Comput. Gr. Stat. 30(2), 467–483 (2021)

    Article  MathSciNet  Google Scholar 

  • Li, J., Nott, D.J., Fan, Y., Sisson, S.A.: Extending approximate Bayesian computation methods to high dimensions via Gaussian copula. Comput. Stat. Data Anal. 106, 77–89 (2017)

    Article  MathSciNet  Google Scholar 

  • Li, W., Fearnhead, P.: Convergence of regression-adjusted approximate Bayesian computation. Biometrika 105(2), 301–318 (2018)

    Article  MathSciNet  Google Scholar 

  • Marin, J.M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22(6), 1167–1180 (2012)

    Article  MathSciNet  Google Scholar 

  • Mayer, M.: missRanger: Fast Imputation of Missing Values. (2019). https://CRAN.R-project.org/package=missRanger, r package version 2.1.0

  • Meinshausen, N.: Quantile regression forests. J. Mach. Learn. Res. 7, 983–999 (2006)

    MathSciNet  MATH  Google Scholar 

  • Moritz, S., Bartz-Beielstein, T.: imputeTS: Time Series Missing Value Imputation in R. R J. 9(1), 207–218 (2017). https://doi.org/10.32614/RJ-2017-009

    Article  Google Scholar 

  • Nicholson, A.: An outline of the dynamics of animal populations. Aust. J. Zool. 2(1), 9–65 (1954)

    Article  Google Scholar 

  • Nott, D.J., Wang, X., Evans, M., Englert, B.G.: Checking for prior-data conflict using prior-to-posterior divergences. Stat. Sci. 35(2), 234–253 (2020)

    Article  MathSciNet  Google Scholar 

  • Papamakarios, G., Murray, I.: Fast \(\epsilon \)-free inference of simulation models with Bayesian conditional density estimation. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 1028–1036. Curran Associates Inc (2016)

  • Papamakarios, G., Sterratt, D., Murray, I.: Sequential neural likelihood: fast likelihood-free inference with autoregressive flows. In: Chaudhuri, K., Sugiyama, M. (eds.), Proceedings of Machine Learning Research, vol. 89, pp. 837–848 (2019)

  • Polson, N.G., Sokolov, V.: Deep learning: a Bayesian perspective. Bayesian Anal. 12(4), 1275–1304 (2017)

    Article  MathSciNet  Google Scholar 

  • Presanis, A.M., Ohlssen, D., Spiegelhalter, D.J., Angelis, D.D.: Conflict diagnostics in directed acyclic graphs, with applications in Bayesian evidence synthesis. Stat. Sci. 28, 376–397 (2013)

    Article  MathSciNet  Google Scholar 

  • Price, L.F., Drovandi, C.C., Lee, A.C., Nott, D.J.: Bayesian synthetic likelihood. J. Comput. Gr. Stat. 27(1), 1–11 (2018)

    Article  MathSciNet  Google Scholar 

  • Probst, P., Wright, M.N., Boulesteix, A.L.: Hyperparameters and tuning strategies for random forest. WIREs Data Min. Knowl. Discov. 9(3), e1301 (2019)

    Google Scholar 

  • Ratmann, O., Andrieu, C., Wiuf, C., Richardson, S.: Model criticism based on likelihood-free inference, with an application to protein network evolution. Proc. Natl. Acad. Sci. 106(26), 10576–10581 (2009)

    Article  Google Scholar 

  • Ratmann, O., Pudlo, P., Richardson, S., Robert, C.: Monte Carlo algorithms for model assessment via conflicting summaries. arXiv preprint arXiv:1106.5919 (2011)

  • Raynal, L., Marin, J.M., Pudlo, P., Ribatet, M., Robert, C.P., Estoup, A.: ABC random forests for Bayesian parameter inference. Bioinformatics 35(10), 1720–1728 (2018)

    Article  Google Scholar 

  • Rényi, A.: On measures of entropy and information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1: Contributions to the Theory of Statistics, University of California Press, Berkeley, Calif., pp 547–561 (1961)

  • Ricker, W.: Stock and recruitment. J. Fish. Res. Board Canada 11(5), 559–623 (1954)

    Article  Google Scholar 

  • Ridgway, J.: Probably approximate Bayesian computation: nonasymptotic convergence of ABC under misspecification. arXiv preprint arXiv:1707.05987 (2017)

  • Robnik-Šikonja, M., Kononenko, I.: Explaining classifications for individual instances. IEEE Trans. Knowl. Data Eng. 20(5), 589–600 (2008)

    Article  Google Scholar 

  • Rubin, D.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)

    Book  Google Scholar 

  • Sisson, S., Fan, Y., Beaumont, M. (eds) Handbook of Approximate Bayesian Computation. Chapman & Hall/CRC Handbooks of Modern Statistical Methods, CRC Press, Taylor & Francis Group, Boca Raton, Florida (2018a)

  • Sisson, S., Fan, Y., Beaumont, M.: Overview of Approximate Bayesian Computation. In: Sisson, S., Fan, Y., Beaumont, M. (eds.) Handbook of Approximate Bayesian Computation, Chapman & Hall/CRC Handbooks of Modern Statistical Methods, CRC Press. Taylor & Francis Group, Boca Raton, Florida (2018b)

    Google Scholar 

  • Thomas, O., Pesonen, H.: ao RSL, de Lencastre H, Kaski S, Corander J Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions. arXiv preprint arXiv:2002.09377 (2020)

  • Tong, H.: Threshold models in time series analysis—30 years on. Stat. Interface 4(2), 107–118 (2011)

    Article  MathSciNet  Google Scholar 

  • van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011)

    Article  Google Scholar 

  • Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12(2), 129–141 (2013)

    Article  MathSciNet  Google Scholar 

  • Wood, S.N.: Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466, 1102–1107 (2010)

    Article  Google Scholar 

  • Zhang, H., Nieto, F.H.: TAR: Bayesian Modeling of Autoregressive Threshold Time Series Models. (2017). https://CRAN.R-project.org/package=TAR, r package version 1.0

  • Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: Prediction difference analysis. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings, OpenReview.net, (2017). https://openreview.net/forum?id=BJ5UeU9xx

Download references

Acknowledgements

We thank the Editor, Associate Editor, and two referees for their comments which greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xueou Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Michael Evans was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.

Appendix

Appendix

1.1 Details of the SETAR auxiliary model for Sect. 5.3

For a time series \(X_t\), \(t=0,1,\dots T\), the SETAR model used to obtain summary statistics takes the form

$$\begin{aligned} X_t = \left\{ \begin{array}{ll} a_0+a_1X_{t-1}+a_2X_{t-2}+\epsilon _t, \text{ if } X_{t-1}<c \\ b_0+b_1X_{t-1}+b_2X_{t-2}+\epsilon _t, \text{ if } X_{t-1}\ge c \end{array}\right. , \end{aligned}$$

where \(\epsilon _t\sim N(0,\rho ^2)\) if \(X_{t-1}<c\) and \(\epsilon _t\sim N(0,\zeta ^2)\) if \(X_{t-1}\ge c\). Independence is assumed at different times for the noise sequence \(\epsilon _t\). The parameter c is a threshold parameter, and the dynamics of the process switches between two autoregressive components of order 2 depending on whether the threshold is exceeded. To obtain our summary statistics, we fit this model to the observed data, and fix c based on the observed data fit. With this fixed c the SETAR model is then fitted to any simulated data series d to obtain maximum likelihood estimates of the SETAR model parameters, which are the summary statistics denoted by \(S=S(d)\). We write \(S=(S_L^\top ,S_U^\top )^\top \), where \(S_L=(\widehat{a}_0,\widehat{a}_1,\widehat{a}_2,\widehat{\rho })^\top \) and \(S_U=(\widehat{b}_0,\widehat{b}_1,\widehat{b}_2,\widehat{\zeta })^\top \) are maximum likelihood estimates for the autoregressive component parameters for low and high levels, respectively. The SETAR models are fitted using the TAR package (Zhang and Nieto 2017) in R. In simulating data from the model there were some cases where there were no values of the series above the threshold c. Since the number of such cases was small, we simply discarded these simulations.

1.2 Details of diagnostic for the example of Sect. 5.4

We describe how we implement our diagnostic for the example of Sect. 5.4. Roughly speaking, all windows B of width k are considered including a time t, and then we average \(R_\infty (S_B|S_A)\) over B where \(S_B\) consists of the series values in B and \(S_A\) is the remaining values.

To make the method precise we need some further notation. Let \(d=\{d_i:1\le i\le T\}\) denote a time series of length T. For some subset C of the times, \(C\subseteq \{1,\dots , T\}\), we write \(d(C)=\{d_i:i\in C\}\) and \(d(-C)=\{d_i:i\notin C\}\). Let \(t\in \{1,\dots , T\}\) be a fixed time. Let \(W_t^K=\{C_1^{t,k},\dots , C_{n_{t,k}}^{t,k}\}\) denote the set of all windows of width k containing t of the form \(\{l,\dots , l+k-1\}\) for some l. For each \(j=1,\dots , n_{t,k}\) suppose that \(d^{t,k}_j\) is some time series of length T, and write \(d_.^{t,k}=(d_1^{t,k},\dots , d_{n_{t,k}}^{t,k})\). Write \(d_{obs}^{t,k}\) for the value of \(d_.^{t,k}\) where \(d_j^{t,k}=d_{\mathrm{obs}}\) for all \(j=1,\dots , n_{t,k}\) where \(d_{\mathrm{obs}}\) is the observed series. Let \(d_.^{t,k,*}\) denote the value of \(d_.^{t,k}\) where \(d_j^{t,k,*}(-C_j^{t,k})=d_{obs}(-C_j^{t,k})\) and \(d_j^{t,k,*}(C_j^{t,k})\sim p(d(C_j^{t,k})|d_{\mathrm{obs}} (-C_j^{t,k}))\), i.e., the observation for \(d_j^{t,k,*}\) in \(C_j^{t,k}\) is generated from the conditional prior predictive given the observed value for the remainder of the series. The draws \(d_j^{t,k,*}(C_j^{t,k})\) are independent for different j. Let

Fig. 10
figure 10

Imputation and conditional sampling of features for Example 5.4. Blue-shaded area is the window of length k which we impute in the series. Red-shaded area is the larger window to which a multivariate normal model is fitted based on the training set observations. Normal noise is added to a conditional mean imputation using the covariance matrix for the observations in the blue window, conditional on remaining observations in the red patch. (Color figure online)

Fig. 11
figure 11

Left-hand side: estimated posterior density of \(\eta \) using ABC random forests with different sets of summary statistics in Example 5.1, with imputation using missRanger. Right-hand side: observed maximum log relative belief statistic (vertical lines) and histogram of reference distribution values for imputations when a \(s^2\) is imputed from \(\bar{y}\) and b \(\bar{y}\) is imputed from \(s^2\) in Example 5.1, using missRanger imputation

$$\begin{aligned} R^{t,k}(d_.^{t,k}) = \frac{1}{n_{t,k}} \sum _{j=1}^{n_{t,k}} R_\infty (d_j^{t,k}(C_j^{t,k})|d_j^{t,k}(-C_j^{t,k})). \end{aligned}$$

We base our diagnostic on \(R^{t,k}(d_{\mathrm{obs}}^{t,k})\), calibrated by

$$\begin{aligned} p_t = P(R^{t,k}(d_.^{t,k,*})\ge R^{t,k}(d^{t,k}_{\mathrm{obs}})), \end{aligned}$$
(13)

and estimate (13) by

$$\begin{aligned} \widetilde{p}_t = \frac{1}{M^*} \sum _{i=1}^{M^*} I(R^{t,k}(d_.^{t,k,i})\ge R^{t,k}(d_{\mathrm{obs}}^{t,k})), \end{aligned}$$
(14)

where \(d_.^{t,k,i}\), \(i=1,\dots , M^*\) are approximations of draws of \(d_.^{t,k,*}\) based on imputation, i.e., we have imputed \(d_j^{t,k,*}(C_j^{t,k})\) from \(d_{\mathrm{obs}}(-C_j^{t,k})\) independently for each j and i.

1.3 Details of the imputation method for the example of Sect. 5.4

Figure 10 illustrates the idea behind the window-based imputation used in Sect. 5.4. We impute values of the series for a window of size k which has been deleted (blue region in the figure). A larger window around the one of width k is considered (red patch in the figure). A mean imputation using the na_interpolation function in the imputeTS R package (Moritz and Bartz-Beielstein 2017) with spline interpolation is obtained with default settings for tuning parameters. For multiple imputation, we add noise to the conditional mean by fitting a stationary Gaussian autoregressive model of order one to the observed series and then consider zero mean Gaussian noise, where the covariance matrix of the noise is the conditional covariance matrix of the autoregressive process in the blue region given the remaining observations in the red patch. Although the series values are counts, these counts are generally large and we treat them as continuous quantities in the imputation procedure.

1.4 Results for examples using different imputation methods

See Figs. 11, 12, and 13.

Fig. 12
figure 12

Top row: estimated marginal posterior densities using quantile regression forests and different summary statistics and imputations of summary statistic subsets in Example 5.2. a, b and c are for parameters \(\log \lambda \), \(\log \sigma \) and \(\xi \), respectively. Middle and bottom rows: observed maximum log relative belief statistic (vertical lines) and histogram of reference distribution values for imputations when \(S_U\) is imputed from \((N,S_L)\) (top row) and \(S_L\) is imputed from \((N,S_U)\) (bottom row). All imputation is done using the local linear MICE approach

Fig. 13
figure 13

Top row: estimated marginal posterior densities using quantile regression forests and different summary statistics and imputations of summary statistic subsets in Example 5.3. a, b and c are for parameters \(\log r\), \(\log \sigma \) and \(\log \phi \), respectively. Middle and bottom rows: observed maximum log relative belief statistic (vertical lines) and histogram of reference distribution values for imputations when \(S_U\) is imputed from \(S_L\) (top row) and \(S_L\) is imputed from \(S_U\) (bottom row). All imputation is done using the local linear MICE approach

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, Y., Wang, X., Nott, D.J. et al. Detecting conflicting summary statistics in likelihood-free inference. Stat Comput 31, 78 (2021). https://doi.org/10.1007/s11222-021-10053-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10053-3

Keywords

Navigation