[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Identifying relevant variables among numerous potential predictors has been of primary interest in modern regression analysis. While stochastic search algorithms have surged as a dominant tool for Bayesian variable selection, when the number of potential predictors is large, their practicality is constantly challenged due to high computational cost as well as slow convergence. In this paper, we propose a new Bayesian variable selection scheme by using hybrid deterministic–deterministic variable selection (HD-DVS) algorithm that asymptotically ensures a rapid convergence to the global mode of the posterior model distribution. A key feature of HD-DVS is that it allows us to circumvent the iterative computation of inverse matrices, which is a common computational bottleneck in Bayesian variable selection. A simulation study is conducted to demonstrate that our proposed method outperforms existing Bayesian and frequentist methods. An analysis of the Bardet–Biedl syndrome gene expression data is presented to illustrate the applicability of HD-DVS to real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679

    Article  MathSciNet  Google Scholar 

  • Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32(3):870–897

    Article  MathSciNet  Google Scholar 

  • Bhattacharya A, Chakraborty A, Mallick BK (2016) Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika 103:985–991

    Article  MathSciNet  Google Scholar 

  • Carvalho CM, Polson NG, Scott JG (2009) Handling sparsity via the horseshoe. In: Artificial intelligence and statistics. PMLR, pp 73–80

  • Carvalho CM, Polson NG, Scott JG (2010) The horseshoe estimator for sparse signals. Biometrika 97(2):465–480

    Article  MathSciNet  Google Scholar 

  • Casella G, Moreno E (2006) Objective Bayesian variable selection. J Am Stat Assoc 101(473):157–167

    Article  MathSciNet  Google Scholar 

  • Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771

    Article  MathSciNet  Google Scholar 

  • Cibis H, Biyanee A, Dörner W, Mootz HD, Klempnauer KH (2020) Characterization of the zinc finger proteins ZMYM2 and ZMYM4 as novel B-MYB binding proteins. Sci Rep 10(1):8390

    Article  Google Scholar 

  • Deng HX, Shi Y, Yang Y, Ahmeti KB, Miller N, Huang C, Cheng L, Zhai H, Deng S, Nuytemans K et al (2016) Identification of TMEM230 mutations in familial Parkinson’s disease. Nat Genet 48(7):733–739

    Article  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  • George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889

    Article  Google Scholar 

  • Hans C, Dobra A, West M (2007) Shotgun stochastic search for large p regression. J Am Stat Assoc 102(478):507–516

    Article  MathSciNet  Google Scholar 

  • Hindmarch C, Fry M, Yao ST, Smith PM, Murphy D, Ferguson AV (2008) Microarray analysis of the transcriptome of the subfornical organ in the rat: regulation by fluid and food deprivation. Am J Physiol Regul Integr Comp Physiol 295(6):R1914–R1920

    Article  Google Scholar 

  • Jin S, Goh G (2021) Bayesian selection of best subsets via hybrid search. Comput Stat 36(3):1991–2007

    Article  MathSciNet  Google Scholar 

  • Johndrow J, Orenstein P, Bhattacharya A (2020) Scalable approximate MCMC algorithms for the horseshoe prior. J Mach Learn Res 21(73):1–61

    MathSciNet  Google Scholar 

  • Kass RE, Wasserman L (1995) A reference Bayesian test for nested hypotheses and its relationship to the Shwarz criterion. J Am Stat Assoc 90(431):928–934

    Article  Google Scholar 

  • Koslovsky M, Swartz MD, Leon-Novelo L, Chan W, Wilkinson AV (2018) Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. J Stat Comput Simul 88(3):575–596

    Article  MathSciNet  Google Scholar 

  • Lu TT, Shiou SH (2002) Inverses of 2\(\times\) 2 block matrices. Comput Math Appl 43(1–2):119–129

    Article  MathSciNet  Google Scholar 

  • Moreno E, Girón J, Casella G (2015) Posterior model consistency in variable selection as the model dimension grows. Stat Sci 30(2):228–241

    Article  MathSciNet  Google Scholar 

  • Narisetty NN, Shen J, He X (2018) Skinny Gibbs: a consistent and scalable Gibbs sampler for model selection. J Am Stat Assoc 114(527):1205–1217

    Article  MathSciNet  Google Scholar 

  • Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686

    Article  MathSciNet  Google Scholar 

  • Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191

    Article  MathSciNet  Google Scholar 

  • Ročková V, George EI (2014) EMVS: the EM approach to Bayesian variable selection. J Am Stat Assoc 109(506):828–846

    Article  MathSciNet  Google Scholar 

  • Ročková V, George EI (2018) The spike-and-slab lasso. J Am Stat Assoc 113(521):431–444

    Article  MathSciNet  Google Scholar 

  • Rocková V, Moran G (2021) EMVS Vignette

  • Scheetz TE, Kim KYA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci 103(39):14429–14434

    Article  Google Scholar 

  • Scott JG, Berger JO (2010) Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann Stat 38:2587–2619

    Article  MathSciNet  Google Scholar 

  • Tadesse MG, Vannucci M (2021) Handbook of Bayesian variable selection. CRC Press, Boca Raton

    Book  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  Google Scholar 

  • Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524

    Article  MathSciNet  Google Scholar 

  • Yang Y, Wainwright MJ, Jordan MI (2016) On the computational complexity of high-dimensional Bayesian variable selection. Ann Stat 44(6):2497–2532

    Article  MathSciNet  Google Scholar 

  • Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel PK, Zellner A (eds) Bayesian inference and decision techniques. Elsevier, New York, pp 233–243

    Google Scholar 

  • Zhang Z (2014) The matrix ridge approximation: algorithms and applications. Mach Learn 97(3):227–258

    Article  MathSciNet  Google Scholar 

  • Zhao K, Lian H (2016) The expectation–maximization approach for Bayesian quantile regression. Comput Stat Data Anal 96:1–11

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gyuhyeong Goh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 3.1

Let \(\Sigma _{X_{\gamma },y}\), \(\Sigma _{X_\gamma ,X_{\gamma }}\) and \(\Sigma _{y,y}\) denote the probability limits of \(n^{-1} {\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}\), \(n^{-1} {\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma\) and \(n^{-1}{\textbf{y}}^{\textrm{T}}{\textbf{y}}\), respectively. Note that

$$\begin{aligned} \frac{1}{n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}= (n^{-1}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma )(n^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}(n^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}), \end{aligned}$$

which converges to \(\Sigma _{X_{\gamma },y}^{{\textrm{T}}} \Sigma _{X_\gamma ,X_{\gamma }}^{-1} \Sigma _{X_{\gamma },y}\) in probability as \(n\rightarrow \infty\). This implies that

$$\begin{aligned} \frac{1}{n}{\textbf{y}}^{\textrm{T}}{\textbf{P}}_\gamma ^{\perp } {\textbf{y}}\rightarrow \Sigma _{y,y}-\Sigma _{X_{\gamma },y}^{{\textrm{T}}} \Sigma _{X_\gamma ,X_{\gamma }}^{-1} \Sigma _{X_{\gamma },y} \end{aligned}$$

in probability as \(n\rightarrow \infty\), where \({\textbf{P}}_\gamma ^\perp = {\textbf{I}}_n-{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1} {\textbf{X}}_\gamma ^{\textrm{T}}\). Hence, we can write

$$\begin{aligned} {\textbf{y}}^{\textrm{T}}{\textbf{y}}-\frac{n}{1+n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}&={\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}+ \frac{1}{1+n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}\\&={\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\left( 1 + \frac{1}{1+n}\frac{{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}}{{\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}}\right) \\&= {\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\{1 + O_p(n^{-1}) \}. \end{aligned}$$

It follows that

$$\begin{aligned} n\log \left\{ {\textbf{y}}^{\textrm{T}}{\textbf{y}}-\frac{n}{1+n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}\right\} = n \log ({\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}) + O_p(1). \end{aligned}$$
(A1)

Also, note that

$$\begin{aligned} p_\gamma \log (n+1) = p_\gamma \log n + p_\gamma \log \left( \frac{n+1}{n}\right) =p_\gamma \log n + o(1). \end{aligned}$$
(A2)

Using (A1) and (A2), we can write \(D(\gamma )\) as

$$\begin{aligned} D(\gamma )&=n \log \left( {\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\right) +p_\gamma \log n+ 2 \log \left( {\begin{array}{c}p\\ p_\gamma \end{array}}\right) +O_p(1)\nonumber \\&=\left\{ n \log \left( {\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\right) +p_\gamma \log n+ 2 \log \left( {\begin{array}{c}p\\ p_\gamma \end{array}}\right) \right\} \left\{ 1+o_p(1)\right\} \nonumber \\&=\text {EBIC}(\gamma )\{1+o_p(1)\}, \end{aligned}$$
(A3)

where \(\text {EBIC}(\cdot )\) denotes the extended Bayesian information criterion (Chen and Chen 2008). Note that Theorem 1 of Chen and Chen (2008) implies that

$$\begin{aligned} \text {EBIC}(\gamma _{\text {HPM}})< \text {EBIC}(\gamma _1) < \text {EBIC}(\gamma _2) \end{aligned}$$

in probability as \(n\rightarrow \infty\) for any \(\gamma _1\) and \(\gamma _2\) such that (a) \(\gamma _{\text {HPM}}\subsetneq \gamma _1 \subsetneq \gamma _2\) or (b) \(\gamma _{\text {HPM}}\subset \gamma _1\) and \(\gamma _{\text {HPM}} \not \subset \gamma _2\). By the asymptotic equivalence in A3, we therefore obtain the results of our Theorem 1.

Appendix B: Proof of Theorem 3.2

Suppose that, in Step 1, the HD-DVS algorithm (Algorithm 1) visits \(\tilde{\gamma }\) such that \(\tilde{\gamma }\supset \gamma _{\text {HPM}}\). Then, by Theorem 1(a), the probability that the HD-DVS algorithm converges to \(\gamma _{\text {HPM}}\) goes to one as \(n\rightarrow \infty\).

Suppose that, in Step 1, the HD-DVS algorithm never visits \(\tilde{\gamma }\) such that \(\tilde{\gamma }\supset \gamma _{\text {HPM}}\). In this case, by Theorem 2 of Wang (2009) and A3, the probability that Step 2 of the HD-DVS algorithm converges to \(\tilde{\gamma }_+(\supset \gamma _\text {HPM})\) goes to one as \(n\rightarrow \infty\). Then, the algorithm goes back to Step 1 with the initial value \(\hat{\gamma }=\tilde{\gamma }_+\). Therefore, this time the algorithm converges to \(\gamma _\text {HPM}\) in probability. This completes our proof.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, J., Goh, G. A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior. Comput Stat 39, 1659–1681 (2024). https://doi.org/10.1007/s00180-023-01368-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01368-y

Keywords

Navigation