A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

235 Accesses
1 Citation
Explore all metrics

Abstract

Identifying relevant variables among numerous potential predictors has been of primary interest in modern regression analysis. While stochastic search algorithms have surged as a dominant tool for Bayesian variable selection, when the number of potential predictors is large, their practicality is constantly challenged due to high computational cost as well as slow convergence. In this paper, we propose a new Bayesian variable selection scheme by using hybrid deterministic–deterministic variable selection (HD-DVS) algorithm that asymptotically ensures a rapid convergence to the global mode of the posterior model distribution. A key feature of HD-DVS is that it allows us to circumvent the iterative computation of inverse matrices, which is a common computational bottleneck in Bayesian variable selection. A simulation study is conducted to demonstrate that our proposed method outperforms existing Bayesian and frequentist methods. An analysis of the Bardet–Biedl syndrome gene expression data is presented to illustrate the applicability of HD-DVS to real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Article 06 June 2016

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Article Open access 21 March 2018

Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

Article 29 June 2023

References

Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679
Article MathSciNet Google Scholar
Barbieri MM, Berger JO (2004) Optimal predictive model selection. Ann Stat 32(3):870–897
Article MathSciNet Google Scholar
Bhattacharya A, Chakraborty A, Mallick BK (2016) Fast sampling with Gaussian scale mixture priors in high-dimensional regression. Biometrika 103:985–991
Article MathSciNet Google Scholar
Carvalho CM, Polson NG, Scott JG (2009) Handling sparsity via the horseshoe. In: Artificial intelligence and statistics. PMLR, pp 73–80
Carvalho CM, Polson NG, Scott JG (2010) The horseshoe estimator for sparse signals. Biometrika 97(2):465–480
Article MathSciNet Google Scholar
Casella G, Moreno E (2006) Objective Bayesian variable selection. J Am Stat Assoc 101(473):157–167
Article MathSciNet Google Scholar
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771
Article MathSciNet Google Scholar
Cibis H, Biyanee A, Dörner W, Mootz HD, Klempnauer KH (2020) Characterization of the zinc finger proteins ZMYM2 and ZMYM4 as novel B-MYB binding proteins. Sci Rep 10(1):8390
Article Google Scholar
Deng HX, Shi Y, Yang Y, Ahmeti KB, Miller N, Huang C, Cheng L, Zhai H, Deng S, Nuytemans K et al (2016) Identification of TMEM230 mutations in familial Parkinson’s disease. Nat Genet 48(7):733–739
Article Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Article MathSciNet Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Article Google Scholar
Hans C, Dobra A, West M (2007) Shotgun stochastic search for large p regression. J Am Stat Assoc 102(478):507–516
Article MathSciNet Google Scholar
Hindmarch C, Fry M, Yao ST, Smith PM, Murphy D, Ferguson AV (2008) Microarray analysis of the transcriptome of the subfornical organ in the rat: regulation by fluid and food deprivation. Am J Physiol Regul Integr Comp Physiol 295(6):R1914–R1920
Article Google Scholar
Jin S, Goh G (2021) Bayesian selection of best subsets via hybrid search. Comput Stat 36(3):1991–2007
Article MathSciNet Google Scholar
Johndrow J, Orenstein P, Bhattacharya A (2020) Scalable approximate MCMC algorithms for the horseshoe prior. J Mach Learn Res 21(73):1–61
MathSciNet Google Scholar
Kass RE, Wasserman L (1995) A reference Bayesian test for nested hypotheses and its relationship to the Shwarz criterion. J Am Stat Assoc 90(431):928–934
Article Google Scholar
Koslovsky M, Swartz MD, Leon-Novelo L, Chan W, Wilkinson AV (2018) Using the EM algorithm for Bayesian variable selection in logistic regression models with related covariates. J Stat Comput Simul 88(3):575–596
Article MathSciNet Google Scholar
Lu TT, Shiou SH (2002) Inverses of 2$\times$ 2 block matrices. Comput Math Appl 43(1–2):119–129
Article MathSciNet Google Scholar
Moreno E, Girón J, Casella G (2015) Posterior model consistency in variable selection as the model dimension grows. Stat Sci 30(2):228–241
Article MathSciNet Google Scholar
Narisetty NN, Shen J, He X (2018) Skinny Gibbs: a consistent and scalable Gibbs sampler for model selection. J Am Stat Assoc 114(527):1205–1217
Article MathSciNet Google Scholar
Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686
Article MathSciNet Google Scholar
Raftery AE, Madigan D, Hoeting JA (1997) Bayesian model averaging for linear regression models. J Am Stat Assoc 92(437):179–191
Article MathSciNet Google Scholar
Ročková V, George EI (2014) EMVS: the EM approach to Bayesian variable selection. J Am Stat Assoc 109(506):828–846
Article MathSciNet Google Scholar
Ročková V, George EI (2018) The spike-and-slab lasso. J Am Stat Assoc 113(521):431–444
Article MathSciNet Google Scholar
Rocková V, Moran G (2021) EMVS Vignette
Scheetz TE, Kim KYA, Swiderski RE, Philp AR, Braun TA, Knudtson KL, Dorrance AM, DiBona GF, Huang J, Casavant TL, Sheffield VC, Stone EM (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci 103(39):14429–14434
Article Google Scholar
Scott JG, Berger JO (2010) Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann Stat 38:2587–2619
Article MathSciNet Google Scholar
Tadesse MG, Vannucci M (2021) Handbook of Bayesian variable selection. CRC Press, Boca Raton
Book Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet Google Scholar
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524
Article MathSciNet Google Scholar
Yang Y, Wainwright MJ, Jordan MI (2016) On the computational complexity of high-dimensional Bayesian variable selection. Ann Stat 44(6):2497–2532
Article MathSciNet Google Scholar
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Goel PK, Zellner A (eds) Bayesian inference and decision techniques. Elsevier, New York, pp 233–243
Google Scholar
Zhang Z (2014) The matrix ridge approximation: algorithms and applications. Mach Learn 97(3):227–258
Article MathSciNet Google Scholar
Zhao K, Lian H (2016) The expectation–maximization approach for Bayesian quantile regression. Comput Stat Data Anal 96:1–11
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Kansas State University, 1116 Mid-Campus Drive N., Manhattan, KS, 66506-0802, USA
Jieun Lee & Gyuhyeong Goh

Authors

Jieun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Gyuhyeong Goh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gyuhyeong Goh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 3.1

Let $\Sigma _{X_{\gamma },y}$, $\Sigma _{X_\gamma ,X_{\gamma }}$ and $\Sigma _{y,y}$ denote the probability limits of $n^{-1} {\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}$, $n^{-1} {\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma$ and $n^{-1}{\textbf{y}}^{\textrm{T}}{\textbf{y}}$, respectively. Note that

$$\begin{aligned} \frac{1}{n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}= (n^{-1}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma )(n^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}(n^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}), \end{aligned}$$

which converges to $\Sigma _{X_{\gamma },y}^{{\textrm{T}}} \Sigma _{X_\gamma ,X_{\gamma }}^{-1} \Sigma _{X_{\gamma },y}$ in probability as $n\rightarrow \infty$. This implies that

$$\begin{aligned} \frac{1}{n}{\textbf{y}}^{\textrm{T}}{\textbf{P}}_\gamma ^{\perp } {\textbf{y}}\rightarrow \Sigma _{y,y}-\Sigma _{X_{\gamma },y}^{{\textrm{T}}} \Sigma _{X_\gamma ,X_{\gamma }}^{-1} \Sigma _{X_{\gamma },y} \end{aligned}$$

in probability as $n\rightarrow \infty$, where ${\textbf{P}}_\gamma ^\perp = {\textbf{I}}_n-{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1} {\textbf{X}}_\gamma ^{\textrm{T}}$. Hence, we can write

$$\begin{aligned} {\textbf{y}}^{\textrm{T}}{\textbf{y}}-\frac{n}{1+n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}&={\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}+ \frac{1}{1+n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}\\&={\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\left( 1 + \frac{1}{1+n}\frac{{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}}{{\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}}\right) \\&= {\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\{1 + O_p(n^{-1}) \}. \end{aligned}$$

It follows that

$$\begin{aligned} n\log \left\{ {\textbf{y}}^{\textrm{T}}{\textbf{y}}-\frac{n}{1+n}{\textbf{y}}^{\textrm{T}}{\textbf{X}}_\gamma ({\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{X}}_\gamma )^{-1}{\textbf{X}}_\gamma ^{\textrm{T}}{\textbf{y}}\right\} = n \log ({\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}) + O_p(1). \end{aligned}$$

(A1)

Also, note that

$$\begin{aligned} p_\gamma \log (n+1) = p_\gamma \log n + p_\gamma \log \left( \frac{n+1}{n}\right) =p_\gamma \log n + o(1). \end{aligned}$$

(A2)

Using (A1) and (A2), we can write $D(\gamma )$ as

$$\begin{aligned} D(\gamma )&=n \log \left( {\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\right) +p_\gamma \log n+ 2 \log \left( {\begin{array}{c}p\\ p_\gamma \end{array}}\right) +O_p(1)\nonumber \\&=\left\{ n \log \left( {\textbf{y}}^{\textrm{T}}{\textbf{P}}^{\perp }_\gamma {\textbf{y}}\right) +p_\gamma \log n+ 2 \log \left( {\begin{array}{c}p\\ p_\gamma \end{array}}\right) \right\} \left\{ 1+o_p(1)\right\} \nonumber \\&=\text {EBIC}(\gamma )\{1+o_p(1)\}, \end{aligned}$$

(A3)

where $\text {EBIC}(\cdot )$ denotes the extended Bayesian information criterion (Chen and Chen 2008). Note that Theorem 1 of Chen and Chen (2008) implies that

$$\begin{aligned} \text {EBIC}(\gamma _{\text {HPM}})< \text {EBIC}(\gamma _1) < \text {EBIC}(\gamma _2) \end{aligned}$$

in probability as $n\rightarrow \infty$ for any $\gamma _1$ and $\gamma _2$ such that (a) $\gamma _{\text {HPM}}\subsetneq \gamma _1 \subsetneq \gamma _2$ or (b) $\gamma _{\text {HPM}}\subset \gamma _1$ and $\gamma _{\text {HPM}} \not \subset \gamma _2$. By the asymptotic equivalence in A3, we therefore obtain the results of our Theorem 1.

Appendix B: Proof of Theorem 3.2

Suppose that, in Step 1, the HD-DVS algorithm (Algorithm 1) visits $\tilde{\gamma }$ such that $\tilde{\gamma }\supset \gamma _{\text {HPM}}$. Then, by Theorem 1(a), the probability that the HD-DVS algorithm converges to $\gamma _{\text {HPM}}$ goes to one as $n\rightarrow \infty$.

Suppose that, in Step 1, the HD-DVS algorithm never visits $\tilde{\gamma }$ such that $\tilde{\gamma }\supset \gamma _{\text {HPM}}$. In this case, by Theorem 2 of Wang (2009) and A3, the probability that Step 2 of the HD-DVS algorithm converges to $\tilde{\gamma }_+(\supset \gamma _\text {HPM})$ goes to one as $n\rightarrow \infty$. Then, the algorithm goes back to Step 1 with the initial value $\hat{\gamma }=\tilde{\gamma }_+$. Therefore, this time the algorithm converges to $\gamma _\text {HPM}$ in probability. This completes our proof.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lee, J., Goh, G. A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior. Comput Stat 39, 1659–1681 (2024). https://doi.org/10.1007/s00180-023-01368-y

Download citation

Received: 16 October 2022
Accepted: 15 May 2023
Published: 27 May 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00180-023-01368-y

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Theorem 3.1

Appendix B: Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Stable Variable Selection for High-Dimensional Genomic Data with Strong Correlations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Theorem 3.1

Appendix B: Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation