Abstract
In this study, we investigate a penalty-based two-stage least square estimator in regression models when the exploratory variables are correlated with the error term. We propose a two-stage Bridge estimator to overcome this endogeneity problem in high-dimensional data. Our proposed estimator enjoys remarkable statistical properties such as consistency and asymptotic normality. As special cases, this method deals some ill-condition situations such as the multicollinearity as well as the sparsity. Performance of the proposed estimators is demonstrated by simulation studies and it is compared to the existing estimators. An application in real data set is presented for illustration.
Similar content being viewed by others
Notes
We use the R software and all codes are available upon request.
References
Anderson TW (2005) Origins of the limited information maximum likelihood and two-stage least squares estimators. J Econom 127(1):1–16
Belloni A, Chernozhukov V (2013) Least squares after model selection in high-dimensional sparse models. Bernoulli 19(2):521–547
Belloni A, Chernozhukov V, Chetverikov D, Hansen CB, Kato K (2018) High-dimensional econometrics and regularized GMM, arXiv preprint, arxiv:1806.01888
Bowden R, Turkington D (1984) Instrumental variables. Cambridge University Press, New York
Burgess S, Small DS (2016) Predicting the direction of causal effect based on an instrumental variable analysis: a cautionary tale. J Causal Infer 4(1):49–59
Burgess S, Small DS, Thompson SG (2017) A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 26(5):2333–2355
Chicco D, Tötsch N, Jurman G (2021) The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining 14(1):1–22
Didelez V, Sheehan N (2007) Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res 16(4):309–330
Durbin J (1954) Errors in variables. Revue de l’institut Int de Stat 1:23–32
Ebbes P (2004) Latent instrumental variables—A new approach to solve for endogeneity. University of Groningen Economics and Business, Netherlands
Efron B, Tibshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92(438):548–560
Fan J, Li R (2001) Variable selection via non concave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Fan J, Liao Y (2014) Endogeneity in high dimensions. Ann Stat 42(3):872
Ferguson TS (2017) A course in large sample theory. Routledge, UK
Frank LE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):109–135
Friedman J, Hastie T, Tibshirani R (2008) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Fu W, Knight K (2000) Asymptotic for LASSO-type estimators. Ann Stat 28(5):1356–1378
Gao X, Ahmed SE, Feng Y (2017) Post selection shrinkage estimation for high-dimensional data analysis. Appl Stoch Model Bus Ind 33(2):97–120
Gautier E, Tsybakov AB (2018) High-dimensional instrumental variables regression and confidence sets, arXiv preprint, arxiv:1105.2454
Guo Z, Kang H, Cai TT, Small DS (2016) Testing endogeneity with possibly invalid instruments and high dimensional covariates. arXiv preprint arXiv:1609.06713
Hausman J (1978) Specification tests in econometrics. Econometrica 46(6):1251–1271
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Hunter D, Li R (2005) Variable selection using mm algorithms. Ann Stat 33:1617–1642 (MR2166557)
Lawlor DA, Harbord RM, Sterne JA, Timpson N, Smith GD (2008) Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 27(8):1133–1163
Lin W, Feng R, Li H (2015) Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. J Am Stat Assoc 110(509):270–288
Liu Z (2017) Statistical models to predict popularity of news articles on social networks
Liu H, Yu B (2013) Asymptotic properties of LASSO+ mLS and LASSO+ ridge in sparse high-dimensional linear regression. Electron J Stat 7:3124–3169
Liu XQ, Gao F, Yu ZF (2013) Improved Ridge estimators in a linear regression model. J Appl Stat 40(1):209–220
Lukman AF, Ayinde K, Binuomote S, Onate AC (2019) Modified Ridge-type estimator to combat multicollinearity: application to chemical data. J Chemom 33(5):e3125
Lukman AF, Ayinde K, Siok Kun S, Adewuyi ET (2019) A modified new two-parameter estimator in a linear regression model. Modell Simul Eng 2019:6342702
Mesiar R, Sheikhi A (2021) Nonlinear random forest classification, a copula-based approach. Appl Sci 11(15):7140
Okbay A et al (2016) Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533(7604):539–542
Rietveld CA et al (2013) GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340:1467–1471
Sheikhi A, Bahador F, Arashi M (2020) On a generalization of the test of endogeneity in a two stage least squares estimation. J Appl Stat 49(3):709–721
Sonnega A, Faul JD, Ofstedal MB, Langa KM, Phillips JW, Weir DR (2014) Cohort profile: the health and retirement study (HRS). Int J Epidemiol 43(2):576–585
Tibshirani T (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B 58(1):267–288
WooldRidge JM (2016) Introductory econometrics: a modern approach, 6th edn. Cengage Learning, Boston
Wu DM (1973) Alternative tests of independence between stochastic regressors and disturbances. J Economet 41:733
Xu X, Li X, Zhang J (2020) Regularization methods for high-dimensional sparse control function models. J Stat Plann Inferf 206:111–126
Yüzbası B, Arashi M, Ejaz Ahmed S (2020) Shrinkage estimation strategies in generalised ridge regression models: low/high-dimension regime. Int Stat Rev Apr 88(1):229–51
Zhang CH (2007) Penalized linear unbiased selection department of statistics and bioinformatics. Rutgers Univ 3(2010):894–942
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bahador, F., Sheikhi, A. & Arabpour, A. A two-stage Bridge estimator for regression models with endogeneity based on control function method. Comput Stat 39, 1351–1370 (2024). https://doi.org/10.1007/s00180-023-01379-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-023-01379-9