Abstract
Variables selection and parameter estimation are of great significance in all regression analysis. A variety of approaches have been proposed to tackle this problem. Among those, the penalty-based shrinkage approach has been most popular for the ability to carry out the variable selection and parameter estimation simultaneously. However, not much work is available on the variable selection for the generalized partially models (GPLMs) with longitudinal data. In this paper, we proposed a variable selection procedure for GPLMs with longitudinal data. The inference is based on the SCAD-penalized quadratic inference functions, which is obtained after the B-spline approximating to non-parametric function in the model. The proposed approach efficiently utilized the within-cluster correlation information, which can improve estimating efficiency. The proposed approach also has the virtue of low computational cost. With the tuning parameter chosen by BIC, the correct model is identified with probability tends to 1. The resulted estimator of the parametric component is asymptotic to a normal distribution, and that of the non-parametric function achieves the optimal convergence rate. The performance of the proposed methods is evaluated through extensive simulation studies. A real data analysis shows that the proposed approach succeeds in excluding the insignificant variable.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Breiman L (1995) Better subset selection using nonnegative garrote. Techonometrics 37:373–384
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc Ser B 58:267–288
Fu WJ (1998) Penalized regression: the bridge versus the LASSO. J Comput Graph Stat 7:397–416
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Zhou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Wang L, Li H, Huang JZ (2008) Variable selection in non-parametric varying coefficient models for analysis of repeated measurements. J Am Stat Assoc 103:1556–1569
Xue L, Qu A, Zhou J (2010) Consistent model selection in marginal generalized additive models for correlated data. J Am Stat Assoc 105:1518–1530
Tian RQ, Xue LG, Liu CL (2014) Penalized quadratic functions for semiparametric varying coefficient partially linear models with longitudinal data. J Multivar Anal 132:94–110
Fan J, Li R (2004) New estimation and model selection procedure for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723
Fan J, Zhang W (2008) Statistical methods with varying coefficient models. Stat Interface 1:179–195
Zhao PX, Xue LG (2009) Variable selection for semi-parametric varying coefficient partially linear models. Stat Probab Lett 79:2148–2157
Wang L, Xue L, Qu A, Liang H (2014) Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann Stat 42:592–694
Liang KL, Zeger SL (1986) Longitudinal data analysis using generalized estimating equations. Biometrika 73:13–22
Qu A, Lindsay BG, Li B (2000) Improving generalized estimating equations using quadratic inference functions. Biometrika 87:823–836
Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50(4):1029–1054
Qu A, Li R (2006) Quadratic inference functions for varying coefficent models with longitudinal data. Biometrics 62:379–391
Bai Y, Zhu ZY, Fung WK (2008) Partially linear models for longitudinal data based on quadratic inference functions. Scand J Stat 35:104–118
Zhang JH, Xue LG (2017) Quadratic inference functions for generalized partially models with longitudinal data. Chin J Appl Probab Stat 33:417–432
Bai Y, Fung WK, Zhu ZY (2009) Penalized quadratic inference functions for single-index models with longitudinal data. J Multivar Anal 100:152–161
Cho H, Qu A (2013) Model selection for correlated data with diverging number of parameters. Stat Sin 23:901–927
Lin XH, Carroll RJ (2001) Non-parametric function estimation for clustered data when the predictor is measured without/with error. J Am Stat Assoc 95:520–534
Lin XH, Carroll RJ (2001) Semiparametric regression for clustered data with generalized estimating equations. J Am Stat Assoc 96:1045–1056
He XM, Fung WK, Zhu ZY (2005) Robust estimation in a generalized partially linear model for cluster data. J Am Stat Assoc 34:391–410
Qin GY, Bai Y, Zhu ZY (2012) Robust empirical likelihood inference for generalized partially linear models with longitudinal data. J Multivar Anal 105:32–44
Qu A, Song XK (2004) Assessing robustness of generalized estimating equations and quadratic inference functions. Biometrika 91:447–459
Schumaker G (1981) Spline function. Wiley, New York
Wang H, Li R, Tsai C (2007) Tuning parameter selection for the smoothly clipped absolute deviation method. Biometrika 94:553–556
Wang HS, Xia YC (2009) Shrinkage estimator of the varying coefficient model. J Am Stat Assoc 104:747–757
Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36:261–286
Oman SD (2009) Easily simulated multivariate binary distributions with given positive and negative correlations. Comput Stat Data Anal 53(4):999–1005
Zeger SL, Karim MR (2001) Generalized linear models with random effects: a Gibbs sampling approach. J Am Stat Assoc 86:79–86
Diggle PJ, Liang KY, Zeger SL (1994) Analysis of longitudinal data. Oxford University Press, Oxford
Chang XJ, Ma ZG, Yang Y, Zeng ZQ, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans on Cybern 47(5):1180–1197
Galiautdinov R (2020) The math model of drone behavior in the hive, providing algorithmic architecture. Int J Softw Sci Comput Intell 12(2):15–33
Zhang L (2019) Evaluating the effects of size and precision of training data on ANN training performance for the prediction of chaotic time series patterns. Int J Softw Sci Comput Intell 11(1):16–30
Acknowledgements
The research is funded by the National Natural Science Foundation of China (11571025) and the Beijing Natural Science Foundation (1182008).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs of the main results
Appendix: Proofs of the main results
For convenience and simplicity, let \(C\) denote a positive constant that may have different values at each appearance throughout this paper and \(\parallel A\parallel\) denote the modulus of the largest singular value of matrix or vector A.
Let \(\eta_{ij} = X_{ij}^{T} \beta + W_{ij}^{T} \gamma\), Then \(\mu_{ij} = h\left( {\eta_{ij} } \right)\). Let \(\eta_{i} = (\eta_{i1} , \ldots ,\eta_{im} )^{T} ,\mu_{i} = (\mu_{i1} , \ldots ,\mu_{im} )^{T}\), and \(\theta = (\beta^{T} ,\gamma^{T} )^{T} ,Y_{i} = (Y_{i1} , \ldots ,Y_{im} )^{T} ,X_{i} = (X_{i1} , \ldots ,X_{im} )^{T}\).
Similarly, let \(P_{ij} = (X_{ij}^{T} ,W_{ij} )^{T}\), and \(W_{i} = \left( {W_{i1} , \ldots ,W_{im} } \right))^{T} ,P_{i} = (P_{i1} , \ldots ,P_{im} )^{T} = \left( {X_{i} ,W\left( {U_{i} } \right)} \right)\), then \(\eta_{ij} = P_{ij}^{T} \theta ,\eta_{i} = P_{i} \theta\), and \(\frac{{\partial \eta_{ij} }}{\partial \theta } = P_{ij} ,\frac{{\partial \eta_{i} }}{\partial \theta } = P_{i}^{T}\).
Let \(h^{\prime} \left( t \right) = \frac{dh\left( t \right)}{dt}\), then \(\frac{{\partial \mu_{ij} }}{\partial \theta } = h^{\prime} \left( {\eta_{ij} } \right)P_{ij}\). Let
then
Proof of Theorem 1
Let \(\delta = n^{{ - \frac{1}{2}}} ,\beta = \beta_{0} + \delta D_{1} ,\gamma = \gamma_{0} + \delta D_{2}\) and \(D = (D_{1}^{T} ,D_{2}^{T} )^{T}\). we first show that for any given \(\varepsilon > 0\), there exists a large constant C such that
Note that \(\beta_{0l} = 0\), for all \(l = P_{1} + 1, \cdots ,p\) and \(\gamma_{0k} = 0\), for all \(k = q_{1} , \cdots ,q\), together with assumption (A1) and \(p_{\lambda } \left( 0 \right) = 0\), we have,
By Taylor expansion and assumption (A4), we have
Invoking the proof of Theorem 2 in [8],
By choosing a sufficient large \(C\), \(I_{1}\) dominates \(I_{2}\). Similarly, \(I_{1}\) dominates \(I_{3}\) for a sufficient large C. Thus (9.1) holds. i.e., with probability at least \(1 - \varepsilon\), there exists a local minimizer \(\hat{\theta }\) satisfies that \(\parallel \hat{\theta } - \theta_{0} \parallel = O_{p} \left( \delta \right)\). Therefore \(\parallel \hat{\gamma } - \gamma_{0} \parallel = O_{p} \left( {n^{ - 1/2} } \right)\) and \(\parallel \hat{\beta } - \beta_{0} \parallel = O_{p} \left( {n^{ - 1/2} } \right)\). Follow the proof of Theorem 1 of [8], we have
Thus, we complete the proof of Theorem 1.
Proof of Theorem 2
According to Theorem 1, in order to proof the first part of Theorem 2, we need only to proof that, for any \(\gamma\) satisfies \(\parallel \gamma - \gamma_{0} \parallel = O_{p} \left( {n^{ - 1/2} } \right)\) and for any \(\beta_{l}\) satisfies \(\parallel \beta_{l} - \beta_{0l} \parallel = O_{p} \left( {n^{ - 1/2} } \right),l = 1, \ldots ,p_{1}\), there exists a certain \(\epsilon = Cn^{- 1/2}\) satisfies that, as \(n \to \infty\), with probability tending to 1,
and
These imply that the PQIF \({\mathcal{Q}}_{n}^{p} \left( {\beta ,\gamma } \right)\) reaches its minimum at \(\beta_{l} = 0,l = p_{1} + 1, \ldots ,p\).
Following Lemma 3 and 4 of [18], we have
According to (3.7), the expression of the derivative of SCAD penalized function, it easy to see that \({ \lim }_{n \to \infty } { \liminf }_{{\beta_{l} \to 0}} \lambda^{ - 1} p'_{\lambda } \left( {\left| {\beta_{l} } \right|} \right) = 1\). Together with the Assumption (A10), \(\lambda n^{1/2} > \lambda_{ \hbox{min} } n^{1/2} \to \infty\), it is clear that the sign of (9.4) is decided by that of \(\beta_{l}\). This implies (9.2) and (9.3) hold. Thus we complete the proof of Theorem 2 .
Proof of Theorem 3
Let \(\theta^{ *} = (\beta^{ *T} ,\gamma^{ *T} )^{T}\), and let \(P_{i}^{ *} = (X_{i}^{ *T} ,W_{i}^{ *T} )^{T} , i = 1, \ldots ,n\) denote the covariates corresponding to \(\theta^{ *}\). Denote \({\dot{\mathcal{Q}}}_{1n} \left( {\beta ,\gamma } \right)\) and \({\dot{\mathcal{Q}}}_{2n} \left( {\beta ,\gamma } \right)\) to be the first derivatives of the PQIF \({\mathcal{Q}}_{n}^{p}\) with respect to \(\beta\) and \(\gamma\) respectively. i.e.,
By Theorems 1 and 2, \((\hat{\beta }^{ *T} ,0^{T} )^{T}\) and \(\hat{\gamma }^{ *T}\) satisfies that
By Taylor expansion, we have
where \(\tilde{\theta }\) is between \(((\beta_{0}^{ *T} ,{\mathbf{0}}^{T} )^{T} ,\gamma_{0}^{ *T} )^{T}\) and \(((\hat{\beta }^{ *T} ,{\mathbf{0}}^{T} )^{T} ,\hat{\gamma }^{ *T} )^{T}\). Apply Taylor expansion to \(p_{\lambda }^{\prime } \left( {\left| {\hat{\beta }_{l} } \right|} \right)\), we obtain
by assumption (A10), \(p''_{\lambda } \left( {\left| {\beta_{0l} } \right|} \right) = o_{p} \left( 1 \right)\). Note \(p'_{\lambda } \left( {\left| {\beta_{0l} } \right|} \right) = 0\) as \(\lambda_{ \hbox{max} } \to 0\), therefore, by Lemma 4 of [18] and through some calculation, we have
where \(\tilde{X}_{i} = H^{\prime} \left( {\eta_{i} } \right)X_{i}^{ *} ,\tilde{R}\left( {U_{i} } \right) = H^{\prime} \left( {\eta_{i} } \right)R\left( {U_{i} } \right),\varOmega_{kl}^{ - 1}\) is the \(\left( {l,k} \right)\) block of \(\varOmega^{ - 1}\) and
Similarly, we have
where, \(\tilde{W}\left( {U_{j} } \right) = H^{\prime} \left( {\eta_{j} } \right)W\left( {U_{j} } \right),W\left( {U_{j} } \right) = (W_{j1} , \ldots ,W_{jm} )^{T} ,W_{ij} = B\left( {U_{ij} } \right)\). Hence
Follow the proof of Theorem 2 in [18], we prove (4.3). Thus we complete the proof of Theorem 3.
Rights and permissions
About this article
Cite this article
Zhang, J., Xue, L. Variable selection for generalized partially linear models with longitudinal data. Evol. Intel. 15, 2473–2483 (2022). https://doi.org/10.1007/s12065-020-00521-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-020-00521-6