Abstract
An essential assumption in traditional regression techniques is that predictors are measured without errors. Failing to take into account measurement error in predictors may result in severely biased inferences. Correcting measurement-error bias is an extremely difficult problem when estimating a regression function nonparametrically. We propose an approach to deal with measurement errors in predictors when modelling flexible regression functions. This approach depends on directly modelling the mean and the variance of the response variable after integrating out the true unobserved predictors in a penalized splines model. We demonstrate through simulation studies that our approach provides satisfactory prediction accuracy largely outperforming previously suggested local polynomial estimators even when the model is incorrectly specified and is competitive with the Bayesian estimator.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
Azzalini, A.: The skew-normal and related families. Cambridge University Press, Cambridge (2013)
Berry, S.M., Carroll, R.J., Ruppert, D.: Bayesian smoothing and regression splines for measurement error problems. J. Am. Stat. Assoc. 97, 160–169 (2002)
Carroll, R.J.: Covariance analysis in generalized linear measurement error models. Stat. Med. 8, 1075–1093 (1989)
Carroll, R.J., Maca, J.D., Ruppert, D.: Nonparametric regression with errors in covariates. Biometrika 86, 541–554 (1999)
Carroll, R.J., Ruppert, D., Stefanski, L., Crainiceanu, C.: Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman and Hall, Boca Raton (2006)
Cook, J.R., Stefanski, L.A.: Simulation-extrapolation estimation in parametric measurement error models. J. Am. Stat. Assoc. 89, 1314–1328 (1994)
Davidian, M., Giltinan, D.M.: Nonlinear Models for Repeated Measurement Data. Chapman and Hall, New York (1995)
Delaigle, A., Gijbels, I.: Estimation of integrated squared density derivatives from a contaminated sample. J. Roy. Stat. Soc. B 64, 869–886 (2002)
Delaigle, A., Gijbels, I.: Practical bandwidth selection in deconvolution kernel density estimation. Comput. Stat. Data Anal. 45, 249–267 (2004)
Delaigle, A., Hall, P.: Using SIMEX for Smoothing-parameter Choice in Errors-invariables Problems. J Am Stat Assoc 103, 280–287 (2008)
Delaigle, A., Fan, J., Carroll, R.J.: A design-adaptive local polynomial estimator for the errors-in-variables problem. J. Am. Stat. Assoc. 104, 348–359 (2009)
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties (with discussion). Stat. Sci. 11, 89–102 (1996)
Fan, J., Truong, Y.K.: Nonparametric regression with errors in variables. Ann. Stat. 21, 1900–1925 (1993)
Fuller, W.A.: Measurement Error Models. John Wiley and Sons, New York (1987)
Ganguli, B., Staudenmayer, J., Wand, M.P.: Additive models with predictors subject to measurement error. Aust. N. Z. J. Stat. 47, 193–202 (2005)
Harezlak, J., Ruppert, D., Wand, M.P.: Semiparametric Regression with R, Use R!. Springer, New York (2018)
Huang, X., Zhou, H.: An alternative local polynomial estimator for the error-in-variables problem. J. Nonparametr. Stat. 29, 301–325 (2017)
Ruppert, D., Carroll, R.J.: Spatially adaptive penalties for spline fitting. Aust. N. Z. J. Stat. 42, 205–223 (2000)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge, UK (2003)
Sarkar, A., Mallick, B.K., Carroll, R.J.: Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors. Biometrics 70, 823–834 (2014)
Spiegelman, D., Rosner, B., Logan, R.: Estimation and inference for logistic regression with covariate misclassification and measurement error, in main study/validation study designs. J. Am. Stat. Assoc. 95, 51–61 (2000)
Staudenmayer, J., Ruppert, D.: Local polynomial regression and simulation-extrapolation. J. R. Stat. Soc. Ser. B 66, 17–30 (2004)
Wang, X., Shen, J., Ruppert, D.: On the asymptotics of penalized spline smoothing. Electr. J. Stat. 5, 1–17 (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Theorem 2
We will demonstrate (10) only since (7), (8), and (9) will follow similarly. First since \(b >a\),
The first term in (20)
Now, the second term in (20)
Substituting (21) and (22) in (20) gives (10).
Appendix B: \(\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})\)
Using result 1, the (i, j) entry of \(\mathrm{Cov}(\varvec{X} \varvec{\beta }|\varvec{w})\) is
\(\mathrm{Cov}(\varvec{X} \varvec{\beta }|\varvec{w})\) is a diagonal matrix since the \(x_i\)’s are independent given \(\varvec{w}\). Similarly, the ith element of the vector \(\varvec{Z} \varvec{u} \) is a function of \(x_i\) only. Therefore, \(\mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})\) is a diagonal matrix as well. The (i, i) entry of this matrix is
More compactly using vectors, (24) can be re-expressed as
where \(z_i\) is the ith row vector of \(\varvec{Z}\). Specifically, \(\varvec{z}_i = \bigl [(x_i - k_1)_+, \ \ldots , (x_i-k_k)_+ \bigr ]^\top \). The (l, r) entry of \(\mathrm{Cov}(\varvec{z}_i|w_i)\)
The last term can be found similar to (12) whereas the first term, by (10),
assuming \(l\le r\).
Finally, similar to above, \(\mathrm{Cov}(\varvec{X} \varvec{\beta }, \varvec{Z} \varvec{u}|\varvec{w}, \varvec{u}) \) is a diagonal matrix since the ith element for both vectors \(\varvec{X} \varvec{\beta }\) and \(\varvec{Z} \varvec{u}\) depends on \(x_i\) alone. The (i, i) entry of this matrix is
which can be directly found using (9).
Let \(v_i\) be (i, i) entry of the \(\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})\). Putting (5), (23), (27) and (28) together concludes that \(\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})\) is a diagonal matrix with \(v_i\) being
Appendix C: varying the number of knots
Figure 7 shows the MSE performance for the conditional approach for Case 1 while varying k the number of knots. It basically repeats the analysis in panel (a) of Fig. 2 for the conditional approach with \(k=10, 20, 30, 40\) and 50. Except when n is small, the performance of the estimator is slightly affected by choice of k.
Appendix D: MSE pointwise assessment
Figure 8 provides pointwise MSE for six points for the conditional and the Bayesian approaches for Case 1. Two points are on the boundary of the grid \(\{-2,2\}\), three critical points \(\{-1,0,0.43\}\), and one inflection point at 0.81.
Rights and permissions
About this article
Cite this article
Hattab, M.W., Ruppert, D. A mixed model approach to measurement error in semiparametric regression. Stat Comput 31, 31 (2021). https://doi.org/10.1007/s11222-021-10005-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10005-x