A mixed model approach to measurement error in semiparametric regression

Mohammad W. Hattab¹ &
David Ruppert²

587 Accesses
1 Citation
Explore all metrics

Abstract

An essential assumption in traditional regression techniques is that predictors are measured without errors. Failing to take into account measurement error in predictors may result in severely biased inferences. Correcting measurement-error bias is an extremely difficult problem when estimating a regression function nonparametrically. We propose an approach to deal with measurement errors in predictors when modelling flexible regression functions. This approach depends on directly modelling the mean and the variance of the response variable after integrating out the true unobserved predictors in a penalized splines model. We demonstrate through simulation studies that our approach provides satisfactory prediction accuracy largely outperforming previously suggested local polynomial estimators even when the model is incorrectly specified and is competitive with the Bayesian estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Local influence for Liu estimators in semiparametric linear models

Article 17 May 2016

Effective identification and estimation for the semiparametric measurement error model

Article 03 June 2016

Regression

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)
MathSciNet MATH Google Scholar
Azzalini, A.: The skew-normal and related families. Cambridge University Press, Cambridge (2013)
Book Google Scholar
Berry, S.M., Carroll, R.J., Ruppert, D.: Bayesian smoothing and regression splines for measurement error problems. J. Am. Stat. Assoc. 97, 160–169 (2002)
Article MathSciNet Google Scholar
Carroll, R.J.: Covariance analysis in generalized linear measurement error models. Stat. Med. 8, 1075–1093 (1989)
Article Google Scholar
Carroll, R.J., Maca, J.D., Ruppert, D.: Nonparametric regression with errors in covariates. Biometrika 86, 541–554 (1999)
Article MathSciNet Google Scholar
Carroll, R.J., Ruppert, D., Stefanski, L., Crainiceanu, C.: Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman and Hall, Boca Raton (2006)
Book Google Scholar
Cook, J.R., Stefanski, L.A.: Simulation-extrapolation estimation in parametric measurement error models. J. Am. Stat. Assoc. 89, 1314–1328 (1994)
Article Google Scholar
Davidian, M., Giltinan, D.M.: Nonlinear Models for Repeated Measurement Data. Chapman and Hall, New York (1995)
Google Scholar
Delaigle, A., Gijbels, I.: Estimation of integrated squared density derivatives from a contaminated sample. J. Roy. Stat. Soc. B 64, 869–886 (2002)
Article MathSciNet Google Scholar
Delaigle, A., Gijbels, I.: Practical bandwidth selection in deconvolution kernel density estimation. Comput. Stat. Data Anal. 45, 249–267 (2004)
Article MathSciNet Google Scholar
Delaigle, A., Hall, P.: Using SIMEX for Smoothing-parameter Choice in Errors-invariables Problems. J Am Stat Assoc 103, 280–287 (2008)
Article Google Scholar
Delaigle, A., Fan, J., Carroll, R.J.: A design-adaptive local polynomial estimator for the errors-in-variables problem. J. Am. Stat. Assoc. 104, 348–359 (2009)
Article MathSciNet Google Scholar
Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties (with discussion). Stat. Sci. 11, 89–102 (1996)
Article Google Scholar
Fan, J., Truong, Y.K.: Nonparametric regression with errors in variables. Ann. Stat. 21, 1900–1925 (1993)
MathSciNet MATH Google Scholar
Fuller, W.A.: Measurement Error Models. John Wiley and Sons, New York (1987)
Book Google Scholar
Ganguli, B., Staudenmayer, J., Wand, M.P.: Additive models with predictors subject to measurement error. Aust. N. Z. J. Stat. 47, 193–202 (2005)
Article MathSciNet Google Scholar
Harezlak, J., Ruppert, D., Wand, M.P.: Semiparametric Regression with R, Use R!. Springer, New York (2018)
Book Google Scholar
Huang, X., Zhou, H.: An alternative local polynomial estimator for the error-in-variables problem. J. Nonparametr. Stat. 29, 301–325 (2017)
Article MathSciNet Google Scholar
Ruppert, D., Carroll, R.J.: Spatially adaptive penalties for spline fitting. Aust. N. Z. J. Stat. 42, 205–223 (2000)
Article Google Scholar
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge, UK (2003)
Book Google Scholar
Sarkar, A., Mallick, B.K., Carroll, R.J.: Bayesian semiparametric regression in the presence of conditionally heteroscedastic measurement and regression errors. Biometrics 70, 823–834 (2014)
Article MathSciNet Google Scholar
Spiegelman, D., Rosner, B., Logan, R.: Estimation and inference for logistic regression with covariate misclassification and measurement error, in main study/validation study designs. J. Am. Stat. Assoc. 95, 51–61 (2000)
Article Google Scholar
Staudenmayer, J., Ruppert, D.: Local polynomial regression and simulation-extrapolation. J. R. Stat. Soc. Ser. B 66, 17–30 (2004)
Article MathSciNet Google Scholar
Wang, X., Shen, J., Ruppert, D.: On the asymptotics of penalized spline smoothing. Electr. J. Stat. 5, 1–17 (2011)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Anesthesiology and Critical Care Medicine, Johns Hopkins University School of Medicine, Baltimore, USA
Mohammad W. Hattab
Department of Statistics and Data Science and School of Operations Research and Information Engineering, Cornell University, Ithaca, USA
David Ruppert

Authors

Mohammad W. Hattab
View author publications
You can also search for this author in PubMed Google Scholar
David Ruppert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad W. Hattab.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Theorem 2

We will demonstrate (10) only since (7), (8), and (9) will follow similarly. First since $b >a$,

$$\begin{aligned}&\int \limits _{-\infty }^{+\infty } (x-a)_+(x-b)_+ f(x) \, \mathrm {d} x = \int \limits _{b}^{+\infty } (x-a)(x-b) f(x) \, \mathrm {d} x \nonumber \\&\quad = \int \limits _{b}^{+\infty } (x-\mu +\mu -a)(x-\mu + \mu -b) f(x) \, \mathrm {d} x \nonumber \\&\quad = \int \limits _{b}^{+\infty } (x-\mu )^2 f(x) \, \mathrm {d} x + (2\mu -a-b)\int \limits _{b}^{+\infty } (x-\mu ) f(x) \, \mathrm {d} x \nonumber \\&\qquad + (\mu -a) (\mu -b) (1-F(b)). \end{aligned}$$

(20)

The first term in (20)

$$\begin{aligned}&\int \limits _{b}^{+\infty } (x-\mu )^2 f(x) \, \mathrm {d} x= \int \limits _{b}^{+\infty } (2\pi s^2)^{-1/2} (x-\mu )^2 \exp (-(x-\mu )^2/(2s^2)) \, \mathrm {d} x \nonumber \\&\quad = \int \limits _{b-\mu }^{+\infty } (2\pi s^2)^{-1/2} t^2 \exp (-t^2/(2s^2)) \, \mathrm {d} t \nonumber \\&\quad = -(2\pi s^2)^{-1/2} s^2 t \exp (-t^2/(2s^2)) \Big |_{b-\mu }^{+\infty }\nonumber \\&\quad + s^2 \int \limits _{b-\mu }^{+\infty } (2\pi s^2)^{-1/2} \exp (-t^2/(2s^2)) \, \mathrm {d} t \nonumber \\&\quad = (2\pi s^2)^{-1/2} s^2 (b-\mu ) \exp (-(\mu -b)^2/(2s^2)) +s^2 (1-F(b)) \nonumber \\&\quad = s^2 (b-\mu ) f(b) +s^2 (1-F(b)). \end{aligned}$$

(21)

Now, the second term in (20)

$$\begin{aligned}&(2\mu -a-b)\int \limits _{b}^{+\infty } (x-\mu ) f(x) \, \mathrm {d}x\nonumber \\&\quad = (2\mu -a-b) \int \limits _{b}^{+\infty } (2\pi s^2)^{-1/2} (x-\mu ) \exp (-(x-\mu )^2/(2s^2)) \, \mathrm {d} x \nonumber \\&\quad = (2\mu -a-b) \int \limits _{(b-\mu )^2/2}^{+\infty } (2\pi s^2)^{-1/2} \exp (-t/s^2) \, \mathrm {d} t \nonumber \\&\quad = (2\mu -a-b)( -s^2 (2\pi s^2)^{-1/2} \exp (-t/s^2) )\Big |_{(b-\mu )^2/2}^{+\infty } \nonumber \\&\quad = (2\mu -a-b)s^2f(b). \end{aligned}$$

(22)

Substituting (21) and (22) in (20) gives (10).

Appendix B: $\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})$

Using result 1, the (i, j) entry of $\mathrm{Cov}(\varvec{X} \varvec{\beta }|\varvec{w})$ is

$$\begin{aligned}&\mathrm{Cov}( \beta _0 + \beta _1 x_i, \beta _0 + \beta _1 x_j|\varvec{w}) = \beta _1^2 \mathrm{Cov}(x_i,x_j|\varvec{w}) \nonumber \\&\quad = \left\{ \begin{array}{ll} 0 &{} i\ne j \\ \beta _1^2 \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} &{} i= j \end{array} \right. . \end{aligned}$$

(23)

$\mathrm{Cov}(\varvec{X} \varvec{\beta }|\varvec{w})$ is a diagonal matrix since the $x_i$’s are independent given $\varvec{w}$. Similarly, the ith element of the vector $\varvec{Z} \varvec{u} $ is a function of $x_i$ only. Therefore, $\mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})$ is a diagonal matrix as well. The (i, i) entry of this matrix is

$$\begin{aligned} \mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})_{ii} = \mathrm{Var}\left( \sum ^{k}_{j=1} u_j (x_i - k_j)_+|w_i,\varvec{u}\right) . \end{aligned}$$

(24)

More compactly using vectors, (24) can be re-expressed as

$$\begin{aligned} \mathrm{Cov}(\varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})_{ii} = \varvec{u}^\top \mathrm{Cov}(\varvec{z}_i|w_i) \varvec{u}. \end{aligned}$$

(25)

where $z_i$ is the ith row vector of $\varvec{Z}$. Specifically, $\varvec{z}_i = \bigl [(x_i - k_1)_+, \ \ldots , (x_i-k_k)_+ \bigr ]^\top $. The (l, r) entry of $\mathrm{Cov}(\varvec{z}_i|w_i)$

$$\begin{aligned}&\mathrm{Cov}\bigl ((x_i-k_l)_+, (x_i-k_r)_+|w_i\bigr ) = \mathrm{E}\bigl ( (x_i-k_l)_+ (x_i-k_r)_+ |w_i\bigr )\nonumber \\&\quad - \mathrm{E}\bigl ( (x_i-k_l)_+|w_i\bigr ) \mathrm{E}\bigl (x_i-k_r)_+ |w_i\bigr ). \end{aligned}$$

(26)

The last term can be found similar to (12) whereas the first term, by (10),

$$\begin{aligned}&\mathrm{E}\bigl ( (x_i-k_l)_+ (x_i-k_r)_+ |w_i\bigr ) = \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} f_i(k_r) \left( \frac{\sigma _x^2 w_i+ \mu _x \sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} -k_l\right) \nonumber \\&\quad +\left( \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} + \left( \frac{\sigma _x^2 w_i + \mu _x \sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2}-k_l\right) \times \left( \frac{\sigma _x^2 w_i + \mu _x \sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2}-k_r\right) \right) \nonumber \\&\quad \times \bigl (1-F_i(k_r)\bigr ), \end{aligned}$$

(27)

assuming $l\le r$.

Finally, similar to above, $\mathrm{Cov}(\varvec{X} \varvec{\beta }, \varvec{Z} \varvec{u}|\varvec{w}, \varvec{u}) $ is a diagonal matrix since the ith element for both vectors $\varvec{X} \varvec{\beta }$ and $\varvec{Z} \varvec{u}$ depends on $x_i$ alone. The (i, i) entry of this matrix is

$$\begin{aligned} \mathrm{Cov}(\varvec{X} \varvec{\beta }, \varvec{Z} \varvec{u}|\varvec{w}, \varvec{u})_{ii}= & {} \mathrm{Cov}(\beta _0 + \beta _1 x_i,\varvec{z}_i^\top \varvec{u}|w_i,\varvec{u})\nonumber \\= & {} \beta _1 \mathrm{Cov}(x_i,\varvec{z}_i|w_i) \varvec{u}\nonumber \\= & {} \beta _1\sum _{j=1}^{k} u_j \mathrm{Cov}\bigl (x_i,(x_i-k_j)_+|w_i\bigr ) \end{aligned}$$

(28)

which can be directly found using (9).

Let $v_i$ be (i, i) entry of the $\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})$. Putting (5), (23), (27) and (28) together concludes that $\mathrm{Cov}(\varvec{Y}|\varvec{w}, \varvec{u})$ is a diagonal matrix with $v_i$ being

$$\begin{aligned} v_i = \sigma _e^2 + \beta _1^2 \frac{\sigma _x^2\sigma _{w|x}^2}{\sigma _x^2 + \sigma _{w|x}^2} +2\beta _1 \mathrm{Cov}(x_i,\varvec{z}_i|w_i) \varvec{u} +\varvec{u}^\top \mathrm{Cov}(\varvec{z}_i|w_i) \varvec{u}.\nonumber \\ \end{aligned}$$

(29)

Appendix C: varying the number of knots

Figure 7 shows the MSE performance for the conditional approach for Case 1 while varying k the number of knots. It basically repeats the analysis in panel (a) of Fig. 2 for the conditional approach with $k=10, 20, 30, 40$ and 50. Except when n is small, the performance of the estimator is slightly affected by choice of k.

Appendix D: MSE pointwise assessment

Figure 8 provides pointwise MSE for six points for the conditional and the Bayesian approaches for Case 1. Two points are on the boundary of the grid $\{-2,2\}$, three critical points $\{-1,0,0.43\}$, and one inflection point at 0.81.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hattab, M.W., Ruppert, D. A mixed model approach to measurement error in semiparametric regression. Stat Comput 31, 31 (2021). https://doi.org/10.1007/s11222-021-10005-x

Download citation

Received: 09 June 2020
Accepted: 26 February 2021
Published: 30 March 2021
DOI: https://doi.org/10.1007/s11222-021-10005-x