Multivariate Local Fitting with General Basis Functions

Jochen Einbeck¹

669 Accesses
Explore all metrics

Summary

In this paper we combine the concepts of local smoothing and fitting with basis functions for multivariate predictor variables. We start with arbitrary basis functions and show that the asymptotic variance at interior points is independent of the choice of the basis. Moreover we calculate the asymptotic variance at boundary points. We are not able to compute the asymptotic bias since a Taylor theorem for arbitrary basis functions does not exist. For this reason we focus on basis functions without interactions and derive a Taylor theorem which covers this case. This theorem enables us to calculate the asymptotic bias for interior as well as for boundary points. We demonstrate how advantage can be taken of the idea of local fitting with general basis functions by means of a simulated data set, and also provide a data-driven tool to optimize the basis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Explicit multivariate approximations from cell-average data

Article Open access 12 December 2022

Fundamental Splines on Sparse Grids and Their Application to Gradient-Based Optimization

Bias reduction by projection on parametric models in Hilbertian nonparametric regression

Article 13 March 2021

References

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74, 829–836.
Article MathSciNet Google Scholar
Cleveland, W. S. and Devlin, S. (1988). Locally weighted regression: An approach to regression analysis by local fitting. J. Amer. Statist. Assoc. 83, 596–610.
Article Google Scholar
Einbeck, J. (2001). Local fitting with general basis functions, SFB 386, Discussion Paper No. 256. https://doi.org/www.stat.uni-muenchen.de/∼einbeck/powerpap06.ps.
Fahrmeir, L. and Hamerle, A. (1984). Multivariate statistische Verfahren. Berlin / New York: de Gruyter.
MATH Google Scholar
Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87, 998–1004.
Article MathSciNet Google Scholar
Nadaraya, E. A. (1964). On estimating regression. Theory Prob. Appl. 10, 186–190.
Article Google Scholar
Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis. New York: Springer.
Book Google Scholar
Ruppert, D. and Wand, M. P. (1994). Multivariate locally weighted least squares regression. Ann. Statist. 22, 1346–1370.
Article MathSciNet Google Scholar
Staniswalis, J. G., Messer, K., and Finston, D. R. (1993). Kernel estimators for multivariate regression. Nonparametric Statistics 3, 103–121.
Article MathSciNet Google Scholar
Stone, C. J. (1977). Consistent nonparametric regression. Ann. Statist. 5, 595–645.
Article MathSciNet Google Scholar
Wand, M. P. (1992). Error analysis for general multivariate kernel estimators. Nonparametric Statistics 2, 1–15.
Article MathSciNet Google Scholar
Wand, M. P. and Jones, M. C. (1993). Comparison of smoothing parametrizations in bivariate kernel density estimation. J. Amer. Statist. Assoc. 88, 520–528.
Article MathSciNet Google Scholar
Watson, G. S. (1964). Smooth regression analysis. Sankhyā, Series A, 26, 359–372.
MathSciNet MATH Google Scholar
Yang, L. and Tschering, R. (1999). Multivariate bandwidth selection for local linear regression. J. R. Statist. Soc. B 61, 793–815.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author is grateful to Gerhard Tutz (LMU, Dep. of Statistics) for helpful inspirations during this work, Daniel Rost (LMU, Math. Inst.) for contributions concerning the extendibility of Taylor’s theorem and to the referees for many suggestions which improved this paper. This work was finished while the author passed a term at the University São Paulo. Many thanks especially to Carmen D. S. de André and Júlio M. Singer (USP) for their support in various fields.

Parts of this work were presented and discussed at the Euroworkshop on Statististical Modelling (2001). There a variety of valuable comments and suggestions were given that enhanced the paper.

Author information

Authors and Affiliations

Institut für Statistik, Ludwig Maximilians Universität, Akadeniiestr. 1, 80799, München, Germany
Jochen Einbeck

Authors

Jochen Einbeck
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Appendix

A Regularity conditions

(A1) The kernel K is bounded with compact support, ∫ uu^TK(u)du = μ₂I_d, where μ₂ is a scalar and I_d the d × d identity matrix. In addition, all odd-order moments of K vanish, i.e. $\int {u_1^{{l_1}}\; \cdots \;u_d^{{l_d}}K\left( u \right)du = 0} $ for all nonnegative integers l₁, …, l_d with an odd sum.
(A2) The point x is ∈ supp(f). At x, σ² is continuous, f is continuously differentiable and all second-order derivatives of m are continuous. Further f(x) > 0, σ²(x) > 0.
(A3) The sequence of bandwidth matrices H^1/2 is such that n⁻¹∣H∣^−1/2 and each entry of H tends to zero as n → ∞.
(A4) For a boundary point x, there exists a value on the boundary of supp(f) with x = x_b + H^1/2c, where c is a fixed element of supp(K), and a convex set ${\cal C}$ with nonnull interior containing x_b, such that $x \in {\cal C}$.

(A5) At x, all basis functions are continuously differentiable (for variance expressions in Theorem 1,3,4) resp. twice continuously differentiable (for bias expressions in Theorem 3 and 4). In either case, the point x is non-singular for all basis functions, i.e. ∇φ_j(x) ≠ 0 for j = 1, …, q.

For explanations and interpretations of conditions (Al) to (A4) see Ruppert & Wand (1994).

B Proofs

B.1 Proof of Theorem 1

Let 1 be a matrix of appropriate dimension having only entries equal to 1, further let

$${A_H} = \left( {\begin{array}{*{20}c} 1 \; 0 \\ 0 \; {{H^{1/2}}} \\ \end{array} } \right)\quad \in \;{\mathbb{R} ^{d + 1,d + 1}},\quad {\rm{and}}\;{A_1} = \,\left( {\begin{array}{*{20}c} 1 \; 0 \\ 0 \; 1 \\ \end{array} } \right)\quad \in \;{\mathbb{R} ^{d + 1,q + 1}}.$$

Note that for any u ∈ ℝ^d

$${\bf{\Phi }}\left( {x + {H^{1/2}}u} \right) - {\bf{\Phi }}\left( x \right) = {D_x}{H^{1/2}}u + o\left( {{H^{1/2}}1} \right)$$

holds. Let ${C_{x,H}} = \left\{ {t:{H^{ - 1/2}}\left( {t - x} \right) \in {{\cal D}_{x,H}}} \right\}$. For interior and boundary points we derive

$$\begin{array}{l}\; {X_{x}^{T} W_{x}} {X_{x}=} \\\;\quad {=} {\sum\limits_{i=1}^{n} K_{H}\left(X_{i}-x\right)\left(\begin{array}{cc}{1} \; {\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T}} \\{\Phi\left(X_{i}\right)-\Phi(x)} \; {\left(\Phi\left(X_{i}\right)-\Phi(x)\right)\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T}}\end{array}\right)} \\\;\quad {=} {n \int\nolimits_{C_{x, H}} K_{H}(t-x)\left(\begin{array}{cc}{1} \; {(\Phi(t)-\Phi(x))^{T}} \\ {\Phi(t)-\Phi(x)} \; {(\Phi(t)-\Phi(x))(\Phi(t)-\Phi(x))^{T}}\end{array}\right) f(t) d t} \\ \;\qquad {} {+n o_{P}\left(A_{1}^{T} A_{H} \mathbf{1} A_{H} A_{1}\right)} \\\;\quad {=} {n f(x) \int_{\mathcal{D}_{x, H}} K(u)\left(\begin{array}{cc}{1} \; {u^{T}H^{1/2}D_{x}}\\ {D_{x}^{T} H^{1 / 2} u} \; {D_{x}^{T} H^{1 / 2} u u^{T} H^{1 / 2} D_{x}}\end{array}\right) d u} \\\;\qquad {} {+n o_{P}\left(A_{1}^{T} A_{H} \mathbf{1} A_{H} A_{1}\right)}\end{array}$$

(18)

$$\quad = \quad nf\left( x \right)\left( {A_{{D_x}}^T\;{A_H}{M_x}{A_H}{A_{{D_x}}} + \;{o_P}\left( {A_1^T\;{A_H}1{A_H}{A_1}} \right)} \right),\;$$

(19)

and analogously

$$X_x^T{{\rm{\Sigma }}_x}{X_x} = n{\left| H \right|^{ - 1/2}}f\left( x \right){\sigma ^2}\left( x \right)\left( {A_{{D_x}}^T{A_H}{N_x}{A_H}{A_{{D_x}}} + {o_P}\left( {A_1^T{A_H}1{A_H}{A_1}} \right)} \right).$$

(20)

Substituting (19) and (20) into (6) leads to (8). In the special case of an interior point we have ${M_x} = \left( {\begin{array}{*{20}c} 1 \; 0 \\ 0 \; {{\mu _2}{{\bf{I}}_d}} \\ \end{array} } \right)$ and ${N_x} = \,\left( {\begin{array}{*{20}c} {{\nu _0}} \; 0 \\ 0 \; {\int u{u^T}\;{K^2}\left( u \right)du} \\ \end{array} } \right)$. Thus (8) reduces to

$${\rm{Var}}\left( {\hat m\left( x \right)|\mathbb{X}} \right)\; = \;{{{\sigma ^2}\left( x \right)} \over {nf\left( x \right)}}{\left| H \right|^{ - 1/2}}e_1^T{N_x}{e_1}\left( {1 + {o_P}\left( 1 \right)} \right)\; = $$

(21)

$$ = \;{{{\sigma ^2}\left( x \right)} \over {nf\left( x \right)}}{\left| H \right|^{ - 1/2}}{\nu _0}\left( {1 + {o_P}\left( 1 \right)} \right).$$

(22)

B.2 Proof of Theorem 2

We introduce the function M : [0, 1] → ℝ,

$$M\left( t \right) = m\left( {{y_{\rm{\Phi }}}\left( t \right)} \right) = m\left( {{{\rm{\Phi }}^{ - 1}}\left[ {{\rm{\Phi }}\left( x \right) + t\left( {{\rm{\Phi }}\left( z \right) - {\rm{\Phi }}\left( x \right)} \right)} \right]} \right).$$

Then we have M(0) = m(x) and M(1) = m(z). We apply the univariate Taylor theorem on the function M ∈ C^p+1([0,1]) and obtain

$$M\left( 1 \right) = M\left( 0 \right) + M\prime\left( 0 \right) + {1 \over {2!}}M\prime\prime\left( 0 \right) + \; \ldots + \,{1 \over {p!}}{M^{(p)}}\left( 0 \right) + {r_{p + 1}},$$

(23)

where

$${r_{p + 1}} = {1 \over {\left( {p + 1} \right)!}}{M^{\left( {p + 1} \right)}}\left( \tau \right)\quad \left( {\tau \in \left[ {0,1} \right]} \right).$$

Using the Inverse Function Theorem we obtain

$$y_{\rm{\Phi }}^\prime \left( t \right) = {\left[ {{1 \over {\phi _i^\prime \left( {{y_{\rm{\Phi }}}{{\left( t \right)}_{\left( i \right)}}} \right)}}\left( {{\phi _i}\left( {{z_i}} \right) - {\phi _i}\left( {{x_i}} \right)} \right)} \right]_{\left( {1 \le i \le n} \right)}}$$

Repeated application of the chain rule on M = m ○ y_Φ leads to

$$\begin{array}{rcl}\;\;\;\;\;\;\;\;\;\;\;\;\;\; M^{\prime}(t) \;= \;\nabla m\left(y_{\Phi}(t)\right) \cdot y_{\Phi}^{\prime}(t)=\left[\left((\Phi(z)-\Phi(x)) \cdot \nabla_{\Phi}\right) m\right]\left(y_{\Phi}(t)\right) \\ M^{\prime \prime}(t) \;= \;\left[\left((\Phi(z)-\Phi(x)) \cdot \nabla_{\Phi}\right)^{2} m\right]\left(y_{\Phi}(t)\right)\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \\ \; \vdots\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \; \\ M^{(n)}(t)\;= \;\left[\left((\Phi(z)-\Phi(x)) \cdot \nabla_{\Phi}\right)^{n} m\right]\left(y_{\Phi}(t)\right)\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \end{array}$$

Applying the latter formulas in (23) and substituting ζ = y_Φ(τ) proves the allegation.

B.3 Proof of Theorem 3

The proof is kept shortly since it follows mainly the ideas of the corresponding proof for multivariate local linear fitting, see Ruppert & Wand (1994).

Asymptotic Bias

First note that, applying (10), we have

$$m=X_{x}\left(\begin{array}{c}{m(x)} \\ {P_{x}^{-1} \nabla m(x)}\end{array}\right)+\frac{1}{2} Q_{m}(x)+S_{m}(x)$$

(24)

with

$$Q_{m}(x)=\left[\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T} P_{x}^{-1} N_{m}(x) P_{x}^{-1}\left(\Phi\left(X_{i}\right)-\Phi(x)\right)\right]_{1 \leq i \leq n}$$

and S_m(x) = o(Q_m(x)). Plugging (24) into (5) shows that

$$\operatorname{Bias}(\hat{m}(x) | \mathbb{X})=\frac{1}{2} e_{1}^{T}\left(X_{x}^{T} W_{x} X_{x}\right)^{-1} X_{x}^{T} W_{x} Q_{m}(x)(1+o(1)).$$

(25)

Let W_i = K_H(X_i − x). Using matrix algebra (see e.g. Fahrmeir & Hamerle (1984)) we derive

$$\begin{array}{l}\;{\left(X_{x}^{T} W_{x} X_{x}\right)^{-1}=} \\\;\ \ =\ \ \left(\begin{array}{cc}{\sum w_{i}} \; {\sum w_{i}\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T}} \\ {\sum w_{i}\left(\Phi\left(X_{i}\right)-\Phi(x)\right)} \; {\sum w_{i}\left(\Phi\left(X_{i}\right)-\Phi(x)\right)\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T}}\end{array}\right)^{-1}\\\;\ \ =\ \ n\left(\begin{array}{cc}{f(x)+o_{P}(1)} \; {o_{P}\left(\mathbf{1}^{T} H^{1 / 2}\right)} \\ {o_{P}\left(H^{1 / 2} \mathbf{1}\right)} \; {\mu_{2} P_{x} H P_{x} f(x)+o_{P}(H)}\end{array}\right)^{-1}\\\;\ \ =\ \ \frac{1}{n}\left(\begin{array}{cc}{\frac{1}{f(x)}+o_{P}(1)} \; {o_{P}\left(\mathbf{1}^{T} H^{-1 / 2}\right)} \\ {o_{P}\left(H^{-1 / 2} \mathbf{1}\right)} \; {\frac{1}{\mu_{2} f(x)} P_{x}^{-1} H^{-1} P_{x}^{-1}+o_{P}\left(H^{-1}\right)}\end{array}\right)\end{array}$$

(26)

and

$$X_{x}^{T} W_{x} Q_{m}(x)=n\left(\begin{array}{c}{\mu_{2} f(x) \operatorname{tr}\left\{H N_{m}(x)\right\}+o_{P}(\operatorname{tr}(H))} \\ {O_{P}\left(H^{3 / 2} \mathbf{1}\right)}\end{array}\right)$$

(27)

so that substituting (26) and (27) into (25) proves (11).

Asymptotic variance

Similar like above we obtain

$$\begin{array}{l}{X_{x}^{T} \Sigma_{x} X_{x}=} \\{\quad=\sum w_{i}^{2} \sigma^{2}\left(X_{i}\right)\left(\begin{array}{cc}{1} \; {\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T}} \\ {\left(\Phi\left(X_{i}\right)-\Phi(x)\right)} \; {\left(\Phi\left(X_{i}\right)-\Phi(x)\right)\left(\Phi\left(X_{i}\right)-\Phi(x)\right)^{T}}\end{array}\right)}\\\quad=n|H|^{-1 / 2}\left(\begin{array}{cc}{\nu_{0} \sigma^{2}(x) f(x)+o_{P}(1)} \; {\mathbf{1}^{T} H^{1 / 2}\left(1+o_{P}(1)\right)} \\ {H^{1 / 2} \mathbf{1}\left(1+o_{P}(1)\right)} \; {G(x, H)+o_{P}(H)}\end{array}\right),\end{array}$$

where

$$G(x, H)=\left(\int K^{2}(u) u u^{T} d u\right) P_{x} H P_{x} \sigma^{2}(x) f(x).$$

Plugging this result and (26) into (6) leads to (12).

B.4 Proof of Theorem 4

Let

$$A_{H}=\left(\begin{array}{cc}{1} \; {0} \\ {0} \; {H^{1 / 2}}\end{array}\right), \quad A_{P_{x}}=\left(\begin{array}{cc}{1} \; {0} \\ {0} \; {P_{x}}\end{array}\right)$$

Asymptotic bias

Note that

$$\begin{array}{l}\;{X_{x}^{T} W_{x}} {X_{z}=} \\\;\ \ {=}\ \ {n \int_{C_{x}, H} K_{H}(t-x)\left(\begin{array}{cc}{1} \; {(\Phi(t)-\Phi(x))^{T}} \\ {(\Phi(t)-\Phi(x))} \; {(\Phi(t)-\Phi(x))(\Phi(t)-\Phi(x))^{T}}\end{array}\right) f(t) d t} \\\;\quad\ \ {} {+n o_{p}\left(A_{H} \mathbf{1} A_{H}\right)} \\\;\ \ {=}\ \ {n f(x)\left(A_{P_{x}} A_{H} M_{x} A_{H} A_{P_{x}}+o_{P}\left(A_{H} \mathbf{1} A_{H}\right)\right)},\end{array}$$

(28)

where C_x,h was defined in the proof of Theorem 1. Using the first step in (27),

$$\begin{array}{l}\;{X_{x}^{T} W_{x} Q_{m}(x)=}\\\;\ \ =\ \ n f(x)\left(\begin{array}{c}{\int\nolimits_{\mathcal{D}_{x, H}} K(u) u^{T} H^{1 / 2} N_{m}(x) H^{1 / 2} u d u} \\ {P_{x} H^{1 / 2} \int\nolimits_{\mathcal{D}_{x, H}} u K(u)\left\{u^{T} H^{1 / 2} N_{m}(x) H^{1 / 2} u\right\} d u}\end{array}\right)\\\;\ \ +\ \ o_{P}\left(\begin{array}{c}{n \operatorname{tr}(H)} \\ {n H^{1 / 2} \mathbf{1} \operatorname{tr}(\mathbf{H})}\end{array}\right)\end{array}$$

(29)

holds. Assuming (A4), M_x is nonsingular and we have

$$M_{x}^{-1}=\left(\begin{array}{cc}{\mu_{x}^{11}} \; {\mu_{x}^{12}} \\ {\mu_{x}^{21}} \; {\mu_{x}^{22}}\end{array}\right),$$

where $\mu_{x}^{11}=\left(\mu_{x, 11}-\mu_{x, 12} \mu_{x, 22}^{-1} \mu_{x, 21}\right)^{-1}, \mu_{x}^{12}=-\left(\mu_{x, 12} / \mu_{x, 11}\right) \mu_{x}^{22}$and $\mu_{x}^{22}=\left(\mu_{x, 22}-\mu_{x, 21} \mu_{x, 12} / \mu_{x, 11}\right)^{-1}$. Then substituting (28) and (29) into (25) and noticing that

$$e_{1}^{T} A_{P_{x}}^{-1} A_{H}^{-1} M_{x}^{-1} A_{H}^{-1} A_{P_{x}}^{-1}=\left(\begin{array}{cc}{\mu_{x}^{11}} \; {\mu_{x}^{12} P_{x}^{-1} H^{-1 / 2}}\end{array}\right)$$

yields formula (14).

Asymptotic variance

Similar considerations like in (28) lead to

$$X_{x}^{T} W_{x}^{2} X_{x}=n f(x)|H|^{-1 / 2}\left(A_{P_{x}} A_{H} N_{x} A_{H} A_{P_{x}}+o_{P}\left(A_{H} \mathbf{1} A_{H}\right)\right).$$

(30)

With (6), (28) and (30) we get

$$\begin{array}{l}{\operatorname{Var}(\hat{m}(x) | \mathbb{X})=} \\ {\quad=e_{1}^{T}\left(X_{x}^{T} W_{x} X_{x}\right)^{-1}\left(X_{x}^{T} W_{x}^{2} X_{x}\right)\left(X_{x}^{T} W_{x} X_{x}\right)^{-1} e_{1}\left(\sigma^{2}(x)+o_{P}(1)\right)} \\ {\quad=\frac{\sigma^{2}(x)}{n f(x)}|H|^{-1 / 2}\left(e_{1}^{T} M_{x}^{-1} N_{x} M_{x}^{-1} e_{1}+o_{P}(1)\right)},\end{array}$$

what had to be proven.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Einbeck, J. Multivariate Local Fitting with General Basis Functions. Computational Statistics 18, 185–203 (2003). https://doi.org/10.1007/s001800300140

Download citation

Published: 04 November 2019
Issue Date: July 2003
DOI: https://doi.org/10.1007/s001800300140

Multivariate Local Fitting with General Basis Functions

Summary

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Explicit multivariate approximations from cell-average data

Fundamental Splines on Sparse Grids and Their Application to Gradient-Based Optimization

Bias reduction by projection on parametric models in Hilbertian nonparametric regression

References

Acknowledgements

Author information

Authors and Affiliations

Appendices

Appendix

A Regularity conditions

B Proofs

B.1 Proof of Theorem 1

B.2 Proof of Theorem 2

B.3 Proof of Theorem 3

Asymptotic Bias

Asymptotic variance

B.4 Proof of Theorem 4

Asymptotic bias

Asymptotic variance

Rights and permissions

About this article

Cite this article

Key Words

Subscribe and save

Buy Now

Multivariate Local Fitting with General Basis Functions

Summary

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Explicit multivariate approximations from cell-average data

Fundamental Splines on Sparse Grids and Their Application to Gradient-Based Optimization

Bias reduction by projection on parametric models in Hilbertian nonparametric regression

References

Acknowledgements

Author information

Authors and Affiliations

Appendices

Appendix

A Regularity conditions

B Proofs

B.1 Proof of Theorem 1

B.2 Proof of Theorem 2

B.3 Proof of Theorem 3

Asymptotic Bias

Asymptotic variance

B.4 Proof of Theorem 4

Asymptotic bias

Asymptotic variance

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Subscribe and save

Buy Now