Abstract
Principal component analysis (PCA) is a well-established tool for identifying the main sources of variation in multivariate data. However, as a linear method it cannot describe complex nonlinear structures. To overcome this limitation, a novel nonlinear generalization of PCA is developed in this paper. The method obtains the nonlinear principal components from ridges of the underlying density of the data. The density is estimated by using Gaussian kernels. Projection onto a ridge of such a density estimate is formulated as a solution to a differential equation, and a predictor-corrector method is developed for this purpose. The method is further extended to time series data by applying it to the phase space representation of the time series. This extension can be viewed as a nonlinear generalization of singular spectrum analysis (SSA). Ability of the nonlinear PCA to capture complex nonlinear shapes and its SSA-based extension to identify periodic patterns from time series are demonstrated on climate data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The explained variances for the nonlinear principal components were obtained from the covariance matrix of the corresponding scores.
References
Berkooz, G., Holmes, P., Lumley, J.L.: The proper orthogonal decomposition in the analysis of turbulent flows. Annu. Rev. Fluid Mech. 25, 539–575 (1993)
Chacón, J.E., Duong, T., Wand, M.P.: Asymptotics for general multivariate kernel density derivative estimators. Stat. Sin. 21, 807–840 (2011)
Christiansen, B.: The shortcomings of nonlinear principal component analysis in identifying circulation regimes. J. Clim. 18(22), 4814–4823 (2005)
Damon, J.: Generic structure of two-dimensional images under Gaussian blurring. SIAM J. Appl. Math. 59(1), 97–138 (1998)
Delworth, T.L., Broccoli, A.J., Rosati, A., Stouffer, R.J., Balaji, V.: GFDL’s CM2 global coupled climate models. Part I: formulation and simulation characteristics. J. Clim. 19(5), 643–674 (2006)
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003)
Duong, T.: ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Stat. Softw. 21(7), 1–16 (2007)
Einbeck, J., Tutz, G., Evers, L.: Local principal curves. Stat. Comput. 15(4), 301–313 (2005)
Einbeck, J., Evers, L., Bailer-Jones, C.: Representing complex data using localized principal components with application to astronomical data. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, volume 58 of Lecture Notes in Computational Science and Engineering, pp. 178–201. Springer, Berlin, Heidelberg (2008)
Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Nonparametric ridge estimation. Ann. Stat. 42(4), 1511–1545 (2014)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Golyandina, N., Nekrutkin, V., Zhigljavsky, A.A.: Analysis of Time Series Structure: SSA and Related Techniques. Chapman and Hall/CRC Press, London (2001)
Greengard, L., Strain, J.: The fast Gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79–94 (1991)
Higham, N.J.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia (2008)
Hsieh, W.W.: Nonlinear multivariate and time series analysis by neural network methods. Rev. Geophys. 42(1), 1–25 (2004)
Hsieh, W.W., Hamilton, K.: Nonlinear singular spectrum analysis of the tropical stratospheric wind. Quart. J. R. Meteorol. Soc. 129(592), 2367–2382 (2003)
Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. Proc. IEEE 80(9), 1502–1517 (1992)
Jolliffe, I.T.: Principal Component Analysis. Springer-Verlag, Berlin (1986)
Kambhatla, N., Leen, K.T.: Dimension reduction by local principal component analysis. Neural Comput. 9(7), 1493–1516 (1997)
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Loève, M.: Probability Theory: Foundations, Random Sequences. van Nostrand, Princeton (1955)
Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econ. Theory 1(2), 179–191 (1985)
Miller, J.: Relative critical sets in \(R^n\) and applications to image analysis. PhD thesis, University of North Carolina (1998)
Monahan, A.H.: Nonlinear principal component analysis: tropical Indo-Pacific sea surface temperature and sea level pressure. J. Clim. 14(2), 219–233 (2001)
Newbigging, S.C., Mysak, L.A., Hsieh, W.W.: Improvements to the non-linear principal component analysis method, with applications to ENSO and QBO. Atmos.-Ocean 41(4), 291–299 (2003)
Ortega, J.M.: Numerical Analysis: A Second Course. SIAM, Philadelphia (1990)
Ozertem, U., Erdogmus, D.: Locally defined principal curves and surfaces. J. Mach. Learn. Res. 12, 1249–1286 (2011)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. Ser. 6 2(11), 559–572 (1901)
Pulkkinen, S.: Ridge-based method for finding curvilinear structures from noisy data. Comput. Stat. Data Anal. 82, 89–109 (2015)
Pulkkinen, S., Mäkelä, M.M., Karmitsa, N.: A generative model and a generalized trust region Newton method for noise reduction. Comput. Optim. Appl. 57(1), 129–165 (2014)
Rangayyan, R.M.: Biomedical Signal Analysis: A Case-Study Approach. IEEE Press, New York (2002)
Renardy, M., Rogers, R.C.: An introduction to partial differential equations. In: Marsden, J.E., Sirovich, L., Antman, S.S. (eds.) Texts in Applied Mathematics, vol. 13, 2nd edn. Springer-Verlag, New York (2004)
Ross, I.: Nonlinear dimensionality reduction methods in climate data analysis. PhD thesis, University of Bristol, United Kingdom (2008)
Ross, I., Valdes, P.J., Wiggins, S.: ENSO dynamics in current climate models: an investigation using nonlinear dimensionality reduction. Nonlinear Process. Geophys. 15, 339–363 (2008)
Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) Artificial Neural Networks-ICANN’97, volume 1327 of Lecture Notes in Computer Science, pp. 583–588. Springer, Berlin (1997)
Scholz, M., Kaplan, F., Guy, C.L., Kopka, J., Selbig, J.: Non-linear PCA: a missing data approach. Bioinformatics 21(20), 3887–3895 (2005)
Scholz, M., Fraunholz, M., Selbig, J.: Nonlinear principal component analysis: Neural network models and applications. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, volume 58 of Lecture Notes in Computational Science and Engineering, pp. 44–67. Springer, Berlin (2008)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. 61(3), 611–622 (1999)
Vautard, R., Yiou, P., Ghil, M.: Singular-spectrum analysis: a toolkit for short, noisy chaotic signals. Physica D 58(1–4), 95–126 (1992)
Weare, B.C., Navato, A.R., Newell, E.R.: Empirical orthogonal analysis of Pacific sea surface temperatures. J. Phys. Oceanogr. 6(5), 671–678 (1976)
Weinberger, K., Saul, L.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vis. 70(1), 77–90 (2006)
Whittlesey, E.F.: Fixed points and antipodal points. Am. Math. Mon. 70(8), 807–821 (1963)
Yang, C., Duraiswami, R., Gumerov, N.A., Davis, L.: Improved fast gauss transform and efficient kernel density estimation. In Ninth IEEE International Conference on Computer Vision. volume 1, pp. 664–671. Nice, France (2003)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)
Acknowledgments
The author was financially supported by the TUCS Graduate Programme, the Academy of Finland grant No. 255718 and the Finnish Funding Agency for Technology and Innovation (Tekes) grant No. 3155/31/2009. He would also like to thank Prof. Marko Mäkelä, Doc. Napsu Karmitsa and Prof. Hannu Oja (University of Turku) and the reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this appendix we give proofs of Theorems 3.2 and 3.3. In the following, we assume that the given set of sample points \(\varvec{Y}=\{\varvec{y}_i\}_{i=1}^N\subset \mathbb { R}^d\) is fixed. The proofs are carried out by making the following simplifying assumption that can be made without loss of generality.
Assumption 7.1
The points \(\varvec{y}_i\) satisfy the condition \(\sum _{i=1}^n\varvec{y }_i=\varvec{0}\).
First, we recall the density estimate defined by equations (20) and (21) with the bandwidth parametrization \(\varvec{H}= h^2\varvec{I}\). That is,
The following limits hold for the logarithm of the Gaussian kernel density estimate and its derivatives as \(h\) approaches infinity. Uniform convergence in a given compact set \(U\) can be verified by showing that the functions are uniformly bounded (i.e. bounded with respect to all \(\varvec{x}\in U\) and all \(h\ge h_0\) for some \(h_0>0\)) and that they have a uniform Lipschitz constant in \(U\) for all \(h\ge h_0\). Under these conditions, this follows from the Arzelà-Ascoli theorem (e.g. Renardy and Rogers 2004). The proof of the following lemma is omitted due to space constraints.
Lemma 7.1
Let \(\hat{p}_h:\mathbb {R}^d\rightarrow \mathbb {R}\) be a Gaussian kernel density estimate and assume \(7.1\). Then
for all \(\varvec{x}\in \mathbb {R}^d\). Furthermore, convergence to these limits is uniform in any compact set.
The following two lemmata facilitate the proof of Theorem 3.3.
Lemma 7.2
Let \(\hat{p}_h:\mathbb {R}^d\rightarrow \mathbb {R}\) be a Gaussian kernel density estimate and let Assumptions \(3.2\) and \(7.1\) be satisfied. Denote the eigenvalues of \(\nabla ^2\log {\hat{p}_h}\) by \(\lambda _1(\cdot ;h)\ge \lambda _2(\cdot ;h)\ge \dots \ge \lambda _d(\cdot ;h)\) and the corresponding eigenvectors by \(\{\varvec{w}_i(\cdot ;h)\}_{i=1}^d\). Then for any compact set \(U\subset \mathbb {R}^d\) there exists \(h_0>0\) such that
for all \(\varvec{x}\in U, h\ge h_0\) and \(i,j=1,2,\dots ,r+1\) such that \(i\ne j\). Furthermore, if we define
and
where \(\{\varvec{v}_i\}_{i=1}^r\) denote the eigenvectors of the matrix \(\hat{\varvec{\Sigma }}_{\varvec{Y}}\) defined by Eq. (3) corresponding to its \(r\) greatest eigenvalues, then for all \(\varepsilon >0\) there exists \(h_0>0\) such that
Proof
Let \(\tilde{\lambda }_1\ge \tilde{\lambda }_2\ge \dots \ge \tilde{\lambda }_d\) denote the eigenvalues of the matrix \(\hat{\varvec{\Sigma }}_{\varvec{ Y}}\) and let \(\{h_k\}\) be some sequence such that \(\lim _{k\rightarrow \infty }h_k= \infty \). By uniform convergence to the limit (32) under Assumption 7.1 and continuity of eigenvalues of a matrix as a function of its elements (e.g. Ortega 1990 Theorem 3.1.2), for all \(\varepsilon >0\) there exists \(k_0\) such that
for all \(i=1,2,\dots ,r+1\), \(\varvec{x}\in U\) and \(k\ge k_0\). Consequently, condition (33) holds for all \(\varvec{x}\in U\) for any sufficiently large \(h\) by Assumption 3.2. It also follows from Assumption 3.2, condition (36) and the reverse triangle inequality that for all \(\varepsilon >0\) and \(i,j=1,2,\dots ,r+1\) such that \(i\ne j\) and \(|\tilde{ \lambda }_i-\tilde{\lambda }_j|>\varepsilon \) there exists \(k_1\) such that
for all \(\varvec{x}\in U\) and \(k\ge k_1\). This implies condition (34). Similarly, condition (35) follows from uniform convergence to the limit (32) under Assumption 7.1, condition (34) and continuity of eigenvectors as a function of matrix elements when the eigenvalues are distinct (e.g. Ortega 1990 Theorem 3.1.3). \(\square \)
Lemma 7.3
Assume \(3.2\) and \(7.1\) and define the function
where the function \(\varvec{W}\) is defined as in Lemma 7.2, and the set \(S^r_{\infty }\) as in Theorem 3.3. Then the limit
exists for all \(\varvec{x}\in \mathbb {R}^d\). Furthermore, \(\varvec{ x}\in S^r_{\infty }\) if and only if the limit (37) is zero.
Proof
By the limits (31) and (35) the limit (37) exists for all \(\varvec{x} \in \mathbb {R}^d\). Furthermore, for any \(\varvec{x}\in \mathbb {R}^d\), the condition that the limit (37) is zero is equivalent to the condition that
where the vectors \(\varvec{v}_i\) are defined as in Lemma 7.2. By the orthogonality of the vectors \(\varvec{v}_i\), the definition of the set \(S^r_{\infty }\) and Assumption 7.1, this condition is equivalent to the condition that \(\varvec{x}\in S^r_{ \infty }\).\(\square \)
For the proof of Theorem 3.3, we define the set
where the function \(\tilde{\varvec{W}}\) is defined as in Lemma 7.3. Under Assumption 7.1, we prove both claims of Theorem 3.3 by the following two lemmata.
Lemma 7.1
Let \(U\subset \mathbb {R}^d\) be a compact set such that \(U\cap S^r_{\infty }\ne \emptyset \) for some \(0\le r<d\). If Assumptions \(3.2\) and \(7.1\) are satisfied, then for all \(\varepsilon >0\) there exists \(h_0>0\) such that
Proof
The proof is by contradiction. Let \(0\le r<d\) and let \(U\subset \mathbb {R}^d\) be a compact set such that \(U\cap S^r_{\infty }\ne \emptyset \). Assume that there exists \(\varepsilon _1>0\) such that for all \(h_0>0\) there exists \(h\ge h_0\) such that condition (39) is not satisfied. This implies that for all \(h_0>0\) there exists \(h\ge h_0\) such that
Let \(\{\varvec{x}_k\}\) denote a sequence of such points \(\varvec{x}\) with the corresponding sequence \(h_k\). Since the set \(S^r_h\cap U\) is compact by the compactness of \(U\) and the continuity of \(\tilde{\varvec{W}}(\cdot , h)\) in \(U\) for any sufficiently large \(h\), the sequence \(\{\varvec{x}_k\}\) has a convergent subsequence \(\{\varvec{z}_k\}\) whose limit point we shall denote as \(\varvec{z}^*\). Clearly \(\varvec{z}^*\notin S^r_{\infty }\) by condition (40). Thus, by Lemma 7.3 we deduce that for some \(c>0\),
In view of the definition (38), the above limit implies that there exists \(\varepsilon _2>0\) and \(k_0\) such that for all \(k\ge k_0\),
On the other hand, if we define the function \(\varvec{F}^*(\varvec{x}) =-(\varvec{I}-\varvec{V}\varvec{V}^T)\varvec{x}\), the triangle inequality yields
Combining this with the inequality
and noting the convergence of \(\varvec{F}(\cdot ;h_k)\) to the function \(\varvec{F}^*\) (that is uniform in \(U\)) as \(k\rightarrow \infty \) (by Lemmata 7.1 and 7.2), we deduce from (41) that for all \(\varepsilon _2> \varepsilon _3>0\) there exists \(k_1\) such that
for all \(\varvec{y}\in S^r_{h_k}\cap U\) and \(k\ge k_1\).
Condition (42) implies that for all \(0< \varepsilon _3<\varepsilon _2\) there exists \(k_1\) such that
On the other hand, for all \(\varepsilon >0\) we have \(\varvec{z}_k\in B( \varvec{z}^*;\varepsilon )\) for any sufficiently large \(k\) due to the assumption that \(\varvec{z}_k\) converges to \(\varvec{z}^*\). If we choose \(0<\varepsilon <\varepsilon _2\), then the sequence \(\{\varvec{x}_k \}\), whose subsequence is \(\{\varvec{z}_k\}\), has an element \(\varvec{x}_k\notin S^r_{h_k}\cap U\) for some \(k\) by condition (43). This leads to a contradiction with the construction of the sequence \(\{\varvec{x}_k\}\), which states that \(\varvec{x}_k\in S^r_{h_k}\cap U\) for all \(k\). \(\square \)
Lemma 7.5
Let \(\hat{p}_h\) be a Gaussian kernel density estimate, let \(0\le r<d\), let Assumptions \(3.2\) and \(7.1\) be satisfied and define the set \(S^r_{\infty }\) as in Theorem 3.3. Then for any compact set \(U\subset \mathbb {R}^d\) such that \(U\cap S^r_{\infty } \ne \emptyset \) and \(\varepsilon >0\) there exists \(h_0>0\) such that
Proof
Let \(0\le r<d\) and let \(\{\varvec{v}_i\}_{i=r+1}^d\) denote a set of orthonormal eigenvectors of the matrix \(\hat{\varvec{\Sigma }}_{\varvec{ Y}}\) corresponding to the \(d-r\) smallest eigenvalues. The vectors \(\{\varvec{v}_i\}_{i=r+1}^d\) are uniquely determined up to the choice of basis because the eigenvectors \(\{\varvec{v}_i\}_{i=1}^r\) spanning their orthogonal complement are uniquely determined by Assumption 3.2. Define the sets
and
for some orthonormal eigenvectors \(\{\varvec{v}_i\}_{i=r+1}^d\) spanning the orthogonal complement of \(\text{ span }(\varvec{v}_1,\varvec{v}_2,\dots , \varvec{v}_r)\).
Let \(\{\varvec{u}_i(\cdot ;h)\}_{i=1}^d\) denote a set of orthonormal vectors that are orthogonal to the eigenvectors \(\{\varvec{w}_i( \cdot ;h)\}_{i=1}^r\) of \(\nabla ^2\log {\hat{p}_h}\) corresponding to the \(r\) greatest eigenvalues. Define the functions
and
where
and \(\bar{\varvec{V}}=[\varvec{v}_{r+1}\quad \varvec{v}_{r+2}\quad \cdots \quad \varvec{v}_d]\) assuming that the orientation is chosen so that \(\text{ det }(\bar{\varvec{V}})=1\). To fix the orientation of the vectors \(\varvec{u}_i(\varvec{x};h)\), we impose the constraint
Here \(\Vert \cdot \Vert _F\) denotes the Frobenius norm,
\(O(d,d-r)\) denotes a \(d\times (d-r)\) matrix having orthonormal columns and the matrix \(\varvec{W}(\varvec{x};h)\) is defined as in Lemma 7.2. It can be shown that the function \(\varvec{U }(\cdot ;h)\) is well-defined for any \(h>0\).Footnote 2 Spanning the orthogonal complement of the columns of \(\varvec{W}(\cdot ;h)\), the columns of \(\varvec{U}(\cdot ;h)\) are also continuous in a given compact set when \(\varvec{W}(\cdot ;h)\) is continuous. That is, when condition (34) is satisfied in such a set by Lemma 7.2.
The above definitions and condition (35) in the compact set \(D_{\varepsilon }\) imply that for all \(\varepsilon _1,\varepsilon _2 >0\) there exists \(h_0>0\) such that
Consequently, uniform convergence to the limit (31) as \(h\rightarrow \infty \) by Lemma 7.3 together with the property that
following from Assumption 7.1 implies that for all \(\varepsilon _1,\varepsilon _2>0\) there exists \(h_0>0\) such that
where \(\tilde{D}_{\varepsilon }=\{\varvec{y}\in \mathbb {R}^{d-r}\mid \Vert \varvec{y}\Vert \le \varepsilon \}\).
By the above condition, for any \(0<\varepsilon _2<\varepsilon _1\) there exists \(h_0>0\) such that for all \(h\ge h_0\) and \(\varvec{x}_0\in S^r_{\infty } \cap U\) we have \(-\tilde{\varvec{F}}_{\varvec{x}_0}(\varvec{y};h )^T\varvec{y}>0\) for all \(\varvec{y}\in \partial \tilde{D}_{\varepsilon _1}\), where \(\partial \) denotes the boundary of a set. On the other hand, \(-\varvec{y}\) is the inward-pointing normal vector of the disk \(\tilde{D}_{\varepsilon _1}\) at any \(\varvec{y}\in \partial \tilde{D }_{\varepsilon _1}\). Together with the continuity of \(\tilde{\varvec{F}}_{ \varvec{x}_0}(\cdot ;h)\) in \(\tilde{D}_{\varepsilon _1}\) when \(h\) is sufficiently large, the well-known results from topology (e.g. Whittlesey 1963) then imply that \(\tilde{\varvec{F}}_{\varvec{x}_0}(\cdot ;h)\) has at least one zero point \(\varvec{y}^*\) in the interior of \(\tilde{D}_{ \varepsilon _1}\) for all \(\varvec{x}_0\in S^r_{\infty }\cap U\) and \(h\ge h_0\). Clearly, for any such \(\varvec{y}^*\) and \(\varvec{x}_0\) the point \(\varvec{x}^*=\bar{\varvec{V}}\varvec{y}^*+\varvec{x}_0\) lies in the set \(D_{\varvec{x}_0,\varepsilon }\) and \(\varvec{F}( \varvec{x}^*;h)=\tilde{\varvec{F}}_{\varvec{x}_0}(\varvec{y}^* ;h)=\varvec{0}\).
From the above we conclude that for all \(\varepsilon >0\) there exists \(h_0>0\) such that for all \(\varvec{x}_0\in S^r_{\infty }\cap U\) condition (6a) holds for \(\log {\hat{p}_h}\) at least at one point in \(D_{\varvec{x}_0,\varepsilon }\) for all \(h\ge h_0\). On the other hand, for all \(\varepsilon >0\) conditions (6b) and (6c) are satisfied in the compact set \(D_{\varepsilon }\) for all sufficiently large \(h\) by conditions (33) and (34). Hence, we have proven that for all \(\varepsilon >0\) condition (44) holds for all sufficiently large \(h\). \(\square \)
Proof (Theorem 3.3)
Follows directly from Lemmata 7.4 and 7.5 by the property that \(\mathcal {R}^r_{ \hat{p}_h}=\mathcal {R}^r_{\log {\hat{p}_h}}\subseteq S^r_h\) for all \(0\le r<d\) and \(h>0\) by Lemma 3.1 and Definition 3.1. \(\square \)
Next, we prove Theorem 3.2 under Assumption 7.1 by using the following lemma.
Lemma 7.6
Let \(\hat{p}_h\) be a Gaussian kernel density estimate, assume 7.1 and define the set
Then for some \(r>\max _{i=1,2,\dots ,n}\Vert \varvec{y}_i\Vert \) there exists \(h_0>0\) such that \(U_h\subseteq B(\varvec{0};r)\) for all \(h\ge h_0\).
Proof
The proof is by contradiction. Assume that for all \(r>r_0=\max _{i=1,2,\dots ,n}\Vert \varvec{y}_i\Vert \) and \(h_0>0\) there exists \(h\ge h_0\) such that \(\varvec{x}\in U_h\setminus B(\varvec{0};r)\). Let \(\{\varvec{x}_k\}\), \(\{r_k \}\) and \(\{h_k\}\) denote sequences satisfying these properties such that \(\{ r_k\}\) and \(\{h_k\}\) are monotoneusly increasing. This implies that
and also that for all \(k\ge k_0\),
By Assumption 7.1 and condition (46) we have that \(\Vert \varvec{x}_k-\varvec{y}_i\Vert \ge \Vert \varvec{x}_k\Vert -r_0\) for all \(k\ge k_0\) and \(i=1,2,\dots ,n\). Consequently,
for all \(k\ge k_0\). By equation (29), this implies that
for all \(k\ge k_0\). On the other hand, by the limit (30), Assumption 7.1 and the choice of \(r_0\) we have
for all \(j=1,2,\dots ,n\). Plugging the limits (48) and (49) into inequality (47) then leads to a contradiction for any sufficiently large \(k\) since \(\lim _{k\rightarrow \infty }\Vert \varvec{x}_k\Vert =\infty \) by condition (46) and the assumption that the sequence \(\{r_k\}\) is monotoneusly increasing.\(\square \)
Proof (Theorem 3.2)
By Lemma 7.6 there exists
such that \(U_h\subseteq B(\varvec{0};r)\) for all sufficiently large \(h\). Thus, condition (9) for all \(\varvec{x}\in U_h\) and such \(h\) follows from Assumption 3.2, compactness of the set \(B(\varvec{0};r)\) and Lemma 7.2. Finally, compactness and connectedness of the set \(U_h\) for all sufficiently large \(h\) follows from the strict concavity of \(\log {\hat{p}_h}\) in \(B(\varvec{0};r)\supseteq U_h\) by condition (33).\(\square \)
Rights and permissions
About this article
Cite this article
Pulkkinen, S. Nonlinear kernel density principal component analysis with application to climate data. Stat Comput 26, 471–492 (2016). https://doi.org/10.1007/s11222-014-9539-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9539-0