Nested Kriging predictions for datasets with a large number of observations

Didier Rullière¹,
Nicolas Durrande²,
François Bachoc³ &
…
Clément Chevalier⁴

1409 Accesses
34 Citations
Explore all metrics

Abstract

This work falls within the context of predicting the value of a real function at some input locations given a limited number of observations of this function. The Kriging interpolation technique (or Gaussian process regression) is often considered to tackle such a problem, but the method suffers from its computational burden when the number of observation points is large. We introduce in this article nested Kriging predictors which are constructed by aggregating sub-models based on subsets of observation points. This approach is proven to have better theoretical properties than other aggregation methods that can be found in the literature. Contrarily to some other methods it can be shown that the proposed aggregation method is consistent. Finally, the practical interest of the proposed method is illustrated on simulated datasets and on an industrial test case with \(10^4\) observations in a 6-dimensional space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Optimally Weighted Cluster Kriging for Big Data Regression

Combination of optimization-free kriging models for high-dimensional problems

Article 27 October 2023

Optimal Model Average Prediction in Orthogonal Kriging Models

Article 08 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bachoc, F.: Cross validation and maximum likelihood estimations of hyper-parameters of Gaussian processes with model mispecification. Comput. Stat. Data Anal. 66, 55–69 (2013)
Article Google Scholar
Bachoc, F., Durrande, N., Rullière, D., Chevalier, C.: Some properties of nested Kriging predictors. Technical report hal-01561747 (2017)
Bhatnagar, S., Prasad, H., Prashanth, L.: Stochastic Recursive Algorithms for Optimization, vol. 434. Springer, New York (2013)
MATH Google Scholar
Cao, Y., Fleet, D.J.: Generalized product of experts for automatic and principled fusion of Gaussian process predictions. arXiv preprint arXiv:1410.7827v2, CoRR, abs/1410.7827:1–5. Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS, Montreal (2014)
Deisenroth, M.P., Ng, J.W.: Distributed Gaussian processes. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, France. JMLR: W & CP vol. 37 (2015)
Genest, C., Zidek, J.V.: Combining probability distributions: a critique and an annotated bibliography. Stat. Sci. 1(1), 114–135 (1986)
Article MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
MATH Google Scholar
Guhaniyogi, R., Finley, A.O., Banerjee, S., Gelfand, A.E.: Adaptive Gaussian predictive process models for large spatial datasets. Environmetrics 22(8), 997–1007 (2011)
Article MathSciNet Google Scholar
Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian Processes for Big Data. Uncertainty in Artificial Intelligence Conference. Paper Id 244 (2013)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Article MATH Google Scholar
Katzfuss, M.: Bayesian nonstationary spatial modeling for very large datasets. Environmetrics 24(3), 189–200 (2013)
Article MathSciNet Google Scholar
Lemaitre, J., Chaboche, J.-L.: Mechanics of Solid Materials. Cambridge University Press, Cambridge (1994)
MATH Google Scholar
Maurya, A.: A well-conditioned and sparse estimation of covariance and inverse covariance matrices using a joint penalty. J. Mach. Learn. Res. 17(1), 4457–4484 (2016)
MathSciNet MATH Google Scholar
Nickson, T., Gunter, T., Lloyd, C., Osborne, M.A., Roberts, S.: Blitzkriging: Kronecker-structured stochastic Gaussian processes. arXiv preprint arXiv:1510.07965v2, pp 1–13 (2015)
Ranjan, R., Gneiting, T.: Combining probability forecasts. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72(1), 71–91 (2010)
Article MathSciNet Google Scholar
Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J. Stat. Softw. 51(1), 1–55 (2012)
Rue, H., Held, L.: Gaussian Markov Random Fields. Theory and Applications. Chapman & Hall, London (2005)
Book MATH Google Scholar
Samo, Y.-L.K., Roberts, S.J.: String and membrane Gaussian processes. J. Mach. Learn. Res. 17(131), 1–87 (2016)
MathSciNet MATH Google Scholar
Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer, Berlin (2013)
MATH Google Scholar
Satopää, V.A., Pemantle, R., Ungar, L.H.: Modeling probability forecasts via information diversity. J. Am. Stat. Assoc. 111(516), 1623–1633 (2016)
Article MathSciNet Google Scholar
Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I., McCulloch, R.E.: Bayes and big data: the consensus monte carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11(2), 78–88 (2016)
Google Scholar
Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer, Berlin (2012)
Google Scholar
Stein, M.L.: Limitations on low rank approximations for covariance matrices of spatial data. Spat. Stat. 8, 1–19 (2014)
Article MathSciNet Google Scholar
Tresp, V.: A bayesian committee machine. Neural Comput. 12(11), 2719–2741 (2000)
Article Google Scholar
Tzeng, S., Huang, H.-C., Cressie, N.: A fast, optimal spatial-prediction method for massive datasets. J. Am. Stat. Assoc. 100(472), 1343–1357 (2005)
Article MathSciNet MATH Google Scholar
van Stein, B., Wang, H., Kowalczyk, W., Bäck, T., Emmerich, M.: Optimally weighted cluster Kriging for big data regression. In: International Symposium on Intelligent Data Analysis, pp. 310–321. Springer (2015)
Wahba, G.: Spline Models for Observational Data, vol. 59. SIAM, Philadelphia (1990)
Book MATH Google Scholar
Wei, H., Du, Y., Liang, F., Zhou, C., Liu, Z., Yi, J., Xu, K., Wu, D.: A k-d tree-based algorithm to parallelize Kriging interpolation of big spatial data. GISci. Remote Sens. 52(1), 40–57 (2015)
Article Google Scholar
Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Winkler, R.L.: The consensus of subjective probability distributions. Manag. Sci. 15(2), B–61 (1968)
Article Google Scholar
Winkler, R.L.: Combining probability distributions from dependent information sources. Manag. Sci. 27(4), 479–488 (1981)
Article MATH Google Scholar
Zhang, B., Sang, H., Huang, J.Z.: Full-scale approximations of spatio-temporal covariance models for large datasets. Stat. Sinica 25(1), 99–114 (2015)

Download references

Acknowledgements

Part of this research was conducted within the frame of the Chair in Applied Mathematics OQUAIDO, gathering partners in technological research (BRGM, CEA, IFPEN, IRSN, Safran, Storengy) and academia (Ecole Centrale de Lyon, Mines Saint-Etienne, University of Grenoble, University of Nice, University of Toulouse and CNRS) around advanced methods for Computer Experiments. The authors would like to warmly thank Dr. Géraud Blatman and EDF R&D for providing us the industrial test case. They also thank both editor and reviewers for very precise and constructive comments on this paper. This paper has been finished during a stay of D. Rullière at Vietnam Institute for Advanced Study in Mathematics, the latter author thanks the VIASM institute and DAMI research chair (Data Analytics & Models for Insurance) for their support.

Author information

Authors and Affiliations

Laboratoire SAF, EA2429, ISFA, Université Claude Bernard Lyon 1, Université de Lyon, 50 avenue Tony Garnier, 69366, Lyon, France
Didier Rullière
Institut Fayol—LIMOS, Mines Saint-Étienne, 158 cours Fauriel, Saint-Étienne, France
Nicolas Durrande
Institut de Mathématiques de Toulouse, Université Paul Sabatier, 118 route de Narbonne, 31062, Toulouse, France
François Bachoc
Institute of Statistics, University of Neuchâtel, avenue de Bellevaux 51, 2000, Neuchâtel, Switzerland
Clément Chevalier

Authors

Didier Rullière
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Durrande
View author publications
You can also search for this author in PubMed Google Scholar
François Bachoc
View author publications
You can also search for this author in PubMed Google Scholar
Clément Chevalier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Durrande.

Appendix: Proof of Proposition 4

Complexities Under chosen assumption on \(\alpha \) and \(\beta \) coefficients, for a regular tree and in the case of simple Kriging sub-models, \(\mathcal {C}_\alpha =\sum _{\nu =1}^{\bar{\nu }} \sum _{i=1}^{n_\nu } \alpha c_\nu ^3 =\alpha \sum _{\nu =1}^{\bar{\nu }} c_\nu ^3 n_\nu \) and \(\mathcal {C}_\beta =\sum _{\nu =1}^{\bar{\nu }} \sum _{i=2}^{n_\nu } \sum _{j=1}^{i-1}\beta c^2_{\nu } =\frac{\beta }{2} \sum _{\nu =1}^{\bar{\nu }} n_\nu (n_{\nu }-1) c^2_{\nu }\). Notice that the sum starts from \(\nu =1\) in order to include sub-models calculation. Equilibrated trees complexities in a constant child number setting, when \(c_\nu =c\) for all \(\nu \), the tree structure ensures that \(n_{\nu }=n/c^{\nu }\), thus as \(c=n^{1/\bar{\nu }}\), we get when \(n \rightarrow +\infty \), \(\mathcal {C}_\alpha \sim \alpha n^{1+\frac{2}{\bar{\nu }}}\) and \(\mathcal {C}_\beta \sim \frac{\beta }{2} n^2\). The result for equilibrated two-layer tree where \(\bar{\nu }=2\) directly derives from this one, and in this case \(\mathcal {C}_\alpha \sim \alpha n^{2}\) and \(\mathcal {C}_\beta \sim \frac{\beta }{2} n^2\) (it derives also from the expressions of \(\mathcal {C}_\alpha \), \(\mathcal {C}_\beta \), when \(c_1=c_2=\sqrt{n}\), \(n_1=\sqrt{n}\), \(n_2=1\)). Optimal tree complexities one easily shows that under the chosen assumptions \(\mathcal {C}_\beta \sim \frac{\beta }{2}n^2\). Thus, it is indeed not possible to reduce the whole complexity to orders lower than \(O(n^2)\). However, one can choose the tree structure in order to reduce the complexity \(\mathcal {C}_\alpha \). For a regular tree, \(n_\nu =n/(c_1 \cdots c_{\nu })\) such that \(\frac{\partial }{\partial c_k} n_{\nu } = -\mathbf {1}_{{\lbrace {\nu \ge k}\rbrace }} n_\nu /c_k\). Using a Lagrange multiplier \(\ell \), one defines \(\xi (k)=c_k \frac{\partial }{\partial c_k} \left( \mathcal {C}_\alpha - \ell (c_1 \cdots c_{\bar{\nu }} -n) \right) = 3\alpha c_k^3 n_k - \alpha \sum _{\nu =k}^{\bar{\nu }}c_\nu ^3 n_\nu - \ell c_1 \cdots c_{\bar{\nu }}\). The tree structure that minimizes \(\mathcal {C}_\alpha \) is such that for all \(k<\bar{\nu }\), \(\xi (k)=\xi (k+1)=0\). Using \(c_{k+1} n_{k+1}=n_k\), one gets \(3c_{k+1}^2=2 c_{k}^3\) for all \(k<\bar{\nu }\), and setting \(c_1\cdots c_{\bar{\nu }}=n\), \(c_\nu = \delta \left( \delta ^{-\bar{\nu }} n\right) ^{\frac{\delta ^{\nu -1}}{2(\delta ^{\bar{\nu }}-1)}}\), \(\nu =1, \ldots , \bar{\nu }\), with \(\delta =\frac{3}{2}\). Setting \(\gamma =\frac{27}{4}\delta ^{-\frac{\bar{\nu }}{\delta ^{\bar{\nu }}-1}}\left( 1-\delta ^{-\bar{\nu }}\right) \). After some direct calculations this tree structure corresponds to complexities, \(\mathcal {C}_\alpha = \gamma \alpha n^{1+\frac{1}{\delta ^{\bar{\nu }}-1}}\) and \(\mathcal {C}_\beta \sim \frac{\beta }{2}n^2\). In a two-layers setting one gets \(c_1=\left( \frac{3}{2}\right) ^{1/5} n^{2/5}\) and \(c_2=\left( \frac{3}{2}\right) ^{-1/5} n^{3/5}\), which leads to \(\mathcal {C}_\alpha = \gamma \alpha n^{9/5}\) and \(\mathcal {C}_\beta = \frac{\beta }{2} n^2 - \frac{\beta }{2} \left( \frac{3}{2}\right) ^{\frac{1}{5}}n^{\frac{7}{5}}\), where \(\gamma =(\frac{2}{3})^{-2/5}+(\frac{2}{3})^{3/5}\simeq 1.96\) (eventually notice that even for values of n of order \(10^5\), terms of order like \(n^{9/5}\) are not necessarily negligible compared to those of order \(n^2\), and that \(\mathcal {C}_\beta \) is slightly affected by the choice of the tree structure, but the global complexity benefits from the optimization of \(\mathcal {C}_\alpha \)).

Storage footprint First, covariances can be stored in triangular matrices. So temporary objects M, k and K in Algorithm 1 require the storage of \(c_{\max }(c_{\max }+5)/2\) real values. For a given step \(\nu \), \(\nu \ge 2\), building all vectors \(\alpha _i\) requires the storage of \(\sum _{i=1}^{n_{\nu }} c_i^{\nu }=n_{\nu -1}\) values. At last, for a given step \(\nu \), we simultaneously need objects \(M_{\nu -1}, K_{\nu -1}, M_{\nu }, K_{\nu }\), which require the storage of \(n_{\nu -1}(n_{\nu -1}+3)/2 + n_{\nu }(n_{\nu }+3)/2\) real values. In a regular tree, as \(n_\nu \) is decreasing in \(\nu \), the storage footprint is \(\mathcal {S} = (c_{\max }(c_{\max }+5) + n_1(n_1+5) + n_2(n_2+3))/2\). Hence the equivalents for \(\mathcal {S}\) for the different tree structures, \(\mathcal {S}\sim n\) for the two-layer equilibrated tree, \(\mathcal {S}\sim \frac{1}{2}n^{2-2/\bar{\nu }}\) for the \(\bar{\nu }\)-layer, \(\bar{\nu }>2\) and the indicated result for the optimal tree. Simple orders are given in the proposition, which avoids separating the case \(\bar{\nu }=2\) and a cumbersome constant for the optimal tree.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rullière, D., Durrande, N., Bachoc, F. et al. Nested Kriging predictions for datasets with a large number of observations. Stat Comput 28, 849–867 (2018). https://doi.org/10.1007/s11222-017-9766-2

Download citation

Received: 09 December 2016
Accepted: 19 July 2017
Published: 26 July 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11222-017-9766-2

Nested Kriging predictions for datasets with a large number of observations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimally Weighted Cluster Kriging for Big Data Regression

Combination of optimization-free kriging models for high-dimensional problems

Optimal Model Average Prediction in Orthogonal Kriging Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Proposition 4

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Nested Kriging predictions for datasets with a large number of observations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimally Weighted Cluster Kriging for Big Data Regression

Combination of optimization-free kriging models for high-dimensional problems

Optimal Model Average Prediction in Orthogonal Kriging Models

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Proof of Proposition 4

Appendix: Proof of Proposition 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation