Sequential Batch Design for Gaussian Processes Employing Marginalization †
<p>One-dimensional test case: Expectation values of the target function (Prediction) from Markov chain Monte Carlo (MCMC)-calculation where the grey-shaded area represents the uncertainty range. The utility (scaled and normalized) is plotted at the bottom of each figure. Its maximum <math display="inline"> <semantics> <mrow> <msub> <mi>U</mi> <mi>max</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>opt</mi> </msub> <mo>)</mo> </mrow> </mrow> </semantics> </math> shows the next input vector added to the pool of data. Top to bottom: increasing number of optimized points in input data pool. (<b>a</b>–<b>c</b>) without marginalized point. (<b>d</b>–<b>f</b>) with marginalized point at <math display="inline"> <semantics> <mrow> <msub> <mover accent="true"> <mi>x</mi> <mo stretchy="false">^</mo> </mover> <mn>1</mn> </msub> <mo>=</mo> <mn>0</mn> <mo>.</mo> <mn>775</mn> </mrow> </semantics> </math>.</p> "> Figure 2
<p>Deviation of the Gaussian process results from the exact model outcome as function of the number <math display="inline"> <semantics> <msub> <mi>N</mi> <mi>opt</mi> </msub> </semantics> </math> of obtained data at optimized parameter settings. Dotted line: without marginalized input values; dashed line: with one marginalized input value. Both show quadratic decay behaviour.</p> "> Figure 3
<p>Two test cases for one-dimensional input vectors, <math display="inline"> <semantics> <msub> <mi>N</mi> <mi>ROI</mi> </msub> </semantics> </math> = 81. (<b>a</b>–<b>c</b>) Gaussian model; (<b>d</b>–<b>f</b>) damped cosine model. Top row (<b>a</b>,<b>d</b>): model (solid line), initial input data value (filled circle), optimized approach (dotted line) vs. randomly chosen parameter settings (dashed line). Panels (<b>b</b>,<b>c</b>,<b>e</b>,<b>f</b>) with number of added points to the right: in the top of each figure, hyper-parameters <span class="html-italic">λ</span> and <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>n</mi> </msub> </semantics> </math>; in the bottom, total difference between target and model for <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.1 (dotted line), <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.01 (dashed line), <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.001 (dot-dashed line). Middle row (<b>b</b>,<b>e</b>): optimized approach. Bottom row (<b>c</b>,<b>f</b>): random parameter setting. The solid lines represent dedicated decay powers.</p> "> Figure 4
<p>Two test cases for two-dimensional input vectors, <math display="inline"> <semantics> <msub> <mi>N</mi> <mi>ROI</mi> </msub> </semantics> </math> = 21 × 21 (ROI: region of interest). (<b>a</b>–<b>c</b>): Gaussian model; (<b>d</b>–<b>f</b>): damped cosine model. Top row (<b>a</b>,<b>d</b>): model, five initial input vectors (plus signs in base). Panels (<b>b</b>,<b>c</b>,<b>e</b>,<b>f</b>) with number of added points to the right: in the top of each figure, hyper-parameters <span class="html-italic">λ</span> and <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>n</mi> </msub> </semantics> </math>; in the bottom, total difference between target and model for <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.1 (dotted line), <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.01 (dashed line), <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.001 (dot-dashed line). Middle row (<b>b</b>,<b>e</b>): optimized approach. Bottom row (<b>c</b>,<b>f</b>): random parameter setting. The solid lines represent dedicated decay powers. The small inset in (<b>e</b>) shows the deviations of target and model for <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.01 and <math display="inline"> <semantics> <msub> <mi>σ</mi> <mi>d</mi> </msub> </semantics> </math> = 0.001, which settles on a square root behavior around 400 points.</p> ">
Abstract
:1. Introduction
2. Prediction of Function Values
3. Marginalizing the Hyper-Parameters
4. Closed Loop Optimization Scheme
5. Marginalizing Test Points
6. Validation in One Dimension
7. Convergence Study
8. Summary and Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
Appendix A. Notation Table
N | number of input data vectors |
number of elements in the input data vector | |
number of marginalized test data | |
number of points added to the data pool by optimization | |
number of points added to the data pool chosen randomly | |
number of test points in the region of interest | |
test input vector | |
first test input vector, for which the target value is marginalized | |
test input vector found by the utility criterion | |
test input vector after first marginalized test input vector was found | |
i-th input data vector | |
matrix with input data vectors as columns | |
matrix of the input data vectors expanded by the vector of grid points | |
vector of grid points within region of interest | |
grid point with largest utility | |
target value at test input vector | |
function of input data to describe target data | |
first target value, to be marginalized | |
target value at test point, obtained after marginalization of a first target value | |
vector of the N target data | |
ϵ | uncertainty of the target data |
variance of the i-th target data | |
-th element of the matrix of the variances of target data | |
λ | length scale to set up the notion of distance between input data vectors |
signal variance of the distribution over functions f | |
overall noise in the data | |
vector of the hyper-parameters | |
covariance of two input data vectors | |
short notation for the i-th element of the vector of covariances between test input vector and input data vector | |
-th element of the covariance matrix of the input data vectors | |
covariance matrix of the expanded input | |
region of interest to run Gaussian processes | |
utility of a target data obtained at input vector ξ |
Appendix B. Algorithm for Computer Simulation
- Compose input data vector from data base;
- Set up batch run with processors;
- Processor #1: code running without any marginalized point;Processor #: code running for i marginalized points;
- Return outcome; i.e., most promising parameter settings for the long running simulation code ready for batch execution.
References
- Barber, D. Bayesian Reasoning and Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Bishop, C. Neural Networks for Pattern Recognition; Oxford University Press: Oxford, UK, 1996. [Google Scholar]
- MacKay, D.J.C. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Cohn, D. Neural Network Exploration Using Optimal Experiment Design. Neural Netw. 1996, 9, 1071–1083. [Google Scholar] [CrossRef]
- Managing Uncertainty in Complex Models. Available online: http://www.mucm.ac.uk/Pages/Dissemination/RelatedPapers.html (accessed on 18 February 2017).
- Seo, S.; Wallat, M.; Graepel, T.; Obermayer, K. Gaussian process regression: Active data selection and test point rejection. In Mustererkennung 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 241–246. [Google Scholar]
- Gramacy, R.B.; Lee, H.K.H. Adaptive Design and Analysis of Supercomputer Experiments. Technometrics 2009, 51, 130–145. [Google Scholar] [CrossRef]
- Mockus, J. Bayesian Approach to Global Optimization; Springer: Berlin/Heidelberg, Germany, 1989. [Google Scholar]
- Sacks, J.; Welch, W.; Mitchell, T.; Wynn, H. Design and Analysis of Computer Experiments. Stat. Sci. 1989, 4, 409–435. [Google Scholar] [CrossRef]
- Locatelli, M. Bayesian Algorithms for One-Dimensional Global Optimization. J. Glob. Optim. 1997, 10, 57–76. [Google Scholar] [CrossRef]
- Azimi, J.; Fern, A.; Fern, X. Batch Bayesian Optimization via Simulation Matching. In Advances in Neural Information Processing Systems 23; Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A., Eds.; Curran Associates: Red Hook, NY, USA, 2010; pp. 109–117. [Google Scholar]
- Azimi, J.; Jalali, A.; Fern, X. Hybrid Batch Bayesian Optimization. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, 26 June–1 July 2012.
- Gonzalez, J.; Osborne, M.; Lawrence, N. GLASSES: Relieving The Myopia of Bayesian Optimisation. J. Mach. Learn. Res. 2016, 51, 790–799. [Google Scholar]
- Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning; MIT Press: Cambridge, UK, 2006. [Google Scholar]
- Preuss, R.; von Toussaint, U. Gaussian Processes for SOLPS Data Emulation. Fusion Sci. Technol. 2016, 69, 605–610. [Google Scholar] [CrossRef]
- Coster, D.; (IPP, Garching, Germany). Private communication, 2014.
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Preuss, R.; Von Toussaint, U. Sequential Batch Design for Gaussian Processes Employing Marginalization †. Entropy 2017, 19, 84. https://doi.org/10.3390/e19020084
Preuss R, Von Toussaint U. Sequential Batch Design for Gaussian Processes Employing Marginalization †. Entropy. 2017; 19(2):84. https://doi.org/10.3390/e19020084
Chicago/Turabian StylePreuss, Roland, and Udo Von Toussaint. 2017. "Sequential Batch Design for Gaussian Processes Employing Marginalization †" Entropy 19, no. 2: 84. https://doi.org/10.3390/e19020084
APA StylePreuss, R., & Von Toussaint, U. (2017). Sequential Batch Design for Gaussian Processes Employing Marginalization †. Entropy, 19(2), 84. https://doi.org/10.3390/e19020084