Abstract
One of the major challenges in time series analysis are changing data distributions, especially when processing data streams. To ensure an up-to-date model delivering useful predictions at all times, model reconfigurations are required to adapt to such evolving streams. For Gaussian processes, this might require the adaptation of the internal kernel expression. In this paper, we present dynamically self-adjusting Gaussian processes by introducing Event-Triggered Kernel Adjustments in Gaussian process modelling (ETKA), a novel data stream modelling algorithm that can handle evolving and changing data distributions. To this end, we enhance the recently introduced Adjusting Kernel Search with a novel online change point detection method. Our experiments on simulated data with varying change point patterns suggest a broad applicability of ETKA. On real-world data, ETKA outperforms comparison partners that differ regarding the model adjustment and its refitting trigger in nine respective ten out of 14 cases. These results confirm ETKA’s ability to enable a more accurate and, in some settings, also more efficient data stream processing via Gaussian processes.
J.D. Hüwel and F. Haselbeck—Both authors contributed equally.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
For many applications, accurate real-time analysis of data streams is essential to guarantee a constant workflow. In order to analyze data streams, incoming data points need to be incorporated into an online-generated data model. However, changing data distributions at so called change points are a common challenge in data stream modelling [4]. As a consequence, an outdated prediction model might impair a downstream application. For instance, this could lead to overstocking and missed sales in demand forecasting or power supply issues in case of smart grid systems [2, 11, 12]. Providing an up-to-date model is therefore a major objective when modelling data streams. However, identifying the correct time point for model reconfiguration is challenging. Simply adjusting the current model periodically bypasses this challenge, but it might lead to prolonged periods with inaccurate models or increased computational costs due to unnecessary reconfigurations. Because of these drawbacks, many algorithms aim to detect change points online and consequently trigger model adjustments [3, 18].
A Gaussian process (GP) is a stochastic process based on the Gaussian distribution and is commonly used as a non-parametric machine learning model [17]. GPs’ probabilistic nature makes them excel at dealing with small and noisy datasets. To incorporate knowledge on the general behavior of the data, GPs use positive semi-definite covariance functions, often called kernels. If no prior knowledge about this behavior is available, an automatic kernel search can determine a fitting function for the given data [8]. However, this is usually a computationally expensive process and requires the optimization of numerous GP models. Recently, the Adjusting Kernel Search (AKS) [13] algorithm was introduced to accelerate this process on data streams. If multiple GP models are used to represent consecutive segments of a stream, it is often reasonable to assume that the models’ covariance functions will be similar. AKS enables a search of similar kernels based on a given kernel expression and circumvents the construction of novel expressions from scratch.
In this paper, we enhance AKS with a novel GP-based change point detection (CPD) method in order to propose the Event-Triggered Kernel Adjustments in Gaussian process modelling (ETKA) algorithm. The major objective of this algorithm is to deliver an up-to-date GP model describing the current data behavior at all times. We evaluate ETKA based on simulated as well as real-world data and compare it to alternatives in CPD and model inference. Beyond that, we present multiple ways to further expand and improve this method.
The rest of the paper is structured as follows: In Sect. 2, we outline relevant literature about GPs, kernel search algorithms and CPD methods. Sect. 3 introduces the ETKA algorithm. In Sect. 4 we describe our experimental setup. Afterwards, in Sect. 5, we show and discuss the results, before we conclude our findings in Sect. 6.
2 Related Work
The research we present in this paper combines the fields CPD and GP-based data stream modelling. In this section, we briefly introduce relevant works from these fields. Due to a lack of space, we include a formal introduction of GPs and kernel search approaches in Appendix 1.
GPs are commonly used probabilistic machine learning models that mainly depend on their inherent kernel function. An appropriate kernel expression can be chosen by an expert based on previous knowledge about the data. Without such expert knowledge, automatic kernel search algorithms can be employed to find an optimal fit for the given data [6, 8, 14, 15]. In 2013, Duvenaud et al. [8] introduced such an algorithm for the first time, i.e. the Compositional Kernel Search (CKS). Lloyd et al. [15] expanded the method to the Automatic Bayesian Covariance Discovery (ABCD) by including change point kernels. Since both algorithms require the exact evaluation of numerous GP models per iteration, they are restricted to small to medium-sized datasets only.
The problem of scalability was addressed in different ways: Kim et al. [14] introduced the Scalable Kernel Composition (SKC) in 2018, which performs model selection via lower and upper bounds for the GPs’ marginal likelihood instead of the exact evaluations. Berns et al. developed new approaches that use a prior segmentation of the data to accelerate the search [5] or perform the segmentation themselves [6]. We aim to expand this idea by performing a dynamic segmentation via online CPD during the modelling process. Recently, Hüwel et al. [13] introduced the Adjusting Kernel Search (AKS) algorithm, which exploits prior assumptions about the data without ascertaining the kernel function. More details about AKS can be found in Appendix 1.
Haselbeck et al. [10] developed EVARS-GPR, a framework to update a GP model online at certain change points. While this approach is restricted to output scale changes, it can be seen as a predecessor of this work due to its retraining of a GP at online-detected change points.
CPD approaches can be separated into offline and online methods. The former were extensively reviewed by Truong et al. [18]. In this paper, we focus on online methods to enable real-time model adjustments. Aminikhanghahi and Cook [3] provided an elaborate overview of available approaches in this area. One widely-used method is the Bayesian Online Change Point Detection [1]. It uses Bayes’ rule to determine the number of observations since the last change point and a hazard function to predict a new one. Another commonly used method is the cumulative sum (CUSUM) [16]. It tracks an accumulated deviation score over multiple data points and detects a change point when that score exceeds a custom threshold. CUSUM’s high potential for adjustments allows us to use this approach for ETKA.
3 ETKA
Previous applications of AKS employed periodic adjustments of the GP model [13]. This potentially leads to extended periods of incoming data points being processed with an outdated model. Contrariwise, unnecessarily frequent model reconfigurations cause increased computational costs. For this reason, we present the Event-Triggered Kernel Adjustments in Gaussian process Modelling (ETKA) algorithm, an enhancement of AKS with a novel GP-based online CPD approach.
The combination of a kernel search algorithm with a CPD method is obvious, but the nature of GPs results in specific requirements for an optimal CPD method. Changes should not be detected in primary statistics of the incoming data, such as its mean or variance, as a GP model does not need to be adjusted to regular changes of that kind. Rather, we need to find changes in the abstract behavior and tendencies underlying the data. For example, if the periodicity changes, we want to adjust the model to find a new period length value. Aside from an accurate modelling, there are two additional requirements for CPD: the method should be simple, computationally efficient and easily comprehensible in order to maintain high explainability.
Within ETKA, we achieve these goals by employing a CUSUM approach [16] based on the current GP’s performance. We hypothesize that data that differs from the current model’s prediction for multiple points in a row signifies a change point. In this case, a kernel search using AKS is triggered to adjust the model to the novel data. The exact procedure of ETKA is explained below and presented in Algorithm 1.
First, we construct a GP model using CKS on an initial window of w data points. By using this model with kernel \(k:\mathbb {R} \times \mathbb {R} \rightarrow \mathbb {R}\), we make a prediction \(\hat{y}_{i}\) for the next value starting from \(i=w+1\) after this initial window of length w as follows [17]:
Then, we calculate the absolute deviation \(|y_{i}-\hat{y}_{i}|\) between the observed value \(y_{i}\) and our prediction \(\hat{y}_{i}\). Afterwards, the window slides one position further and the consecutive data points are used for the next prediction step employing the current GP. The accumulating error is used together with the GP’s noise \(\sigma ^2\) and a tolerance factor \(\delta \in \mathbb {R}\) to compute a change point score s:
If this score surpasses a certain threshold \(\varepsilon \in \mathbb {R}_{>0}\), a change point is detected at \(x_{i}\) and s is reset to zero. With this CPD approach, incoming data points need to be within the inner \(100 \cdot \delta \%\) of the GP’s confidence interval in order to count as accurately predicted. The further a data point is outside this interval, the more it increases s and the faster a change point is detected. When the need for a model adjustment is triggered due to a detected change point, the GP’s kernel is adjusted with AKS on the current window w. Then, the procedure with the CUSUM-based CPD and a potential trigger of AKS continues with this updated model. In settings where other kernel search methods are considered more useful, this step can easily be substituted with the corresponding approach.
4 Experimental Setup
In this section, we present the experimental settings that we employed to produce the results shown in Sect. 5. All experiments were conducted on an \(11^{th}\) generation Intel Core i9-11900H processor with 8 cores à 2.50 GHz. For reproducibility, we published all code and data on GitHubFootnote 1.
4.1 Simulated Data
For the development and evaluation of ETKA under controlled and predefined settings, we generated artificial datasetsFootnote 2. The simulations are based on univariate time series of length n with values \(\boldsymbol{b} \in \mathbb {R}^n\) that follow a periodicity of length \(n_{per}\) and have an amplitude of size a. Furthermore, we consider the multiplicative components linear trend \(\boldsymbol{l} \in \mathbb {R}^n\) and random noise \(\boldsymbol{\eta } \in \mathbb {R}^n\). The size of the linear trend \(\boldsymbol{l}\) is defined by the coefficient m of a linear model. The random noise \(\boldsymbol{\eta }\) is sampled from a normal distribution with a mean value of 1 and a standard deviation s. These components enable the simulation of time series data \(\boldsymbol{y} \in \mathbb {R}^n\) with
where \(b_i\) is the value of the base signal \(\boldsymbol{b}\) at index i, \(l_i\) the factor of the linear trend \(\boldsymbol{l}\) at index i and \(\eta _i\) the factor of the random noise \(\boldsymbol{\eta }\) at index i. A factor can be left out to simulate that a component is not present. Finally, we can generate change points by fading between time series \(\boldsymbol{y}\) of different properties. The abruptness of the change is adjustable via a fading window \(w_{fade}\). We used this framework to model for instance changes of the period length (parameter \(n_{per}\) of \(\boldsymbol{b}\)), the output scale (parameter a of \(\boldsymbol{b}\)), the size of the linear trend (parameter m of \(\boldsymbol{l}\)) or the noise level (parameter s of \(\boldsymbol{\eta }\)). We both considered time series with single and multiple change points, all having a length of 2000 data points. An overview of all simulation settings can be found in Table 3 in Appendix 2.
4.2 Real-World Data
We further included 14 real-world datasets from various domains, see Table 4 in the appendix. Besides the different domains, the datasets show a wide range regarding the number of samples reaching from 180 (Call centre) up to 7718 (Airquality). Examining the minimum (min) and maximum (max) target values of each time series, we further observe that many datasets have a large value range. Beyond that, several datasets have a standard deviation (std) that is rather high relative to their value range, indicating a strong variation in the time series. In summary, this collection of common time series datasets with varying characteristics and domains enables a broad evaluation of ETKA.
4.3 Evaluation
As comparison partners in our experiments, we included alternatives to ETKA in both main aspects: the choices when to adjust the model and how to adjust the model. Hence, we compare the previously described CPD-based approach to data-agnostic periodic model adjustments (PER AKS). These model adjustments are done three times in equidistant intervals within the dataset after the initial kernel search. Furthermore, only the data in the current window is used, disregarding everything before that. By doing so, we examine the concept of CPD-driven adjustments at the cost of frequent model predictions needed to enable CPD. Furthermore, as mentioned in Sect. 2, different kernel search algorithms exist [6, 8, 14, 15]. For this proof of concept, we included a hyperparameter optimization (no change of the GP’s kernel expression) after a detected change point as a comparison partner (CPD HPO). This approach also performs its retraining on the current window exclusively.
The two evaluation criteria we consider in this work are runtime and the mean absolute error of the prediction. Regarding the former, we measured the total runtime for processing each dataset, excluding the initial model construction as it is identical for all comparison partners. With respect to the prediction performance, we calculated the mean absolute error \(\frac{1}{N-w}\sum _{i=w+1}^{N}|y_i-\hat{y}_i|\) on every step after the initial model construction. Finally, we set the results in relation to all comparison partners. Thereby, a negative value represents a shorter runtime respective more accurate modelling of ETKA, i.e. an improvement.
For all experiments, we use a window size w equivalent to \(20\%\) of the whole dataset. We allow kernels consisting of up to three base kernels and employ one iteration of adjustments when using AKS. The base kernel set B consists of the periodic, the linear and the squared exponential kernel. Before the initial kernel search with CKS, the data is rescaled using a Z-normalization.
5 Experimental Results
In this section, we provide an overview of our experimental results, both on simulated as well as on real-world data, and discuss our findings.
5.1 Simulated Data
As outlined in Sect. 4.1, we conducted experiments based on simulated data to evaluate ETKA under controlled and predefined settings. We further considered different configurations of the online CPD integrated in ETKA. These differ with respect to the defining parameters, i.e. the tolerance factor \(\delta \) and the threshold \(\varepsilon \). With the former, one can control how strict ETKA’s CPD is, i.e. a higher value increases the tolerance range for deviations between real and prediction values. A change point is declared, if the threshold \(\varepsilon \) is exceeded, so a higher value allows higher change point scores and leads to a less sensitive CPD.
We show detailed results with absolute evaluation values for all simulated datasets and CPD configurations in Figs. 2 and 3 in Appendix 3. In Table 1, we provide a summary of these results. Besides the absolute evaluation values of ETKA, all results are shown in relation to the comparison partners PER AKS and CPD HPO. The table shows averaged values over all simulated datasets and its standard deviations. As CPD HPO also differs based on \(\delta \) and \(\varepsilon \), the values of the relative comparison with ETKA cannot be compared across CPD settings, but give an impression which algorithm is in advantage for the specific CPD.
We observe that ETKA constantly outperforms both comparison partners with respect to the prediction error. The predictions of ETKA tend to be more accurate with a more sensitive CPD. In comparison with PER AKS, ETKA shows the largest improvement for \(\delta =0.5\) and \(\varepsilon =5.0\). With respect to CPD HPO, ETKA’s top result is achieved for \(\delta =0.5\) as well, but with \(\varepsilon =7.0\). In terms of the absolute prediction error of ETKA, we see that the best overall result is achieved for \(\delta =0.5\) and \(\varepsilon =5.0\). Regarding the runtime, CPD HPO is in all cases more computationally efficient, while ETKA is faster or at least on a par with PER AKS. As expected, we observe that the runtime of ETKA decreases with a less sensitive CPD due to higher values for \(\delta \) and \(\varepsilon \). This decrease is larger for an increasing \(\delta \) with a constant \(\varepsilon \) than the other way round.
Assessing Fig. 2 in Appendix 3 for the best performing CPD configuration with \(\delta =0.5\) and \(\varepsilon =5.0\), we observe that ETKA outperforms its comparison partners on the majority of the simulated datasets (PER AKS is best in 7 and CPD HPO in 4 out of 24 simulation settings). Furthermore, in cases for which ETKA does not deliver the best outcome, it is generally close to the top performer. We further see that results for lower noise levels tend to be better (scenarios with a variable noise are also less noisy than those with \(s=0.05\) as the maximum of s is 0.05 for them).
The main goal of ETKA is an up-to-date GP model at all times. For that reason, the prediction error is more important than the runtime. Consequently, we set \(\delta \) to 0.5 and \(\varepsilon \) to 5.0 at the cost of longer runtimes for the experiments on real-world data.
5.2 Real-World Data
Besides simulated data, we evaluated ETKA on real-world data from different domains (see Sect. 4.2). An overview of the results with absolute evaluation values is shown in Fig. 4 in Appendix 3. In Table 2, we provide the comparison of ETKA with PER AKS and CPD HPO. On average, ETKA outperforms the comparison partners in terms of the prediction error by \(2.73\%\) and \(6.19\%\), respectively. Considering the individual datasets, ETKA is more accurate than PER AKS and CPD HPO in 9 out of 14 respective 10 out of 14 cases. For two respectively three cases, ETKA is only slightly outperformed. Both comparison partners deliver better predictions than ETKA for Unemployment. Beyond that, PER AKS is the top performer for Internet and Gas production.
We observe the largest improvement of ETKA over both comparison partners in terms of the prediction error for Airline, which we show in Fig. 1a. ETKA detected two change points, so less refittings than for PER AKS were employed. Furthermore, the reconfigurations that ETKA performed are closer to the actual changes in the dataset, which leads to advantages regarding the prediction error. For Gas production shown in Fig. 1b, ETKA performs significantly better than CPD HPO, but is outperformed by PER AKS. By chance, the periodical refitting points of PER AKS are accurate, leading to better predictions.
Regarding the runtime overall, both PER AKS and CPD HPO are more efficient, with the latter requiring the lowest runtimes as expected. Furthermore, this disadvantage of ETKA is clearer for the real-world data than it was for simulated data. However, when focusing on the datasets for which the runtime of ETKA is more than \(100\%\) higher than for PER AKS (Wheat, Internet, Radio and Airquality), we observe lower prediction errors in three out of four cases. In comparison with PER AKS, Radio and Airquality are within the three datasets with the largest improvement on the prediction error. Internet, which already was noticeable with respect to the prediction error, is also problematic regarding the runtime.
5.3 Discussion
With respect to synthetic data, we overall observe a broad applicability of ETKA. Despite different change point patterns and noise levels, ETKA mostly outperforms its comparison partners. We further observe that ETKA does not perform worse in case of multiple changes in comparison with single changes. This indicates that the CPD within ETKA delivers the intended results and is applicable on data with multiple change points. Simulation settings B and C (see Table 3 and Fig. 2 in the Appendix) lead to the worst results for all three noise levels, while ETKA performs best for four of these six datasets. Setting B triggers a slow change of the periodicity. In contrast, settings E and H contain abrupt shifts of the periodical length, for which all three prediction models show lower prediction errors. Furthermore, the slow change of the amplitude a for setting C is problematic, whereas an abrupt shift of a for setting H is captured better by all methods. This lets us assume that slow changes are problematic for all three prediction models. A potential explanation for this phenomenon for both CPD-based approaches (ETKA and CPD HPO) is that abrupt changes increase the change point score s faster. Consequently, this might lead to quicker model adjustments. In contrast to abrupt changes, slow shifts can lead to prolonged inaccuracies without triggering the model adjustment.
On real-world data, ETKA still delivered the best performance in terms of the mean absolute error. However, it comes at the cost of higher runtimes compared to both alternative approaches. The primary goal of ETKA is to deliver an up-to-date model at all times. As we therefore chose a rather sensitive CPD configuration based on the results on simulated data, higher runtimes were expected. A notable exception is the Mauna Loa dataset, for which ETKA had the lowest runtime. The prediction error comparison to CPD HPO let us infer that no change point was detected. Furthermore, the slightly lower prediction error of ETKA in comparison with PER AKS indicates that model adjustments are not beneficial for Mauna Loa. Hence, it might be valid that ETKA does not detect any change points. On the other datasets, the runtime difference varies greatly, while the performance improvement is, albeit not constant, more stable. For Airline, Radio and Airquality, we observe the highest improvement in terms of the prediction error. As outlined, ETKA detected less change points more accurately than its comparison partners for the former. In contrast to the improved prediction error, Radio and Airquality lead to higher runtimes for ETKA. Both datasets contain several change points, which were detected by ETKA and lead to multiple model adjustments. Consequently, ETKA delivered more accurate GP models, however at the cost of computational resources. Periodical refittings highly depend on coincidence regarding an appropriate timing of model reconfigurations, as for instance observed for Gas production. Not depending on a by chance well chosen time for refitting is a big advantage of ETKA’s CPD-based approach. With respect to Internet, we observe both a higher runtime as well as prediction error for ETKA in comparison with PER AKS. This is probably caused by poor results of the initial kernel search using CKS, indicating data that would require a higher complexity of the kernel expression. For such data, AKS and consequently ETKA might intuitively be advantageous as its search is based on the current kernel expression instead of restarting from scratch.
In settings focused on fast processing, ETKA has multiple options to trade potential prediction performance for lowered computational costs, e.g. by employing a smaller window size w and a reduced set of base kernels. In contrast, more iterations of AKS per adjustment or a more sensitive CPD can increase the model quality at the cost of longer processing times. The effect of the CPD configuration can be seen in our results on simulated data, cf. Table 1. As expected, settings with higher values for \(\delta \) and \(\epsilon \) lead to lower runtimes at the cost of a higher prediction error.
ETKA’s performance is highly dependent on the integrated CPD and consequently on its parameters. For this study, the CPD parameters were determined using simulated data. Despite having generated a broad variety of change point patterns, this might not lead to the optimal parameters for all settings. Therefore, future work enabling a parameter determination based on the processed dataset could improve ETKA and the evaluation of other state-of-the-art CPD approaches is an interesting point for future research. Beyond that, for severe changes of the data behavior, a kernel search from scratch might be more efficient than AKS. A classification of detected change points could enable ETKA to always choose the most appropriate kernel search approach. A further potential improvement could be a dynamically determined window size w in contrast to the fixed value we applied. This could be beneficial both in terms of efficiency and prediction performance.
6 Conclusion
In this paper, we enhanced AKS, a recently-introduced kernel search algorithm for GPs with a novel CUSUM-based CPD approach. The resulting algorithm, ETKA, offers the ability to automatically deliver an up-to-date GP model for data streams. In our experiments, ETKA proved to be broadly applicable to data of different behavior and noise levels. Compared to intuitive alternatives, ETKA delivered improved predictions. Especially on simulated data, the results were significantly better. On real-world datasets, the improvement was smaller. Overall, ETKA reached its main goal of an up-to-date model at all times and is therefore especially well suitable for applications for which an accurate modelling is paramount.
References
Adams, R.P., MacKay, D.J.C.: Bayesian online changepoint detection (2007)
Alagu Dharshini, M.P., Antelin Vijila, S.: Survey of machine learning and deep learning approaches on sales forecasting. In: 2021 4th International Conference on Computing and Communications Technologies (ICCCT), pp. 59–64 (2021). https://doi.org/10.1109/ICCCT53315.2021.9711878
Aminikhanghahi, S., Cook, D.J.: A survey of methods for time series change point detection. Knowl. Inf. Syst. 51(2), 339–367 (2016). https://doi.org/10.1007/s10115-016-0987-z
Bahri, M., Bifet, A., Gama, J., Gomes, H.M., Maniu, S.: Data stream analysis: foundations, major tasks and tools. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 11(3), e1405 (2021)
Berns, F., Beecks, C.: Automatic gaussian process model retrieval for big data. In: CIKM, pp. 1965–1968. ACM (2020)
Berns, F., Schmidt, K., Bracht, I., Beecks, C.: 3cs algorithm for efficient gaussian process model retrieval. In: ICPR, pp. 1773–1780. IEEE (2020)
De Vito, S., Massera, E., Piga, M., Martinotto, L., Di Francia, G.: On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sens. Actuators B: Chem. 129(2), 750–757 (2008). https://doi.org/10.1016/j.snb.2007.09.060
Duvenaud, D., Lloyd, J.R., Grosse, R.B., Tenenbaum, J.B., Ghahramani, Z.: Structure discovery in nonparametric regression through compositional kernel search. In: ICML (3). JMLR Workshop and Conference Proceedings, vol. 28, pp. 1166–1174. JMLR.org (2013)
Ghahramani, Z.: Probabilistic machine learning and artificial intelligence. Nature 521(7553), 452–459 (2015)
Haselbeck, F., Grimm, D.G.: EVARS-GPR: EVent-triggered augmented refitting of gaussian process regression for seasonal data. In: Edelkamp, S., Möller, R., Rueckert, E. (eds.) KI 2021. LNCS (LNAI), vol. 12873, pp. 135–157. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87626-5_11
Haselbeck, F., Killinger, J., Menrad, K., Hannus, T., Grimm, D.G.: Machine learning outperforms classical forecasting on horticultural sales predictions. Mach. Learn. Appl. 7, 100239 (2022). https://doi.org/10.1016/j.mlwa.2021.100239
Hernandez, L., et al.: A survey on electric power demand forecasting: future trends in smart grids, microgrids and smart buildings. IEEE Commun. Surv. Tutorials 16(3), 1460–1495 (2014). https://doi.org/10.1109/SURV.2014.032014.00094
Hüwel, J.D., Berns, F., Beecks, C.: Automated kernel search for gaussian processes on data streams. In: IEEE BigData, pp. 3584–3588. IEEE (2021)
Kim, H., Teh, Y.W.: Scaling up the automatic statistician: scalable structure discovery using gaussian processes. In: AISTATS. Proceedings of Machine Learning Research, vol. 84, pp. 575–584. PMLR (2018)
Lloyd, J., Duvenaud, D., Grosse, R., Tenenbaum, J., Ghahramani, Z.: Automatic construction and natural-language description of nonparametric regression models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 28 (2014)
Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning, MIT Press, Cambridge (2006)
Truong, C., Oudre, L., Vayatis, N.: Selective review of offline change point detection methods. Sig. Process. 167, 107299 (2020)
Funding
This research was supported by the research training group “Dataninja” (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia.
The project is supported in parts by funds of the Federal Ministry of Food and Agriculture (BMEL), Germany based on a decision of the Parliament of the Federal Republic of Germany via the Federal Office for Agriculture and Food (BLE) under the innovation support program [grant number 2818504A18, DGG].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Background
In the following, we formally introduce the foundations of ETKA, namely GPs and the AKS algorithm.
A GP is a non-parametric probabilistic machine learning model [9, 17]. A model GP(m, k) is uniquely defined by its mean function \(m: \mathbb {R} \rightarrow \mathbb {R}\) and its covariance function or kernel \(k: \mathbb {R} \times \mathbb {R} \rightarrow \mathbb {R}\). While the mean function can often be set to constant zero [17], the kernel contains assumptions about the GP’s behavior. There are various kernels that are well understood and describe specific patterns, like the periodic \(\mathcal {K}_{PER}\) and the linear kernel \(\mathcal {K}_{LIN}\). These simple kernels will be referred to as base kernels throughout this paper.
Kernels often depend on parameters such as a lengthscale or a period length. These parameters are referred to as hyperparameters of the GP model and can be (locally) optimized for given data. One possible measure of performance for such optimization is the GP’s log marginal likelihood \(\mathcal {L}\left( GP(m,k), \boldsymbol{D}\right) \) on the data \(\boldsymbol{D} = (x_i,y_i)_{i=1,..,N}\) [17]. While there exist alternative measures [8, 17], we will use the log marginal likelihood for model optimization in our experiments.
Individual base kernels can be combined to more complex kernel expression via addition or multiplication [8]. This way, a kernel that optimally describes the data’s behavior can be constructed by experts. Alternatively, an algorithmic approach to find such an optimal kernel expression automatically can be employed. An example for this type of algorithm is CKS [8], which is depicted in Algorithm 2.
In each iteration, the algorithm adjusts the current best kernel expression K given a set of base kernels B by
-
1.
adding any base kernel to any subexpression of K,
-
2.
multiplying any base kernel to any subexpression of K or
-
3.
replacing any base kernel in K with a different base kernel from B.
The abstract functions AddViaAddition, AddViaMultiplication and ReplaceKernel in Algorithm 2 correspond to these three options. The best performing candidate from the thus generated set is used as the basis for the next iteration. This way, the algorithm can build arbitrarily complex kernels at the cost of multiple model optimizations and evaluations per iteration. Since the candidate generation can not lower the kernel’s complexity, any searches for new kernels need to start from scratch.
Our goal is to model consecutive segments of potentially infinite time series. We utilize the AKS algorithm [13] as it has inherent advantages in this specific setting. In particular, the AKS algorithm is able to adjust a given kernel to fit new data instead of starting from zero. This is accomplished by adding a fourth possibility in the candidate generation: the removal of a base kernel from the current expression. This procedure is depicted Algorithm 3.
It has been shown before that AKS can lead to a faster modelling process than CKS [13]. Especially in high-complexity models, the computational cost of constructing a kernel from zero are much higher than a few iterations of adjustment. However, for low-complexity models, the larger set of candidates in AKS can lead to longer processing compared to CKS, even if fewer iterations are needed.
Appendix 2: Simulated and Real-World Data Characteristics
Appendix 3: Results Overviews
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Hüwel, J.D., Haselbeck, F., Grimm, D.G., Beecks, C. (2022). Dynamically Self-adjusting Gaussian Processes for Data Stream Modelling. In: Bergmann, R., Malburg, L., Rodermund, S.C., Timm, I.J. (eds) KI 2022: Advances in Artificial Intelligence. KI 2022. Lecture Notes in Computer Science(), vol 13404. Springer, Cham. https://doi.org/10.1007/978-3-031-15791-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-15791-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15790-5
Online ISBN: 978-3-031-15791-2
eBook Packages: Computer ScienceComputer Science (R0)