Abstract
While there is an increasing interest in Machine Learning (ML) based solutions, scarce research has been devoted to the deployment and monitoring of ML models. In this work, we address this research gap by proposing a new data drift ML update strategy that only considers changes in the input features. Using the realistic Growing Window (GW) and Rolling Window (RW) ML deployment simulation schemes, we propose two Drift variants (DGW and DRW), which are compared with three other ML update approaches: Single Training (ST) and Periodic retraining methods (PGW and PRW). Several computational experiments were held, using the XGBoost regression learner and 8 public-domain datasets related to energy production and consumption. Overall, when considering both the predictive performance and computational effort, the proposed DGW and DRW obtained competitive results. In particular, quality predictive errors were achieved (overall value of 1.33% for DGW and 1.43% for DRW), while requiring around half of the computational effort when compared with the periodic update versions (PGW and PRW).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adam, G.A., Chang, C.K., Haibe-Kains, B., Goldenberg, A.: Error amplification when updating deployed machine learning models. In: Proceedings of the 7th Machine Learning for Healthcare Conference (MLHC). Proceedings of Machine Learning Research, vol. 182, pp. 715–740. PMLR, August 2022. https://proceedings.mlr.press/v182/adam22a.html
Berger, V.W., Zhou, Y.: Kolmogorov-Smirnov test: overview. Wiley statsref: Statistics Reference Online (2014). https://doi.org/10.1002/9781118445112.stat06558
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, August 2016. https://doi.org/10.1145/2939672.2939785
Chi, S., Tian, Y., Wang, F., Zhou, T., Jin, S., Li, J.: A novel lifelong machine learning-based method to eliminate calibration drift in clinical prediction models. Artif. Intell. Medi. 125, 102256 (2022). https://doi.org/10.1016/j.artmed.2022.102256
Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Inf. Sci. 225, 1–17 (2013). https://doi.org/10.1016/J.INS.2012.10.039
Darwiche, A.: Human-level intelligence or animal-like abilities? Commun. ACM 61(10), 56–67 (2018). https://doi.org/10.1145/3271625
Doak, J.E., Smith, M.R., Ingram, J.B.: Self-updating models with error remediation. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, vol. 11413. SPIE, May 2020. https://doi.org/10.1117/12.2563843
Donate, J.P., Cortez, P.: Evolutionary optimization of sparsely connected and time-lagged neural networks for time series forecasting. Appl. Soft Comput. 23, 432–443 (2014). https://doi.org/10.1016/J.ASOC.2014.06.041
Hinder, F., Vaquet, V., Brinkrolf, J., Hammer, B.: On the change of decision boundaries and loss in learning with concept drift. arXiv, February 2022. https://doi.org/10.48550/arXiv.2212.01223
Hollander, M., Wolfe, D.A., Chicken, E.: Nonparametric Statistical Methods. Wiley Series in Probability and Statistics. Wiley, July 2015. https://doi.org/10.1002/9781119196037
Ilic, M., Ivanovic, M., Kurbalija, V., Valachis, A.: Towards optimal learning: investigating the impact of different model updating strategies in federated learning. Exp. Syst. Appl. 249, 123553 (2024). https://doi.org/10.1016/j.eswa.2024.123553
Kidane, L., Townend, P., Metsch, T., Elmroth, E.: When and how to retrain machine learning-based cloud management systems. In: International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 688–698. IEEE, June 2022. https://doi.org/10.1109/IPDPSW55747.2022.00120
Kim, M., Lim, B., Lee, K., Kwon, H.: Effective model update for adaptive classification of text streams in a distributed learning environment. Sensors 22(23), 9298 (2022). https://doi.org/10.3390/s22239298
Menendez, M.L., Pardo, J.A., Pardo, L., Pardo, M.C.: The Jensen-Shannon divergence. J. Franklin Inst. 334(2), 307–318 (1997). https://doi.org/10.1016/S0016-0032(96)00063-4
Montgomery, D.C., Jennings, C.L., Kulahci, M.: Introduction to Time Series Analysis and Forecasting. Wiley Series in Probability and Statistics. Wiley (2015)
Müller, R., Abdelaal, M., Stjelja, D.: Open-source drift detection tools in action: insights from two use cases. arXiv, April 2024. https://doi.org/10.48550/arXiv.2404.18673
Nielsen, D.: Tree boosting with XGBoost - why does XGBoost win “every” machine learning competition? Master’s thesis, NTNU, December 2016. http://hdl.handle.net/11250/2433761
Oliveira, N., Cortez, P., Areal, N.: The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices. Exp. Syst. Appl. 73, 125–144 (2017). https://doi.org/10.1016/J.ESWA.2016.12.036
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Ann. Rev. Stat. Appl. 6, 405–431 (2019). https://doi.org/10.1146/annurev-statistics-030718-104938
Strahler, A.N.: Quantitative slope analysis. Geol. Soc. Am. Bull. 67(5), 571–596 (1956). https://doi.org/10.1130/0016-7606(1956)67[571:QSA]2.0.CO;2
Tashman, L.J.: Out-of-sample tests of forecasting accuracy: an analysis and review. Int. J. Forecast. 16(4), 437–450 (2000). https://doi.org/10.1016/S0169-2070(00)00065-0
Züfle, M., Erhard, F., Kounev, S.: Machine learning model update strategies for hard disk drive failure prediction. In: International Conference on Machine Learning and Applications (ICMLA), pp. 1379–1386. IEEE, December 2021. https://doi.org/10.1109/ICMLA52953.2021.00223
Acknowledgements
This work has been supported by the European Union under the NextGenerationEU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project ATE – Aliança para a Transição Energética, aiming at enhancing the competitiveness and resilience of energy sector companies, thus propelling Portugal to a leadership position on decarbonization and promoting an effective energy transition (Project ref. nr. 56 - C644914747-00000023).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Teixeira, H. et al. (2025). A Data Drift Approach to Update Deployed Energy Prediction Machine Learning Models. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14969. Springer, Cham. https://doi.org/10.1007/978-3-031-73503-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-73503-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73502-8
Online ISBN: 978-3-031-73503-5
eBook Packages: Computer ScienceComputer Science (R0)