A Data Drift Approach to Update Deployed Energy Prediction Machine Learning Models

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14969))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

168 Accesses

Abstract

While there is an increasing interest in Machine Learning (ML) based solutions, scarce research has been devoted to the deployment and monitoring of ML models. In this work, we address this research gap by proposing a new data drift ML update strategy that only considers changes in the input features. Using the realistic Growing Window (GW) and Rolling Window (RW) ML deployment simulation schemes, we propose two Drift variants (DGW and DRW), which are compared with three other ML update approaches: Single Training (ST) and Periodic retraining methods (PGW and PRW). Several computational experiments were held, using the XGBoost regression learner and 8 public-domain datasets related to energy production and consumption. Overall, when considering both the predictive performance and computational effort, the proposed DGW and DRW obtained competitive results. In particular, quality predictive errors were achieved (overall value of 1.33% for DGW and 1.43% for DRW), while requiring around half of the computational effort when compared with the periodic update versions (PGW and PRW).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Integrating LSTMs with Online Density Estimation for the Probabilistic Forecast of Energy Consumption

Real-Time Stream Mining Electric Power Consumption Data Using Hoeffding Tree with Shadow Features

Open-Source Drift Detection Tools in Action: Insights from Two Use Cases

Notes

1.
https://www.eia.gov/totalenergy/data/annual/index.php.

References

Adam, G.A., Chang, C.K., Haibe-Kains, B., Goldenberg, A.: Error amplification when updating deployed machine learning models. In: Proceedings of the 7th Machine Learning for Healthcare Conference (MLHC). Proceedings of Machine Learning Research, vol. 182, pp. 715–740. PMLR, August 2022. https://proceedings.mlr.press/v182/adam22a.html
Berger, V.W., Zhou, Y.: Kolmogorov-Smirnov test: overview. Wiley statsref: Statistics Reference Online (2014). https://doi.org/10.1002/9781118445112.stat06558
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM, August 2016. https://doi.org/10.1145/2939672.2939785
Chi, S., Tian, Y., Wang, F., Zhou, T., Jin, S., Li, J.: A novel lifelong machine learning-based method to eliminate calibration drift in clinical prediction models. Artif. Intell. Medi. 125, 102256 (2022). https://doi.org/10.1016/j.artmed.2022.102256
Cortez, P., Embrechts, M.J.: Using sensitivity analysis and visualization techniques to open black box data mining models. Inf. Sci. 225, 1–17 (2013). https://doi.org/10.1016/J.INS.2012.10.039
Darwiche, A.: Human-level intelligence or animal-like abilities? Commun. ACM 61(10), 56–67 (2018). https://doi.org/10.1145/3271625
Article Google Scholar
Doak, J.E., Smith, M.R., Ingram, J.B.: Self-updating models with error remediation. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, vol. 11413. SPIE, May 2020. https://doi.org/10.1117/12.2563843
Donate, J.P., Cortez, P.: Evolutionary optimization of sparsely connected and time-lagged neural networks for time series forecasting. Appl. Soft Comput. 23, 432–443 (2014). https://doi.org/10.1016/J.ASOC.2014.06.041
Hinder, F., Vaquet, V., Brinkrolf, J., Hammer, B.: On the change of decision boundaries and loss in learning with concept drift. arXiv, February 2022. https://doi.org/10.48550/arXiv.2212.01223
Hollander, M., Wolfe, D.A., Chicken, E.: Nonparametric Statistical Methods. Wiley Series in Probability and Statistics. Wiley, July 2015. https://doi.org/10.1002/9781119196037
Ilic, M., Ivanovic, M., Kurbalija, V., Valachis, A.: Towards optimal learning: investigating the impact of different model updating strategies in federated learning. Exp. Syst. Appl. 249, 123553 (2024). https://doi.org/10.1016/j.eswa.2024.123553
Kidane, L., Townend, P., Metsch, T., Elmroth, E.: When and how to retrain machine learning-based cloud management systems. In: International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 688–698. IEEE, June 2022. https://doi.org/10.1109/IPDPSW55747.2022.00120
Kim, M., Lim, B., Lee, K., Kwon, H.: Effective model update for adaptive classification of text streams in a distributed learning environment. Sensors 22(23), 9298 (2022). https://doi.org/10.3390/s22239298
Article Google Scholar
Menendez, M.L., Pardo, J.A., Pardo, L., Pardo, M.C.: The Jensen-Shannon divergence. J. Franklin Inst. 334(2), 307–318 (1997). https://doi.org/10.1016/S0016-0032(96)00063-4
Article MathSciNet Google Scholar
Montgomery, D.C., Jennings, C.L., Kulahci, M.: Introduction to Time Series Analysis and Forecasting. Wiley Series in Probability and Statistics. Wiley (2015)
Google Scholar
Müller, R., Abdelaal, M., Stjelja, D.: Open-source drift detection tools in action: insights from two use cases. arXiv, April 2024. https://doi.org/10.48550/arXiv.2404.18673
Nielsen, D.: Tree boosting with XGBoost - why does XGBoost win “every” machine learning competition? Master’s thesis, NTNU, December 2016. http://hdl.handle.net/11250/2433761
Oliveira, N., Cortez, P., Areal, N.: The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices. Exp. Syst. Appl. 73, 125–144 (2017). https://doi.org/10.1016/J.ESWA.2016.12.036
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Ann. Rev. Stat. Appl. 6, 405–431 (2019). https://doi.org/10.1146/annurev-statistics-030718-104938
Strahler, A.N.: Quantitative slope analysis. Geol. Soc. Am. Bull. 67(5), 571–596 (1956). https://doi.org/10.1130/0016-7606(1956)67[571:QSA]2.0.CO;2
Article Google Scholar
Tashman, L.J.: Out-of-sample tests of forecasting accuracy: an analysis and review. Int. J. Forecast. 16(4), 437–450 (2000). https://doi.org/10.1016/S0169-2070(00)00065-0
Article Google Scholar
Züfle, M., Erhard, F., Kounev, S.: Machine learning model update strategies for hard disk drive failure prediction. In: International Conference on Machine Learning and Applications (ICMLA), pp. 1379–1386. IEEE, December 2021. https://doi.org/10.1109/ICMLA52953.2021.00223

Download references

Acknowledgements

This work has been supported by the European Union under the NextGenerationEU, through a grant of the Portuguese Republic’s Recovery and Resilience Plan (PRR) Partnership Agreement, within the scope of the project ATE – Aliança para a Transição Energética, aiming at enhancing the competitiveness and resilience of energy sector companies, thus propelling Portugal to a leadership position on decarbonization and promoting an effective energy transition (Project ref. nr. 56 - C644914747-00000023).

Author information

Authors and Affiliations

CCG/ZGDV ICT Innovation Institute, Universidade do Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Hélder Teixeira, Arthur Matta, André Pilastri, Pedro Pereira & Paulo Cortez
ALGORITMI/LASI Center - Department Information Systems, Universidade do Minho, Campus de Azurém, 4800-058, Guimarães, Portugal
Hélder Teixeira, Arthur Matta, André Pilastri, Luís Ferreira, Pedro Pereira & Paulo Cortez
Efacec Energia - Máquinas e Equipamentos Eléctricos, S.A., 4466-952, S. Mamede de Infesta, Portugal
Carlos Gonçalves
ISEP, Polytechnic of Porto, Dr. António Bernardino de Almeida, 4249-015, Porto, Portugal
Carlos Gonçalves
SoftCPS - Software Technologies for Cyber-Physical Systems, Rua Dr. António Bernardino de Almeida, 4249-015, Porto, Portugal
Carlos Gonçalves

Authors

Hélder Teixeira
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Matta
View author publications
You can also search for this author in PubMed Google Scholar
André Pilastri
View author publications
You can also search for this author in PubMed Google Scholar
Luís Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Gonçalves
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Cortez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paulo Cortez .

Editor information

Editors and Affiliations

University of Minho, Braga, Portugal
Manuel Filipe Santos
University of Minho, Braga, Portugal
José Machado
University of Minho, Braga, Portugal
Paulo Novais
University of Minho, Braga, Portugal
Paulo Cortez
Polytechnic Institute of Viana do Castelo, Viana do Castelo, Portugal
Pedro Miguel Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teixeira, H. et al. (2025). A Data Drift Approach to Update Deployed Energy Prediction Machine Learning Models. In: Santos, M.F., Machado, J., Novais, P., Cortez, P., Moreira, P.M. (eds) Progress in Artificial Intelligence. EPIA 2024. Lecture Notes in Computer Science(), vol 14969. Springer, Cham. https://doi.org/10.1007/978-3-031-73503-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-73503-5_13
Published: 16 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73502-8
Online ISBN: 978-3-031-73503-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics