Energy Demand Response in a Food-Processing Plant: A Deep Reinforcement Learning Approach
<p>Scheme of the food-processing plant including the building envelope and the warehouse showing considered mass flows, heat flows, and the electrical power of the industrial cooler.</p> "> Figure 2
<p>Schematic of agent–environment interface in RL showing the agent, the environment and their interaction via the action, the reward, and the state.</p> "> Figure 3
<p>Results for load shifting over three consecutive example days to comparing the RL and MILP with the reference scenario. (<b>a</b>) The refrigeration system’s electrical power consumption. (<b>b</b>) Cooling hall temperature variations, where 0 °C corresponds to a fully charged TES and 5 °C represents an empty TES state. (<b>c</b>) The electricity price profile over the same period, illustrating the price-driven adjustments in system operation.</p> "> Figure 4
<p>Results for load shifting over the complete year evaluating RL and MILP depending on the electricity price. (<b>a</b>) The weekly energy savings via RL and MILP, (<b>b</b>) the relative weekly energy savings, and (<b>c</b>) the EXAA spot market price for the test period (May 2022 to April 2023).</p> "> Figure 5
<p>EXAA spot market price from May 2020 to April 2023 showing the electricity prices with its fluctuations. The white area is the training data, whereas the testing period is shaded in grey.</p> "> Figure 6
<p>RL training process showing the return of single episodes and the moving average with a window length of 100. The return is proportional to the negative energy costs, while the absolute value is irrelevant and only chosen to be an appropriate scale for the training process. Maximizing the negative energy costs is equal to minimizing the energy costs.</p> "> Figure 7
<p>RL training process showing the cost reduction (<b>a</b>) and runtime (<b>b</b>) of different training period lengths.</p> ">
Abstract
:1. Introduction
1.1. Food-Processing Plants for Demand Response
1.2. Reinforcement Learning for Demand Response
1.3. Contributions
- Instead of directly controlling the cooling power, the set point temperature of a PI controller was optimized. This enhances stability and simplifies practical implementation.
- DDQL—a state-of-the-art RL algorithm—for load shifting was applied to reduce energy costs in an RTP scenario.
- The problem was additionally formulated as MILP to compare RL with a state-of-the-art MPC controller.
- The respective energy cost savings and computation times of RL and MILP were evaluated.
2. Methods
2.1. Data Acquisition and Processing
2.2. Industrial Warehouse Model
Algorithm 1 PI controller with saturation |
|
2.3. Optimization via Reinforcement Learning
Algorithm 2 Double Deep Q-Learning—Training |
|
2.4. Validation via Mixed Integer Linear Programming
3. Results and Discussion
3.1. Evaluation of the RL Training Process
3.2. Analysis of Computational Complexity
3.3. Practical Applicability and Future Research Directions
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DR | Demand response |
DDQL | Double deep Q-learning |
DDPG | Deep deterministic policy gradient |
DQL | Deep Q-learning |
DQN | Deep Q-networks |
DRL | Deep reinforcement learning |
DSM | Demand side management |
EXAA | Energy exchange Austria |
HVAC | Heating, ventilation and air conditioning |
IoT | Internet of things |
LP | Linear programming |
LSTM | Long Short-Term Memory |
MILP | Mixed integer linear programming |
MINLP | Mixed integer nonlinear programming |
ML | Machine learning |
MPC | Model predictive control |
PI | Proportional integral |
PPO | Proximal policy optimization |
PV | Photovoltaic |
RL | Reinforcement learning |
RTP | Real-time pricing |
SAC | Soft actor-critic |
TES | Thermal energy storage |
TOU | Time-of-use |
xLSTM | Extended Long Short-Term Memory |
Appendix A. MILP Formulation
- is the set point temperature of the warehouse during the time period p.
- is the warehouse temperature at the time point t.
- is the output signal of the P-controller before saturation during the time period p.
- is a helper variable for calculating the saturation during the time period p.
- are binary variables to calculate the saturation during the time period p.
- is the electrical power consumption of the industrial refrigeration system p.
- is the price signal during the time period p.
- is the heat flow rate of the load during the time period p.
- is the length of a time period.
- is the initial warehouse temperature at the time point 0.
- is the energy efficiency ratio of the industrial refrigeration system.
- proportional factor of the controller.
- integral factor of the controller.
- is the minimum electrical power.
- is the maximum electrical power.
- is the minimum set point temperature.
- is the maximum set point temperature.
- N is the number of time periods.
- and are big M constraints.
- is the set of time period indices.
- is the set of time point indices.
- is a set to index of every hour of a day.
- is a set to index every minute in a hour.
References
- Clairand, J.-M.; Briceno-Leon, M.; Escriva-Escriva, G.; Pantaleo, A.M. Review of Energy Efficiency Technologies in the Food Industry: Trends, Barriers, and Opportunities. IEEE Access 2020, 8, 48015–48029. [Google Scholar] [CrossRef]
- Panda, S.; Mohanty, S.; Rout, P.K.; Sahu, B.K.; Parida, S.; Samanta, I.S.; Bajaj, M.; Piecha, M.; Blazek, V.; Prokop, L. A comprehensive review on demand side management and market design for renewable energy support and integration. Energy Rep. 2023, 10, 2228–2250. [Google Scholar] [CrossRef]
- Siddiquee, S.M.S.; Howard, B.; Bruton, K.; Brem, A.; O’Sullivan, D.T.J. Progress in Demand Response and It’s Industrial Applications. Front. Energy Res. 2021, 9, 673176. [Google Scholar] [CrossRef]
- Morais, D.; Gaspar, P.D.; Silva, P.D.; Andrade, L.P.; Nunes, J. Energy Consumption and Efficiency Measures in the Portuguese Food Processing Industry. J. Food Process. Preserv. 2022, 46, e14862. [Google Scholar] [CrossRef]
- Koohi-Fayegh, S.; Rosen, M.A. A Review of Energy Storage Types, Applications and Recent Developments. J. Energy Storage 2020, 27, 101047. [Google Scholar] [CrossRef]
- Chen, C.; Sun, H.; Shen, X.; Guo, Y.; Guo, Q.; Xia, T. Two-Stage Robust Planning-Operation Co-Optimization of Energy Hub Considering Precise Energy Storage Economic Model. Appl. Energy 2019, 252, 1. [Google Scholar] [CrossRef]
- Giordano, L.; Furlan, G.; Puglisi, G.; Cancellara, F.A. Optimal Design of a Renewable Energy-Driven Polygeneration System: An Application in the Dairy Industry. J. Clean. Prod. 2023, 405, 136933. [Google Scholar] [CrossRef]
- Pazmiño-Arias, A.; Briceño-León, M.; Clairand, J.-M.; Serrano-Guerrero, X.; Escrivá-Escrivá, G. Optimal Scheduling of a Dairy Industry Based on Energy Hub Considering Renewable Energy and Ice Storage. J. Clean. Prod. 2023, 429, 139580. [Google Scholar] [CrossRef]
- Cirocco, L.; Pudney, P.; Riahi, S.; Liddle, R.; Semsarilar, H.; Hudson, J.; Bruno, F. Thermal Energy Storage for Industrial Thermal Loads and Electricity Demand Side Management. Energy Convers. Manag. 2022, 270, 116190. [Google Scholar] [CrossRef]
- Saffari, M.; de Gracia, A.; Fernández, C.; Belusko, M.; Boer, D.; Cabeza, L.F. Optimized Demand Side Management (DSM) of Peak Electricity Demand by Coupling Low Temperature Thermal Energy Storage (TES) and Solar PV. Appl. Energy 2018, 211, 604–616. [Google Scholar] [CrossRef]
- Wohlgenannt, P.; Huber, G.; Rheinberger, K.; Preißinger, M.; Kepplinger, P. Modelling of a Food-Processing Plant for Industrial Demand Side Management. In Proceedings of the HEAT POWERED CYCLES 2021 Conference Proceedings, Bilbao, Spain, 10–13 April 2022; pp. 638–649. [Google Scholar] [CrossRef]
- Wohlgenannt, P.; Huber, G.; Rheinberger, K.; Kolhe, M.; Kepplinger, P. Comparison of Demand Response Strategies Using Active and Passive Thermal Energy Storage in a Food-Processing Plant. Energy Rep. 2024, 12, 226–236. [Google Scholar] [CrossRef]
- Zhang, Q.; Grossmann, I.E. Enterprise-Wide Optimization for Industrial Demand Side Management: Fundamentals, Advances, and Perspectives. Chem. Eng. Res. Des. 2016, 116, 114–131. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Watkins, C.J.C.H.; Dayan, P. Q-Learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar] [CrossRef]
- Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement Learning for Demand Response: A Review of Algorithms and Modeling Techniques. Appl. Energy 2019, 235, 1072–1089. [Google Scholar] [CrossRef]
- Yu, L.; Qin, S.; Zhang, M.; Shen, C.; Jiang, T.; Guan, X. A Review of Deep Reinforcement Learning for Smart Building Energy Management. IEEE Internet Things J. 2021, 8, 12046–12063. [Google Scholar] [CrossRef]
- Lazic, N.; Boutilier, C.; Lu, T.; Wong, E.; Roy, B.; Ryu, M.; Imwalle, G. Data Center Cooling Using Model-Predictive Control. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2018; Volume 31, Available online: https://proceedings.neurips.cc/paper_files/paper/2018/file/059fdcd96baeb75112f09fa1dcc740cc-Paper.pdf (accessed on 23 September 2024).
- Part 2: Kinds of RL Algorithms—Spinning Up Documentation. Available online: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html (accessed on 23 September 2024).
- Afroosheh, S.; Esapour, K.; Khorram-Nia, R.; Karimi, M. Reinforcement Learning Layout-Based Optimal Energy Management in Smart Home: AI-Based Approach. IET Gener. Transm. Distrib. 2024, 18, 2509–2520. [Google Scholar] [CrossRef]
- Lissa, P.; Deane, C.; Schukat, M.; Seri, F.; Keane, M.; Barrett, E. Deep Reinforcement Learning for Home Energy Management System Control. Energy AI 2021, 3, 100043. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, D.; Gooi, H.B. Optimization Strategy Based on Deep Reinforcement Learning for Home Energy Management. CSEE J. Power Energy Syst. 2020, 6, 572–582. [Google Scholar] [CrossRef]
- Jiang, Z.; Risbeck, M.J.; Ramamurti, V.; Murugesan, S.; Amores, J.; Zhang, C.; Lee, Y.M.; Drees, K.H. Building HVAC Control with Reinforcement Learning for Reduction of Energy Cost and Demand Charge. Energy Build. 2021, 239, 110833. [Google Scholar] [CrossRef]
- Brandi, S.; Piscitelli, M.S.; Martellacci, M.; Capozzoli, A. Deep Reinforcement Learning to Optimise Indoor Temperature Control and Heating Energy Consumption in Buildings. Energy Build. 2020, 224, 110225. [Google Scholar] [CrossRef]
- Brandi, S.; Fiorentini, M.; Capozzoli, A. Comparison of Online and Offline Deep Reinforcement Learning with Model Predictive Control for Thermal Energy Management. Autom. Constr. 2022, 135, 104128. [Google Scholar] [CrossRef]
- Coraci, D.; Brandi, S.; Capozzoli, A. Effective Pre-Training of a Deep Reinforcement Learning Agent by Means of Long Short-Term Memory Models for Thermal Energy Management in Buildings. ENergy Convers. Manag. 2023, 291, 117303. [Google Scholar] [CrossRef]
- Han, G.; Lee, S.; Lee, J.; Lee, K.; Bae, J. Deep-Learning- and Reinforcement-Learning-Based Profitable Strategy of a Grid-Level Energy Storage System for the Smart Grid. J. Energy Storage 2021, 41, 102868. [Google Scholar] [CrossRef]
- Muriithi, G.; Chowdhury, S. Deep Q-Network Application for Optimal Energy Management in a Grid-Tied Solar PV-Battery Microgrid. J. Eng. 2022, 2022, 422–441. [Google Scholar] [CrossRef]
- Lu, R.; Hong, S.H. Incentive-Based Demand Response for Smart Grid with Reinforcement Learning and Deep Neural Network. Appl. Energy 2019, 236, 937–949. [Google Scholar] [CrossRef]
- Brandi, S.; Coraci, D.; Borello, D.; Capozzoli, A. Energy Management of a Residential Heating System Through Deep Reinforcement Learning. In Sustainability in Energy and Buildings 2021; Smart Innovation, Systems and Technologies; Littlewood, J.R., Howlett, R.J., Jain, L.C., Eds.; Springer Nature Singapore: Singapore, 2022; Volume 263, pp. 329–339. [Google Scholar] [CrossRef]
- Brandi, S.; Gallo, A.; Capozzoli, A. A Predictive and Adaptive Control Strategy to Optimize the Management of Integrated Energy Systems in Buildings. Energy Rep. 2022, 8, 1550–1567. [Google Scholar] [CrossRef]
- Silvestri, A.; Coraci, D.; Brandi, S.; Capozzoli, A.; Borkowski, E.; Köhler, J.; Wu, D.; Zeilinger, M.N.; Schlueter, A. Real Building Implementation of a Deep Reinforcement Learning Controller to Enhance Energy Efficiency and Indoor Temperature Control. Appl. Energy 2024, 368, 123447. [Google Scholar] [CrossRef]
- Gao, G.; Li, J.; Wen, Y. DeepComfort: Energy-Efficient Thermal Comfort Control in Buildings Via Reinforcement Learning. IEEE Internet Things J. 2020, 7, 8472–8484. [Google Scholar] [CrossRef]
- Opalic, S.M.; Palumbo, F.; Goodwin, M.; Jiao, L.; Nielsen, H.K.; Kolhe, M.L. COST-WINNERS: COST Reduction with Neural NEtworks-Based Augmented Random Search for Simultaneous Thermal and Electrical Energy Storage Control. J. Energy Storage 2023, 72, 108202. [Google Scholar] [CrossRef]
- Azuatalam, D.; Lee, W.-L.; de Nijs, F.; Liebman, A. Reinforcement Learning for Whole-Building HVAC Control and Demand Response. Energy AI 2020, 2, 100020. [Google Scholar] [CrossRef]
- Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement Learning of Room Temperature Set-Point of Thermal Storage Air-Conditioning System with Demand Response. Energy Build. 2022, 259, 111903. [Google Scholar] [CrossRef]
- DAY-AHEAD PREISE. Available online: https://markttransparenz.apg.at/de/markt/Markttransparenz/Uebertragung/EXAA-Spotmarkt (accessed on 23 September 2024).
- Gymnasium Version 0.29.1. Available online: https://pypi.org/project/gymnasium/ (accessed on 23 September 2024).
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv 2019, arXiv:1509.02971. [Google Scholar]
- Pytorch Version 2.1.1. Available online: https://pytorch.org (accessed on 23 September 2024).
- Gurobi Version 11.0. Available online: https://www.gurobi.com (accessed on 23 September 2024).
- Zhang, H.; Li, Z.; Xue, Y.; Chang, X.; Su, J.; Wang, P.; Guo, Q.; Sun, H. A Stochastic Bi-Level Optimal Allocation Approach of Intelligent Buildings Considering Energy Storage Sharing Services. IEEE Trans. Consum. Electron. 2024, 70, 5142–5153. [Google Scholar] [CrossRef]
- ISO Standard No. 50001; Energy Management. International Organization for Standardization: London, UK, 2018.
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems—NIPS’17, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
- Beck, M.; Pöppel, K.; Spanring, M.; Auer, A.; Prudnikova, O.; Kopp, M.; Klambauer, G.; Brandstetter, J.; Hochreiter, S. xLSTM: Extended Long Short-Term Memory. arXiv 2024, arXiv:2405.04517. [Google Scholar]
- Zhang, H.; Zhai, X.; Zhang, J.; Bai, X.; Li, Z. Mechanism Analysis of the Effect of the Equivalent Proportional Coefficient of Inertia Control for a Doubly Fed Wind Generator on Frequency Stability in Extreme Environments. Sustainability 2024, 16, 4965. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
4360 MJ/K | |
500 t | |
1260 t | |
480 J/(kg K) | |
3270 J/(kg K) | |
4.938 | |
202,511 W | |
0 W | |
5 °C | |
0 °C | |
60 s | |
500,000 W/K | |
2 W/(K s) |
Parameter | Value |
---|---|
Training episodes | 2000 |
Batch size | 1250 |
Memory buffer size | 10,000 |
Update rate | 0.005 |
Adam learning rate | |
Initial exploration rate | 0.9 |
End exploration rate | 0.05 |
Exploration decay rate | 2000 |
Discount factor | 0.999 |
Neural net layers | 3 |
Layer 1 | (50, 256), ReLu activation |
Layer 2 | (256, 256), ReLu activation |
Layer 3 | (256, 101), ReLu activation |
Huber loss parameter | 1 |
Optimization | Costs (EUR) | Savings (EUR) | Relative Savings (%) | Relative Costs (EUR/MWh) |
---|---|---|---|---|
RL | 116,831 | 24,911 | 17.57 | 208.00 |
MILP | 115,301 | 26,441 | 18.65 | 205.30 |
Reference | 141,742 | - | - | 252.10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wohlgenannt, P.; Hegenbart, S.; Eder, E.; Kolhe, M.; Kepplinger, P. Energy Demand Response in a Food-Processing Plant: A Deep Reinforcement Learning Approach. Energies 2024, 17, 6430. https://doi.org/10.3390/en17246430
Wohlgenannt P, Hegenbart S, Eder E, Kolhe M, Kepplinger P. Energy Demand Response in a Food-Processing Plant: A Deep Reinforcement Learning Approach. Energies. 2024; 17(24):6430. https://doi.org/10.3390/en17246430
Chicago/Turabian StyleWohlgenannt, Philipp, Sebastian Hegenbart, Elias Eder, Mohan Kolhe, and Peter Kepplinger. 2024. "Energy Demand Response in a Food-Processing Plant: A Deep Reinforcement Learning Approach" Energies 17, no. 24: 6430. https://doi.org/10.3390/en17246430
APA StyleWohlgenannt, P., Hegenbart, S., Eder, E., Kolhe, M., & Kepplinger, P. (2024). Energy Demand Response in a Food-Processing Plant: A Deep Reinforcement Learning Approach. Energies, 17(24), 6430. https://doi.org/10.3390/en17246430