Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15019))

Included in the following conference series:

International Conference on Artificial Neural Networks

385 Accesses

Abstract

Sample efficiency in the face of computationally expensive simulations is a common concern in surrogate modeling. Current strategies to minimize the number of samples needed are not as effective in simulated environments with wide state spaces. As a response to this challenge, we propose a novel method to efficiently sample simulated deterministic environments by using policies trained by Reinforcement Learning. We provide an extensive analysis of these surrogate-building strategies with respect to Latin-Hypercube sampling or Active Learning and Kriging, cross-validating performances with all sampled datasets. The analysis shows that a mixed dataset that includes samples acquired by random agents, expert agents, and agents trained to explore the regions of maximum entropy of the state transition distribution provides the best scores through all datasets, which is crucial for a meaningful state space representation. We conclude that the proposed method improves the state-of-the-art and clears the path to enable the application of surrogate-aided Reinforcement Learning policy optimization strategies on complex simulators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/dmlc/xgboost.

Abbreviations

AL:: Active Learning
Random:: Random sampling
Sobol:: Sobol sampling
LHS:: Latin-Hypercube sampling
RA:: Random agent
EA:: Expert agent
MEA:: Maximum Entropy agent
MA:: Mixed agent
MPA:: Mixed (Partially) agent
PA:: Partial agents

References

Ali, M.: PyCaret: an open source, low-code machine learning library in Python, PyCaret version 1.0.0 (2020). https://www.pycaret.org
Asher, M.J., Croke, B.F.W., Jakeman, A.J., Peeters, L.J.M.: A review of surrogate models and their application to groundwater modeling. Water Resour. Res. 51(8), 5957–5973 (2015)
Article Google Scholar
Broad, D.R., Dandy, G.C., Maier, H.R.: Water distribution system optimization using metamodels. J. Water Resour. Plan. Manag. 131(3), 172–180 (2005)
Article Google Scholar
Bui, V.H., et al.: Deep neural network-based surrogate model for optimal component sizing of power converters using deep reinforcement learning. IEEE Access 10, 78702–78712 (2022)
Article Google Scholar
Burhenne, S., et al.: Sampling based on Sobol’ sequences for Monte Carlo techniques applied to building simulations. In: Proceedings of International Conference Building Simulation, pp. 1816–1823 (2011)
Google Scholar
Cao, D., et al.: Model-free voltage regulation of unbalanced distribution network based on surrogate model and deep reinforcement learning. arXiv preprint arXiv:2006.13992 (2020)
Cherkassky, V., Ma, Y.: Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17(1), 113–126 (2004)
Article Google Scholar
Chollet, F.: Deep Learning with Python: Second Edition. Simon and Schuster (2021)
Google Scholar
Cressie, N.: The origins of Kriging. Math. Geol. 22(3), 239–252 (1990). https://doi.org/10.1007/BF00889887
Article MathSciNet Google Scholar
da Costa Paulo, B., et al.: Surrogate model of a HVAC system for PV self-consumption maximisation. Energ. Convers. Manag. X 19, 100396 (2023)
Google Scholar
Dubourg, V., Sudret, B., Bourinet, J.M.: Reliability-based design optimization using Kriging surrogates and subset simulation. Struct. Multidisc. Optim. 44(5), 673–690 (2011)
Article Google Scholar
Eriksson, L., Johansson, E., Kettaneh-Wold, N., Wikström, C., Wold, S.: Design of Experiments: Principles and Applications, 3rd edn. Umetrics AB, Stockholm (2000)
Google Scholar
Fang, H., Horstemeyer, M.F.: Global response approximation with radial basis functions. Eng. Optim. 38(4), 407–424 (2006)
Article MathSciNet Google Scholar
Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 45(1), 50–79 (2009)
Article Google Scholar
Gaspar, B., Teixeira, A.P., Soares, C.G.: Assessment of the efficiency of Kriging surrogate models for structural reliability analysis. Probab. Eng. Mech. 37, 24–34 (2014)
Article Google Scholar
Giraldo-Pérez, J.P., Mejía-Gutiérrez, R., Aguilar, J.: A reinforcement learning based energy optimization approach for household fridges. Sustain. Energ. Grids Netw. 36, 101174 (2023)
Article Google Scholar
Helton, J.C., Davis, F.J.: Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliab. Eng. Syst. Saf. 81(1), 23–69 (2003)
Article Google Scholar
Lüthen, N., Marelli, S., Sudret, B.: A spectral surrogate model for stochastic simulators computed from trajectory samples. Comput. Methods Appl. Mech. Eng. 406, 115875 (2023)
Article MathSciNet Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
Mai, H.T., Lee, J., Kang, J., Nguyen-Xuan, H., Lee, J.: An improved blind Kriging surrogate model for design optimization problems. Mathematics 10(16), 2906 (2022)
Google Scholar
McBride, K., Sundmacher, K.: Overview of surrogate modeling in chemical process engineering. Chem. Eng. Tech. 91(3), 228–239 (2019)
Google Scholar
Moustapha, M., Sudret, B., Bourinet, J.M., Guillaume, B.: Quantile-based optimization under uncertainties using adaptive Kriging surrogate models. Struct. Multidisc. Optim. 54(6), 1403–1421 (2016)
Article MathSciNet Google Scholar
Mutti, M., Pratissoli, L., Restelli, M.: Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate. arXiv preprint arXiv:2007.04640 (2021)
Pandala, S.R.: Shankarpandala/lazypredict (2024). https://github.com/shankarpandala/lazypredict
Razavi, S., Tolson, B.A., Burn, D.H.: Review of surrogate modeling in water resources. Water Resour. Res. 48(7) (2012)
Google Scholar
Sieusahai, A., Guzdial, M.: Explaining deep reinforcement learning agents in the Atari domain through a surrogate model. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, vol. 17, pp. 82–90 (2021)
Google Scholar
Simpson, T.W., Mauery, T.M., Korte, J.J., Mistree, F.: Kriging Models for Global Approximation in Simulation-Based Multidisciplinary Design Optimization. AIAA J. 39(12), 2233–2241 (2001)
Article Google Scholar
Wang, Z., Ierapetritou, M.: A novel feasibility analysis method for black-box processes using a radial basis function adaptive sampling approach. AIChE J. 63(2), 532–550 (2017)
Article Google Scholar
Williams, B.A., Cremaschi, S.: Surrogate model selection for design space approximation and surrogate based optimization. In: Muñoz, S.G., Laird, C.D., Realff, M.J. (eds.) Computer Aided Chemical Engineering, Proceedings of the 9 International Conference on Foundations of Computer-Aided Process Design, vol. 47, pp. 353–358. Elsevier (2019)
Google Scholar
Xing, J., Luo, Y., Gao, Z.: A global optimization strategy based on the Kriging surrogate model and parallel computing. Struct. Multidisc. Optim. 62(1), 405–417 (2020)
Article Google Scholar
Zhou, Y., Lu, Z.: An enhanced Kriging surrogate modeling technique for high-dimensional problems. Mech. Syst. Signal Process. 140, 106687 (2020)
Article Google Scholar

Download references

Funding

This research has been supported by the Spanish Ministry (NextGenerationEU Funds) through Project IA4TES (Grant Number: MIA.2021.M04.0008).

Author information

Authors and Affiliations

Politecnico di Milano, Milano, Italy
Julen Cestero & Marcello Restelli
Vicomtech, San Sebastian, Spain
Julen Cestero & Marco Quartulli

Authors

Julen Cestero
View author publications
You can also search for this author in PubMed Google Scholar
Marco Quartulli
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Restelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julen Cestero .

Editor information

Editors and Affiliations

IDSIA USI-SUPSI, Lugano, Switzerland
Michael Wand
Comenius University, Bratislava, Slovakia
Kristína Malinovská
KAUST Center of Generative AI, Thuwal, Saudi Arabia
Jürgen Schmidhuber
Helmholtz Zentrum München, Neuherberg, Germany
Igor V. Tetko

A Algorithm implementation details

The XGBoost models are trained using the default parameters. For that, we use the XGBoost Python package^{Footnote 1}.

The ANNs have been subjected to a hyperparameter optimization process. The main parameters are the following ones:

2 hidden layers with 512 and 256 neurons each,
learning rate of 0.001,
batch size of 64,
25 epochs for the Mujoco environments, and 10 epochs for the other environments.
Early stopping if the validation curve was increased by more than 0.001

The Gaussian surrogate (used in Kriging along with AL) has the following specifications:

Uses LHS for the general sampling procedure of each training step.
Samples \(100\,000\) space points every epoch.
If the maximum std of the sampled space is less than 0.01, the epoch is halted. Another stop condition is having added 300 points to the training set.
We train 3 epochs. For each epoch, the initial space sample is repeated to prevent overfitting.

Note that the stopping conditions for the Kriging process have been added to prevent computational problems since, after every step, the training time increased drastically.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cestero, J., Quartulli, M., Restelli, M. (2024). Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning. In: Wand, M., Malinovská, K., Schmidhuber, J., Tetko, I.V. (eds) Artificial Neural Networks and Machine Learning – ICANN 2024. ICANN 2024. Lecture Notes in Computer Science, vol 15019. Springer, Cham. https://doi.org/10.1007/978-3-031-72341-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-72341-4_23
Published: 17 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72340-7
Online ISBN: 978-3-031-72341-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Building Surrogate Models Using Trajectories of Agents Trained by Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Notes

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Algorithm implementation details

A Algorithm implementation details

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation