Abstract
Traditional multi-objective reinforcement learning problems pay attention to the expected return of each objective under different preferences. However, the difference in strategy in practice is also important. This paper proposes an algorithm Multi-objective RL with Preference Exploration (MoPE), which can cover the optimal solutions under different objective preferences as much as possible with only one trained model. Specifically, the coverage of the optimal solution is improved by exploring the preference space in the sampling stage and reusing samples with similar preferences in the training stage. Furthermore, for different preference inputs, a variety of diversity strategies that conform to the preference can be generated by maximizing the mutual information of preference and state based on a method of information theory. Compared with the existing methods, our algorithm can implement more diverse strategies on the premise of ensuring the coverage of the optimal solution.
This work is supported by the National Natural Science Foundation of China (62073176). All the authors are with the Institute of Robotics and Automatic Information System, College of Artificial Intelligence, Nankai University, China.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Hayes, C.F., Rădulescu, R., Bargiacchi, E., et al.: A practical guide to multi-objective reinforcement learning and planning. arXiv preprint arXiv:2103.09568 (2021)
Czarnecki, W.M., Gidel, G., Tracey, B., et al.: Real world games look like spinning tops. Adv. Neural. Inf. Process. Syst. 33, 17443–17454 (2020)
Roijers, D.M., Whiteson, S.: Multi-objective decision making. Synth. Lect. Artif. Intell. Mach. Learn. 11(1), 1–129 (2017)
Shen, R., Zheng, Y., Hao, J., et al.: Generating behavior-diverse game AIs with evolutionary multi-objective deep reinforcement learning. In: IJCAI, pp. 3371–3377 (2020)
Friedman, E., Fontaine, F.: Generalizing across multi-objective reward functions in deep reinforcement learning. arXiv preprint arXiv:1809.06364 (2018)
Andrychowicz, M., Wolski, F., Ray, A., et al.: Hindsight experience replay. Adv. Neural Inf. Process. Syst. 30 (2017)
Abels, A., Roijers, D., Lenaerts, T., et al.: Dynamic weights in multi-objective deep reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 11–20 (2019)
Yang R, Sun X, Narasimhan K. A generalized algorithm for multi-objective reinforcement learning and policy adaptation[J]. Advances in Neural Information Processing Systems, 2019, 32
Wu, Z., Li, K., Zhao, E., et al.: L2e: learning to exploit your opponent. arXiv preprint arXiv:2102.09381 (2021)
Eysenbach, B., Gupta, A., Ibarz, J., et al.: Diversity is all you need: learning skills without a reward function. arXiv preprint arXiv:1802.06070 (2018)
Schaul, T., Quan, J., Antonoglou, I., et al.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Vamplew, P., Yearwood, J., Dazeley, R., et al.: On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts. In: Australasian Joint Conference on Artificial Intelligence. Springer, Berlin, Heidelberg, pp. 372–378 (2008)
Strehl, A.L., Littman, M.L.: An analysis of model-based interval estimation for Markov decision processes. J. Comput. Syst. Sci. 74(8), 1309–1331 (2008)
Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Gábor, Z., Kalmár, Z., Szepesvári, C.: Multi-criteria reinforcement learning. In: ICML, vol. 98, pp. 197–205 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xi, W., Guo, X. (2022). Multi-objective RL with Preference Exploration. In: Liu, H., et al. Intelligent Robotics and Applications. ICIRA 2022. Lecture Notes in Computer Science(), vol 13455. Springer, Cham. https://doi.org/10.1007/978-3-031-13844-7_62
Download citation
DOI: https://doi.org/10.1007/978-3-031-13844-7_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13843-0
Online ISBN: 978-3-031-13844-7
eBook Packages: Computer ScienceComputer Science (R0)