Abstract
Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is to maximize the expected rewards; however, this may cause a fatal state because safety is not considered. Therefore, RL methods that consider safety during or after learning have been proposed. SafeRL is similar to MORL because it considers two objectives, i.e., maximizing expected rewards and satisfying safety constraints. However, to the best of our knowledge, no study has investigated the relationship between MORL and SafeRL to demonstrate that the SafeRL method can be applied to MORL tasks. This paper combines MORL with SafeRL and proposes a method for Multi-Objective SafeRL (MOSafeRL). We applied the proposed method to resource gathering task, which is a standard task used in MORL test cases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge
Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84:51–80
García J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16:1437–1480
Aissani N, Beldjilali, Trentesaux D (2008) Efficient and effective reactive scheduling of manufacturing system using SARSA multi-objective agents. In: Proc of the 7th Int’l Conf on Modeling and Simulation, pp 698–707
Van Moffaert K, Drugan MM, Nowé A (2013) Scalarized multi-objective reinforcement learning: novel design techniques. In: Proc of 2013 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn, pp 191–199
Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. In: Proc of the 15th Int’l Conf on Mach Learn, pp 197–205
Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. In: Proc of the 25th Int’l Conf on Mach Learn, pp 41–47
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of Pareto dominating policies. J Mach Learn Res 15:3663–3692
Basu A, Bhattacharyya T, Borkar VS (2008) A learning algorithm for risk-sensitive cost. Math Oper Res 33(4):880–898
Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math Oper Res 27(1):192–209
Borkar VS (2002) Q-learning for risk-sensitive control. Math Oper Res 27(2):294–311
Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49(2–3):267–290
Sato M, Kimura H, Kobayashi S (2002) TD algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif Intell 16(3):353–362 (in Japanese)
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Mach Learn Res 24:81–108
Takeyama D, Kanoh M, Matsui T, Nakamura T (2015) Obtaining robot’s behavior to avoid danger by using probability based reinforcement learning. J Jpn Soc Fuzzy Theory Intell Inform 27(6):877–884 (in Japanese)
Horie N, Matsui T, Moriyama K, Mutoh A, Inuzuka N (2016) Reinforcement learning based on action values combined with success probability and profit. In: Proc of the 30th Ann Conf of the Jpn Soc for Artif Intell, 1M2-4 (in Japanese)
Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Proc of the 7th Int’l Conf on Evol Multi-Criterion Opt, pp 352–366
Wiering M, Withagen M, Drugan M (2014) Model-based multi-objective reinforcement learning. In: Proc of 2014 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn
Wang W, Sebag M (2013) Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search. Mach Learn 92:403–429
Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms: a comparative case study. In: Proc of the 5th Int’l Conf on Parallel Problem Solving from Nature, pp 292-301
Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the hypervolume indicator: optimal \(\mu\)-distributions and the choice of the reference point. In: Proc of the 10th ACM SIGEVO Workshop on Found of Genetic Algorithms
Künzel S, Meyer-Nieberg S (2018) Evolving artificial neural networks for multi-objective tasks. In: Proc of the 21st Int’l Conf on Appl of Evol Comput, pp 671–686
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was presented in part at the 23rd International Symposium on Artificial Life and Robotics, Beppu, Oita, January 18–20, 2018.
About this article
Cite this article
Horie, N., Matsui, T., Moriyama, K. et al. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning. Artif Life Robotics 24, 352–359 (2019). https://doi.org/10.1007/s10015-019-00523-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-019-00523-3