Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Naoto Horie¹^nAff3,
Tohgoroh Matsui²,
Koichi Moriyama¹,
Atsuko Mutoh¹ &
…
Nobuhiro Inuzuka¹

1460 Accesses
8 Citations
Explore all metrics

Abstract

Reinforcement learning (RL) is a learning method that learns actions based on trial and error. Recently, multi-objective reinforcement learning (MORL) and safe reinforcement learning (SafeRL) have been studied. The objective of conventional RL is to maximize the expected rewards; however, this may cause a fatal state because safety is not considered. Therefore, RL methods that consider safety during or after learning have been proposed. SafeRL is similar to MORL because it considers two objectives, i.e., maximizing expected rewards and satisfying safety constraints. However, to the best of our knowledge, no study has investigated the relationship between MORL and SafeRL to demonstrate that the SafeRL method can be applied to MORL tasks. This paper combines MORL with SafeRL and proposes a method for Multi-Objective SafeRL (MOSafeRL). We applied the proposed method to resource gathering task, which is a standard task used in MORL test cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems

Safe Offline Reinforcement Learning Through Hierarchical Policies

Evaluation of Safe Reinforcement Learning with CoMirror Algorithm in a Non-Markovian Reward Problem

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT press, Cambridge
MATH Google Scholar
Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach Learn 84:51–80
Article MathSciNet Google Scholar
García J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16:1437–1480
MathSciNet MATH Google Scholar
Aissani N, Beldjilali, Trentesaux D (2008) Efficient and effective reactive scheduling of manufacturing system using SARSA multi-objective agents. In: Proc of the 7th Int’l Conf on Modeling and Simulation, pp 698–707
Van Moffaert K, Drugan MM, Nowé A (2013) Scalarized multi-objective reinforcement learning: novel design techniques. In: Proc of 2013 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn, pp 191–199
Gábor Z, Kalmár Z, Szepesvári C (1998) Multi-criteria reinforcement learning. In: Proc of the 15th Int’l Conf on Mach Learn, pp 197–205
Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. In: Proc of the 25th Int’l Conf on Mach Learn, pp 41–47
Van Moffaert K, Nowé A (2014) Multi-objective reinforcement learning using sets of Pareto dominating policies. J Mach Learn Res 15:3663–3692
MathSciNet MATH Google Scholar
Basu A, Bhattacharyya T, Borkar VS (2008) A learning algorithm for risk-sensitive cost. Math Oper Res 33(4):880–898
Article MathSciNet MATH Google Scholar
Borkar VS, Meyn SP (2002) Risk-sensitive optimal control for Markov decision processes with monotone cost. Math Oper Res 27(1):192–209
Article MathSciNet MATH Google Scholar
Borkar VS (2002) Q-learning for risk-sensitive control. Math Oper Res 27(2):294–311
Article MathSciNet MATH Google Scholar
Mihatsch O, Neuneier R (2002) Risk-sensitive reinforcement learning. Mach Learn 49(2–3):267–290
Article MATH Google Scholar
Sato M, Kimura H, Kobayashi S (2002) TD algorithm for the variance of return and mean-variance reinforcement learning. Trans Jpn Soc Artif Intell 16(3):353–362 (in Japanese)
Article Google Scholar
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Mach Learn Res 24:81–108
MATH Google Scholar
Takeyama D, Kanoh M, Matsui T, Nakamura T (2015) Obtaining robot’s behavior to avoid danger by using probability based reinforcement learning. J Jpn Soc Fuzzy Theory Intell Inform 27(6):877–884 (in Japanese)
Google Scholar
Horie N, Matsui T, Moriyama K, Mutoh A, Inuzuka N (2016) Reinforcement learning based on action values combined with success probability and profit. In: Proc of the 30th Ann Conf of the Jpn Soc for Artif Intell, 1M2-4 (in Japanese)
Van Moffaert K, Drugan MM, Nowé A (2013) Hypervolume-based multi-objective reinforcement learning. In: Proc of the 7th Int’l Conf on Evol Multi-Criterion Opt, pp 352–366
Wiering M, Withagen M, Drugan M (2014) Model-based multi-objective reinforcement learning. In: Proc of 2014 IEEE Sympo on Adapt Dyn Progr and Reinforce Learn
Wang W, Sebag M (2013) Hypervolume indicator and dominance reward based multi-objective Monte-Carlo tree search. Mach Learn 92:403–429
Article MathSciNet MATH Google Scholar
Zitzler E, Thiele L (1998) Multiobjective optimization using evolutionary algorithms: a comparative case study. In: Proc of the 5th Int’l Conf on Parallel Problem Solving from Nature, pp 292-301
Auger A, Bader J, Brockhoff D, Zitzler E (2009) Theory of the hypervolume indicator: optimal \(\mu\)-distributions and the choice of the reference point. In: Proc of the 10th ACM SIGEVO Workshop on Found of Genetic Algorithms
Künzel S, Meyer-Nieberg S (2018) Evolving artificial neural networks for multi-objective tasks. In: Proc of the 21st Int’l Conf on Appl of Evol Comput, pp 671–686

Download references

Author information

Naoto Horie
Present address: Meitetsucom Co. Ltd., 1-21-12 Meieki-minami, Nakamura-ku, Nagoya, 450-0003, Japan

Authors and Affiliations

Department of Computer Science, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya, 466-8555, Japan
Naoto Horie, Koichi Moriyama, Atsuko Mutoh & Nobuhiro Inuzuka
Department of Clinical Engineering, College of Life and Health Sciences, Chubu University, 1200 Matsumoto-cho, Kasugai, 487-8501, Japan
Tohgoroh Matsui

Authors

Naoto Horie
View author publications
You can also search for this author in PubMed Google Scholar
Tohgoroh Matsui
View author publications
You can also search for this author in PubMed Google Scholar
Koichi Moriyama
View author publications
You can also search for this author in PubMed Google Scholar
Atsuko Mutoh
View author publications
You can also search for this author in PubMed Google Scholar
Nobuhiro Inuzuka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tohgoroh Matsui.

Additional information

This work was presented in part at the 23rd International Symposium on Artificial Life and Robotics, Beppu, Oita, January 18–20, 2018.

About this article

Cite this article

Horie, N., Matsui, T., Moriyama, K. et al. Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning. Artif Life Robotics 24, 352–359 (2019). https://doi.org/10.1007/s10015-019-00523-3

Download citation

Received: 10 April 2018
Accepted: 02 December 2018
Published: 08 February 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10015-019-00523-3

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems

Safe Offline Reinforcement Learning Through Hierarchical Policies

Evaluation of Safe Reinforcement Learning with CoMirror Algorithm in a Non-Markovian Reward Problem

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems

Safe Offline Reinforcement Learning Through Hierarchical Policies

Evaluation of Safe Reinforcement Learning with CoMirror Algorithm in a Non-Markovian Reward Problem

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation