Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Takuya Etoh¹,
Hirotaka Takano¹ &
Junichi Murata¹

294 Accesses
Explore all metrics

Abstract

Multi-stage decision making (MSDM) problems often include changes in practical situations. For example, in the shortest route selection problems in road networks, travelling times of road sections vary depending on traffic conditions. The changes give rise to risks in adopting particular solutions to MSDM problems. Therefore, a method is proposed in this paper for solving MSDM problems considering the risks. Reinforcement learning (RL) is adopted as a method for solving those problems, and stochastic changes of action sets are treated. It is necessary to evaluate risks based on subjective views of decision makers (DMs) because the risk evaluation is by nature subjective and depends on DMs. Therefore, an RL approach is proposed which uses a new method for evaluating risks of the changes that can easily incorporate the DM’s subjective view and can be readily imbedded in reinforcement learning algorithms. The effectiveness of the method is illustrated with a road network path selection problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear scalarization in stochastic multi-objective MDPs

Article 09 December 2024

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

A multi-agent reinforcement learning based approach for intelligent traffic signal control

Article 13 September 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17(4):B-141–B-164
Article MathSciNet Google Scholar
Bertsekas DP (2007) Dynamic programming and optimal control, vol 1. Athena Scientigic, Belmont
Howard RA (1966) Dynamic programming. Manag Sci 12(5):317–348
Article Google Scholar
Wang F-Y, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction, IEEE Comput Intell Mag 39–47
Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE Press, New York
Momoh JA, Zhang Y (2005) Unit Commitment Using Adaptive Dynamic Programming
Barto Andrew G, Mahadevan Sridhar (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst Theory Appl 13:343–379
MATH Google Scholar
Merrick K, Maher ML (2009) Motivated learning from interesting events: adaptive, multitask learning agents for complex environments. Int Soc Adapt Behav 17:7–27
Article Google Scholar
Bedford T, Cooke R (2001) Probabilistic risk analysis: foundations and methods, Cambridge University Press, Cambridge
Kaplan S, Garrick J (1981) On the quantitative definition of risk. Risk Anal 1(1):11–27
Article Google Scholar
Kahneman D, Tversky A (1979) An analysis of decision under risk. Econometrica 47(2):263–292
Article MATH Google Scholar
Basak S, Shapiro A (2001) Value-at-risk-based risk management: optimal policies and asset prices. Rev Financ Stud Summer 14(2):371–405
Article Google Scholar
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Artif Intell Res 24:81–108
MATH Google Scholar
Sato M, Kobayashi S (2000) Variance-penalized reinforcement learning for risk-averse asset allocation. Proc IDEAL 2000:244–249
Google Scholar
Shibuya T (2010) A study on reinforcement learning in unstationary dynamic environments. Proc SSI 2010 3B1–3B2 (in Japanese)
Sutton RS, Barto AG (1998) Reinforcement learning—an introduction, The MIT Press, Cambridge
Howard RA (1960) Dynamic programming and markov processes. The MIT Press, Cambridge
MATH Google Scholar

Download references

Acknowledgments

This work has been partly supported by JSPS KAKENHI Grant Number 24560499.

Author information

Authors and Affiliations

Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, Japan
Takuya Etoh, Hirotaka Takano & Junichi Murata

Authors

Takuya Etoh
View author publications
You can also search for this author in PubMed Google Scholar
Hirotaka Takano
View author publications
You can also search for this author in PubMed Google Scholar
Junichi Murata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takuya Etoh.

About this article

Cite this article

Etoh, T., Takano, H. & Murata, J. Reinforcement learning approach to multi-stage decision making problems with changes in action sets. Artif Life Robotics 17, 293–299 (2012). https://doi.org/10.1007/s10015-012-0058-9

Download citation

Received: 14 March 2012
Accepted: 27 August 2012
Published: 06 November 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10015-012-0058-9

Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nonlinear scalarization in stochastic multi-objective MDPs

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

A multi-agent reinforcement learning based approach for intelligent traffic signal control

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Reinforcement learning approach to multi-stage decision making problems with changes in action sets

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Nonlinear scalarization in stochastic multi-objective MDPs

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

A multi-agent reinforcement learning based approach for intelligent traffic signal control

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation