Abstract
Multi-stage decision making (MSDM) problems often include changes in practical situations. For example, in the shortest route selection problems in road networks, travelling times of road sections vary depending on traffic conditions. The changes give rise to risks in adopting particular solutions to MSDM problems. Therefore, a method is proposed in this paper for solving MSDM problems considering the risks. Reinforcement learning (RL) is adopted as a method for solving those problems, and stochastic changes of action sets are treated. It is necessary to evaluate risks based on subjective views of decision makers (DMs) because the risk evaluation is by nature subjective and depends on DMs. Therefore, an RL approach is proposed which uses a new method for evaluating risks of the changes that can easily incorporate the DM’s subjective view and can be readily imbedded in reinforcement learning algorithms. The effectiveness of the method is illustrated with a road network path selection problem.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci 17(4):B-141–B-164
Bertsekas DP (2007) Dynamic programming and optimal control, vol 1. Athena Scientigic, Belmont
Howard RA (1966) Dynamic programming. Manag Sci 12(5):317–348
Wang F-Y, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction, IEEE Comput Intell Mag 39–47
Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of Learning and Approximate Dynamic Programming Wiley-IEEE Press, New York
Momoh JA, Zhang Y (2005) Unit Commitment Using Adaptive Dynamic Programming
Barto Andrew G, Mahadevan Sridhar (2003) Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst Theory Appl 13:343–379
Merrick K, Maher ML (2009) Motivated learning from interesting events: adaptive, multitask learning agents for complex environments. Int Soc Adapt Behav 17:7–27
Bedford T, Cooke R (2001) Probabilistic risk analysis: foundations and methods, Cambridge University Press, Cambridge
Kaplan S, Garrick J (1981) On the quantitative definition of risk. Risk Anal 1(1):11–27
Kahneman D, Tversky A (1979) An analysis of decision under risk. Econometrica 47(2):263–292
Basak S, Shapiro A (2001) Value-at-risk-based risk management: optimal policies and asset prices. Rev Financ Stud Summer 14(2):371–405
Geibel P, Wysotzki F (2005) Risk-sensitive reinforcement learning applied to control under constraints. J Artif Intell Res 24:81–108
Sato M, Kobayashi S (2000) Variance-penalized reinforcement learning for risk-averse asset allocation. Proc IDEAL 2000:244–249
Shibuya T (2010) A study on reinforcement learning in unstationary dynamic environments. Proc SSI 2010 3B1–3B2 (in Japanese)
Sutton RS, Barto AG (1998) Reinforcement learning—an introduction, The MIT Press, Cambridge
Howard RA (1960) Dynamic programming and markov processes. The MIT Press, Cambridge
Acknowledgments
This work has been partly supported by JSPS KAKENHI Grant Number 24560499.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Etoh, T., Takano, H. & Murata, J. Reinforcement learning approach to multi-stage decision making problems with changes in action sets. Artif Life Robotics 17, 293–299 (2012). https://doi.org/10.1007/s10015-012-0058-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10015-012-0058-9