Abstract
An important and often neglected aspect in probabilistic planning is how to account for different attitudes towards risk in the process. In goal-driven problems, modeled as Shortest Stochastic Path (ssp) problems, risk arises from the uncertainties on future events and how they can lead to goal states. An ssp agent that minimizes the expected accumulated cost is considered a risk-neutral agent, while with a different optimization criterion it could choose between two extreme attitudes: risk-aversion or risk-prone. In this work we consider a Risk Sensitive ssp (called rs-ssp) that uses an expected exponential utility parameterized by the risk factor \(\lambda \) that is used to define the agent’s risk attitude. Moreover, a \(\lambda \)-value is feasible if it admits a policy with finite expected cost. There are several algorithms capable of determining an optimal policy for rs-ssp s when we fix a feasible value for \(\lambda \). However, so far, there has been only one approach to find an extreme \(\lambda \) feasible i.e., an extreme risk-averse policy. In this work we propose and compare new approaches to finding the extreme feasible \(\lambda \) value for a given rs-ssp, and to return the corresponding extreme risk-averse policy. Experiments on three benchmark domains show that our proposals outperform previous approach, allowing the solution of larger problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Marcus, S.: Risk sensitive Markov decision processes. Systems and Control in the 21st Century (1997)
Shen, Y., Tobia, M.J., Sommer, T., Obermayer, K.: Risk-sensitive reinforcement learning. Neural Comput. 26(7), 1298–1328 (2014)
Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)
Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)
Denardo, E.V., Rothblum, U.G.: Optimal stopping, exponential utility, and linear programming. Math. Program. 16(1), 228–244 (1979)
Rothblum, U.G.: Multiplicative Markov decision chains. Math. Oper. Res. 9(1), 6–24 (1984)
Patek, S.D.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37(9), 1379–1386 (2001)
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002)
Sobel, M.J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19(4), 794–802 (1982)
Filar, J.A., Kallenberg, L.C.M., Lee, H.-M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14(1), 147–161 (1989)
Filar, J.A., Krass, D., Ross, K.W., Ross, K.W.: Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Autom. Control 40(1), 2–10 (1995)
Yu, S.X., Lin, Y., Yan, P.: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225(1), 193–223 (1998)
Hou, P., Yeoh, W., Varakantham, P.: Revisiting risk-sensitive MDPs: new algorithms and results. In: International Conference on Automated Planning and Scheduling (ICAPS) (2014)
Hou, P., Yeoh, W., Varakantham, P.: Solving risk-sensitive POMDPs with and without cost observations. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3138–3144 (2016)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
Freire , V., Delgado, K.V.: Extreme risk averse policy for goal-directed risk-sensitive Markov decision process. In: Brazilian Conference on Intelligent System (BRACIS), pp. 79–84 (2016)
de Freitas, E., Delgado, K., Freire, V.: Risk sensitive probabilistic planning with ILAO* and exponential utility function. In: Anais do XV Encontro Nacional de Inteligência Artificial e Computacional, (Porto Alegre, RS, Brasil), pp. 401–412. SBC (2018)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)
Freire, V.: The role of discount factor in Risk Sensitive Markov Decision Processes. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 480–485, October 2016
Bonet, B., Geffner, H.: Faster heuristic search algorithms for planning with uncertainty and full feedback. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1233–1238 (2003)
Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)
Fernandez, M.C.: Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk. Master’s thesis, USP (2019)
William, N., Ross, D., Lu, S.: Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer, no. US 6301693, B1 (2001)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge Press, Cambridge (2004)
Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC (1999)
d’Epenoux, F.: A probabilistic production and inventory problem. Manag. Sci. 10(1), 98–108 (1963)
Trevizan, F., Teichteil-Königsbuch, F., Thiébaux, S.: Efficient Solutions for Stochastic Shortest Path Problems with Dead Ends, pp. 1–10 (2017)
Trevizan, F., Thiébaux, S., Haslum, P.: Occupation measure heuristics for probabilistic planning background: SSPs. In: International Conference on Automated Planning and Scheduling (ICAPS), pp. 306–315 (2017)
Fernandez, M.C., de Barros, L.N., Delgado, K.V.: Occupation measure heuristics to solve stochastic shortest path with dead ends. In: 7th Brazilian Conference on Intelligent Systems, BRACIS 2018, São Paulo, Brazil, pp. 522–527 (2018)
Acknowledgments
We thank the CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for the financial support.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fernandez, M.C., de Barros, L.N., Mauá, D., Delgado, K.V., Freire, V. (2020). Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-61380-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)