Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning

Milton Condori Fernandez¹⁰,
Leliane N. de Barros¹⁰,
Denis Mauá¹⁰,
Karina V. Delgado¹¹ &
…
Valdinei Freire¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

967 Accesses

Abstract

An important and often neglected aspect in probabilistic planning is how to account for different attitudes towards risk in the process. In goal-driven problems, modeled as Shortest Stochastic Path (ssp) problems, risk arises from the uncertainties on future events and how they can lead to goal states. An ssp agent that minimizes the expected accumulated cost is considered a risk-neutral agent, while with a different optimization criterion it could choose between two extreme attitudes: risk-aversion or risk-prone. In this work we consider a Risk Sensitive ssp (called rs-ssp) that uses an expected exponential utility parameterized by the risk factor \(\lambda \) that is used to define the agent’s risk attitude. Moreover, a \(\lambda \)-value is feasible if it admits a policy with finite expected cost. There are several algorithms capable of determining an optimal policy for rs-ssp s when we fix a feasible value for \(\lambda \). However, so far, there has been only one approach to find an extreme \(\lambda \) feasible i.e., an extreme risk-averse policy. In this work we propose and compare new approaches to finding the extreme feasible \(\lambda \) value for a given rs-ssp, and to return the corresponding extreme risk-averse policy. Experiments on three benchmark domains show that our proposals outperform previous approach, allowing the solution of larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 87.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 109.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Shortest Stochastic Path with Risk Sensitive Evaluation

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

References

Marcus, S.: Risk sensitive Markov decision processes. Systems and Control in the 21st Century (1997)
Google Scholar
Shen, Y., Tobia, M.J., Sommer, T., Obermayer, K.: Risk-sensitive reinforcement learning. Neural Comput. 26(7), 1298–1328 (2014)
Article MathSciNet MATH Google Scholar
Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)
Article MathSciNet MATH Google Scholar
Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)
Article MathSciNet MATH Google Scholar
Denardo, E.V., Rothblum, U.G.: Optimal stopping, exponential utility, and linear programming. Math. Program. 16(1), 228–244 (1979)
MathSciNet MATH Google Scholar
Rothblum, U.G.: Multiplicative Markov decision chains. Math. Oper. Res. 9(1), 6–24 (1984)
MathSciNet MATH Google Scholar
Patek, S.D.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37(9), 1379–1386 (2001)
MATH Google Scholar
Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002)
MATH Google Scholar
Sobel, M.J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19(4), 794–802 (1982)
MathSciNet MATH Google Scholar
Filar, J.A., Kallenberg, L.C.M., Lee, H.-M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14(1), 147–161 (1989)
MathSciNet MATH Google Scholar
Filar, J.A., Krass, D., Ross, K.W., Ross, K.W.: Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Autom. Control 40(1), 2–10 (1995)
MathSciNet MATH Google Scholar
Yu, S.X., Lin, Y., Yan, P.: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225(1), 193–223 (1998)
MathSciNet MATH Google Scholar
Hou, P., Yeoh, W., Varakantham, P.: Revisiting risk-sensitive MDPs: new algorithms and results. In: International Conference on Automated Planning and Scheduling (ICAPS) (2014)
Google Scholar
Hou, P., Yeoh, W., Varakantham, P.: Solving risk-sensitive POMDPs with and without cost observations. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3138–3144 (2016)
Google Scholar
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
MathSciNet MATH Google Scholar
Freire , V., Delgado, K.V.: Extreme risk averse policy for goal-directed risk-sensitive Markov decision process. In: Brazilian Conference on Intelligent System (BRACIS), pp. 79–84 (2016)
Google Scholar
de Freitas, E., Delgado, K., Freire, V.: Risk sensitive probabilistic planning with ILAO* and exponential utility function. In: Anais do XV Encontro Nacional de Inteligência Artificial e Computacional, (Porto Alegre, RS, Brasil), pp. 401–412. SBC (2018)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)
Google Scholar
Freire, V.: The role of discount factor in Risk Sensitive Markov Decision Processes. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 480–485, October 2016
Google Scholar
Bonet, B., Geffner, H.: Faster heuristic search algorithms for planning with uncertainty and full feedback. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1233–1238 (2003)
Google Scholar
Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)
MathSciNet MATH Google Scholar
Fernandez, M.C.: Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk. Master’s thesis, USP (2019)
Google Scholar
William, N., Ross, D., Lu, S.: Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer, no. US 6301693, B1 (2001)
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge Press, Cambridge (2004)
Google Scholar
Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC (1999)
Google Scholar
d’Epenoux, F.: A probabilistic production and inventory problem. Manag. Sci. 10(1), 98–108 (1963)
Google Scholar
Trevizan, F., Teichteil-Königsbuch, F., Thiébaux, S.: Efficient Solutions for Stochastic Shortest Path Problems with Dead Ends, pp. 1–10 (2017)
Google Scholar
Trevizan, F., Thiébaux, S., Haslum, P.: Occupation measure heuristics for probabilistic planning background: SSPs. In: International Conference on Automated Planning and Scheduling (ICAPS), pp. 306–315 (2017)
Google Scholar
Fernandez, M.C., de Barros, L.N., Delgado, K.V.: Occupation measure heuristics to solve stochastic shortest path with dead ends. In: 7th Brazilian Conference on Intelligent Systems, BRACIS 2018, São Paulo, Brazil, pp. 522–527 (2018)
Google Scholar

Download references

Acknowledgments

We thank the CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for the financial support.

Author information

Authors and Affiliations

IME – Universidade de São Paulo, São Paulo, Brazil
Milton Condori Fernandez, Leliane N. de Barros & Denis Mauá
EACH – Universidade de São Paulo, São Paulo, Brazil
Karina V. Delgado & Valdinei Freire

Authors

Milton Condori Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Leliane N. de Barros
View author publications
You can also search for this author in PubMed Google Scholar
Denis Mauá
View author publications
You can also search for this author in PubMed Google Scholar
Karina V. Delgado
View author publications
You can also search for this author in PubMed Google Scholar
Valdinei Freire
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Milton Condori Fernandez , Leliane N. de Barros , Denis Mauá , Karina V. Delgado or Valdinei Freire .

Editor information

Editors and Affiliations

Federal University of São Carlos, São Carlos, Brazil
Ricardo Cerri
Federal University of ABC, Santo Andre, Brazil
Ronaldo C. Prati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fernandez, M.C., de Barros, L.N., Mauá, D., Delgado, K.V., Freire, V. (2020). Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-61380-8_7
Published: 13 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61379-2
Online ISBN: 978-3-030-61380-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Shortest Stochastic Path with Risk Sensitive Evaluation

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Risk-Sensitive Piecewise-Linear Policy Iteration for Stochastic Shortest Path Markov Decision Processes

Shortest Stochastic Path with Risk Sensitive Evaluation

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation