[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning

  • Conference paper
  • First Online:
Intelligent Systems (BRACIS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12320))

Included in the following conference series:

  • 967 Accesses

Abstract

An important and often neglected aspect in probabilistic planning is how to account for different attitudes towards risk in the process. In goal-driven problems, modeled as Shortest Stochastic Path (ssp) problems, risk arises from the uncertainties on future events and how they can lead to goal states. An ssp agent that minimizes the expected accumulated cost is considered a risk-neutral agent, while with a different optimization criterion it could choose between two extreme attitudes: risk-aversion or risk-prone. In this work we consider a Risk Sensitive ssp (called rs-ssp) that uses an expected exponential utility parameterized by the risk factor \(\lambda \) that is used to define the agent’s risk attitude. Moreover, a \(\lambda \)-value is feasible if it admits a policy with finite expected cost. There are several algorithms capable of determining an optimal policy for rs-ssp s when we fix a feasible value for \(\lambda \). However, so far, there has been only one approach to find an extreme \(\lambda \) feasible i.e., an extreme risk-averse policy. In this work we propose and compare new approaches to finding the extreme feasible \(\lambda \) value for a given rs-ssp, and to return the corresponding extreme risk-averse policy. Experiments on three benchmark domains show that our proposals outperform previous approach, allowing the solution of larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 87.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Marcus, S.: Risk sensitive Markov decision processes. Systems and Control in the 21st Century (1997)

    Google Scholar 

  2. Shen, Y., Tobia, M.J., Sommer, T., Obermayer, K.: Risk-sensitive reinforcement learning. Neural Comput. 26(7), 1298–1328 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  3. Howard, R.A., Matheson, J.E.: Risk-sensitive Markov decision processes. Manag. Sci. 18(7), 356–369 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  4. Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  5. Denardo, E.V., Rothblum, U.G.: Optimal stopping, exponential utility, and linear programming. Math. Program. 16(1), 228–244 (1979)

    MathSciNet  MATH  Google Scholar 

  6. Rothblum, U.G.: Multiplicative Markov decision chains. Math. Oper. Res. 9(1), 6–24 (1984)

    MathSciNet  MATH  Google Scholar 

  7. Patek, S.D.: On terminating Markov decision processes with a risk-averse objective function. Automatica 37(9), 1379–1386 (2001)

    MATH  Google Scholar 

  8. Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Mach. Learn. 49(2), 267–290 (2002)

    MATH  Google Scholar 

  9. Sobel, M.J.: The variance of discounted Markov decision processes. J. Appl. Probab. 19(4), 794–802 (1982)

    MathSciNet  MATH  Google Scholar 

  10. Filar, J.A., Kallenberg, L.C.M., Lee, H.-M.: Variance-penalized Markov decision processes. Math. Oper. Res. 14(1), 147–161 (1989)

    MathSciNet  MATH  Google Scholar 

  11. Filar, J.A., Krass, D., Ross, K.W., Ross, K.W.: Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Autom. Control 40(1), 2–10 (1995)

    MathSciNet  MATH  Google Scholar 

  12. Yu, S.X., Lin, Y., Yan, P.: Optimization models for the first arrival target distribution function in discrete time. J. Math. Anal. Appl. 225(1), 193–223 (1998)

    MathSciNet  MATH  Google Scholar 

  13. Hou, P., Yeoh, W., Varakantham, P.: Revisiting risk-sensitive MDPs: new algorithms and results. In: International Conference on Automated Planning and Scheduling (ICAPS) (2014)

    Google Scholar 

  14. Hou, P., Yeoh, W., Varakantham, P.: Solving risk-sensitive POMDPs with and without cost observations. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3138–3144 (2016)

    Google Scholar 

  15. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  16. Freire , V., Delgado, K.V.: Extreme risk averse policy for goal-directed risk-sensitive Markov decision process. In: Brazilian Conference on Intelligent System (BRACIS), pp. 79–84 (2016)

    Google Scholar 

  17. de Freitas, E., Delgado, K., Freire, V.: Risk sensitive probabilistic planning with ILAO* and exponential utility function. In: Anais do XV Encontro Nacional de Inteligência Artificial e Computacional, (Porto Alegre, RS, Brasil), pp. 401–412. SBC (2018)

    Google Scholar 

  18. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)

    Google Scholar 

  19. Freire, V.: The role of discount factor in Risk Sensitive Markov Decision Processes. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 480–485, October 2016

    Google Scholar 

  20. Bonet, B., Geffner, H.: Faster heuristic search algorithms for planning with uncertainty and full feedback. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1233–1238 (2003)

    Google Scholar 

  21. Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)

    MathSciNet  MATH  Google Scholar 

  22. Fernandez, M.C.: Heuristics based on projection occupation measures for probabilistic planning with dead-ends and risk. Master’s thesis, USP (2019)

    Google Scholar 

  23. William, N., Ross, D., Lu, S.: Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer, no. US 6301693, B1 (2001)

    Google Scholar 

  24. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge Press, Cambridge (2004)

    Google Scholar 

  25. Altman, E.: Constrained Markov Decision Processes. Chapman & Hall/CRC (1999)

    Google Scholar 

  26. d’Epenoux, F.: A probabilistic production and inventory problem. Manag. Sci. 10(1), 98–108 (1963)

    Google Scholar 

  27. Trevizan, F., Teichteil-Königsbuch, F., Thiébaux, S.: Efficient Solutions for Stochastic Shortest Path Problems with Dead Ends, pp. 1–10 (2017)

    Google Scholar 

  28. Trevizan, F., Thiébaux, S., Haslum, P.: Occupation measure heuristics for probabilistic planning background: SSPs. In: International Conference on Automated Planning and Scheduling (ICAPS), pp. 306–315 (2017)

    Google Scholar 

  29. Fernandez, M.C., de Barros, L.N., Delgado, K.V.: Occupation measure heuristics to solve stochastic shortest path with dead ends. In: 7th Brazilian Conference on Intelligent Systems, BRACIS 2018, São Paulo, Brazil, pp. 522–527 (2018)

    Google Scholar 

Download references

Acknowledgments

We thank the CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) for the financial support.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Milton Condori Fernandez , Leliane N. de Barros , Denis Mauá , Karina V. Delgado or Valdinei Freire .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fernandez, M.C., de Barros, L.N., Mauá, D., Delgado, K.V., Freire, V. (2020). Finding Feasible Policies for Extreme Risk-Averse Agents in Probabilistic Planning. In: Cerri, R., Prati, R.C. (eds) Intelligent Systems. BRACIS 2020. Lecture Notes in Computer Science(), vol 12320. Springer, Cham. https://doi.org/10.1007/978-3-030-61380-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61380-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61379-2

  • Online ISBN: 978-3-030-61380-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics