[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

An active-set strategy to solve Markov decision processes with good-deal risk measure

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

This paper proposes a quasi closed-form solution for the reweighting of transition probabilities in finite state, finite action distributionally robust Markov decision processes with good-deal risk measure. The relation to the expected (risk-neutral) and minimax (worst-case) discounted cumulated cost objectives is discussed, as well as possible methods for the choice of the risk measure parameters. Numerical results illustrate the computational effectiveness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abada, I., Ehrenmann, A., Smeers, Y.: Modeling gas markets with endogenous long-term contracts. Oper. Res. 65(4), 856–877 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  2. Acerbi, C.: Spectral measures of risk: a coherent representation of subjective risk aversion. J. Bank. Finance 26(7), 1505–1518 (2002)

    Article  Google Scholar 

  3. Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Prog. 95(1), 3–51 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  4. Artzner, P., Delbaen, F., Eber, J.M., Heath, D., Ku, H.: Coherent multi-period risk adjusted values and Bellman’s principle. Ann. Oper. Res. 152(1), 5–22 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Becherer, D., Kentia, K.: Good deal hedging and valuation under combined uncertainty about drift and volatility. Probab. Uncertain. Quant. Risk 2(1), 13 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (2015)

    MATH  Google Scholar 

  7. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. In: Proceedings of the 34th IEEE Conference on Decision and Control, vol. 1, pp. 560–564. IEEE (1995)

  8. Björk, T., Slinko, I.: Towards a general theory of good-deal bounds. Rev. Finance 10(2), 221–260 (2006)

    Article  MATH  Google Scholar 

  9. Chatterjee, K., Sen, K., Henzinger, T.A.: Model-checking \(\omega \)-regular properties of interval Markov chains. In: International Conference on Foundations of Software Science and Computational Structures, pp. 302–317. Springer (2008)

  10. Cheridito, P., Delbaen, F., Kupper, M., et al.: Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab. 11, 57–106 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chung, K.J., Sobel, M.J.: Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25(1), 49–62 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cochrane, J.H., Saa-Requejo, J.: Beyond arbitrage: good-deal asset price bounds in incomplete markets. J. Polit Econ 108(1), 79–119 (2000)

    Article  Google Scholar 

  13. Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595–612 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. Delbaen, F.: Coherent risk measures on general probability spaces. In: Sandmann K., Schönbucher P.J. (eds.) Advances in Finance and Stochastics. Springer, Berlin, Heidelberg (2002)

  15. Domahidi, A., Chu, E., Boyd, S.: ECOS: an SOCP solver for embedded systems. In: European Control Conference (ECC), pp. 3071–3076 (2013)

  16. Druenne, E., Ehrenmann, A., de Maere d’Aertrycke, G., Smeers, Y.: Good-deal investment valuation in stochastic generation capacity expansion problems. In: 44th Hawaii International Conference on System Sciences (HICSS), pp. 1–9. IEEE (2011)

  17. Epstein, L., Schneider, M.: Recursive multiple-priors. J. Econ. Theory 113(1), 1–31 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  18. Föllmer, H., Schied, A.: Convex measures of risk and trading constraints. Finance Stoch. 6(4), 429–447 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  19. Frittelli, M., Gianin, E.R.: Dynamic convex risk measures. In: Risk Measures for the 21st Century, pp. 227–248. Wiley, Chichester (2004)

  20. Frittelli, M., Scandolo, G.: Risk measures and capital requirements for processes. Math Finance 16(4), 589–612 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  21. Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  22. Harrison, J.M., Kreps, D.M.: Martingales and arbitrage in multiperiod securities markets. J. Econ. Theory 20(3), 381–408 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  23. Howard, R., Matheson, J.: Risk-sensitive Markov decision processes. Manag.Sci. 18(7), 356–369 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  24. Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  25. Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  26. Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308–322 (2007)

    Article  MATH  Google Scholar 

  27. Nilim, A., El Ghaoui, L.: Robustness in Markov decision problems with uncertain transition matrices. In: Advances in Neural Information Processing Systems, pp. 839–846 (2004)

  28. Nocedal, J., Wright, S.: Numerical Optimization Operations Research and Financial Engineering. Springer, New York (2006)

    Google Scholar 

  29. Pflug, G., Römisch, W.: Modeling, Measuring and Managing Risk. World Scientific, New York (2007)

    Book  MATH  Google Scholar 

  30. Pichler, A., Shapiro, A.: Risk averse stochastic programming: time consistency and optimal stopping, Preprint, arXiv:1808.10807 (2018)

  31. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  32. Rockafellar, R., Uryasev, S., Zabarankin, M.: Generalized deviations in risk analysis. Finance Stoch. 10(1), 51–74 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  33. Roorda, B., Schumacher, J.M., Engwerda, J.: Coherent acceptability measures in multi-period models. Math. Finance 15(4), 589–612 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  34. Ruszczyński, A.: Risk-averse dynamic programming for Markov decision processes. Math. Program. 125(2), 235–261 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  35. Satia, J.K., Lave Jr., R.E.: Markov decision processes with uncertain transition probabilities. Oper. Res. 21(3), 728–740 (1973)

    Article  MATH  Google Scholar 

  36. Shapiro, A.: Worst-case distribution analysis of stochastic programs. Math. Program. 107(1–2), 91–96 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  37. Shapiro, A.: Distributionally robust stochastic programming. SIAM J. Optim. 27(4), 2258–2275 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  38. Staum, J.: Fundamental theorems of asset pricing for good deal bounds. Math. Finance 14(2), 141–161 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  39. Tamar, A., Mannor, S., Xu, H.: Scaling up robust MDPs using function approximation. In: International Conference on Machine Learning, pp. 181–189 (2014)

  40. Tseng, P.: Solving H-horizon, stationary Markov decision problems in time proportional to log(H). Oper. Res. Lett. 9(5), 287–297 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  41. West, D.: Updating mean and variance estimates: an improved method. Commun. ACM 22(9), 532–535 (1979)

    Article  MATH  Google Scholar 

  42. White III, C.C., Eldeib, H.K.: Markov decision processes with imprecise transition probabilities. Oper. Res. 42(4), 739–749 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  43. Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153–183 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  44. Wu, D., Koutsoukos, X.: Reachability analysis of uncertain systems using bounded-parameter Markov decision processes. Artif. Intell. 172(8–9), 945–954 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  45. Xu, H., Mannor, S.: Distributionally robust Markov decision processes. In: Advances in Neural Information Processing Systems, pp. 2505–2513 (2010)

  46. Yu, P., Xu, H.: Distributionally robust counterpart in Markov decision processes. IEEE Trans. Autom. Control 61(9), 2538–2543 (2016)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Defourny.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tu, S., Defourny, B. An active-set strategy to solve Markov decision processes with good-deal risk measure. Optim Lett 13, 1239–1257 (2019). https://doi.org/10.1007/s11590-019-01413-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-019-01413-0

Keywords

Navigation