Abstract
This paper proposes a quasi closed-form solution for the reweighting of transition probabilities in finite state, finite action distributionally robust Markov decision processes with good-deal risk measure. The relation to the expected (risk-neutral) and minimax (worst-case) discounted cumulated cost objectives is discussed, as well as possible methods for the choice of the risk measure parameters. Numerical results illustrate the computational effectiveness of the proposed approach.
Similar content being viewed by others
References
Abada, I., Ehrenmann, A., Smeers, Y.: Modeling gas markets with endogenous long-term contracts. Oper. Res. 65(4), 856–877 (2017)
Acerbi, C.: Spectral measures of risk: a coherent representation of subjective risk aversion. J. Bank. Finance 26(7), 1505–1518 (2002)
Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Prog. 95(1), 3–51 (2003)
Artzner, P., Delbaen, F., Eber, J.M., Heath, D., Ku, H.: Coherent multi-period risk adjusted values and Bellman’s principle. Ann. Oper. Res. 152(1), 5–22 (2007)
Becherer, D., Kentia, K.: Good deal hedging and valuation under combined uncertainty about drift and volatility. Probab. Uncertain. Quant. Risk 2(1), 13 (2017)
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming. Princeton University Press, Princeton (2015)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-dynamic programming: an overview. In: Proceedings of the 34th IEEE Conference on Decision and Control, vol. 1, pp. 560–564. IEEE (1995)
Björk, T., Slinko, I.: Towards a general theory of good-deal bounds. Rev. Finance 10(2), 221–260 (2006)
Chatterjee, K., Sen, K., Henzinger, T.A.: Model-checking \(\omega \)-regular properties of interval Markov chains. In: International Conference on Foundations of Software Science and Computational Structures, pp. 302–317. Springer (2008)
Cheridito, P., Delbaen, F., Kupper, M., et al.: Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Probab. 11, 57–106 (2006)
Chung, K.J., Sobel, M.J.: Discounted MDPs: distribution functions and exponential utility maximization. SIAM J. Control Optim. 25(1), 49–62 (1987)
Cochrane, J.H., Saa-Requejo, J.: Beyond arbitrage: good-deal asset price bounds in incomplete markets. J. Polit Econ 108(1), 79–119 (2000)
Delage, E., Ye, Y.: Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595–612 (2010)
Delbaen, F.: Coherent risk measures on general probability spaces. In: Sandmann K., Schönbucher P.J. (eds.) Advances in Finance and Stochastics. Springer, Berlin, Heidelberg (2002)
Domahidi, A., Chu, E., Boyd, S.: ECOS: an SOCP solver for embedded systems. In: European Control Conference (ECC), pp. 3071–3076 (2013)
Druenne, E., Ehrenmann, A., de Maere d’Aertrycke, G., Smeers, Y.: Good-deal investment valuation in stochastic generation capacity expansion problems. In: 44th Hawaii International Conference on System Sciences (HICSS), pp. 1–9. IEEE (2011)
Epstein, L., Schneider, M.: Recursive multiple-priors. J. Econ. Theory 113(1), 1–31 (2003)
Föllmer, H., Schied, A.: Convex measures of risk and trading constraints. Finance Stoch. 6(4), 429–447 (2002)
Frittelli, M., Gianin, E.R.: Dynamic convex risk measures. In: Risk Measures for the 21st Century, pp. 227–248. Wiley, Chichester (2004)
Frittelli, M., Scandolo, G.: Risk measures and capital requirements for processes. Math Finance 16(4), 589–612 (2006)
Givan, R., Leach, S., Dean, T.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)
Harrison, J.M., Kreps, D.M.: Martingales and arbitrage in multiperiod securities markets. J. Econ. Theory 20(3), 381–408 (1979)
Howard, R., Matheson, J.: Risk-sensitive Markov decision processes. Manag.Sci. 18(7), 356–369 (1972)
Iyengar, G.N.: Robust dynamic programming. Math. Oper. Res. 30(2), 257–280 (2005)
Jaquette, S.C.: A utility criterion for Markov decision processes. Manag. Sci. 23(1), 43–49 (1976)
Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Manag. Sci. 53(2), 308–322 (2007)
Nilim, A., El Ghaoui, L.: Robustness in Markov decision problems with uncertain transition matrices. In: Advances in Neural Information Processing Systems, pp. 839–846 (2004)
Nocedal, J., Wright, S.: Numerical Optimization Operations Research and Financial Engineering. Springer, New York (2006)
Pflug, G., Römisch, W.: Modeling, Measuring and Managing Risk. World Scientific, New York (2007)
Pichler, A., Shapiro, A.: Risk averse stochastic programming: time consistency and optimal stopping, Preprint, arXiv:1808.10807 (2018)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)
Rockafellar, R., Uryasev, S., Zabarankin, M.: Generalized deviations in risk analysis. Finance Stoch. 10(1), 51–74 (2006)
Roorda, B., Schumacher, J.M., Engwerda, J.: Coherent acceptability measures in multi-period models. Math. Finance 15(4), 589–612 (2005)
Ruszczyński, A.: Risk-averse dynamic programming for Markov decision processes. Math. Program. 125(2), 235–261 (2010)
Satia, J.K., Lave Jr., R.E.: Markov decision processes with uncertain transition probabilities. Oper. Res. 21(3), 728–740 (1973)
Shapiro, A.: Worst-case distribution analysis of stochastic programs. Math. Program. 107(1–2), 91–96 (2006)
Shapiro, A.: Distributionally robust stochastic programming. SIAM J. Optim. 27(4), 2258–2275 (2017)
Staum, J.: Fundamental theorems of asset pricing for good deal bounds. Math. Finance 14(2), 141–161 (2004)
Tamar, A., Mannor, S., Xu, H.: Scaling up robust MDPs using function approximation. In: International Conference on Machine Learning, pp. 181–189 (2014)
Tseng, P.: Solving H-horizon, stationary Markov decision problems in time proportional to log(H). Oper. Res. Lett. 9(5), 287–297 (1990)
West, D.: Updating mean and variance estimates: an improved method. Commun. ACM 22(9), 532–535 (1979)
White III, C.C., Eldeib, H.K.: Markov decision processes with imprecise transition probabilities. Oper. Res. 42(4), 739–749 (1994)
Wiesemann, W., Kuhn, D., Rustem, B.: Robust Markov decision processes. Math. Oper. Res. 38(1), 153–183 (2013)
Wu, D., Koutsoukos, X.: Reachability analysis of uncertain systems using bounded-parameter Markov decision processes. Artif. Intell. 172(8–9), 945–954 (2008)
Xu, H., Mannor, S.: Distributionally robust Markov decision processes. In: Advances in Neural Information Processing Systems, pp. 2505–2513 (2010)
Yu, P., Xu, H.: Distributionally robust counterpart in Markov decision processes. IEEE Trans. Autom. Control 61(9), 2538–2543 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tu, S., Defourny, B. An active-set strategy to solve Markov decision processes with good-deal risk measure. Optim Lett 13, 1239–1257 (2019). https://doi.org/10.1007/s11590-019-01413-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-019-01413-0