More Web Proxy on the site http://driver.im/

Article

Algorithms for CVaR optimization in MDPs

Authors:

Mohammad GhavamzadehAuthors Info & Claims

NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

Pages 3509 - 3517

Published: 08 December 2014 Publication History

Abstract

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

References

[1]

P. Artzner, F. Delbaen, J. Eber, and D. Heath. Coherent measures of risk. Journal of Mathematical Finance, 9(3):203-228, 1999.

[2]

O. Bardou, N. Frikha, and G. Pages. Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling. Monte Carlo Methods and Applications, 15(3):173-210, 2009.

[3]

N. Bäuerle and J. Ott. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361-379, 2011.

[4]

D. Bertsekas. Nonlinear programming. Athena Scientific, 1999.

[5]

S. Bhatnagar. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59(12):760-766, 2010.

[6]

S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actor-critic algorithms. In Proceedings of Advances in Neural Information Processing Systems 20, pages 105-112, 2008.

[7]

S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, 45 (11):2471-2482, 2009.

Digital Library

[8]

S. Bhatnagar, H. Prasad, and L.A. Prashanth. Stochastic Recursive Algorithms for Optimization, volume 434. Springer, 2013.

[9]

K. Boda and J. Filar. Time consistent dynamic risk measures. Mathematical Methods of Operations Research, 63(1):169-186, 2006.

[10]

V. Borkar. A sensitivity formula for the risk-sensitive cost and the actor-critic algorithm. Systems & Control Letters, 44:339-346, 2001.

[11]

V. Borkar. Q-learning for risk-sensitive control. Mathematics of Operations Research, 27:294-311, 2002.

Digital Library

[12]

V. Borkar and R. Jain. Risk-constrained Markov decision processes. IEEE Transaction on Automatic Control, 2014.

[13]

Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone. Algorithms for CVaR optimization in MDPs. arXiv:1406.3339, 2014.

[14]

J. Filar, L. Kallenberg, and H. Lee. Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147-161, 1989.

Digital Library

[15]

J. Filar, D. Krass, and K. Ross. Percentile performance criteria for limiting average Markov decision processes. IEEE Transaction of Automatic Control, 40(1):2-10, 1995.

[16]

R. Howard and J. Matheson. Risk sensitive Markov decision processes. Management Science, 18(7): 356-369, 1972.

Digital Library

[17]

Prashanth L.A. and M. Ghavamzadeh. Actor-critic algorithms for risk-sensitive MDPs. In Proceedings of Advances in Neural Information Processing Systems 26, pages 252-260, 2013.

[18]

H. Markowitz. Portfolio Selection: Efficient Diversification of Investment. John Wiley and Sons, 1959.

[19]

T. Morimura, M. Sugiyama, M. Kashima, H. Hachiya, and T. Tanaka. Nonparametric return distribution approximation for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning, pages 799-806, 2010.

Digital Library

[20]

J. Ott. A Markov Decision Model for a Surveillance Application and Risk-Sensitive Markov Decision Processes. PhD thesis, Karlsruhe Institute of Technology, 2010.

[21]

J. Peters, S. Vijayakumar, and S. Schaal. Natural actor-critic. In Proceedings of the Sixteenth European Conference on Machine Learning, pages 280-291, 2005.

Digital Library

[22]

M. Petrik and D. Subramanian. An approximate solution method for large risk-averse Markov decision processes. In Proceedings of the 28th International Conference on Uncertainty in Artificial Intelligence, 2012.

Digital Library

[23]

R. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 26:1443-1471, 2002.

[24]

M. Sobel. The variance of discounted Markov decision processes. Applied Probability, pages 794-802, 1982.

[25]

J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37(3):332-341, 1992.

[26]

A. Tamar, D. Di Castro, and S. Mannor. Policy gradients with variance related risk criteria. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, pages 387-396, 2012.

Digital Library

[27]

A. Tamar, Y. Glassner, and S. Mannor. Policy gradients beyond expectations: Conditional value-at-risk. arXiv:1404.3862v1, 2014.

Cited By

Zhang JCheung BFinn CLevine SJayaraman DDaumé HSingh A(2020)Cautious adaptation for reinforcement learning in safety-critical settingsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525963(11055-11065)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525963
Xie TLiu BXu YGhavamzadeh MChow YLyu DYoon D(2018)A block coordinate ascent algorithm for mean-variance optimizationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3326943.3327042(1073-1083)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3326943.3327042
Santara ANaik ARavindran BDas DMudigere DAvancha SKaul BAndre EKoenig SDastani MSukthankar G(2018)RAILProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238072(2062-2063)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3238072
Show More Cited By

Index Terms

Algorithms for CVaR optimization in MDPs
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Probability and statistics
    1. Stochastic processes
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization
  2. Theory and algorithms for application domains
    1. Machine learning theory

Index terms have been assigned to the content through auto-classification.

Recommendations

CVaR Optimization for MDPs: Existence and Computation of Optimal Policies

We study the problem of Conditional Value-at-Risk (CVaR) optimization for a finite-state Markov Decision Process (MDP) with total discounted costs and the reduction of this problem to a stochastic game with perfect information. The CVaR optimization ...
Actor-critic algorithms for risk-sensitive MDPs
NIPS'13: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance-related risk measures are among the most common risk-sensitive criteria ...
Ultimately Stationary Policies to Approximate Risk-Sensitive Discounted MDPs
VALUETOOLS 2019: Proceedings of the 12th EAI International Conference on Performance Evaluation Methodologies and Tools

Risk-sensitive Markov Decision Process (RSMDP) models are less studied than linear Markov decision models. Linear models optimize only expected cost whereas RSMDP models optimize a combination of expected cost and higher moments of the cost. On the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

December 2014

3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JCheung BFinn CLevine SJayaraman DDaumé HSingh A(2020)Cautious adaptation for reinforcement learning in safety-critical settingsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525963(11055-11065)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525963
Xie TLiu BXu YGhavamzadeh MChow YLyu DYoon D(2018)A block coordinate ascent algorithm for mean-variance optimizationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3326943.3327042(1073-1083)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3326943.3327042
Santara ANaik ARavindran BDas DMudigere DAvancha SKaul BAndre EKoenig SDastani MSukthankar G(2018)RAILProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238072(2062-2063)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3238072
Gilbert HWeng PXu YSingh SMarkovitch S(2017)Optimizing quantiles in preference-based Markov decision processesProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298023.3298087(3569-3575)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3298023.3298087
Chow YGhavamzadeh MJanson LPavone M(2017)Risk-constrained reinforcement learning with percentile risk criteriaThe Journal of Machine Learning Research10.5555/3122009.324202418:1(6070-6120)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.5555/3122009.3242024
Ohsaka NYoshida YBarrett RCummings RAgichtein EGabrilovich E(2017)Portfolio Optimization for Influence SpreadProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052628(977-985)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3038912.3052628

View Options

View options

Media

Figures

Other

Tables

View Table of Contents