[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2969033.2969218guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Algorithms for CVaR optimization in MDPs

Published: 08 December 2014 Publication History

Abstract

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

References

[1]
P. Artzner, F. Delbaen, J. Eber, and D. Heath. Coherent measures of risk. Journal of Mathematical Finance, 9(3):203-228, 1999.
[2]
O. Bardou, N. Frikha, and G. Pages. Computing VaR and CVaR using stochastic approximation and adaptive unconstrained importance sampling. Monte Carlo Methods and Applications, 15(3):173-210, 2009.
[3]
N. Bäuerle and J. Ott. Markov decision processes with average-value-at-risk criteria. Mathematical Methods of Operations Research, 74(3):361-379, 2011.
[4]
D. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
[5]
S. Bhatnagar. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59(12):760-766, 2010.
[6]
S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actor-critic algorithms. In Proceedings of Advances in Neural Information Processing Systems 20, pages 105-112, 2008.
[7]
S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, 45 (11):2471-2482, 2009.
[8]
S. Bhatnagar, H. Prasad, and L.A. Prashanth. Stochastic Recursive Algorithms for Optimization, volume 434. Springer, 2013.
[9]
K. Boda and J. Filar. Time consistent dynamic risk measures. Mathematical Methods of Operations Research, 63(1):169-186, 2006.
[10]
V. Borkar. A sensitivity formula for the risk-sensitive cost and the actor-critic algorithm. Systems & Control Letters, 44:339-346, 2001.
[11]
V. Borkar. Q-learning for risk-sensitive control. Mathematics of Operations Research, 27:294-311, 2002.
[12]
V. Borkar and R. Jain. Risk-constrained Markov decision processes. IEEE Transaction on Automatic Control, 2014.
[13]
Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone. Algorithms for CVaR optimization in MDPs. arXiv:1406.3339, 2014.
[14]
J. Filar, L. Kallenberg, and H. Lee. Variance-penalized Markov decision processes. Mathematics of Operations Research, 14(1):147-161, 1989.
[15]
J. Filar, D. Krass, and K. Ross. Percentile performance criteria for limiting average Markov decision processes. IEEE Transaction of Automatic Control, 40(1):2-10, 1995.
[16]
R. Howard and J. Matheson. Risk sensitive Markov decision processes. Management Science, 18(7): 356-369, 1972.
[17]
Prashanth L.A. and M. Ghavamzadeh. Actor-critic algorithms for risk-sensitive MDPs. In Proceedings of Advances in Neural Information Processing Systems 26, pages 252-260, 2013.
[18]
H. Markowitz. Portfolio Selection: Efficient Diversification of Investment. John Wiley and Sons, 1959.
[19]
T. Morimura, M. Sugiyama, M. Kashima, H. Hachiya, and T. Tanaka. Nonparametric return distribution approximation for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning, pages 799-806, 2010.
[20]
J. Ott. A Markov Decision Model for a Surveillance Application and Risk-Sensitive Markov Decision Processes. PhD thesis, Karlsruhe Institute of Technology, 2010.
[21]
J. Peters, S. Vijayakumar, and S. Schaal. Natural actor-critic. In Proceedings of the Sixteenth European Conference on Machine Learning, pages 280-291, 2005.
[22]
M. Petrik and D. Subramanian. An approximate solution method for large risk-averse Markov decision processes. In Proceedings of the 28th International Conference on Uncertainty in Artificial Intelligence, 2012.
[23]
R. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of Risk, 26:1443-1471, 2002.
[24]
M. Sobel. The variance of discounted Markov decision processes. Applied Probability, pages 794-802, 1982.
[25]
J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37(3):332-341, 1992.
[26]
A. Tamar, D. Di Castro, and S. Mannor. Policy gradients with variance related risk criteria. In Proceedings of the Twenty-Ninth International Conference on Machine Learning, pages 387-396, 2012.
[27]
A. Tamar, Y. Glassner, and S. Mannor. Policy gradients beyond expectations: Conditional value-at-risk. arXiv:1404.3862v1, 2014.

Cited By

View all
  • (2020)Cautious adaptation for reinforcement learning in safety-critical settingsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525963(11055-11065)Online publication date: 13-Jul-2020
  • (2018)A block coordinate ascent algorithm for mean-variance optimizationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3326943.3327042(1073-1083)Online publication date: 3-Dec-2018
  • (2018)RAILProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238072(2062-2063)Online publication date: 9-Jul-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2
December 2014
3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Cautious adaptation for reinforcement learning in safety-critical settingsProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525963(11055-11065)Online publication date: 13-Jul-2020
  • (2018)A block coordinate ascent algorithm for mean-variance optimizationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3326943.3327042(1073-1083)Online publication date: 3-Dec-2018
  • (2018)RAILProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3238072(2062-2063)Online publication date: 9-Jul-2018
  • (2017)Optimizing quantiles in preference-based Markov decision processesProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298023.3298087(3569-3575)Online publication date: 4-Feb-2017
  • (2017)Risk-constrained reinforcement learning with percentile risk criteriaThe Journal of Machine Learning Research10.5555/3122009.324202418:1(6070-6120)Online publication date: 1-Jan-2017
  • (2017)Portfolio Optimization for Influence SpreadProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052628(977-985)Online publication date: 3-Apr-2017

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media