More Web Proxy on the site http://driver.im/

research-article

Optimal decisions for continuous time Markov decision processes over finite planning horizons

Authors:

Peter Buchholz,

Iryna Dohndorf,

Dimitri ScheftelowitschAuthors Info & Claims

Computers and Operations Research, Volume 77, Issue C

Pages 267 - 278

https://doi.org/10.1016/j.cor.2016.08.003

Published: 01 January 2017 Publication History

Abstract

The computation of ź-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing ź-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular for models where the policy only rarely changes, optimal policies can be computed much faster. HighlightsA new algorithm to compute accumulated rewards for Continuous Time Markov Decision Processes with action dependent rewards over finite horizons.A proof that the algorithm guarantees a global error in O(ź) for time step ź.Experimental comparision of available algorithms to analyze accumulated rewards for Continuous Time Markov Decision Processes with action dependent rewards over finite horizons.

References

[1]

D.P. Bertsekas, Athena Scientific, Belmont, Massachusetts, 2005.

[2]

D.P. Bertsekas, Athena Scientific, Belmont, Massachusetts, 2007.

[3]

M.L. Puterman, Markov decision processes, Wiley, Hoboken, New Jersey, 2005.

[4]

A. Jensen, Markoff chains as an aid in the study of Markoff processes, Skand, Aktuarietidskr, 36 (1953) 87-91.

[5]

D. Gross, D. Miller, The randomization technique as a modeling tool and solution procedure for transient Markov processes, Oper Res, 32 (1984) 926-944.

Digital Library

[6]

A. Martin-Löfs, Optimal control of a continuous-time Markov chain with periodic transition probabilities, Oper Res, 15 (1967) 872-881.

Digital Library

[7]

B.L. Miller, Finite state continuous time Markov decision processes with a finite planning horizon, SIAM J Control, 6 (1968) 266-280.

[8]

P. Buchholz, I. Schulz, Numerical analysis of continuous time Markov decision processes over finite horizons, Comput OR, 38 (2011) 651-659.

Digital Library

[9]

M. Yasuda, On the existence of optimal control in continuous time Markov decision processes, Bull Math Stat, 15 (1972) 7-17.

[10]

M.R. Lembersky, On maximal rewards and ¿-optimal policies in continuous time Markov decision chains, Ann Stat, 2 (1974) 159-169.

[11]

W.J. Stewart, Princeton University Press, Princeton, New Jersey, 1994.

[12]

A. Rindos, S. Woolet, I. Viniotis, K. Trivedi, Exact methods for the transient analysis of nonhomogeneous continuous time Markov chains, in: Computations with Markov chains, Kluwer Academic Publishers, Boston, London, Dordrecht, 1995, pp. 121-133.

[13]

M. Arns, P. Buchholz, A. Panchenko, On the numerical analysis of inhomogeneous continuous time Markov chains, INFORMS J Comput, 22 (2010) 416-432.

Digital Library

[14]

S.A. Lippman, Applying a new device in the optimization of exponential queuing systems, Oper Res, 23 (1975) 687-710.

Digital Library

[15]

S.A. Lippman, Countable-state, continuous-time dynamic programming with structure, Oper Res, 24 (1976) 477-490.

[16]

S. Bhatnagar, M.S. Abdulla, Simulation-based optimization algorithms for finite-horizon Markov decision processes, Simulation, 84 (2008) 577-600.

Digital Library

[17]

C. Baier, B.R. Haverkort, H. Hermanns, J. Katoen, Model-checking algorithms for continuous-time Markov chains, IEEE Trans Softw Eng, 29 (2003) 524-541.

Digital Library

[18]

C. Baier, H. Hermanns, J.P. Katoen, B.R. Haverkort, Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes, Theor Comput Sci, 345 (2005) 2-26.

Digital Library

[19]

Buchholz P, Hahn EM, Hermanns H, Zhang L. Model checking algorithms for CTMDPs. In: Computer aided verification-23rd international conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings; 2011. p. 225-242.

Digital Library

[20]

J. Fearnley, M.N. Rabe, S. Schewe, L. Zhang, Efficient approximation of optimal control for continuous-time Markov games, Inf Comput, 247 (2016) 106-129.

Digital Library

[21]

R. Gallager, Stochastic processes theory for applications, Cambridge University Press, Cambridge, 2014.

[22]

R.F. Serfozo, An equivalence between continuous and discrete time Markov decision processes, Oper Res, 27 (1979) 616-620.

Digital Library

[23]

W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Cambridge University Press, Cambridge, 1992.

[24]

A. Gandhi, V. Gupta, M. Harchol-Balter, M.A. Kozuch, Optimality analysis of energy-performance trade-off for server farm management, Perform Eval, 67 (2010) 1155-1171.

Digital Library

Optimal decisions for continuous time Markov decision processes over finite planning horizons
1. Theory of computation

Recommendations

Numerical analysis of continuous time Markov decision processes over finite horizons

Continuous time Markov decision processes (CTMDPs) with a finite state and action space have been considered for a long time. It is known that under fairly general conditions the reward gained over a finite horizon can be maximized by a so-called ...
Constrained Continuous-Time Markov Decision Processes on the Finite Horizon

This paper studies the constrained (nonhomogeneous) continuous-time Markov decision processes on the finite horizon. The performance criterion to be optimized is the expected total reward on the finite horizon, while N constraints are imposed on similar ...
Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computers and Operations Research

Computers and Operations Research Volume 77, Issue C

January 2017

279 pages

ISSN:0305-0548

Issue’s Table of Contents

Copyright © Elsevier Ltd.

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 January 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents