[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Optimal decisions for continuous time Markov decision processes over finite planning horizons

Published: 01 January 2017 Publication History

Abstract

The computation of ź-optimal policies for continuous time Markov decision processes (CTMDPs) over finite time intervals is a sophisticated problem because the optimal policy may change at arbitrary times. Numerical algorithms based on time discretization or uniformization have been proposed for the computation of optimal policies. The uniformization based algorithm has shown to be more reliable and often also more efficient but is currently only available for processes where the gain or reward does not depend on the decision taken in a state. In this paper, we present two new uniformization based algorithms for computing ź-optimal policies for CTMDPs with decision dependent rewards over a finite time horizon. Due to a new and tighter upper bound the newly proposed algorithms cannot only be applied for decision dependent rewards, they also outperform the available approach for rewards that do not depend on the decision. In particular for models where the policy only rarely changes, optimal policies can be computed much faster. HighlightsA new algorithm to compute accumulated rewards for Continuous Time Markov Decision Processes with action dependent rewards over finite horizons.A proof that the algorithm guarantees a global error in O(ź) for time step ź.Experimental comparision of available algorithms to analyze accumulated rewards for Continuous Time Markov Decision Processes with action dependent rewards over finite horizons.

References

[1]
D.P. Bertsekas, Athena Scientific, Belmont, Massachusetts, 2005.
[2]
D.P. Bertsekas, Athena Scientific, Belmont, Massachusetts, 2007.
[3]
M.L. Puterman, Markov decision processes, Wiley, Hoboken, New Jersey, 2005.
[4]
A. Jensen, Markoff chains as an aid in the study of Markoff processes, Skand, Aktuarietidskr, 36 (1953) 87-91.
[5]
D. Gross, D. Miller, The randomization technique as a modeling tool and solution procedure for transient Markov processes, Oper Res, 32 (1984) 926-944.
[6]
A. Martin-Löfs, Optimal control of a continuous-time Markov chain with periodic transition probabilities, Oper Res, 15 (1967) 872-881.
[7]
B.L. Miller, Finite state continuous time Markov decision processes with a finite planning horizon, SIAM J Control, 6 (1968) 266-280.
[8]
P. Buchholz, I. Schulz, Numerical analysis of continuous time Markov decision processes over finite horizons, Comput OR, 38 (2011) 651-659.
[9]
M. Yasuda, On the existence of optimal control in continuous time Markov decision processes, Bull Math Stat, 15 (1972) 7-17.
[10]
M.R. Lembersky, On maximal rewards and ¿-optimal policies in continuous time Markov decision chains, Ann Stat, 2 (1974) 159-169.
[11]
W.J. Stewart, Princeton University Press, Princeton, New Jersey, 1994.
[12]
A. Rindos, S. Woolet, I. Viniotis, K. Trivedi, Exact methods for the transient analysis of nonhomogeneous continuous time Markov chains, in: Computations with Markov chains, Kluwer Academic Publishers, Boston, London, Dordrecht, 1995, pp. 121-133.
[13]
M. Arns, P. Buchholz, A. Panchenko, On the numerical analysis of inhomogeneous continuous time Markov chains, INFORMS J Comput, 22 (2010) 416-432.
[14]
S.A. Lippman, Applying a new device in the optimization of exponential queuing systems, Oper Res, 23 (1975) 687-710.
[15]
S.A. Lippman, Countable-state, continuous-time dynamic programming with structure, Oper Res, 24 (1976) 477-490.
[16]
S. Bhatnagar, M.S. Abdulla, Simulation-based optimization algorithms for finite-horizon Markov decision processes, Simulation, 84 (2008) 577-600.
[17]
C. Baier, B.R. Haverkort, H. Hermanns, J. Katoen, Model-checking algorithms for continuous-time Markov chains, IEEE Trans Softw Eng, 29 (2003) 524-541.
[18]
C. Baier, H. Hermanns, J.P. Katoen, B.R. Haverkort, Efficient computation of time-bounded reachability probabilities in uniform continuous-time Markov decision processes, Theor Comput Sci, 345 (2005) 2-26.
[19]
Buchholz P, Hahn EM, Hermanns H, Zhang L. Model checking algorithms for CTMDPs. In: Computer aided verification-23rd international conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings; 2011. p. 225-242.
[20]
J. Fearnley, M.N. Rabe, S. Schewe, L. Zhang, Efficient approximation of optimal control for continuous-time Markov games, Inf Comput, 247 (2016) 106-129.
[21]
R. Gallager, Stochastic processes theory for applications, Cambridge University Press, Cambridge, 2014.
[22]
R.F. Serfozo, An equivalence between continuous and discrete time Markov decision processes, Oper Res, 27 (1979) 616-620.
[23]
W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Cambridge University Press, Cambridge, 1992.
[24]
A. Gandhi, V. Gupta, M. Harchol-Balter, M.A. Kozuch, Optimality analysis of energy-performance trade-off for server farm management, Perform Eval, 67 (2010) 1155-1171.
  1. Optimal decisions for continuous time Markov decision processes over finite planning horizons

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Computers and Operations Research
    Computers and Operations Research  Volume 77, Issue C
    January 2017
    279 pages

    Publisher

    Elsevier Science Ltd.

    United Kingdom

    Publication History

    Published: 01 January 2017

    Author Tags

    1. Continuous time Markov decision process
    2. Finite horizon
    3. Numerical techniques
    4. Optimization
    5. Uniformization

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media