Abstract
For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.
Mark R. Lembersky. "On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains." Ann. Statist. 2 (1) 159 - 169, January, 1974. https://doi.org/10.1214/aos/1176342621
Information