Abstract
The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.
This work has been supported by the ERC Advanced Grant 787914 (FRAPPANT) the DFG RTG 2236 ‘UnRAVeL’, NSF grants 1545126 (VeHICaL) and 1646208, the DARPA Assured Autonomy program, Berkeley Deep Drive, and by Toyota under the iCyPhy center.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
More general observation functions can be efficiently encoded in this formalism [11].
- 2.
The implementation discussed in Sect. 5 supports all these combinations.
- 3.
In the formula, we use Iverson brackets: \([ x ]=1\) if x is true and 0 otherwise.
- 4.
In general, the set of states of the belief MDP is uncountable. However, a given belief state \(\textit{\textbf{b}}\) only has a finite number of successors for each action \(\alpha \), i.e. \( post ^{ bel (M)}(\textit{\textbf{b}},\alpha )\) is finite, and thus the belief MDP is countably infinite. Acyclic POMDPs always give rise to finite belief MDPs (but may be exponentially large).
- 5.
The implementation actually still connects \(\textit{\textbf{b}}\) with already explored successors and only redirects the ‘missing’ probabilities w.r.t. \(U_{}({\textit{\textbf{b}}'})\), \(\textit{\textbf{b}}'\in post ^{ db _{\mathcal {F}}(\mathcal {M})}(s,\alpha ) \setminus S_ expl \).
- 6.
We guess policies in \(\varSigma ^{\mathcal {M}}_\text {obs}\) by distributing over actions of optimal policies for MDP \(M\).
- 7.
\(\rho _ gap \) is set to 0.1 initially and after each iteration we update it to \(\rho _ gap /4\).
- 8.
\(\rho _ step \) is set to \(\infty \) initially and after each iteration we update it to \(4 \cdot |S^\mathcal {A}|\).
- 9.
A policy \(\sigma \) is \(\rho _{\varSigma ^{}}\)-optimal if \(\forall \textit{\textbf{b}}:V_{\sigma (\textit{\textbf{b}})}({\textit{\textbf{b}}}) + \rho _{\varSigma ^{}} \ge V_{}({\textit{\textbf{b}}})\). We set \(\rho _{\varSigma ^{}} = 0.001\).
- 10.
In refinement step i, we explore \(2^{i-1} \cdot |S| \cdot \max _{z\in Z}| O^{-1}(z)|\) states.
- 11.
Storm uses one core, Prism uses four cores in garbage collection only.
References
Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agent. Multi-Agent Syst. 21(3), 293–320 (2010)
Ashok, P., Butkova, Y., Hermanns, H., Křetínský, J.: Continuous-time Markov decisions based on partial exploration. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 317–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_19
Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge (2008)
Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: IJCAI, pp. 1641–1646 (2009)
Bork, A., Junges, S., Katoen, J.P., Quatmann, T.: Experiments for ‘Verification of indefinite- horizon POMDPs’. https://doi.org/10.5281/zenodo.3924577
Bork, A., Junges, S., Katoen, J.P., Quatmann, T.: Verification of indefinite-horizon POMDPs. CoRR abs/2007.00102 (2020)
Bouton, M., Tumova, J., Kochenderfer, M.J.: Point-based methods for model checking in partially observable Markov decision processes. In: AAAI, pp. 10061–10068. AAAI Press (2020)
Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI. pp. 690–696. AAAI Press / The MIT Press (2004)
Černý, P., Chatterjee, K., Henzinger, T.A., Radhakrishna, A., Singh, R.: Quantitative synthesis for concurrent programs. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 243–259. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_20
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In: ICRA. pp. 325–330. IEEE (2015)
Freudenthal, H.: Simplizialzerlegungen von beschrankter Flachheit. Ann. Math. 43(3), 580–582 (1942)
Hansen, E.A.: Solving POMDPs by searching in policy space. In: UAI, pp. 211–219. Morgan Kaufmann (1998)
Hartmanns, A., Hermanns, H.: The Modest Toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51
Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S., Wang, C. (eds.) Computer Aided Verification. CAV 2020. LNCS, vol. 12225. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_26
Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker Storm. CoRR abs/2002.07080 (2020)
Horák, K., Bosanský, B., Chatterjee, K.: Goal-HSVI: heuristic search value iteration for goal POMDPs. In: IJCAI, pp. 4764–4770. ijcai.org (2018)
Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J.-P., Westhofen, L.: Bounded model checking for probabilistic programs. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 68–85. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_5
Junges, S., et al.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI, pp. 519–529. AUAI Press (2018)
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Kochenderfer, M.J.: Decision Making Under Uncertainty. The MIT Press, Cambridge (2015)
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems. The MIT Press (2008)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. 39(1), 162–175 (1991)
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
Meuleau, N., Kim, K., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by searching the space of finite policies. In: UAI, pp. 417–426. Morgan Kaufmann (1999)
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real-Time Syst. 53(3), 354–402 (2017)
Pajarinen, J., Peltonen, J.: Periodic finite state controllers for efficient POMDP and DEC-POMDP planning. In: NIPS, pp. 2636–2644 (2011)
Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032. Morgan Kaufmann (2003)
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education (2010)
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agents Multi Agent Syst. 27(1), 1–51 (2013)
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. The MIT Press, Cambridge (2005)
Volk, M., Junges, S., Katoen, J.P.: Fast dynamic fault tree analysis by model checking techniques. IEEE Trans. Ind. Inform. 14(1), 370–379 (2018)
Walraven, E., Spaan, M.T.J.: Point-based value iteration for finite-horizon POMDPs. J. Artif. Intell. Res. 65, 307–341 (2019)
Winterer, L., et al.: Motion planning under partial observability using game-based abstraction. In: CDC, pp. 2201–2208. IEEE (2017)
Wongpiromsarn, T., Frazzoli, E.: Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications. In: CDC, pp. 7644–7651. IEEE (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bork, A., Junges, S., Katoen, JP., Quatmann, T. (2020). Verification of Indefinite-Horizon POMDPs. In: Hung, D.V., Sokolsky, O. (eds) Automated Technology for Verification and Analysis. ATVA 2020. Lecture Notes in Computer Science(), vol 12302. Springer, Cham. https://doi.org/10.1007/978-3-030-59152-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-59152-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59151-9
Online ISBN: 978-3-030-59152-6
eBook Packages: Computer ScienceComputer Science (R0)