Verification of Indefinite-Horizon POMDPs

Alexander Bork¹⁰,
Sebastian Junges¹¹,
Joost-Pieter Katoen¹⁰ &
…
Tim Quatmann¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12302))

Included in the following conference series:

International Symposium on Automated Technology for Verification and Analysis

1372 Accesses

Abstract

The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.

This work has been supported by the ERC Advanced Grant 787914 (FRAPPANT) the DFG RTG 2236 ‘UnRAVeL’, NSF grants 1545126 (VeHICaL) and 1646208, the DARPA Assured Autonomy program, Berkeley Deep Drive, and by Toyota under the iCyPhy center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics

What Should Be Observed for Optimal Reward in POMDPs?

Enforcing Almost-Sure Reachability in POMDPs

Notes

1.
More general observation functions can be efficiently encoded in this formalism [11].
2.
The implementation discussed in Sect. 5 supports all these combinations.
3.
In the formula, we use Iverson brackets: \([ x ]=1\) if x is true and 0 otherwise.
4.
In general, the set of states of the belief MDP is uncountable. However, a given belief state \(\textit{\textbf{b}}\) only has a finite number of successors for each action \(\alpha \), i.e. \( post ^{ bel (M)}(\textit{\textbf{b}},\alpha )\) is finite, and thus the belief MDP is countably infinite. Acyclic POMDPs always give rise to finite belief MDPs (but may be exponentially large).
5.
The implementation actually still connects \(\textit{\textbf{b}}\) with already explored successors and only redirects the ‘missing’ probabilities w.r.t. \(U_{}({\textit{\textbf{b}}'})\), \(\textit{\textbf{b}}'\in post ^{ db _{\mathcal {F}}(\mathcal {M})}(s,\alpha ) \setminus S_ expl \).
6.
We guess policies in \(\varSigma ^{\mathcal {M}}_\text {obs}\) by distributing over actions of optimal policies for MDP \(M\).
7.
\(\rho _ gap \) is set to 0.1 initially and after each iteration we update it to \(\rho _ gap /4\).
8.
\(\rho _ step \) is set to \(\infty \) initially and after each iteration we update it to \(4 \cdot |S^\mathcal {A}|\).
9.
A policy \(\sigma \) is \(\rho _{\varSigma ^{}}\)-optimal if \(\forall \textit{\textbf{b}}:V_{\sigma (\textit{\textbf{b}})}({\textit{\textbf{b}}}) + \rho _{\varSigma ^{}} \ge V_{}({\textit{\textbf{b}}})\). We set \(\rho _{\varSigma ^{}} = 0.001\).
10.
In refinement step i, we explore \(2^{i-1} \cdot |S| \cdot \max _{z\in Z}| O^{-1}(z)|\) states.
11.
Storm uses one core, Prism uses four cores in garbage collection only.

References

Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agent. Multi-Agent Syst. 21(3), 293–320 (2010)
Article Google Scholar
Ashok, P., Butkova, Y., Hermanns, H., Křetínský, J.: Continuous-time Markov decisions based on partial exploration. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 317–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_19
Chapter Google Scholar
Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge (2008)
MATH Google Scholar
Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: IJCAI, pp. 1641–1646 (2009)
Google Scholar
Bork, A., Junges, S., Katoen, J.P., Quatmann, T.: Experiments for ‘Verification of indefinite- horizon POMDPs’. https://doi.org/10.5281/zenodo.3924577
Bork, A., Junges, S., Katoen, J.P., Quatmann, T.: Verification of indefinite-horizon POMDPs. CoRR abs/2007.00102 (2020)
Google Scholar
Bouton, M., Tumova, J., Kochenderfer, M.J.: Point-based methods for model checking in partially observable Markov decision processes. In: AAAI, pp. 10061–10068. AAAI Press (2020)
Google Scholar
Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8
Chapter Google Scholar
Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI. pp. 690–696. AAAI Press / The MIT Press (2004)
Google Scholar
Černý, P., Chatterjee, K., Henzinger, T.A., Radhakrishna, A., Singh, R.: Quantitative synthesis for concurrent programs. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 243–259. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_20
Chapter Google Scholar
Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In: ICRA. pp. 325–330. IEEE (2015)
Google Scholar
Freudenthal, H.: Simplizialzerlegungen von beschrankter Flachheit. Ann. Math. 43(3), 580–582 (1942)
Article MathSciNet Google Scholar
Hansen, E.A.: Solving POMDPs by searching in policy space. In: UAI, pp. 211–219. Morgan Kaufmann (1998)
Google Scholar
Hartmanns, A., Hermanns, H.: The Modest Toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51
Chapter Google Scholar
Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S., Wang, C. (eds.) Computer Aided Verification. CAV 2020. LNCS, vol. 12225. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_26
Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker Storm. CoRR abs/2002.07080 (2020)
Google Scholar
Horák, K., Bosanský, B., Chatterjee, K.: Goal-HSVI: heuristic search value iteration for goal POMDPs. In: IJCAI, pp. 4764–4770. ijcai.org (2018)
Google Scholar
Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J.-P., Westhofen, L.: Bounded model checking for probabilistic programs. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 68–85. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_5
Chapter MATH Google Scholar
Junges, S., et al.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI, pp. 519–529. AUAI Press (2018)
Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet Google Scholar
Kochenderfer, M.J.: Decision Making Under Uncertainty. The MIT Press, Cambridge (2015)
Book Google Scholar
Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems. The MIT Press (2008)
Google Scholar
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47
Chapter Google Scholar
Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. 39(1), 162–175 (1991)
Article MathSciNet Google Scholar
Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)
Article MathSciNet Google Scholar
Meuleau, N., Kim, K., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by searching the space of finite policies. In: UAI, pp. 417–426. Morgan Kaufmann (1999)
Google Scholar
Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real-Time Syst. 53(3), 354–402 (2017)
Article Google Scholar
Pajarinen, J., Peltonen, J.: Periodic finite state controllers for efficient POMDP and DEC-POMDP planning. In: NIPS, pp. 2636–2644 (2011)
Google Scholar
Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032. Morgan Kaufmann (2003)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education (2010)
Google Scholar
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agents Multi Agent Syst. 27(1), 1–51 (2013)
Article Google Scholar
Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. The MIT Press, Cambridge (2005)
MATH Google Scholar
Volk, M., Junges, S., Katoen, J.P.: Fast dynamic fault tree analysis by model checking techniques. IEEE Trans. Ind. Inform. 14(1), 370–379 (2018)
Article Google Scholar
Walraven, E., Spaan, M.T.J.: Point-based value iteration for finite-horizon POMDPs. J. Artif. Intell. Res. 65, 307–341 (2019)
Article MathSciNet Google Scholar
Winterer, L., et al.: Motion planning under partial observability using game-based abstraction. In: CDC, pp. 2201–2208. IEEE (2017)
Google Scholar
Wongpiromsarn, T., Frazzoli, E.: Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications. In: CDC, pp. 7644–7651. IEEE (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Alexander Bork, Joost-Pieter Katoen & Tim Quatmann
University of California, Berkeley, USA
Sebastian Junges

Authors

Alexander Bork
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Junges
View author publications
You can also search for this author in PubMed Google Scholar
Joost-Pieter Katoen
View author publications
You can also search for this author in PubMed Google Scholar
Tim Quatmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Junges .

Editor information

Editors and Affiliations

Vietnam National University (VNU-UET), Hanoi, Vietnam
Dang Van Hung
University of Pennsylvania, Philadelphia, PA, USA
Oleg Sokolsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bork, A., Junges, S., Katoen, JP., Quatmann, T. (2020). Verification of Indefinite-Horizon POMDPs. In: Hung, D.V., Sokolsky, O. (eds) Automated Technology for Verification and Analysis. ATVA 2020. Lecture Notes in Computer Science(), vol 12302. Springer, Cham. https://doi.org/10.1007/978-3-030-59152-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-59152-6_16
Published: 12 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59151-9
Online ISBN: 978-3-030-59152-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics