[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Verification of Indefinite-Horizon POMDPs

  • Conference paper
  • First Online:
Automated Technology for Verification and Analysis (ATVA 2020)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12302))

  • 1372 Accesses

Abstract

The verification problem in MDPs asks whether, for any policy resolving the nondeterminism, the probability that something bad happens is bounded by some given threshold. This verification problem is often overly pessimistic, as the policies it considers may depend on the complete system state. This paper considers the verification problem for partially observable MDPs, in which the policies make their decisions based on (the history of) the observations emitted by the system. We present an abstraction-refinement framework extending previous instantiations of the Lovejoy-approach. Our experiments show that this framework significantly improves the scalability of the approach.

This work has been supported by the ERC Advanced Grant 787914 (FRAPPANT) the DFG RTG 2236 ‘UnRAVeL’, NSF grants 1545126 (VeHICaL) and 1646208, the DARPA Assured Autonomy program, Berkeley Deep Drive, and by Toyota under the iCyPhy center.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    More general observation functions can be efficiently encoded in this formalism  [11].

  2. 2.

    The implementation discussed in Sect. 5 supports all these combinations.

  3. 3.

    In the formula, we use Iverson brackets: \([ x ]=1\) if x is true and 0 otherwise.

  4. 4.

    In general, the set of states of the belief MDP is uncountable. However, a given belief state \(\textit{\textbf{b}}\) only has a finite number of successors for each action \(\alpha \), i.e. \( post ^{ bel (M)}(\textit{\textbf{b}},\alpha )\) is finite, and thus the belief MDP is countably infinite. Acyclic POMDPs always give rise to finite belief MDPs (but may be exponentially large).

  5. 5.

    The implementation actually still connects \(\textit{\textbf{b}}\) with already explored successors and only redirects the ‘missing’ probabilities w.r.t. \(U_{}({\textit{\textbf{b}}'})\), \(\textit{\textbf{b}}'\in post ^{ db _{\mathcal {F}}(\mathcal {M})}(s,\alpha ) \setminus S_ expl \).

  6. 6.

    We guess policies in \(\varSigma ^{\mathcal {M}}_\text {obs}\) by distributing over actions of optimal policies for MDP \(M\).

  7. 7.

    \(\rho _ gap \) is set to 0.1 initially and after each iteration we update it to \(\rho _ gap /4\).

  8. 8.

    \(\rho _ step \) is set to \(\infty \) initially and after each iteration we update it to \(4 \cdot |S^\mathcal {A}|\).

  9. 9.

    A policy \(\sigma \) is \(\rho _{\varSigma ^{}}\)-optimal if \(\forall \textit{\textbf{b}}:V_{\sigma (\textit{\textbf{b}})}({\textit{\textbf{b}}}) + \rho _{\varSigma ^{}} \ge V_{}({\textit{\textbf{b}}})\). We set \(\rho _{\varSigma ^{}} = 0.001\).

  10. 10.

    In refinement step i, we explore \(2^{i-1} \cdot |S| \cdot \max _{z\in Z}| O^{-1}(z)|\) states.

  11. 11.

    Storm uses one core, Prism uses four cores in garbage collection only.

References

  1. Amato, C., Bernstein, D.S., Zilberstein, S.: Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Auton. Agent. Multi-Agent Syst. 21(3), 293–320 (2010)

    Article  Google Scholar 

  2. Ashok, P., Butkova, Y., Hermanns, H., Křetínský, J.: Continuous-time Markov decisions based on partial exploration. In: Lahiri, S.K., Wang, C. (eds.) ATVA 2018. LNCS, vol. 11138, pp. 317–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01090-4_19

    Chapter  Google Scholar 

  3. Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  4. Bonet, B., Geffner, H.: Solving POMDPs: RTDP-Bel vs. point-based algorithms. In: IJCAI, pp. 1641–1646 (2009)

    Google Scholar 

  5. Bork, A., Junges, S., Katoen, J.P., Quatmann, T.: Experiments for ‘Verification of indefinite- horizon POMDPs’. https://doi.org/10.5281/zenodo.3924577

  6. Bork, A., Junges, S., Katoen, J.P., Quatmann, T.: Verification of indefinite-horizon POMDPs. CoRR abs/2007.00102 (2020)

    Google Scholar 

  7. Bouton, M., Tumova, J., Kochenderfer, M.J.: Point-based methods for model checking in partially observable Markov decision processes. In: AAAI, pp. 10061–10068. AAAI Press (2020)

    Google Scholar 

  8. Brázdil, T., et al.: Verification of Markov decision processes using learning algorithms. In: Cassez, F., Raskin, J.-F. (eds.) ATVA 2014. LNCS, vol. 8837, pp. 98–114. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11936-6_8

    Chapter  Google Scholar 

  9. Braziunas, D., Boutilier, C.: Stochastic local search for POMDP controllers. In: AAAI. pp. 690–696. AAAI Press / The MIT Press (2004)

    Google Scholar 

  10. Černý, P., Chatterjee, K., Henzinger, T.A., Radhakrishna, A., Singh, R.: Quantitative synthesis for concurrent programs. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 243–259. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_20

    Chapter  Google Scholar 

  11. Chatterjee, K., Chmelik, M., Gupta, R., Kanodia, A.: Qualitative analysis of POMDPs with temporal logic specifications for robotics applications. In: ICRA. pp. 325–330. IEEE (2015)

    Google Scholar 

  12. Freudenthal, H.: Simplizialzerlegungen von beschrankter Flachheit. Ann. Math. 43(3), 580–582 (1942)

    Article  MathSciNet  Google Scholar 

  13. Hansen, E.A.: Solving POMDPs by searching in policy space. In: UAI, pp. 211–219. Morgan Kaufmann (1998)

    Google Scholar 

  14. Hartmanns, A., Hermanns, H.: The Modest Toolset: an integrated environment for quantitative modelling and verification. In: Ábrahám, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8_51

    Chapter  Google Scholar 

  15. Hartmanns, A., Kaminski, B.L.: Optimistic value iteration. In: Lahiri, S., Wang, C. (eds.) Computer Aided Verification. CAV 2020. LNCS, vol. 12225. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53291-8_26

  16. Hensel, C., Junges, S., Katoen, J.P., Quatmann, T., Volk, M.: The probabilistic model checker Storm. CoRR abs/2002.07080 (2020)

    Google Scholar 

  17. Horák, K., Bosanský, B., Chatterjee, K.: Goal-HSVI: heuristic search value iteration for goal POMDPs. In: IJCAI, pp. 4764–4770. ijcai.org (2018)

    Google Scholar 

  18. Jansen, N., Dehnert, C., Kaminski, B.L., Katoen, J.-P., Westhofen, L.: Bounded model checking for probabilistic programs. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 68–85. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_5

    Chapter  MATH  Google Scholar 

  19. Junges, S., et al.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI, pp. 519–529. AUAI Press (2018)

    Google Scholar 

  20. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)

    Article  MathSciNet  Google Scholar 

  21. Kochenderfer, M.J.: Decision Making Under Uncertainty. The MIT Press, Cambridge (2015)

    Book  Google Scholar 

  22. Kurniawati, H., Hsu, D., Lee, W.S.: SARSOP: efficient point-based POMDP planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems. The MIT Press (2008)

    Google Scholar 

  23. Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_47

    Chapter  Google Scholar 

  24. Lovejoy, W.S.: Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. 39(1), 162–175 (1991)

    Article  MathSciNet  Google Scholar 

  25. Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147(1–2), 5–34 (2003)

    Article  MathSciNet  Google Scholar 

  26. Meuleau, N., Kim, K., Kaelbling, L.P., Cassandra, A.R.: Solving POMDPs by searching the space of finite policies. In: UAI, pp. 417–426. Morgan Kaufmann (1999)

    Google Scholar 

  27. Norman, G., Parker, D., Zou, X.: Verification and control of partially observable probabilistic systems. Real-Time Syst. 53(3), 354–402 (2017)

    Article  Google Scholar 

  28. Pajarinen, J., Peltonen, J.: Periodic finite state controllers for efficient POMDP and DEC-POMDP planning. In: NIPS, pp. 2636–2644 (2011)

    Google Scholar 

  29. Pineau, J., Gordon, G.J., Thrun, S.: Point-based value iteration: an anytime algorithm for POMDPs. In: IJCAI, pp. 1025–1032. Morgan Kaufmann (2003)

    Google Scholar 

  30. Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education (2010)

    Google Scholar 

  31. Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agents Multi Agent Syst. 27(1), 1–51 (2013)

    Article  Google Scholar 

  32. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. The MIT Press, Cambridge (2005)

    MATH  Google Scholar 

  33. Volk, M., Junges, S., Katoen, J.P.: Fast dynamic fault tree analysis by model checking techniques. IEEE Trans. Ind. Inform. 14(1), 370–379 (2018)

    Article  Google Scholar 

  34. Walraven, E., Spaan, M.T.J.: Point-based value iteration for finite-horizon POMDPs. J. Artif. Intell. Res. 65, 307–341 (2019)

    Article  MathSciNet  Google Scholar 

  35. Winterer, L., et al.: Motion planning under partial observability using game-based abstraction. In: CDC, pp. 2201–2208. IEEE (2017)

    Google Scholar 

  36. Wongpiromsarn, T., Frazzoli, E.: Control of probabilistic systems under dynamic, partially known environments with temporal logic specifications. In: CDC, pp. 7644–7651. IEEE (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Junges .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bork, A., Junges, S., Katoen, JP., Quatmann, T. (2020). Verification of Indefinite-Horizon POMDPs. In: Hung, D.V., Sokolsky, O. (eds) Automated Technology for Verification and Analysis. ATVA 2020. Lecture Notes in Computer Science(), vol 12302. Springer, Cham. https://doi.org/10.1007/978-3-030-59152-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59152-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59151-9

  • Online ISBN: 978-3-030-59152-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics