More Web Proxy on the site http://driver.im/

Article

Policy Synthesis and Reinforcement Learning for Discounted LTL

Authors:

Osbert Bastani,

Kishor Jothimurugan,

Ashutosh TrivediAuthors Info & Claims

Computer Aided Verification: 35th International Conference, CAV 2023, Paris, France, July 17–22, 2023, Proceedings, Part I

Pages 415 - 435

https://doi.org/10.1007/978-3-031-37706-8_21

Published: 17 July 2023 Publication History

Abstract

The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical.

References

[1]

Aksaray, D., Jones, A., Kong, Z., Schwager, M., Belta, C.: Q-learning for robust satisfaction of signal temporal logic specifications. In: Conference on Decision and Control (CDC), pp. 6565–6570. IEEE (2016)

[2]

Almagor, S., Boker, U., Kupferman, O.: Discounting in LTL. In: Ábrahám, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, pp. 424–439 (2014)

[3]

Alur, R., Bansal, S., Bastani, O., Jothimurugan, K.: A Framework for transforming specifications in reinforcement learning. In: Raskin, J.F., Chatterjee, K., Doyen, L., Majumdar, R. (eds.) Principles of Systems Design. LNCS, vol. 13660, pp. 604–624. Springer, Cham (2022).

[4]

Alur, R., Bastani, O., Jothimurugan, K., Perez, M., Somenzi, F., Trivedi, A.: Policy synthesis and reinforcement learning for discounted LTL. arXiv preprint arXiv:2305.17115 (2023)

[5]

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)

[6]

Ashok P et al. Dillig I, Tasiran S, et al. PAC statistical model checking for Markov decision processes and stochastic games Computer Aided Verification 2019 Cham Springer 497-519

[7]

Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press (2008)

[8]

Bozkurt, A.K., Wang, Y., Zavlanos, M.M., Pajic, M.: Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 10349–10355. IEEE (2020)

[9]

Brafman, R., De Giacomo, G., Patrizi, F.: LTLf/LDLf non-Markovian rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

[10]

Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A.: LTL and beyond: formal languages for reward function specification in reinforcement learning. In: IJCAI, vol. 19, pp. 6065–6073 (2019)

[11]

Courcoubetis C and Yannakakis M The complexity of probabilistic verification J. ACM 1995 42 4 857-907

[12]

Daca P, Henzinger TA, Kretinsky J, and Petrov T Faster statistical model checking for unbounded temporal properties ACM Trans. Comput. Logic (TOCL) 2017 18 2 1-25

[13]

De Alfaro, L.: Formal Verification of Probabilistic Systems. Stanford University (1998)

[14]

de Alfaro L, Faella M, Henzinger TA, Majumdar R, and Stoelinga M Jensen K and Podelski A Model checking discounted temporal properties Tools and Algorithms for the Construction and Analysis of Systems 2004 Heidelberg Springer 77-92

[15]

de Alfaro L, Henzinger TA, and Majumdar R Baeten JCM, Lenstra JK, Parrow J, and Woeginger GJ Discounting the future in systems theory Automata, Languages and Programming 2003 Heidelberg Springer 1022-1037

[16]

De Giacomo, G., Iocchi, L., Favorito, M., Patrizi, F.: Foundations for restraining bolts: reinforcement learning with LTLf/LDLf restraining specifications. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 29, pp. 128–136 (2019)

[17]

Fu, J., Topcu, U.: Probably approximately correct MDP learning and control with temporal logic constraints. arXiv preprint arXiv:1404.7073 (2014)

[18]

Hahn, E.M., Li, G., Schewe, S., Turrini, A., Zhang, L.: Lazy probabilistic model checking without determinisation. arXiv preprint arXiv:1311.2928 (2013)

[19]

Hahn EM, Perez M, Schewe S, Somenzi F, Trivedi A, and Wojtczak D Vojnar T and Zhang L Omega-regular objectives in model-free reinforcement learning Tools and Algorithms for the Construction and Analysis of Systems 2019 Cham Springer 395-412

[20]

Hahn EM, Perez M, Schewe S, Somenzi F, Trivedi A, and Wojtczak D Good-for-MDPs automata for probabilistic analysis and reinforcement learning Tools and Algorithms for the Construction and Analysis of Systems 2020 Cham Springer 306-323

[21]

Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G.J., Lee, I.: Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees. In: Conference on Decision and Control (CDC), pp. 5338–5343 (2019)

[22]

Hasanbeig, M., Abate, A., Kroening, D.: Logically-constrained reinforcement learning. arXiv preprint arXiv:1801.08099 (2018)

[23]

Hasanbeig, M., Kantaros, Y., Abate, A., Kroening, D., Pappas, G.J., Lee, I.: Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5338–5343. IEEE (2019)

[24]

Icarte, R.T., Klassen, T., Valenzano, R., McIlraith, S.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: International Conference on Machine Learning, pp. 2107–2116. PMLR (2018)

[25]

Jiang, Y., Bharadwaj, S., Wu, B., Shah, R., Topcu, U., Stone, P.: Temporal-logic-based reward shaping for continuing learning tasks (2020)

[26]

Jothimurugan, K., Alur, R., Bastani, O.: A composable specification language for reinforcement learning tasks. In: Advances in Neural Information Processing Systems, vol. 32, pp. 13041–13051 (2019)

[27]

Jothimurugan, K., Bansal, S., Bastani, O., Alur, R.: Compositional reinforcement learning from logical specifications. In: Advances in Neural Information Processing Systems (2021)

[28]

Jothimurugan, K., Bansal, S., Bastani, O., Alur, R.: Specification-guided learning of Nash equilibria with high social welfare. In: Shoham, S., Vizel, Y. (eds.) Computer Aided Verification, CAV 2022. LNCS, vol. 13372. Springer, Cham (2022).

[29]

Kakade, S.M.: On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom) (2003)

[30]

Kwiatkowska M, Norman G, and Parker D PRISM: probabilistic model checking for performance and reliability analysis ACM SIGMETRICS Perform. Eval. Rev. 2009 36 4 40-45

[31]

Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3834–3839. IEEE (2017)

[32]

Littman, M.L., Topcu, U., Fu, J., Isbell, C., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341 (2017)

[33]

Mandrali E Moreira N and Reis R Weighted LTL with discounting Implementation and Application of Automata 2012 Heidelberg Springer 353-360

[34]

Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley (2014)

[35]

Sadigh, D., Kim, E.S., Coogan, S., Sastry, S.S., Seshia, S.A.: A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications. In: 53rd IEEE Conference on Decision and Control, pp. 1091–1096. IEEE (2014)

[36]

Sickert S et al. Chaudhuri S, Farzan A, et al. Limit-deterministic Büchi automata for linear temporal logic Computer Aided Verification 2016 Cham Springer 312-332

[37]

Sickert S et al. Artho C, Legay A, Peled D, et al. MoChiBA: probabilistic LTL model checking using limit-deterministic Büchi automata Automated Technology for Verification and Analysis 2016 Cham Springer 130-137

[38]

Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 881–888 (2006)

[39]

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press (2018)

[40]

Vardi, M.Y.: Automatic verification of probabilistic concurrent finite state programs. In: 26th Annual Symposium on Foundations of Computer Science, SFCS 1985, pp. 327–338. IEEE (1985)

[41]

Wells, A.M., Lahijanian, M., Kavraki, L.E., Vardi, M.Y.: LTLf synthesis on probabilistic systems. arXiv preprint arXiv:2009.10883 (2020)

[42]

Xu, Z., Topcu, U.: Transfer of temporal logic formulas in reinforcement learning. In: International Joint Conference on Artificial Intelligence, pp. 4010–4018 (7 2019)

[43]

Yang, C., Littman, M., Carbin, M.: Reinforcement learning for general LTL objectives is intractable. arXiv preprint arXiv:2111.12679 (2021)

[44]

Yuan, L.Z., Hasanbeig, M., Abate, A., Kroening, D.: Modular deep reinforcement learning with temporal logic specifications. arXiv preprint arXiv:1909.11591 (2019)

Cited By

Svoboda JBansal SChatterjee KSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Reinforcement learning from reachability specificationsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693998(47331-47344)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693998

Index Terms

Policy Synthesis and Reinforcement Learning for Discounted LTL

Index terms have been assigned to the content through auto-classification.

Recommendations

On Policy Deep Reinforcement Learning — The Discounted and Average Reward Criteria
A Comparative Study of Undiscounted and Discounted Reinforcement Learning Methods
Formal Policy Synthesis for Continuous-State Systems via Reinforcement Learning
Integrated Formal Methods
Abstract
This paper studies satisfaction of temporal properties on unknown stochastic processes that have continuous state spaces. We show how reinforcement learning (RL) can be applied for computing policies that are finite-memory and deterministic using ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Aided Verification: 35th International Conference, CAV 2023, Paris, France, July 17–22, 2023, Proceedings, Part I

Jul 2023

511 pages

ISBN:978-3-031-37705-1

DOI:10.1007/978-3-031-37706-8

Editors:
Constantin Enea
https://ror.org/05hy3tk52LIX, Ecole Polytechnique, CNRS and Institut Polytechnique de Paris, Palaiseau, France
,
Akash Lal
https://ror.org/02w7f3w92Microsoft Research, Bangalore, India

© The Author(s) 2023.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 17 July 2023

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Svoboda JBansal SChatterjee KSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Reinforcement learning from reachability specificationsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693998(47331-47344)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693998

View Options

View options

Media

Figures

Other

Tables

View Table of Contents