[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3618408.3619930guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Enforcing hard constraints with soft barriers: safe reinforcement learning in unknown stochastic environments

Published: 23 July 2023 Publication History

Abstract

Reinforcement Learning (RL) has long grappled with the issue of ensuring agent safety in unpredictable and stochastic environments, particularly under hard constraints that require the system state not to reach unsafe regions. Conventional safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation cost. In this work, we leverage the notion of barrier function to explicitly encode the hard safety chance constraints, and as the environment is unknown, relax them to our design of generative-model-based soft barrier functions. Based on such soft barriers, we propose a novel safe RL approach with bi-level optimization that can jointly learn the unknown environment and optimize the control policy, while effectively avoiding the unsafe region with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety chance constraints and significantly outperform CMDP-based baseline methods in system safe rates measured via simulations.

References

[1]
Agarwal, A., Kakade, S., Krishnamurthy, A., and Sun, W. Flambe: Structural complexity and representation learning of low rank mdps. Advances in neural information processing systems, 33:20095-20107, 2020a.
[2]
Agarwal, A., Kakade, S., and Yang, L. F. Model-based reinforcement learning with a generative model is minimax optimal. In Conference on Learning Theory, pp. 67-83. PMLR, 2020b.
[3]
Altman, E. Constrained Markov decision processes: stochastic modeling. Routledge, 1999.
[4]
Ames, A. D., Xu, X., Grizzle, J.W., and Tabuada, P. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control, 62(8):3861-3876, 2016.
[5]
As, Y., Usmanova, I., Curi, S., and Krause, A. Constrained policy optimization via bayesian world models. In International Conference on Learning Representations, 2021.
[6]
Bai, Q., Bedi, A. S., Agarwal, M., Koppel, A., and Aggarwal, V. Achieving zero constraint violation for constrained reinforcement learning via primal-dual approach. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 3682-3689, 2022.
[7]
Bastani, O., Li, S., and Xu, A. Safe reinforcement learning via statistical model predictive shielding. In Robotics: Science and Systems, 2021.
[8]
Berkenkamp, F., Turchetta, M., Schoellig, A., and Krause, A. Safe model-based reinforcement learning with stability guarantees. Advances in neural information processing systems, 30, 2017.
[9]
Bharadhwaj, H., Kumar, A., Rhinehart, N., Levine, S., Shkurti, F., and Garg, A. Conservative safety critics for exploration. In International Conference on Learning Representations.
[10]
Chang, Y.-C., Roohi, N., and Gao, S. Neural lyapunov control. Advances in neural information processing systems, 32, 2019.
[11]
Cheng, R., Orosz, G., Murray, R. M., and Burdick, J. W. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 3387-3395, 2019.
[12]
Choi, J., Castaneda, F., Tomlin, C. J., and Sreenath, K. Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions. arXiv preprint arXiv:2004.07584, 2020.
[13]
Chow, Y., Ghavamzadeh, M., Janson, L., and Pavone, M. Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070-6120, 2017.
[14]
Chow, Y., Nachum, O., Duenez-Guzman, E., and Ghavamzadeh, M. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018.
[15]
Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., and Tassa, Y. Safe exploration in continuous action spaces. arXiv preprint arXiv:1801.08757, 2018.
[16]
Dawson, C., Qin, Z., Gao, S., and Fan, C. Safe nonlinear control using robust neural lyapunov-barrier functions. In Conference on Robot Learning, pp. 1724-1735. PMLR, 2022.
[17]
Deng, R., Brubaker, M. A., Mori, G., and Lehrmann, A. Continuous latent process flows. Advances in Neural Information Processing Systems, 34:5162-5173, 2021.
[18]
Ding, D., Wei, X., Yang, Z., Wang, Z., and Jovanovic, M. Provably efficient safe exploration via primal-dual policy optimization. In International Conference on Artificial Intelligence and Statistics, pp. 3304-3312. PMLR, 2021.
[19]
Emam, Y., Glotfelter, P., Kira, Z., and Egerstedt, M. Safe model-based reinforcement learning using robust control barrier functions. arXiv preprint arXiv:2110.05415, 2021.
[20]
Fan, J., Huang, C., Chen, X., Li, W., and Zhu, Q. Reachnn*: A tool for reachability analysis of neural-network controlled systems. In Hung, D. V. and Sokolsky, O. (eds.), Automated Technology for Verification and Analysis, pp. 537-542, Cham, 2020. Springer International Publishing. ISBN 978-3-030-59152-6.
[21]
HasanzadeZonuzy, A., Bura, A., Kalathil, D., and Shakkottai, S. Learning with safety constraints: Sample complexity of reinforcement learning for constrained mdps. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 7667-7674, 2021.
[22]
Huang, C., Fan, J., Li, W., Chen, X., and Zhu, Q. Reachnn: Reachability analysis of neural-network controlled systems. ACM Transactions on Embedded Computing Systems (TECS), 18(5s):1-22, 2019.
[23]
Huang, C., Xu, S., Wang, Z., Lan, S., Li, W., and Zhu, Q. Opportunistic intermittent control with safety guarantees for autonomous systems. Design Automation Conference (DAC'20), 2020.
[24]
Huang, C., Fan, J., Chen, X., Li, W., and Zhu, Q. Polar: A polynomial arithmetic framework for verifying neural-network controlled systems. In Automated Technology for Verification and Analysis. Springer International Publishing, 2022.
[25]
Jin, W., Mou, S., and Pappas, G. J. Safe pontryagin differentiable programming. Advances in Neural Information Processing Systems, 34:16034-16050, 2021.
[26]
Li, G., Wei, Y., Chi, Y., Gu, Y., and Chen, Y. Breaking the sample size barrier in model-based reinforcement learning with a generative model. Advances in neural information processing systems, 33:12861-12872, 2020a.
[27]
Li, X., Wong, T.-K. L., Chen, R. T. Q., and Duvenaud, D. Scalable gradients for stochastic differential equations. International Conference on Artificial Intelligence and Statistics, 2020b.
[28]
Lindemann, L., Hu, H., Robey, A., Zhang, H., Dimarogonas, D., Tu, S., and Matni, N. Learning hybrid control barrier functions from data. In Conference on Robot Learning, pp. 1351-1370. PMLR, 2021.
[29]
Liu, X., Huang, C., Wang, Y., Zheng, B., and Zhu, Q. Physics-aware safety-assured design of hierarchical neural network based planner. In 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS), pp. 137-146. IEEE, 2022.
[30]
Liu, X., Jiao, R., Wang, Y., Han, Y., Zheng, B., and Zhu, Q. Safety-assured speculative planning with adaptive prediction. 2023a.
[31]
Liu, X., Jiao, R., Zheng, B., Liang, D., and Zhu, Q. Safety-driven interactive planning for neural network-based lane changing. In Proceedings of the 28th Asia and South Pacific Design Automation Conference, pp. 39-45, 2023b.
[32]
Luo, Y. and Ma, T. Learning barrier certificates: Towards safe reinforcement learning with zero training-time violations. Advances in Neural Information Processing Systems, 34:25621-25632, 2021.
[33]
Ma, H., Chen, J., Eben, S., Lin, Z., Guan, Y., Ren, Y., and Zheng, S. Model-based constrained reinforcement learning using generalized control barrier function. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4552-4559. IEEE, 2021.
[34]
Maeda, S.-i., Watahiki, H., Ouyang, Y., Okada, S., Koyama, M., and Nagarajan, P. Reconnaissance for reinforcement learning with safety constraints. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 567-582. Springer, 2021.
[35]
Moldovan, T. M. and Abbeel, P. Safe exploration in markov decision processes. arXiv preprint arXiv:1205.4810, 2012.
[36]
Parmar, M., Halm, M., and Posa, M. Fundamental challenges in deep learning for stiff contact dynamics. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5181-5188. IEEE, 2021.
[37]
Pfrommer, S., Halm, M., and Posa, M. Contactnets: Learning discontinuous contact dynamics with smooth, implicit representations. In Conference on Robot Learning, pp. 2279-2291. PMLR, 2021.
[38]
Prajna, S. and Jadbabaie, A. Safety verification of hybrid systems using barrier certificates. In International Workshop on Hybrid Systems: Computation and Control, pp. 477-492. Springer, 2004.
[39]
Prajna, S., Jadbabaie, A., and Pappas, G. J. Stochastic safety verification using barrier certificates. In 2004 43rd IEEE conference on decision and control (CDC)(IEEE Cat. No. 04CH37601), volume 1, pp. 929-934. IEEE, 2004.
[40]
Qin, Z., Zhang, K., Chen, Y., Chen, J., and Fan, C. Learning safe multi-agent control with decentralized neural barrier certificates. arXiv preprint arXiv:2101.05436, 2021.
[41]
Ray, A., Achiam, J., and Amodei, D. Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7:1, 2019.
[42]
Shao, Y. S., Chen, C., Kousik, S., and Vasudevan, R. Reachability-based trajectory safeguard (rts): A safe and fast reinforcement learning safety layer for continuous control. IEEE Robotics and Automation Letters, 6(2): 3663-3670, 2021.
[43]
Shen, L., Yang, L., Chen, S., Yuan, B., Wang, X., Tao, D., et al. Penalized proximal policy optimization for safe reinforcement learning. arXiv preprint arXiv:2205.11814, 2022.
[44]
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140-1144, 2018.
[45]
Stooke, A., Achiam, J., and Abbeel, P. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pp. 9133- 9143. PMLR, 2020.
[46]
Taylor, A., Singletary, A., Yue, Y., and Ames, A. Learning for safety-critical control with control barrier functions. In Learning for Dynamics and Control, pp. 708-717. PMLR, 2020.
[47]
Tirinzoni, A., Poiani, R., and Restelli, M. Sequential transfer in reinforcement learning with a generative model. In International Conference on Machine Learning, pp. 9481- 9492. PMLR, 2020.
[48]
Turchetta, M., Berkenkamp, F., and Krause, A. Safe exploration in finite markov decision processes with gaussian processes. Advances in Neural Information Processing Systems, 29, 2016.
[49]
Wachi, A., Sui, Y., Yue, Y., and Ono, M. Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[50]
Wagener, N. C., Boots, B., and Cheng, C.-A. Safe reinforcement learning using advantage-based intervention. In International Conference on Machine Learning, pp. 10630-10640. PMLR, 2021.
[51]
Wang, Y., Huang, C., and Zhu, Q. Energy-efficient control adaptation with safety guarantees for learning-enabled cyber-physical systems. In Proceedings of the 39th International Conference on Computer-Aided Design, pp. 1-9, 2020.
[52]
Wang, Y., Huang, C., Wang, Z., Xu, S., Wang, Z., and Zhu, Q. Cocktail: Learn a better neural network controller from multiple experts via adaptive mixing and robust distillation. In 58th ACM/IEEE Design Automation Conference, DAC 2021, San Francisco, CA, USA, December 5-9, 2021, pp. 397-402. IEEE, 2021a.
[53]
Wang, Y., Huang, C., Wang, Z., Wang, Z., and Zhu, Q. Design-while-verify: Correct-by-construction control learning with verification in the loop. In 59th ACM/IEEE Design Automation Conference, DAC 2022, San Francisco, CA, USA, July 10-14, 2022, 2022.
[54]
Wang, Y., Zhan, S., Wang, Z., Huang, C., Wang, Z., Yang, Z., and Zhu, Q. Joint differentiable optimization and verification for certified reinforcement learning. International Conference on Cyber-Physical Systems, 2023.
[55]
Wang, Z., Huang, C., Kim, H., Li, W., and Zhu, Q. Cross-layer adaptation with safety-assured proactive task job skipping. ACM Trans. Embed. Comput. Syst., 20(5s), sep 2021b. ISSN 1539-9087.
[56]
Wang, Z., Liang, H., Huang, C., and Zhu, Q. Cross-layer design of automotive systems. IEEE Design Test, 38(5): 8-16, 2021c.
[57]
Wei, T., Wang, Y., and Zhu, Q. Deep reinforcement learning for building hvac control. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1-6, June 2017.
[58]
Xu, S., Fu, Y., Wang, Y., O'Neill, Z., and Zhu, Q. Learning-based framework for sensor fault-tolerant building hvac control with model-assisted learning. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys '21, pp. 1-10, New York, NY, USA, 2021a. Association for Computing Machinery. ISBN 9781450391146.
[59]
Xu, S., Fu, Y., Wang, Y., Yang, Z., O'Neill, Z., Wang, Z., and Zhu, Q. Accelerate online reinforcement learning for building hvac control with heterogeneous expert guidances. In Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys '22, pp. 89-98, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450398909.
[60]
Xu, T., Liang, Y., and Lan, G. Crpo: A new approach for safe reinforcement learning with convergence guarantee. In International Conference on Machine Learning, pp. 11480-11491. PMLR, 2021b.
[61]
Yang, L., Huang, B., Li, Q., Tsai, Y.-Y., Lee, W. W., Song, C., and Pan, J. Tacgnn: Learning tactile-based in-hand manipulation with a blind robot using hierarchical graph neural network. IEEE Robotics and Automation Letters, 8(6):3605-3612, 2023.
[62]
Yang, Q., Sim~ao, T. D., Tindemans, S. H., and Spaan, M. T. Wcsac: Worst-case soft actor critic for safety-constrained reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 10639-10646, 2021.
[63]
Yang, Z., Huang, C., Chen, X., Lin, W., and Liu, Z. A linear programming relaxation based approach for generating barrier certificates of hybrid systems. In International Symposium on Formal Methods, pp. 721-738. Springer, 2016.
[64]
Yu, D., Ma, H., Li, S., and Chen, J. Reachability constrained reinforcement learning. In International Conference on Machine Learning, pp. 25636-25655. PMLR, 2022.
[65]
Zhang, Y., Vuong, Q., and Ross, K. First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 33:15338-15349, 2020.
[66]
Zhao, W., Queralta, J. P., and Westerlund, T. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 737-744. IEEE, 2020.
[67]
Zhu, Q., Li, W., Kim, H., Xiang, Y., Wardega, K., Wang, Z., Wang, Y., Liang, H., Huang, C., Fan, J., and Choi, H. Know the unknowns: Addressing disturbances and uncertainties in autonomous systems. In Proceedings of the 39th International Conference on Computer-Aided Design, ICCAD '20, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450380263.
[68]
Zhu, Q., Huang, C., Jiao, R., Lan, S., Liang, H., Liu, X., Wang, Y., Wang, Z., and Xu, S. Safety-assured design and adaptation of learning-enabled autonomous systems. In Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASPDAC '21, pp. 753-760, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450379991.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'23: Proceedings of the 40th International Conference on Machine Learning
July 2023
43479 pages

Publisher

JMLR.org

Publication History

Published: 23 July 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media