[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3576841.3585936acmconferencesArticle/Chapter ViewAbstractPublication PagesiccpsConference Proceedingsconference-collections
research-article
Open access

Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces

Published: 09 May 2023 Publication History

Abstract

Self-Preserving Genetic Algorithms (SPGA) combine the evolutionary strategy of a genetic algorithm with safety assurance methods commonly implemented in safe reinforcement learning (SRL), a branch of reinforcement learning (RL) that accounts for safety in the exploration and decision-making process of the agent. Safe learning approaches are especially important in safety-critical environments, where failure to account for the safety of the controlled system could result in the loss of millions of dollars in hardware or bodily harm to people working nearby, as is true of many cyber-physical systems. While SRL is a viable approach to safe learning, there are many challenges that must be taken into consideration when training agents, such as sample efficiency, stability, and exploration---an issue that is easily addressed by the evolutionary strategy of a genetic algorithm. By combining GAs with the safety mechanisms used with SRL, SPGA offers a safe learning alternative that is able to explore large areas of the solution space, addressing SRL's challenge of exploration. This work implements SPGA with both action masking and run time assurance safety strategies to evolve safe controllers for three types of discrete action space environments applicable to cyber physical systems (control, routing, and operations) and under various safety conditions. Training and testing evaluation metrics are compared with results from SRL trained controllers to validate results. SPGA and SRL controllers are trained across 5 random seeds and evaluated on 500 episodes to calculate average wall time to train, average expected return, and percentage of safe action evaluation metrics. SPGA achieves comparable reward and safety performance results with significantly improved training efficiency (55x faster on average), demonstrating the effectiveness of this safe learning approach.

References

[1]
Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[2]
Andrew G Barto, Richard S Sutton, and Charles W Anderson. 1983. Neuron-like adaptive elements that can solve difficult learning control problems. IEEE transactions on systems, man, and cybernetics 5 (1983), 834--846.
[3]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
[4]
Gabriel Dulac-Arnold, Nir Levine, Daniel J Mankowitz, Jerry Li, Cosmin Paduraru, Sven Gowal, and Todd Hester. 2021. Challenges of real-world reinforcement learning: definitions, benchmarks and analysis. Machine Learning 110, 9 (2021), 2419--2468.
[5]
Aidin Ferdowsi, Ursula Challita, Walid Saad, and Narayan B Mandayam. 2018. Robust deep reinforcement learning for security and safety in autonomous vehicle systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC). IEEE, 307--312.
[6]
Javier Garcia and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16, 1 (2015), 1437--1480.
[7]
Mevludin Glavic, Raphaël Fonteneau, and Damien Ernst. 2017. Reinforcement learning for electric power system decision and control: Past considerations and perspectives. IFAC-PapersOnLine 50, 1 (2017), 6918--6927.
[8]
Yanrong Hu and Simon X Yang. 2004. A knowledge based genetic algorithm for path planning of a mobile robot. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA'04. 2004, Vol. 5. IEEE, 4350--4355.
[9]
Shengyi Huang and Santiago Ontañón. 2020. A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006.14171 (2020).
[10]
Christian D Hubbs, Hector D Perez, Owais Sarwar, Nikolaos V Sahinidis, Ignacio E Grossmann, and John M Wassick. 2020. Or-gym: A reinforcement learning library for operations research problems. arXiv preprint arXiv:2008.06319 (2020).
[11]
Yi Jiang, Jialu Fan, Tianyou Chai, and Frank L Lewis. 2018. Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Transactions on Industrial Electronics 66, 6 (2018), 4587--4599.
[12]
Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2193--2201.
[13]
Anssi Kanervisto, Christian Scheller, and Ville Hautamäki. 2020. Action space shaping in deep reinforcement learning. In 2020 IEEE Conference on Games (CoG). IEEE, 479--486.
[14]
Charles Karr and L Michael Freeman. 1998. Industrial applications of genetic algorithms. Vol. 5. CRC press.
[15]
Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 11 (2013), 1238--1274.
[16]
Christopher Lazarus, James G Lopez, and Mykel J Kochenderfer. 2020. Runtime safety assurance using reinforcement learning. In 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC). IEEE, 1--9.
[17]
Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Goldberg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning. PMLR, 3053--3062.
[18]
Xing Liu, Hansong Xu, Weixian Liao, and Wei Yu. 2019. Reinforcement learning for cyber-physical systems. In 2019 IEEE International Conference on Industrial Internet (ICII). IEEE, 318--327.
[19]
Nguyen Cong Luong, Dinh Thai Hoang, Shimin Gong, Dusit Niyato, Ping Wang, Ying-Chang Liang, and Dong In Kim. 2019. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Communications Surveys & Tutorials 21, 4 (2019), 3133--3174.
[20]
Kim F Man, Kit Sang Tang, and Sam Kwong. 2012. Genetic algorithms for control and signal processing. Springer Science & Business Media.
[21]
Faizan Rasheed, Kok-Lim Alvin Yau, Rafidah Md Noor, Celimuge Wu, and Yeh-Ching Low. 2020. Deep Reinforcement Learning for Traffic Signal Control: A Review. IEEE Access (2020).
[22]
Jose G Rivera, Alejandro A Danylyszyn, Charles B Weinstock, Lui R Sha, and Michael J Gagliardi. 1996. An Architectural Description of the Simplex Architecture. Technical Report. Carnegie-Mellon University.
[23]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[24]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.
[25]
Burrhus F Skinner. 1963. Operant behavior. American psychologist 18, 8 (1963), 503.
[26]
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).
[27]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[28]
Lianfang Tian and Curtis Collins. 2004. An effective robot trajectory planning method using a genetic algorithm. Mechatronics 14, 5 (2004), 455--470.
[29]
Jianping Tu and Simon X Yang. 2003. Genetic algorithm based path planning for a mobile robot. In 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), Vol. 1. IEEE, 1221--1226.
[30]
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350--354.
[31]
Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N Zare, and Patrick Riley. 2019. Optimization of molecules via deep reinforcement learning. Scientific reports 9, 1 (2019), 1--10.

Index Terms

  1. Self-Preserving Genetic Algorithms for Safe Learning in Discrete Action Spaces
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICCPS '23: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)
        May 2023
        291 pages
        ISBN:9798400700361
        DOI:10.1145/3576841
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 May 2023

        Check for updates

        Badges

        Author Tags

        1. genetic algorithms
        2. safe learning
        3. safe reinforcement learning
        4. run time assurance
        5. action masking

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        ICCPS '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 25 of 91 submissions, 27%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 337
          Total Downloads
        • Downloads (Last 12 months)200
        • Downloads (Last 6 weeks)22
        Reflects downloads up to 03 Jan 2025

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media