[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3635637.3663035acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

LgTS: Dynamic Task Sampling using LLM-generated Sub-Goals for Reinforcement Learning Agents

Published: 06 May 2024 Publication History

Abstract

Recent advancements in reasoning abilities of Large Language Models (LLM) has promoted their usage in problems that require high-level planning for artificial agents. However, current techniques that utilize LLMs for such planning tasks make certain key assumptions such as, access to datasets that permit finetuning, meticulously engineered prompts that only provide relevant and essential information to the LLM, and most importantly, a deterministic approach to allow execution of the LLM responses either in the form of existing policies or plan operators. In this work, we propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs to provide a graphical representation of the sub-goals to a reinforcement learning (RL) agent that does not have access to the transition dynamics of the environment. The RL agent uses Teacher-Student learning algorithm to learn a set of successful policies for reaching the goal state from the start state while simultaneously minimizing the number of environmental interactions. Unlike previous methods that utilize LLMs, our approach does not assume access to a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM. Through experiments on a gridworld based DoorKey domain and a search-and-rescue inspired domain, we show that a LLM-proposed graphical structure for sub-goals combined with a Teacher-Student RL algorithm achieves sample-efficient policies.

References

[1]
Shipra Agrawal and Navin Goyal. 2012. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory. JMLR Workshop and Conference Proceedings, 39-1.
[2]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. 2022. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
[3]
Rajeev Alur, Suguman Bansal, Osbert Bastani, and Kishor Jothimurugan. 2022. A framework for transforming specifications in reinforcement learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday. Springer, 604--624.
[4]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47 (2002), 235--256.
[5]
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818 (2023).
[6]
Mingyu Cai, Erfan Aasi, Calin Belta, and Cristian-Ioan Vasile. 2023. Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments From Temporal Logic Specifications. IEEE Robotics and Automation Letters, Vol. 8, 4 (2023), 2158--2165. https://doi.org/10.1109/LRA.2023.3246844
[7]
Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for Gymnasium. https://github.com/Farama-Foundation/Minigrid
[8]
Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. 2019. Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Intl. Conf. on Automated Planning and Scheduling, Vol. 29.
[9]
Giuseppe De Giacomo and Moshe Y Vardi. 2013. Linear temporal logic and linear dynamic logic on finite traces. In IJCAI'13 Proc. of the Twenty-Third Intl. joint Conf. on Artificial Intelligence. Association for Computing Machinery, 854--860.
[10]
Merriam-Webster Dictionary. 2002. Merriam-webster. On-line at http://www. mw. com/home. htm, Vol. 8, 2 (2002).
[11]
Yan Ding, Xiaohan Zhang, Chris Paxton, and Shiqi Zhang. 2023 a. Leveraging Commonsense Knowledge from Large Language Models for Task and Motion Planning. In RSS 2023 Workshop on Learning for Task and Motion Planning.
[12]
Yan Ding, Xiaohan Zhang, Chris Paxton, and Shiqi Zhang. 2023 b. Task and motion planning with large language models for object rearrangement. arXiv preprint arXiv:2303.06247 (2023).
[13]
Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
[14]
Lewis Hammond, Alessandro Abate, Julian Gutierrez, and Michael Wooldridge. 2021. Multi-agent reinforcement learning with temporal logic specifications. arXiv preprint arXiv:2102.00582 (2021).
[15]
Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. 2022. Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, Vol. 73 (2022), 173--208.
[16]
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence, Vol. 101, 1 (1998), 99--134.
[17]
Minae Kwon, Sang Michael Xie, Kalesha Bullard, and Dorsa Sadigh. 2023. Reward design with language models. arXiv preprint arXiv:2303.00001 (2023).
[18]
Xiao Li, Cristian-Ioan Vasile, and Calin Belta. 2017. Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3834--3839.
[19]
Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. 2023. Llm p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477 (2023).
[20]
Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher-Student Curriculum Learning. IEEE Trans. Neural Networks Learn. Syst., Vol. 31, 9 (2020), 3732--3740. https://doi.org/10.1109/TNNLS.2019.2934906
[21]
Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. 2021. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys (2021).
[22]
Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E Taylor, and Peter Stone. 2020. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. JMLR, Vol. 21 (2020), 1--50.
[23]
Pierre-Yves Oudeyer and Frederic Kaplan. 2009. What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics (2009), 6.
[24]
Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277 (2023).
[25]
Shreyas Sundara Raman, Vanya Cohen, Eric Rosen, Ifrah Idrees, David Paulius, and Stefanie Tellex. 2022. Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935 (2022).
[26]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR (2017). arxiv: 1707.06347 http://arxiv.org/abs/1707.06347
[27]
Yash Shukla, Tanushree Burman, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, and Jivko Sinapov. 2024. Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents. arXiv preprint arXiv:2402.03678 (2024).
[28]
Yash Shukla, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, and Jivko Sinapov. 2023. Automaton-Guided Curriculum Generation for Reinforcement Learning Agents. In Proceedings of the 33rd International Conference on Automated Planning and Scheduling.
[29]
Yash Shukla, Christopher Thierauf, Ramtin Hosseini, Gyan Tatiya, and Jivko Sinapov. 2022. ACuTE: Automatic Curriculum Transfer from Simple to Complex Environments. In 21st Intl. Conf. on Autonomous Agents and Multiagent Systems. 1192--1200.
[30]
Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. 2023. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523--11530.
[31]
Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. 2022. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088 (2022).
[32]
Austin Stone, Ted Xiao, Yao Lu, Keerthana Gopalakrishnan, Kuang-Huei Lee, Quan Vuong, Paul Wohlhart, Brianna Zitkovich, Fei Xia, Chelsea Finn, et al. 2023. Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905 (2023).
[33]
Csaba Szepesvári. 2004. Shortest path discovery problems: A framework, algorithms and experimental results. In AAAI. 550--555.
[34]
Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. 2018. Teaching multiple tasks to an RL agent using LTL. In Autonomous Agents and MultiAgent Systems.
[35]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
[36]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, Vol. 35 (2022), 24824--24837.
[37]
Zhe Xu and Ufuk Topcu. 2019. Transfer of temporal logic formulas in reinforcement learning. In IJCAI: proceedings of the conference, Vol. 28. NIH Public Access, 4010.
[38]
Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F Fouhey, and Joyce Chai. 2023. LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. arXiv preprint arXiv:2309.12311 (2023).
[39]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).

Cited By

View all
  • (2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024

Index Terms

  1. LgTS: Dynamic Task Sampling using LLM-generated Sub-Goals for Reinforcement Learning Agents

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems
      May 2024
      2898 pages
      ISBN:9798400704864

      Sponsors

      Publisher

      International Foundation for Autonomous Agents and Multiagent Systems

      Richland, SC

      Publication History

      Published: 06 May 2024

      Check for updates

      Author Tags

      1. curriculum learning
      2. large language models
      3. reinforcement learning

      Qualifiers

      • Research-article

      Funding Sources

      • FA875022C0501

      Conference

      AAMAS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)43
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 10 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media