More Web Proxy on the site http://driver.im/

research-article

LgTS: Dynamic Task Sampling using LLM-generated Sub-Goals for Reinforcement Learning Agents

Authors:

Vasanth Sarathy,

Alvaro Velasquez,

Jivko SinapovAuthors Info & Claims

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Pages 1736 - 1744

Published: 06 May 2024 Publication History

Abstract

Recent advancements in reasoning abilities of Large Language Models (LLM) has promoted their usage in problems that require high-level planning for artificial agents. However, current techniques that utilize LLMs for such planning tasks make certain key assumptions such as, access to datasets that permit finetuning, meticulously engineered prompts that only provide relevant and essential information to the LLM, and most importantly, a deterministic approach to allow execution of the LLM responses either in the form of existing policies or plan operators. In this work, we propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs to provide a graphical representation of the sub-goals to a reinforcement learning (RL) agent that does not have access to the transition dynamics of the environment. The RL agent uses Teacher-Student learning algorithm to learn a set of successful policies for reaching the goal state from the start state while simultaneously minimizing the number of environmental interactions. Unlike previous methods that utilize LLMs, our approach does not assume access to a fine-tuned LLM, nor does it require pre-trained policies that achieve the sub-goals proposed by the LLM. Through experiments on a gridworld based DoorKey domain and a search-and-rescue inspired domain, we show that a LLM-proposed graphical structure for sub-goals combined with a Teacher-Student RL algorithm achieves sample-efficient policies.

References

[1]

Shipra Agrawal and Navin Goyal. 2012. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory. JMLR Workshop and Conference Proceedings, 39-1.

[2]

Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. 2022. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).

[3]

Rajeev Alur, Suguman Bansal, Osbert Bastani, and Kishor Jothimurugan. 2022. A framework for transforming specifications in reinforcement learning. In Principles of Systems Design: Essays Dedicated to Thomas A. Henzinger on the Occasion of His 60th Birthday. Springer, 604--624.

[4]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning, Vol. 47 (2002), 235--256.

Digital Library

[5]

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. 2023. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818 (2023).

[6]

Mingyu Cai, Erfan Aasi, Calin Belta, and Cristian-Ioan Vasile. 2023. Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments From Temporal Logic Specifications. IEEE Robotics and Automation Letters, Vol. 8, 4 (2023), 2158--2165. https://doi.org/10.1109/LRA.2023.3246844

[7]

Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. 2018. Minimalistic Gridworld Environment for Gymnasium. https://github.com/Farama-Foundation/Minigrid

[8]

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. 2019. Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Intl. Conf. on Automated Planning and Scheduling, Vol. 29.

[9]

Giuseppe De Giacomo and Moshe Y Vardi. 2013. Linear temporal logic and linear dynamic logic on finite traces. In IJCAI'13 Proc. of the Twenty-Third Intl. joint Conf. on Artificial Intelligence. Association for Computing Machinery, 854--860.

[10]

Merriam-Webster Dictionary. 2002. Merriam-webster. On-line at http://www. mw. com/home. htm, Vol. 8, 2 (2002).

[11]

Yan Ding, Xiaohan Zhang, Chris Paxton, and Shiqi Zhang. 2023 a. Leveraging Commonsense Knowledge from Large Language Models for Task and Motion Planning. In RSS 2023 Workshop on Learning for Task and Motion Planning.

[12]

Yan Ding, Xiaohan Zhang, Chris Paxton, and Shiqi Zhang. 2023 b. Task and motion planning with large language models for object rearrangement. arXiv preprint arXiv:2303.06247 (2023).

[13]

Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. 2023. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).

[14]

Lewis Hammond, Alessandro Abate, Julian Gutierrez, and Michael Wooldridge. 2021. Multi-agent reinforcement learning with temporal logic specifications. arXiv preprint arXiv:2102.00582 (2021).

[15]

Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. 2022. Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, Vol. 73 (2022), 173--208.

Digital Library

[16]

Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence, Vol. 101, 1 (1998), 99--134.

[17]

Minae Kwon, Sang Michael Xie, Kalesha Bullard, and Dorsa Sadigh. 2023. Reward design with language models. arXiv preprint arXiv:2303.00001 (2023).

[18]

Xiao Li, Cristian-Ioan Vasile, and Calin Belta. 2017. Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3834--3839.

Digital Library

[19]

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. 2023. Llm p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477 (2023).

[20]

Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher-Student Curriculum Learning. IEEE Trans. Neural Networks Learn. Syst., Vol. 31, 9 (2020), 3732--3740. https://doi.org/10.1109/TNNLS.2019.2934906

[21]

Bonan Min, Hayley Ross, Elior Sulem, Amir Pouran Ben Veyseh, Thien Huu Nguyen, Oscar Sainz, Eneko Agirre, Ilana Heintz, and Dan Roth. 2021. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys (2021).

[22]

Sanmit Narvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, Matthew E Taylor, and Peter Stone. 2020. Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. JMLR, Vol. 21 (2020), 1--50.

[23]

Pierre-Yves Oudeyer and Frederic Kaplan. 2009. What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics (2009), 6.

[24]

Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277 (2023).

[25]

Shreyas Sundara Raman, Vanya Cohen, Eric Rosen, Ifrah Idrees, David Paulius, and Stefanie Tellex. 2022. Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935 (2022).

[26]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR (2017). arxiv: 1707.06347 http://arxiv.org/abs/1707.06347

[27]

Yash Shukla, Tanushree Burman, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, and Jivko Sinapov. 2024. Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents. arXiv preprint arXiv:2402.03678 (2024).

[28]

Yash Shukla, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, and Jivko Sinapov. 2023. Automaton-Guided Curriculum Generation for Reinforcement Learning Agents. In Proceedings of the 33rd International Conference on Automated Planning and Scheduling.

Digital Library

[29]

Yash Shukla, Christopher Thierauf, Ramtin Hosseini, Gyan Tatiya, and Jivko Sinapov. 2022. ACuTE: Automatic Curriculum Transfer from Simple to Complex Environments. In 21st Intl. Conf. on Autonomous Agents and Multiagent Systems. 1192--1200.

[30]

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. 2023. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523--11530.

[31]

Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. 2022. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088 (2022).

[32]

Austin Stone, Ted Xiao, Yao Lu, Keerthana Gopalakrishnan, Kuang-Huei Lee, Quan Vuong, Paul Wohlhart, Brianna Zitkovich, Fei Xia, Chelsea Finn, et al. 2023. Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905 (2023).

[33]

Csaba Szepesvári. 2004. Shortest path discovery problems: A framework, algorithms and experimental results. In AAAI. 550--555.

[34]

Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. 2018. Teaching multiple tasks to an RL agent using LTL. In Autonomous Agents and MultiAgent Systems.

[35]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).

[36]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, Vol. 35 (2022), 24824--24837.

[37]

Zhe Xu and Ufuk Topcu. 2019. Transfer of temporal logic formulas in reinforcement learning. In IJCAI: proceedings of the conference, Vol. 28. NIH Public Access, 4010.

[38]

Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F Fouhey, and Joyce Chai. 2023. LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. arXiv preprint arXiv:2309.12311 (2023).

[39]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).

Cited By

Li JZhang MLi NWeyns DJin ZTei K(2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3686803

Index Terms

LgTS: Dynamic Task Sampling using LLM-generated Sub-Goals for Reinforcement Learning Agents
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning

Recommendations

Transferring task models in Reinforcement Learning agents

The main objective of transfer learning is to reuse knowledge acquired in a previous learned task, in order to enhance the learning procedure in a new and more complex task. Transfer learning comprises a suitable solution for speeding up the learning ...
Curriculum learning for reinforcement learning domains: a framework and survey

Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large ...
Curriculum Learning in Reinforcement Learning: (Doctoral Consortium)
AAMAS '16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems

Transfer learning in reinforcement learning is an area of research that seeks to speed up or improve learning of a complex target task, by leveraging knowledge from one or more source tasks. This thesis will extend the concept of transfer learning to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

May 2024

2898 pages

ISBN:9798400704864

General Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Jaime Simão Sichman
University of São Paulo, Brazil
,
Program Chairs:
Natasha Alechina
Utrecht University, Netherlands
,
Virginia Dignum
Umeå University, Sweden

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

FA875022C0501

Conference

AAMAS '24

Sponsor:

SIGAI

AAMAS '24: International Conference on Autonomous Agents and Multiagent Systems

May 6 - 10, 2024

Auckland, New Zealand

Acceptance Rates

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
43
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li JZhang MLi NWeyns DJin ZTei K(2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3686803

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents