Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue

Aishwarya Padmakumar, Mert Inan, Spandana Gella, Patrick Lange, Dilek Hakkani-Tur

Abstract

Embodied task completion is a challenge where an agent in a simulated environment must predict environment actions to complete tasks based on natural language instructions and ego-centric visual observations. We propose a variant of this problem where the agent predicts actions at a higher level of abstraction called a plan, which helps make agent actions more interpretable and can be obtained from the appropriate prompting of large language models. We show that multimodal transformer models can outperform language-only models for this problem but fall significantly short of oracle plans. Since collecting human-human dialogues for embodied environments is expensive and time-consuming, we propose a method to synthetically generate such dialogues, which we then use as training data for plan prediction. We demonstrate that multimodal transformer models can attain strong zero-shot performance from our synthetic data, outperforming language-only models trained on human-human data.

Anthology ID:: 2023.emnlp-main.374
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6114–6131
Language:
URL:: https://aclanthology.org/2023.emnlp-main.374/
DOI:: 10.18653/v1/2023.emnlp-main.374
Bibkey:
Cite (ACL):: Aishwarya Padmakumar, Mert Inan, Spandana Gella, Patrick Lange, and Dilek Hakkani-Tur. 2023. Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6114–6131, Singapore. Association for Computational Linguistics.
Cite (Informal):: Multimodal Embodied Plan Prediction Augmented with Synthetic Embodied Dialogue (Padmakumar et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.374.pdf
Video:: https://aclanthology.org/2023.emnlp-main.374.mp4

PDF Cite Search Video Fix data