More Web Proxy on the site http://driver.im/

research-article

Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing

Authors:

Tao MeiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 1

Article No.: 9, Pages 1 - 16

https://doi.org/10.1145/3526024

Published: 05 January 2023 Publication History

Abstract

Vision-and-Language Navigation (VLN) has been an emerging and fast-developing research topic, where an embodied agent is required to navigate in a real-world environment based on natural language instructions. In this article, we present a Direction-guided Navigator Agent (DNA) that novelly integrates direction clues derived from instructions into the essential encoder-decoder navigation framework. Particularly, DNA couples the standard instruction encoder with an additional direction branch which sequentially encodes the direction clues in the instructions to boost navigation. Furthermore, an Instruction Flipping mechanism is uniquely devised to enable fast data augmentation as well as a follow-up backtracing for navigating the agent in a backward direction. Such a way naturally amplifies the grounding of instruction in the local visual scenes along both forward and backward directions, and thus strengthens the alignment between instruction and action sequence. Extensive experiments conducted on Room to Room (R2R) dataset validate our proposal and demonstrate quantitatively compelling results.

References

[1]

Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian D. Reid, Stephen Gould, and Anton van den Hengel. 2018. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. 3674–3683.

[2]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual question answering. In Proceedings of the 2015 IEEE International Conference on Computer Vision. 2425–2433.

Digital Library

[3]

Yoav Artzi and Luke Zettlemoyer. 2013. Weakly supervised learning of semantic parsers for mapping instructions to actions. Transactions of the Association for Computational Linguistics 1 (2013), 49–62. https://dblp.org/rec/journals/tacl/ArtziZ13.html?view=bibtex.

[4]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.

[5]

Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, and Ruslan Salakhutdinov. 2018. Gated-attention architectures for task-oriented language grounding. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence.2819–2826.

[6]

David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.

[7]

Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. UNITER: UNiversal image-TExt representation learning. In Proceedings of the European Conference on Computer Vision.Springer, 104–120.

Digital Library

[8]

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Stefan Lee, José M. F. Moura, Devi Parikh, and Dhruv Batra. 2019. Visual dialog. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 5 (2019), 1242–1256.

[9]

Hao Fang, Saurabh Gupta, Forrest N. Iandola, Rupesh Kumar Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. 2015. From captions to visual concepts and back. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1473–1482.

[10]

Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, and Trevor Darrell. 2018. Speaker-follower models for vision-and-language navigation. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems.3318–3329.

[11]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep sparse rectifier neural networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics.315–323.

[12]

Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, and Ali Farhadi. 2018. IQA: Visual question answering in interactive environments. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.4089–4098.

[13]

Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, and Jianfeng Gao. 2020. Towards learning a generic agent for vision-and-language navigation via pre-training. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE, 13134–13143.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.770–778.

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[16]

Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016. Natural language object retrieval. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition.4555–4564.

[17]

Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhães, Jason Baldridge, and Eugene Ie. 2019. Transferable representation learning in vision-and-language navigation. In Proceedings of the 2019 IEEE International Conference on Computer Vision.

[18]

Vihan Jain, Gabriel Magalhaes, Alexander Ku, Ashish Vaswani, Eugene Ie, and Jason Baldridge. 2019. Stay on the path: Instruction fidelity in vision-and-language navigation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

[19]

Andrej Karpathy and Fei-Fei Li. 2017. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 664–676.

Digital Library

[20]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.

[21]

Yehao Li, Jiahao Fan, Yingwei Pan, Ting Yao, Weiyao Lin, and Tao Mei. 2022. Uni-EDEN: Universal encoder-decoder network by multi-granular vision-language pre-training. arXiv:2201.04026. Retrieved from https://arxiv.org/abs/2201.04026.

[22]

Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, and Tao Mei. 2021. X-modaler: A versatile and high-performance codebase for cross-modal analytics. In Proceedings of the 29th ACM International Conference on Multimedia. 3799–3802.

Digital Library

[23]

Yehao Li, Yingwei Pan, Ting Yao, Jingwen Chen, and Tao Mei. 2021. Scheduled sampling in vision-language pretraining with decoupled encoder-decoder network. In Proceedings of the AAAI Conference on Artificial Intelligence. 8518–8526.

[24]

Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, and Tao Mei. 2021. CoCo-BERT: Improving video-language pre-training with contrastive cross-modal matching and denoising. In Proceedings of the 29th ACM International Conference on Multimedia. 5600–5608.

Digital Library

[25]

Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, and Caiming Xiong. 2019. Self-monitoring navigation agent via auxiliary progress estimation. In Proceedings of the 7th International Conference on Learning Representations.

[26]

Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, and Zsolt Kira. 2019. The regretful agent: Heuristic-aided navigation through progress estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.6732–6740.

[27]

Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: Connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference.1475–1482.

[28]

Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, and Dhruv Batra. 2020. Improving vision-and-language navigation with image-text pairs from the web. In Proceedings of the European Conference on Computer Vision.Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Lecture Notes in Computer Science, Vol. 12351. Springer, 259–274.

Digital Library

[29]

Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. 2016. Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. In Proceedings of the 13th AAAI Conference on Artificial Intelligence.2772–2778.

[30]

Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andy Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, and Raia Hadsell. 2017. Learning to navigate in complex environments. In Proceedings of the 5th International Conference on Learning Representations.

[31]

Varun K. Nagaraja, Vlad I. Morariu, and Larry S. Davis. 2016. Modeling context between objects for referring expression understanding. In Proceedings of the 14th European Conference on Computer Vision.792–807.

[32]

Yingwei Pan, Yehao Li, Jianjie Luo, Jun Xu, Ting Yao, and Tao Mei. 2020. Auto-captions on GIF: A large-scale video-sentence dataset for vision-language pre-training. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2, Article 48 (2022), 16 pages.

[33]

Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui. 2016. Jointly modeling embedding and translation to bridge video and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4594–4602.

[34]

Yingwei Pan, Ting Yao, Houqiang Li, and Tao Mei. 2017. Video captioning with transferred semantic attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6504–6512.

[35]

Yingwei Pan, Ting Yao, Yehao Li, and Tao Mei. 2020. X-linear attention networks for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10971–10980.

[36]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.

[37]

Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2017. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. International Journal of Computer Vision 123, 1 (2017), 74–93.

Digital Library

[38]

Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele. 2016. Grounding of textual phrases in images by reconstruction. In Proceedings of the 14th European Conference on Computer Vision.817–834.

[39]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.

Digital Library

[40]

Hao Tan and Mohit Bansal. 2019. LXMERT: Learning cross-modality encoder representations from transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.), Association for Computational Linguistics, 5099–5110.

[41]

Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth J. Teller, and Nicholas Roy. 2011. Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of the 25th AAAI Conference on Artificial Intelligence.

[42]

Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond J. Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In Proceedings of the 2015 IEEE International Conference on Computer Vision.4534–4542.

Digital Library

[43]

Adam Vogel and Daniel Jurafsky. 2010. Learning to follow navigational directions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.

Digital Library

[44]

Xin Wang, Qiuyuan Huang, Asli Çelikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, and Lei Zhang. 2019. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.6629–6638.

[45]

Xin Wang, Wenhan Xiong, Hongmin Wang, and William Yang Wang. 2018. Look before you leap: Bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation. In Proceedings of the 15th European Conference on Computer Vision.38–55.

Digital Library

[46]

Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, and Dhruv Batra. 2019. Embodied question answering in photorealistic environments with point cloud perception. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.6659–6668.

[47]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning.2048–2057.

[48]

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher J. Pal, Hugo Larochelle, and Aaron C. Courville. 2015. Describing videos by exploiting temporal structure. In Proceedings of the 2015 IEEE International Conference on Computer Vision.4507–4515.

Digital Library

[49]

Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2018. Exploring visual relationship for image captioning. In Proceedings of the European Conference on Computer Vision.684–699.

Digital Library

[50]

Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2019. Hierarchy parsing for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2621–2629.

[51]

Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2017. Boosting image captioning with attributes. In Proceedings of the IEEE International Conference on Computer Vision. 4894–4902.

[52]

Licheng Yu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Mohit Bansal, and Tamara L. Berg. 2018. MAttNet: Modular attention network for referring expression comprehension. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 1307–1315.

[53]

Zhou Yu, Yuhao Cui, Jun Yu, Meng Wang, Dacheng Tao, and Qi Tian. 2020. Deep multimodal neural architecture search. In Proceedings of the 28th ACM International Conference on Multimedia.Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.), ACM, 3743–3752.

Digital Library

[54]

Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. 2019. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 6281–6290.

[55]

Fengda Zhu, Yi Zhu, Xiaojun Chang, and Xiaodan Liang. 2020. Vision-language navigation with self-supervised auxiliary reasoning tasks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition.IEEE, 10009–10019.

Cited By

Sun BGuo YYan TYe XWang ZLi HWang Z(2024)Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367712320:10(1-20)Online publication date: 30-Oct-2024
https://doi.org/10.1145/3677123
Xu WWang MZhou WLi HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday TaskProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680661(6969-6978)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680661
Jiang XYao YLiu SShen FNie LHua X(2024)Dual Dynamic Threshold Adjustment StrategyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604720:7(1-18)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656047
Show More Cited By

Index Terms

Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
        Vision for robotics

Recommendations

PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
Computer Vision – ECCV 2024
Abstract
Vision and language navigation is a task that requires an agent to navigate according to a natural language instruction. Recent methods predict sub-goals on constructed topology map at each step to enable long-term action planning. However, they ...
Learning Disentanglement with Decoupled Labels for Vision-Language Navigation
Computer Vision – ECCV 2022
Abstract
Vision-and-Language Navigation (VLN) requires an agent to follow complex natural language instructions and perceive the visual environment for real-world navigation. Intuitively, we find that instruction disentanglement for each viewpoint along ...
Vision-Based Humanoid Robot Navigation in a Featureless Environment
MCPR 2015: Proceedings of the 7th Mexican Conference on Pattern Recognition - Volume 9116

One of the most basic tasks for any autonomous mobile robot is that of safely navigating from one point to another e.g. service robots should be able to find their way in different kinds of environments. Typically, vision is used to find landmarks in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 1

January 2023

505 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3572858

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Online AM: 17 March 2022

Accepted: 24 January 2022

Revised: 06 July 2021

Received: 18 October 2020

Published in TOMM Volume 19, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key R&D Program of China
NSF of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
420
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun BGuo YYan TYe XWang ZLi HWang Z(2024)Digging into Depth and Color Spaces: A Mapping Constraint Network for Depth Super-ResolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367712320:10(1-20)Online publication date: 30-Oct-2024
https://doi.org/10.1145/3677123
Xu WWang MZhou WLi HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday TaskProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680661(6969-6978)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680661
Jiang XYao YLiu SShen FNie LHua X(2024)Dual Dynamic Threshold Adjustment StrategyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604720:7(1-18)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656047
Hu XLin YFan HWang SWu ZLv K(2024)Building Category Graphs Representation with Spatial and Temporal Attention for Visual NavigationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365371420:7(1-22)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1145/3653714
Fu FFang SChen WMao Z(2024)Sentiment-Oriented Transformer-Based Variational Autoencoder Network for Live Video CommentingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363333420:4(1-24)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3633334
Fang XLiu DZhou PXu ZLi R(2024)Hierarchical Local-Global Transformer for Temporal Sentence GroundingIEEE Transactions on Multimedia10.1109/TMM.2023.330955126(3263-3277)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3309551
Jing SZhang HZeng PGao LSong JShen H(2024)Memory-Based Augmentation Network for Video CaptioningIEEE Transactions on Multimedia10.1109/TMM.2023.329509826(2367-2379)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3295098
Wei ZYang XWang NGao X(2024)Dual-Adversarial Representation Disentanglement for Visible Infrared Person Re-IdentificationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.334428919(2186-2200)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3344289
Islam CSalman SShams MLiu XKumar P(2024)Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10802618(13845-13852)Online publication date: 14-Oct-2024
https://doi.org/10.1109/IROS58592.2024.10802618
Li ZTu ZZhang MLu YQiao H(2024)Brain-Inspired Visual Language Navigation Robot Position Deviation CorrectionIntelligent Robotics and Applications10.1007/978-981-96-0789-1_20(273-287)Online publication date: 31-Jul-2024
https://dl.acm.org/doi/10.1007/978-981-96-0789-1_20
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents