I currently focus on large language models for navigation including
Last Update: 2025/04/08
- [2025] Generative Models in Decision Making: A Survey, arXiv [Paper]
- [2025] A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond, arXiv [Paper]
- [2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives, arXiv [Paper] [Code]
- [2025] The Role of World Models in Shaping Autonomous Driving: A Comprehensive Survey, arXiv [Paper] [Code]
- [2025] A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models, arXiv [Paper]
- [2025] Embodied Intelligence: A Synergy of Morphology, Action, Perception and Learning, ACM Computing Surveys [Paper]
- [2025] Large Language Models for Multi-Robot Systems: A Survey, arXiv [Paper] [Code]
- [2025] Survey on Large Language Model Enhanced Reinforcement Learning: Conceptaxonomy, and Methods, IEEE TNNLS [Paper]
- [2025] Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning, arXiv [Paper]
- [2025] UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility, arXiv [Paper] [Code]
- [2025] A Survey of World Models for Autonomous Driving, arXiv [Paper]
- [2025] Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, arXiv [Paper]
- [2025] A Survey on Large Language Models with some Insights on their Capabilities and Limitations, arXiv [Paper]
- [2025] 基于大模型的具身智能系统综述, 自动化学报 [Paper]
- [2025] 具身智能研究的关键问题: 自主感知、行动与进化, 自动化学报 [Paper] [Code]
- [2024] 大模型驱动的具身智能: 发展与挑战, 中国科学 [Paper]
- [2024] From Specific-MLLMs to Omni-MLLMs: A Survey on MLLMs Aligned with Multi-modalities, arXiv [Paper] [Code]
- [2024] Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models, TMLR [Paper] [Code]
- [2024] Efficient Large Language Models: A Survey, TMLR [Paper] [Code]
- [2024] A Survey on Multimodal Large Language Models for Autonomous Driving, WACV [Paper]
- [2024] Personalization of Large Language Models: A Survey, arXiv [Paper]
- [2024] A Survey on LLM Inference-Time Self-Improvement, arXiv [Paper]
- [2024] Embodied Navigation with Multi-modal Information: A Survey from Tasks to Methodology, Information Fusion [Paper]
- [2024] Recent Advances in Robot Navigation via Large Language Models: A Review, arXiv [Paper]
- [2024] Large Language Models for Robotics: Opportunities, Challenges, and Perspectives, arXiv [Paper]
- [2024] Advances in Embodied Navigation Using Large Language Models: A Survey, arXiv [Paper]
- [2024] Foundation Models in Robotics: Applications, Challenges, and the Future, IJRR [Paper] [Code]
- [2024] A Survey of Large Language Models, arXiv [Paper] [Code]
- [2024] ChatGPT for Robotics: Design Principles and Model Abilities, IEEE Access [Paper]
- [2023] Large Language Models for Robotics: A Survey, arXiv [Paper]
- [2023] LLM4Drive: A Survey of Large Language Models for Autonomous Driving, arXiv [Paper] [Code]
- [2025] Visual-RFT: Visual Reinforcement Fine-Tuning, arXiv [Paper] [Code]
- [2025] ObjectVLA: End-to-End Open-World Object Manipulation Without Demonstration, arXiv [Paper] [Code]
- [2025] LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token, arXiv [Paper] [Code]
- [2025] Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling, arXiv [Paper] [Code]
- [2024] OVAL-Prompt: Open-Vocabulary Affordance Localization for Robot Manipulation through LLM Affordance-Grounding, arXiv [Paper]
- [2024] NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models, AAAI [Paper]
- [2024] LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation, arXiv [Paper] [Code]
- [2024] OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments, arXiv [Paper]
- [2023] Chat with the Environment: Interactive Multimodal Perception using Large Language Models, IROS [Paper]
- [2023] VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models, arXiv [Paper]
- [2023] Steve-Eye: Equipping LLM-Based Embodied Agents with Visual Perception in Open Worlds, ICLR [Paper]
- [2023] LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, CORL [Paper]
- [2022] Flamingo: a Visual Language Model for Few-Shot Learning, NeurIPS [Paper]
- [2021] Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation, arXiv [Paper]
- [2025] NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning, IEEE TPAMI [Paper]
- [2025] FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks, arXiv [Paper]
- [2025] MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation, arXiv [Paper]
- [2025] NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM, arXiv [Paper]
- [2025] LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs, arXiv [Paper] [Video]
- [2025] Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives, arXiv [Paper] [Video]
- [2025] SD++: Enhancing Standard Defnition Mapsby Incorporating Road Knowledge using LLMs, arXiv [Paper]
- [2025] FAST: Efficient Action Tokenization for Vision-Language-Action Models, arXiv [Paper] [Video]
- [2025] AdaWM: Adaptive World Model based Planning for Autonomous Driving, arXiv [Paper]
- [2025] Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving, AAAI [Paper]
- [2025] LLM-attacker: Enhancing Closed-loop Adversarial Scenario Generation for Autonomous Driving with Large Language Models, arXiv [Paper] [Video]
- [2024] Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs, EMNLP [Paper]
- [2024] Mastering Board Games by Externa 9E38 l and Internal Planning with Language Models, arXiv [Paper]
- [2024] TopV-Nav: Unlocking the TopView Spatial Reasoning Potential of MLLM for Zero-shot Obiect Navigation, arXiv [Paper]
- [2024] The One RING: a Robotic Indoor Navigation Generalist, arXiv [Paper] [Video]
- [2024] Asynchronous Large Language Model Enhanced Planner for Autonomous Driving, ECCV [Paper]
- [2024] Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving, arXiv [Paper]
- [2024] LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning, arXiv [Paper]
- [2024] SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments, ICAPS [Paper]
- [2024] AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers, ICRA [Paper]
- [2023] ProgPrompt: Generating Situated Robot Task Plans using Large Language Models, ICRA [Paper]
- [2023] Text2Motion: from Natural Language Instructions to Feasible Plans, Autonomous Robots [Paper]
- [2023] LLM as A Robotic Brain: Unifying Egocentric Memory and Control, arXiv [Paper]
- [2023] PaLM-E: An Embodied Multimodal Language Model, arXiv [Paper]
- [2022] Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, arXiv [Paper]
- [2022] Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents, ICML [Paper]
- [2021] Learning a Decision Module by Imitating Driver’s Control Behaviors, CORL [Paper]
- [2021] Neuro-Symbolic Program Search for Autonomous Driving Decision Module Design, CORL [Paper]
- [2021] A Lifelong Learning Approach to Mobile Robot Navigation, IEEE RAL [Paper]
- [2025] ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model, arXiv [Paper]
- [2024] π0: A Vision-Language-Action Flow Model for General Robot Control, arXiv [Paper] [Video]
- [2024] NaVILA: Legged Robot Vision-Language-Action Model for Navigation, arXiv [Paper] [Video]
- [2024] Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation, arXiv [Paper]
- [2024] GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment, arXiv [Paper]
- [2024] Probabilistically Correct Language-based Multi-Robot Planning using Conformal Prediction, arXiv [Paper]
- [2024] Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration, arXiv [Paper]
- [2024] Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?, ICRA [Paper]
- [2024] LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination, AAMAS [Paper]
- [2024] VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View, AAAI [Paper]
- [2024] SRLM: Human-in-Loop Interactive Social Robot Navigation with Large Language Model and Deep Reinforcement Learning, arXiv [Paper]
- [2024] RoCo: Dialectic Multi-Robot Collaboration with Large Language Models, ICRA [Paper]
- [2024] Building Cooperative Embodied Agents Modularly with Large Language Models, ICLR [Paper]
- [2024] Lifelong Robot Learning with Human Assisted Language Planners, ICRA [Paper]
- [2024] MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning, arXiv [Paper]
- [2024] LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments, IROS [Paper]
- [2023] Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model, arXiv [Paper]
- [2023] NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning, IROS [Paper]
- [2023] Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration, arXiv [Paper]
- [2023] Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models, arXiv [Paper]
- [2023] Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach, arXiv [Paper]
- [2023] LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models, arXiv [Paper]
- [2023] ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation, ICML [Paper]
- [2023] Code as Policies: Language Model Programs for Embodied Control, ICRA [Paper]
- [2022] Multi-Agent Embodied Visual Semantic Navigation With Scene Prior Knowledge, IEEE RAL [Paper]
- [2022] Multi-Robot Active Mapping via Neural Bipartite Graph Matching, CVPR [Paper]
- [2022] Learning Efficient Multi-agent Cooperative Visual Exploration, ECCV [Paper]
- [2025] Týr-the-Pruner: Unlocking Accurate 50% Structural Pruning for LLMs via Global Sparsity Distribution Optimization, arXiv [Paper]
- [2025] Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing, ICLR [Paper]
- [2025] Lightweight and Post-Training Structured Pruning for On-Device Large Lanaguage Models, arXiv [Paper]
- [2025] FASP: Fast and Accurate Structured Pruning of Large Language Models, arXiv [Paper]
- [2024] FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models, NeurIPS [Paper]
- [2024] Fluctuation-Based Adaptive Structured Pruning for Large Language Models, AAAI [Paper]
- [2024] LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models, ICML [Paper][Code]
- [2024] SlimGPT: Layer-wise Structured Pruning for Large Language Models, NeurIPS [Paper]
- [2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning, NeurIPS [Paper]
- [2024] Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes, arXiv [Paper]
- [2024] Compact Language Models via Pruning and Knowledge Distillation, arXiv [Paper]
- [2024] A Deeper Look at Depth Pruning of LLMs, ICML [Paper]
- [2024] Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models, arXiv [Paper]
- [2024] Plug-and-Play: An Efficient Post-training Pruning Method for Large Language Models, ICLR [Paper]
- [2024] BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation, arXiv [Paper]
- [2024] ShortGPT: Layers in Large Language Models are More Redundant Than You Expect, arXiv [Paper]
- [2024] NutePrune: Efficient Progressive Pruning with Numerous Teachers for Large Language Models, arXiv [Paper]
- [2024] SliceGPT: Compress Large Language Models by Deleting Rows and Columns, ICLR[Paper] [Code]
- [2023] LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery, arXiv [Paper]
- [2023] LLM-Pruner: On the Structural Pruning of Large Language Models, NeurIPS [Paper] [Code]
- [2023] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning, NeurIPS [Paper] [Code]
- [2023] LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning, arXiv [Paper]
- [2023] LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation, ICML [Paper] [Code]
- [2025] Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization, ICLR [Paper] [Code]
- [2024] Fast and Effective Weight Update for Pruned Large Language Models, TMLR [Paper] [Code]
- [2024] A Simple and Effective Pruning Approach for Large Language Models, ICLR [Paper] [Code]
- [2024] Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models, ICML [Paper]
- [2024] MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models, NeurIPS [Paper]
- [2024] Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs, ICLR [Paper]
- [2024] A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models, arXiv [Paper]
- [2023] SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot, ICML [Paper] [Code]
- [2023] One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models, arXiv [Paper]