Highlights
- Pro
Stars
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
[ICRA 2025] Official implementation of Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
A Best-of-list of Robot Simulators, re-generated weekly on Wednesdays
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
OpenEQA Embodied Question Answering in the Era of Foundation Models
Official implementations for paper: Anydoor: zero-shot object-level image customization
The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
Frontier Multimodal Foundation Models for Image and Video Understanding
An open source implementation of CLIP.
[CVPR 2023] CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
Code for ICRA24 paper "Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation" Paper:https://arxiv.org/abs/2310.07968 Video:https://www.youtube.com/watch?v=rN5S8QIhhQc
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. NeurIPS 2022
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Official implementation of SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
A repository accompanying the PARTNR benchmark for using Large Planning Models (LPMs) to solve Human-Robot Collaboration or Robot Instruction Following tasks in the Habitat simulator.
Extending the existing benchmark VideoQA datasets
code for downloading videos from HowTo100M dataset
A Datasette instance for searching WebVid-10M
Large-scale text-video dataset. 10 million captioned short videos.
LightGlue: Local Feature Matching at Light Speed (ICCV 2023)