Stars
PyTorch code and models for VJEPA2 self-supervised learning from video.
A Modular Toolkit for Robot Kinematic Optimization
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
ICCV 2025 | TesserAct: Learning 4D Embodied World Models
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Unfied World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
Code for Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation
Dataset and Code for ICRA 2024 paper "Grasp-Anything: Large-scale Grasp Dataset from Foundation Models."
[CVPR 2025🎉] Official implementation for paper "Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation".
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
[IROS 2025] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
[ICRA 2025] In-Context Imitation Learning via Next-Token Prediction
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Fully open reproduction of DeepSeek-R1
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Vision package for robot manipulation and learning research
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data