Highlights
- Pro
Stars
Nymeria: a massive collection of multimodal egocentric daily motion in the wild
A Modular Toolkit for Robot Kinematic Optimization
[RSS 2024] AdaptiGraph: Material-Adaptive Graph-Based Neural Dynamics for Robotic Manipulation
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
Collect some World Models for Autonomous Driving (and Robotic) papers.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Official code for the CVPR 2025 paper "Navigation World Models".
Official code repository for "Web Agents with World Models [ICLR 2025]".
Code for Scaling Language-Free Visual Representation Learning (WebSSL)
ICLR'25 Oral: Improving Probabilistic Diffusion Models With Optimal Covariance Matching
HaMeR: Reconstructing Hands in 3D with Transformers
[ICLR 2025] 6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Muon: An optimizer for hidden layers in neural networks
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
PyTorch code and models for the DINOv2 self-supervised learning method.
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Distributed Robot Interaction Dataset.
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
A suite of image and video neural tokenizers
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.