Stars
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"
OneDiff: An out-of-the-box acceleration library for diffusion models.
PyTorch code and models for V-JEPA self-supervised learning from video.
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
Modeling, training, eval, and inference code for OLMo
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Building a quick conversation-based search demo with Lepton AI.
[ICCV 2023] StableVideo: Text-driven Consistency-aware Diffusion Video Editing
Character Animation (AnimateAnyone, Face Reenactment)
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
An open-source framework for training large multimodal models.
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Imitation learning algorithms with Co-training for Mobile ALOHA: ACT, Diffusion Policy, VINN
A simple, performant and scalable Jax LLM!
High-speed Large Language Model Serving for Local Deployment
[T-PAMI 2025] PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
QLoRA: Efficient Finetuning of Quantized LLMs
[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
Code for paper: FUTR3D: a unified sensor fusion framework for 3d detection
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
[ICLR 2023] DiffMimic: Efficient Motion Mimicking with Differentiable Physics https://arxiv.org/abs/2304.03274
Metric depth estimation from a single image