Stars
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
CogView4, CogView3-Plus and CogView3(ECCV 2024)
A pipeline parallel training script for diffusion models.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Best practices & guides on how to write distributed pytorch training code
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling
Machine Learning Engineering Open Book
Bootstrap Kubernetes the hard way. No scripts.
Official codes of VEnhancer: Generative Space-Time Enhancement for Video Generation
A book on DevOps for Data Scientists with CRC Press.
Official inference repo for FLUX.1 models
A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson
Repository to quickly label lots of images using CLIP embeddings
Combining Segment Anything (SAM) with Grounded DINO for zero-shot object detection and CLIPSeg for zero-shot segmentation
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Metric learning and retrieval pipelines, models and zoo.
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"
Official code for "Style Aligned Image Generation via Shared Attention"
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Implementation of Conditional ViT on LAION — Referred Visual Search — Fashion
[SIGGRAPH Asia '23] FLARE: Fast Learning of Animatable and Relightable Mesh Avatars
This repository implements the idea of "caption upsampling" from DALL-E 3 with Zephyr-7B and gathers results with SDXL.