Stars
[ICLR 2025] Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Example models using DeepSpeed
Emu Series: Generative Multimodal Models from BAAI
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[AAAI 2025] Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
[ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
High-Resolution Image Synthesis with Latent Diffusion Models
A latent text-to-image diffusion model
High-Resolution Image Synthesis with Latent Diffusion Models
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
[ICLR 2025 spotlight] 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Official inference repo for FLUX.1 models
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
assistant tools for attention visualization in deep learning
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
AcadHomepage: A Modern and Responsive Academic Personal Homepage