Stars
ComfyUI-ReduxFineTune is a custom node for ComfyUI that enables advanced style fine-tuning using the Flux Redux approach. It offers multiple unified fusion modes for precise and consistent control …
✨✨Latest Advances on Multimodal Large Language Models
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
collection of diffusion model papers categorized by their subareas
Implementation of ColorizeDiffusion
[CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
CosmicMan: A Text-to-Image Foundation Model for Humans (CVPR 2024)
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
StyleGAN-Human: A Data-Centric Odyssey of Human Generation
UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer
[CVPR 2025] Attention Distillation: A Unified Approach to Visual Characteristics Transfer
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment
Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional variability in sampling steps
SkyReels V1: The first and most advanced open-source human-centric video foundation model
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
🔥ICLR 2025 (Spotlight) One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
https://wavespeed.ai/ [WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
Official code for VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
A Training-free Iterative Framework for Long Story Visualization
VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
[NeurIPS 2024] Generalizable Implicit Motion Modeling for Video Frame Interpolation