8000 gitlabspy (hanban) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View gitlabspy's full-sized avatar
🌴
On vacation
🌴
On vacation

Block or report gitlabspy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 2,125 184 Updated Jun 6, 2025

🚀 SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

20 Updated Jun 3, 2025

Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 361 23 Updated Jun 18, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 1,614 180 Updated May 8, 2025

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Jupyter Notebook 523 18 Updated May 24, 2024
Python 303 13 Updated Jun 12, 2025

We introduce DI*-SDX-1step Model, which is a leading human-preferred 1-step text-to-image model of 1024 resolution.

Jupyter Notebook 18 Updated Jun 6, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,274 275 Updated Jun 4, 2025

GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Python 77 2 Updated May 29, 2025

PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation

Python 33 Updated Oct 28, 2024

[CVPR 2025] Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Python 108 3 Updated May 16, 2025
Python 1,207 46 Updated Jun 19, 2025

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 758 27 Updated Jun 16, 2025
Python 90 7 Updated Nov 27, 2024

DreamO: A Unified Framework for Image Customization

Python 1,535 115 Updated May 30, 2025
Python 219 11 Updated May 27, 2025
Python 113 15 Updated Jun 18, 2025

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 1,424 60 Updated Jun 18, 2025

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!

Python 113 3 Updated Mar 4, 2025

Lets make video diffusion practical!

Python 14,526 1,297 Updated May 4, 2025

[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 1,120 52 Updated Mar 16, 2025

Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"

Python 370 20 Updated Apr 22, 2025

Official repository of In-Context LoRA for Diffusion Transformers

1,917 91 Updated Dec 20, 2024

A minimal and universal controller for FLUX.1.

Python 1,639 115 Updated Jun 6, 2025

LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation

36 2 Updated Mar 3, 2025

[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inputs, making it easy to integrate both visual understanding an…

Python 86 4 Updated Apr 23, 2025

Pixel-Space Generative Models

Python 249 14 Updated May 11, 2025

Paper list: deep learning based video compression

219 19 Updated Feb 1, 2025
Next
0