Stars
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs
Python module which allows you to specify timeouts when calling any existing function, and support for stoppable threads
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
ImageBind One Embedding Space to Bind Them All
[ICLR2025] Halton Scheduler for Masked Generative Image Transformer
Official Code for Stable Cascade
Implementation of MagViT2 Tokenizer in Pytorch
Text-to-image samples collected for the evaluation of DALL-E 3 in the whitepaper.
[Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…
Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time (CVPR2023)
More practical frame interpolation approach.
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
Making large AI models cheaper, faster and more accessible
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models
Web UI for Stable Diffusion prompt generation via GPT-2 trained model
GRiT: A Generative Region-to-text Transformer for Object Understanding (ECCV2024)
Create amazing Stable Diffusion prompts with minimal prompt knowledge. A vicuna based prompt engineering tool for stable diffusion
Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset
MagicEdit: High-Fidelity Temporally Coherent Video Editing
iCartoonFace dataset, and baseline approaches, the project is supported by iQIYI
📷 EasyPhoto | Your Smart AI Photo Generator.