Starred repositories
Official repository of In-Context LoRA for Diffusion Transformers
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
React Native's Animated library reimplemented
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
AI语义搜索本地素材。以图搜图、查找本地素材、根据文字描述匹配画面、视频帧搜索、根据画面描述搜索视频。Semantic search. Search local photos and videos through natural language.
"VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"
Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Your AI Operator for Web, Android, Automation & Testing.
The swiss army knife of lossless video/audio editing
A python tool that uses GPT-4, FFmpeg, and OpenCV to automatically analyze videos, extract the most interesting sections, and crop them for an improved viewing experience.
🚀🎬 ShortGPT - Experimental AI framework for youtube shorts / tiktok channel automation
Auto-Editor: Efficient media analysis and rendering
Command line utility for forced alignment using Kaldi
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
🔥 InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"
[TIP 2025] CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models 🔥
A web-based Video Editing SDK built on WebCodecs. 基于 WebCodecs 构建的网页视频编辑 SDK。
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
PyTorch code and models for the DINOv2 self-supervised learning method.