- BeiJing
-
14:22
(UTC -12:00)
Stars
Frontier Multimodal Foundation Models for Image and Video Understanding
可循环值守和多人录制的直播录制软件,支持抖音、TikTok、Youtube、快手、虎牙、斗鱼、B站、小红书、pandatv、sooplive、flextv、popkontv、twitcasting、winktv、百度、微博、酷狗、17Live、Twitch、Acfun、CHZZK、shopee等40+平台直播录制
Record some basic training on the stable diffusion series, including Lora, Controlnet, IP-adapter, and a bit of fun AIGC play!
MAGI-1: Autoregressive Video Generation at Scale
📌 [Arxiv2025] Official implementation of "NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representation"
A suite of image and video neural tokenizers
Janus-Series: Unified Multimodal Understanding and Generation Models
High-Fidelity Lip-Syncing with Wav2Lip and Real-ESRGAN
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
easy_clash_tool是一个clash的python库,可以很便捷的自动切换可用节点
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR 2025 Highlight🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[CVPR 2025🔥] Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
6BCDPython based web automation tool. Powerful and elegant.
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
[ICLR 2025] Pyramidal Flow Matching for Efficient Video Generative Modeling