Stars
GoatWu / CausVid-Plus
Forked from tianweiy/CausVidUnofficial extension implementation of CausVid
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Official implementation of ATI: Any Trajectory Instruction for Controllable Video Generation. https://arxiv.org/pdf/2505.22944
This is the official implementation of Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering.
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
Official implementation of TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
XMLGen is a tool for generating native Golang types from XML.
Fast, Flexible and Portable Structured Generation
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Official Pytorch implementation for LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior (ICLR 2025 Oral).
Official code for Self-Supervised Speed of Sound Recovery for Aberration-Corrected Photoacoustic Computed Tomography
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
Lets make video diffusion practical!
Two conversational AI agents switching from English to sound-level protocol after confirming they are both AI agents
HunyuanVideo: A Systematic Framework For Large Video Generation Model
This is the official implementation of Vision-Language-Camera: Introducing Vision Language Models for Unleashing the Power of Camera Manual Mode
This repo contains the code for 1D tokenizer and generator
Official implementation of Continuous 3D Perception Model with Persistent State