Lists (1)
Sort Name ascending (A-Z)
Stars
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
GenZI: Zero-Shot 3D Human-Scene Interaction Generation (CVPR 2024)
MikuDance: Animating Character Art with Mixed Motion Dynamics
Wan: Open and Advanced Large-Scale Video Generative Models
The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."
A generative world for general-purpose robotics & embodied AI learning.
Pandora: Towards General World Model with Natural Language Actions and Video States
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
🌍 A Collection of Awesome Large Weather Models (LWMs) | AI for Earth (AI4Earth) | AI for Science (AI4Science)
[CVPR 2025] A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
[CVPR'25]Tora: Trajectory-oriented Diffusion Transformer for Video Generation
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance. [CVPR 2024] Official PyTorch implementation
High-resolution models for human tasks.
Course: Diffusion Generative AI for Computer Vision and Science
[AAAI 2025] Dynamic Protein Data Bank
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
CoTracker is a model for tracking any point (pixel) on a video.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
A little word cloud generator in Python
GPT4V-level open-source multi-modal model based on Llama3-8B