-
Wuhan University
- Wuhan, China
- https://hpwang-whu.github.io/
Stars
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Public code release associated with SceneScript.
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
[CVPR 2025] WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
[ICLR 2025 Oral] Official code for "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias"
Aether: Geometric-Aware Unified World Modeling
GGUF Quantization support for native ComfyUI models
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
[CVPR 2025] UniK3D: Universal Camera Monocular 3D Estimation
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Roblox Foundation Model for 3D Intelligence
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
SynCity: Training-Free Generation of 3D Worlds
Understanding R1-Zero-Like Training: A Critical Perspective
SpatialLM: Large Language Model for Spatial Understanding
Solve Visual Understanding with Reinforced VLMs
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer
[ARXIV'25] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Official implementations for paper: VACE: All-in-One Video Creation and Editing