-
Fudan University, Shanghai AI Laboratory
- Shanghai
-
07:46
(UTC +08:00)
Stars
[CVPR 2025 Oral] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
The official implementation for "Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos".
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
Illumination Drawing Tools for Text-to-Image Diffusion Models
Codebase for "VLMaterial: Procedural Material Generation with Large Vision-Language Models"
MM-IFEngine: Towards Multimodal Instruction Following
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
Official Implementation for Diffusion Models Without Classifier-free Guidance
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
Official implementation of X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Official repo for "IDArb: Intrinsic Decomposition for arbitrary number of input views and illuminations"
A generative world for general-purpose robotics & embodied AI learning.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official code of "Imagine360: Immersive 360 Video Generation from Perspective Anchor"
[ NeurIPS 2024 D&B Track ] Implementation for "FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models"
Material Anything: Generating Materials for Any 3D Object via Diffusion
[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference ima…