-
National University of Singapore
- https://yujun-shi.github.io/
Stars
[CVPR 2025 Highlight🔥] Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
Experiencing lightning fast (~1s) and accurate drag-based image editing
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Official repository for "CFG++: manifold-constrained classifier free guidance for diffusion models" (ICLR2025)
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[ICML 2024] LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Easily train a good VC model with voice data <= 10 mins!
[ICLR 2025] 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
MambaOut: Do We Really Need Mamba for Vision? (CVPR 2025)
[ICRA 2024] ASAP: Automated Sequence Planning for Complex Robotic Assembly with Physical Feasibility
[TPAMI 2025🔥] MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
CoTracker is a model for tracking any point (pixel) on a video.
When do we not need larger vision models?
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
[CVPR2024, Highlight] Official code for DragDiffusion
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024