-
Stanford University
- https://cs.stanford.edu/~zhzheng
- @ZhuoZheng2
Lists (3)
Sort Name ascending (A-Z)
Stars
Interactive visualizations of the geometric intuition behind diffusion models.
MAGI-1: Autoregressive Video Generation at Scale
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
[IEEE TIP 2025] Multi-Axis Feature Diversity Enhancement for Remote Sensing Video Super-Resolution
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
Official implementation of OneDiffusion paper (CVPR 2025)
Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning (Best open-source multimodal reasoning model)
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
This repository contains the "superproject" wrapper for the "Classic" configuration of the GEOS-Chem model of atmospheric chemistry and composition.
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Diffusion Model-Based Image Editing: A Survey (TPAMI 2025)
Janus-Series: Unified Multimodal Understanding and Generation Models
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
Wan: Open and Advanced Large-Scale Video Generative Models
Official Repo for Open-Reasoner-Zero
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Orthogonalize polygon in python by making all its angles 90 or 180 deg
Fully open reproduction of DeepSeek-R1
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding cap…
Official implementation of the WACV 2025 paper "Pix2Poly: A Sequence Prediction Method for End-to-end Polygonal Building Footprint Extraction from Remote Sensing Imagery".
[IEEE GRSS DFC 2025 Track II] BRIGHT: A globally distributed multimodal VHR dataset for all-weather disaster response
[ICLR 2025] Official Implementation of Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.