-
Shanghaitech
- shanghai
- https://ymzhong66.github.io
Stars
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
A generative world for general-purpose robotics & embodied AI learning.
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge
[ICML 2025] Official code for the paper 'DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space'
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
[IROS 2025] Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies
[RSS25] Official implementation of DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning
Official PyTorch Implementation of Unified Video Action Model (RSS 2025)
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Frequency Autoregressive Image Generation with Continuous Tokens
Official implement for SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
[IROS 2025] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[ICLR 2024] DiffTactile: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
A curated list of Diffusion Model in RL resources (continually updated)
A paper list of my history reading. Robotics, Learning, Vision.