8000 GitHub - heiheishuang/MuDG: official code of "MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

official code of "MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction"

License

Notifications You must be signed in to change notification settings

heiheishuang/MuDG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
< 8000 /td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

   

Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li,
Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang


🔆 News

🔥🔥 (2025.03) Check out our other latest works on generative world models: UniScene, DiST-4D, HERMES.

🔥🔥 (2025.03) The data processing code is released!

🔥🔥 (2025.03) The training and inference code of Multi-modal Diffusion is available NOW!!!

🔥🔥 (2025.03) Paper in on arXiv: MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

📝 TODO List

  • Release data processing code.
  • Release the pretrained model.
  • Release training / inference code.

👀 Abstract

Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial per 8000 formance deterioration under significant viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization, providing comprehensive supervision signals to refine 3DGS representations for rendering robustness enhancement under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.

🧰 Models

Model Resolution Checkpoint
MDM1024 576x1024 Hugging Face
MDM512 320x512 Hugging Face

⚙️ Setup

Install Environment via Anaconda (Recommended)

conda create -n mudg python=3.8.5
conda activate mudg
pip install -r requirements.txt

💫 Inference for Novel View Viewpoint

1. Sparse Conditional Generation

We project the fused point clouds onto novel viewpoints to generate sparse color and depth maps.

Note: The detailed data processing steps can be found in the Data Processing section.

For your convenience, we have also provided pre-processed data. You can access it via this link.

2. Generate item list

python virtual_render/generate_virtual_item.py

3. Multi-modal Diffusion

  1. Download pretrained models, and put the model.ckpt with the required resolution in checkpoints/[1024|512]_mdm/[1024|512]-mdm-checkpoint.ckpt.
  2. Run the commands based on your devices and needs in terminal.
  sh virtual_render/scripts/render.sh 15365

15365 is the item id, and you can change it to any item id following the item list.

💥 Training

Novel View Generation

  1. Process the data and generate the item list.
  2. Generate the train data list:
python data/create_data_infos.py
  1. Download the pretrained model DynamiCrafter512 and put the model.ckpt in checkpoints/512_mdm/512-mdm-checkpoint.ckpt.
  2. We train the 320 * 512 model with the following command:
  sh configs/stage1-512_mdm_waymo/run-512.sh
  1. Then we use the following command to train the 576 * 1024 model:
  sh configs/stage2-1024_mdm_waymo/run-1024.sh

📜 License

This repository is released under the Apache 2.0 license.

😉 Citation

Please consider citing our paper if our code are useful:

@article{zou2025mudg,
  title={MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction},
  author={Zou, Yingshuang and Ding, Yikang and Zhang, Chuanrui and Guo, Jiazhe and Li, Bohan and Lyu, Xiaoyang and Tan, Feiyang and Qi, Xiaojuan and Wang, Haoqian},
  journal={arXiv preprint arXiv:2503.10604},
  year={2025}
}

🙏 Acknowledgements

We would like to thank the contributors of the following repositories for their valuable contributions to the community:

About

official code of "MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0