GitHub - heiheishuang/MuDG: official code of "MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction"

MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li,
Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang

🔆 News

🔥🔥 (2025.03) Check out our other latest works on generative world models: UniScene, DiST-4D, HERMES.

🔥🔥 (2025.03) The data processing code is released!

🔥🔥 (2025.03) The training and inference code of Multi-modal Diffusion is available NOW!!!

🔥🔥 (2025.03) Paper in on arXiv: MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction

📝 TODO List

Release data processing code.
Release the pretrained model.
Release training / inference code.

👀 Abstract

Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial per 8000 formance deterioration under significant viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization, providing comprehensive supervision signals to refine 3DGS representations for rendering robustness enhancement under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.

🧰 Models

Model	Resolution	Checkpoint
MDM1024	576x1024	Hugging Face
MDM512	320x512	Hugging Face

⚙️ Setup

Install Environment via Anaconda (Recommended)

conda create -n mudg python=3.8.5
conda activate mudg
pip install -r requirements.txt

💫 Inference for Novel View Viewpoint

1. Sparse Conditional Generation

We project the fused point clouds onto novel viewpoints to generate sparse color and depth maps.

Note: The detailed data processing steps can be found in the Data Processing section.

For your convenience, we have also provided pre-processed data. You can access it via this link.

2. Generate item list

python virtual_render/generate_virtual_item.py

3. Multi-modal Diffusion

Download pretrained models, and put the model.ckpt with the required resolution in checkpoints/[1024|512]_mdm/[1024|512]-mdm-checkpoint.ckpt.
Run the commands based on your devices and needs in terminal.

  sh virtual_render/scripts/render.sh 15365

15365 is the item id, and you can change it to any item id following the item list.

💥 Training

Novel View Generation

Process the data and generate the item list.
Generate the train data list:

python data/create_data_infos.py

Download the pretrained model DynamiCrafter512 and put the model.ckpt in checkpoints/512_mdm/512-mdm-checkpoint.ckpt.
We train the 320 * 512 model with the following command:

  sh configs/stage1-512_mdm_waymo/run-512.sh

Then we use the following command to train the 576 * 1024 model:

  sh configs/stage2-1024_mdm_waymo/run-1024.sh

📜 License

This repository is released under the Apache 2.0 license.

😉 Citation

Please consider citing our paper if our code are useful:

@article{zou2025mudg,
  title={MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction},
  author={Zou, Yingshuang and Ding, Yikang and Zhang, Chuanrui and Guo, Jiazhe and Li, Bohan and Lyu, Xiaoyang and Tan, Feiyang and Qi, Xiaojuan and Wang, Haoqian},
  journal={arXiv preprint arXiv:2503.10604},
  year={2025}
}

🙏 Acknowledgements

We would like to thank the contributors of the following repositories for their valuable contributions to the community:

Name	Name	Last commit date
Latest commit History 6 Commits
assets	assets
configs	configs	< 8000 /td>
data	data
data_process	data_process
lvdm	lvdm
main	main
utils	utils
virtual_render	virtual_render
.gitignore	.gitignore
.gitmodules	.gitmodules
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔆 News

📝 TODO List

👀 Abstract

🧰 Models

⚙️ Setup

Install Environment via Anaconda (Recommended)

💫 Inference for Novel View Viewpoint

1. Sparse Conditional Generation

2. Generate item list

3. Multi-modal Diffusion

💥 Training

Novel View Generation

📜 License

😉 Citation

🙏 Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

heiheishuang/MuDG

Folders and files

Latest commit

History

Repository files navigation

🔆 News

📝 TODO List

👀 Abstract

🧰 Models

⚙️ Setup

Install Environment via Anaconda (Recommended)

💫 Inference for Novel View Viewpoint

1. Sparse Conditional Generation

2. Generate item list

3. Multi-modal Diffusion

💥 Training

Novel View Generation

📜 License

😉 Citation

🙏 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages