D.VA

This repository provides an implementation of the paper:

Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
Haoxiang You, Yilang Liu and Ian Abraham
paper / project page

If you use this repo in your research, please consider citing the paper as follows:

@misc{you2025acceleratingvisualpolicylearningparallel,
      title={Accelerating Visual-Policy Learning through Parallel Differentiable Simulation}, 
      author={Haoxiang You and Yilang Liu and Ian Abraham},
      year={2025},
      eprint={2505.10646},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.10646}, 
}

This codebase is built on top of various open-source implementations, which we list at the end of this document.

Installation

Prerequisites

git clone https://github.com/HaoxiangYou/D.VA --recursive
In the project folder, create a virtual environment in Anaconda:
```
conda env create -f dva_conda.yml
conda activate dva
```
Build dflex
```
cd dflex
pip install -e .
```

Build pytorch3d (only required for SHAC baseline)

# Use a prebuilt version of PyTorch3D compatible with PyTorch 2.5.1 and CUDA 12.4
pip install pytorch3d==0.7.8+pt2.5.1cu124 --extra-index-url https://miropsota.github.io/torch_packages_builder

Test Example

A test example can be found in the examples folder.

python test_env.py --env AntEnv

If the console outputs Finish Successfully in the last line, the code installation succeeds.

Instruction

Running the following commands in examples folder allows to train Hopper with D.VA.

python train_dva.py --cfg ./cfg/dva/hopper.yaml --logdir ./logs/Hopper/dva

Evaluation videos will be saved in a directory following the pattern:logs/Hopper/dva/$DATE/eval. Evaluation videos are saved every $save_interval training episodes, as specified in the corresponding YAML configuration file, e.g., cfg/dva/hopper.yaml.

To run curl, or state2visDagger use the following command:

python train_$method.py --cfg ./cfg/$method/hopper.yaml --logdir ./logs/Hopper/$method

where $method is the desired baseline name.

To run drqv2, using the following command:

python train_drqv2.py task=hopper

To run dreamerv3, using the following command:

python --configs dflex_vision --task dflex_$env --logdir ./logs/dreamerv3/$env

where $env is the name of the environment (e.g., hopper, humanoid, etc.).

To run SHAC with differentiable rendering, using the following command

python train_shac.py --cfg ./cfg/shac/hopper_vis.yaml  --logdir ./logs/Hopper/shac

Note, the SHAC baseline requires environments with differentiable rendering, which differs from other methods that use ManiSkill as the default rendering engine.

To run our method (D.VA) under the same differentiable rendering, simply replace the config file:cfg/dva/hopper.yaml to cfg/dva/diff_render_hopper.yaml

Methods

D.Va

D.Va is a quasi-analytical policy gradient algorithm for learning image-based continuous control tasks. It extends SHAC to operate directly from pixels by decoupling the rendering process from the gradient computation pipeline.

This decoupling brings several key benefits:

📉 Reduced memory usage: Jacobians from the rendering process are dropped, significantly lowering memory requirements.
⚡ Faster backward pass: Omitting large Jacobian matrices during rendering results in a 2–3× speedup in gradient computation.
🎯 Smoother optimization: Gradient norms are better normalized, leading to more stable and efficient training.
🔄 No need for external differentiable renderers: D.VA avoids dependence on additional differentiable rendering software.

🧩 Differentiable Rendering for Robot Learning

Although D.VA does not require any additional differentiable rendering software, we also provide optional differentiable rendering modules as part of this project for advanced use cases and experimentation. Most implementations are located in the viewer/torch3d_robo directory. A standalone version of this differentiable rendering library is also available in a separate repository.

Results

🎥 Examples of Training

We provide example videos demonstrating how D.VA learns to control using only pixel-based observations.

The experiments are conducted on a single RTX 4080 GPU.

Training Hopper

Iter 0 (initial policy)

Iter 400 (4 minutes)

Iter 8000 (1 hour)

Iter 17600 (2.5 hours)

Training Humanoid

Iter 0 (initial policy)

Iter 4400 (2 hours)

Iter 9600 (4 hours)

Iter 36000 (15 hours)

Comparison to baselines

We present a comparison of training curves between our method and existing visual-policy learning baselines.

Our approach significantly improves both wall-clock training efficiency and final performance across a range of challenging control tasks.

Common issues

RuntimeError: Error building extension 'kernels' for dflex environment

This is due to cuda compute capability. Change the line 1861 in the adjoint.py
```
cuda_flags = ['-gencode=arch=compute_86,code=compute_86']
```
to the one matches your cuda version. For more information, please refer to this issue.
Installing pytorch3d

We found the official installation process for PyTorch3D to be challenging. As a workaround, we recommend using a third-party installation approach, as outlined in this discussion thread.

Licenses

All redistributed code from DiffRL retains its original license.

XML files from dm_control are licensed under the Apache 2.0 License.

All other code in this repository is licensed under the MIT License.

Acknowledgement

Our codebase is built on top of SHAC by Jie Xu (NVIDIA).
CURL implementation is based on the original repository by Michael Laskin.
DrQv2 implementation is based on the original repository by Denis Yarats (Facebook Research).
DreamerV3 implementation is based on the pytorch reimplementation by Naoki Morihira.
We refer the pytorch_kinematics , developed by the Autonomous Robotic Manipulation Lab at the University of Michigan, Ann Arbor, to construct the forward kinematics tree used in our differentiable rendering pipeline. We have made several modifications to support floating-base systems and multiple joints definition under single link.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSES/DiffRL		LICENSES/DiffRL
algorithms		algorithms
dflex		dflex
envs		envs
examples		examples
externals		externals
figures		figures
models		models
utils		utils
viewer		viewer
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
dva_conda.yml		dva_conda.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

D.VA

Installation

Prerequisites

Test Example

Instruction

Methods

D.Va

🧩 Differentiable Rendering for Robot Learning

Results

🎥 Examples of Training

Training Hopper

Training Humanoid

Comparison to baselines

Common issues

Licenses

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

HaoxiangYou/D.VA

Folders and files

Latest commit

History

Repository files navigation

D.VA

Installation

Prerequisites

Test Example

Instruction

Methods

D.Va

🧩 Differentiable Rendering for Robot Learning

Results

🎥 Examples of Training

Training Hopper

Training Humanoid

Comparison to baselines

Common issues

Licenses

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages