This repository provides an implementation of the paper:
Accelerating Visual-Policy Learning through Parallel Differentiable SimulationHaoxiang You, Yilang Liu and Ian Abraham
paper / project page
If you use this repo in your research, please consider citing the paper as follows:
@misc{you2025acceleratingvisualpolicylearningparallel,
title={Accelerating Visual-Policy Learning through Parallel Differentiable Simulation},
author={Haoxiang You and Yilang Liu and Ian Abraham},
year={2025},
eprint={2505.10646},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.10646},
}
This codebase is built on top of various open-source implementations, which we list at the end of this document.
-
git clone https://github.com/HaoxiangYou/D.VA --recursive
-
In the project folder, create a virtual environment in Anaconda:
conda env create -f dva_conda.yml conda activate dva
-
Build dflex
cd dflex pip install -e .
-
Build pytorch3d (only required for SHAC baseline)
# Use a prebuilt version of PyTorch3D compatible with PyTorch 2.5.1 and CUDA 12.4 pip install pytorch3d==0.7.8+pt2.5.1cu124 --extra-index-url https://miropsota.github.io/torch_packages_builder
A test example can be found in the examples
folder.
python test_env.py --env AntEnv
If the console outputs Finish Successfully
in the last line, the code installation succeeds.
Running the following commands in examples
folder allows to train Hopper with D.VA.
python train_dva.py --cfg ./cfg/dva/hopper.yaml --logdir ./logs/Hopper/dva
Evaluation videos will be saved in a directory following the pattern:logs/Hopper/dva/$DATE/eval
. Evaluation videos are saved every $save_interval
training episodes, as specified in the corresponding YAML configuration file, e.g., cfg/dva/hopper.yaml
.
To run curl
, or state2visDagger
use the following command:
python train_$method.py --cfg ./cfg/$method/hopper.yaml --logdir ./logs/Hopper/$method
where $method is the desired baseline name.
To run drqv2
, using the following command:
python train_drqv2.py task=hopper
To run dreamerv3
, using the following command:
python --configs dflex_vision --task dflex_$env --logdir ./logs/dreamerv3/$env
where $env
is the name of the environment (e.g., hopper, humanoid, etc.).
To run SHAC with differentiable rendering
, using the following command
python train_shac.py --cfg ./cfg/shac/hopper_vis.yaml --logdir ./logs/Hopper/shac
Note, the SHAC baseline requires environments with differentiable rendering, which differs from other methods that use ManiSkill as the default rendering engine.
To run our method (D.VA) under the same differentiable rendering
, simply replace the config file:cfg/dva/hopper.yaml
to cfg/dva/diff_render_hopper.yaml
D.Va is a quasi-analytical policy gradient algorithm for learning image-based continuous control tasks. It extends SHAC to operate directly from pixels by decoupling the rendering process from the gradient computation pipeline.
This decoupling brings several key benefits:-
📉 Reduced memory usage: Jacobians from the rendering process are dropped, significantly lowering memory requirements.
-
⚡ Faster backward pass: Omitting large Jacobian matrices during rendering results in a 2–3× speedup in gradient computation.
-
🎯 Smoother optimization: Gradient norms are better normalized, leading to more stable and efficient training.
-
🔄 No need for external differentiable renderers: D.VA avoids dependence on additional differentiable rendering software.
Although D.VA does not require any additional differentiable rendering software, we also provide optional differentiable rendering modules as part of this project for advanced use cases and experimentation.
Most implementations are located in the viewer/torch3d_robo
directory.
A standalone version of this differentiable rendering library is also available in a separate repository.
We provide example videos demonstrating how D.VA learns to control using only pixel-based observations.
The experiments are conducted on a single RTX 4080 GPU.
Iter 0 (initial policy) |
Iter 400 (4 minutes) |
Iter 8000 (1 hour) |
Iter 17600 (2.5 hours) |
Iter 0 (initial policy) |
Iter 4400 (2 hours) |
Iter 9600 (4 hours) |
Iter 36000 (15 hours) |
We present a comparison of training curves between our method and existing visual-policy learning baselines.
Our approach significantly improves both wall-clock training efficiency and final performance across a range of challenging control tasks.
-
RuntimeError: Error building extension 'kernels' for dflex environment
This is due to cuda compute capability. Change the line 1861 in the adjoint.py
cuda_flags = ['-gencode=arch=compute_86,code=compute_86']
to the one matches your cuda version. For more information, please refer to this issue.
-
Installing pytorch3d
We found the official installation process for PyTorch3D to be challenging. As a workaround, we recommend using a third-party installation approach, as outlined in this discussion thread.
All redistributed code from DiffRL retains its original license.
XML files from dm_control are licensed under the Apache 2.0 License.
All other code in this repository is licensed under the MIT License.
-
Our codebase is built on top of SHAC by Jie Xu (NVIDIA).
-
CURL implementation is based on the original repository by Michael Laskin.
-
DrQv2 implementation is based on the original repository by Denis Yarats (Facebook Research).
-
DreamerV3 implementation is based on the pytorch reimplementation by Naoki Morihira.
-
We refer the pytorch_kinematics , developed by the Autonomous Robotic Manipulation Lab at the University of Michigan, Ann Arbor, to construct the
forward kinematics tree
used in our differentiable rendering pipeline. We have made several modifications to support floating-base systems and multiple joints definition under single link.