WACV 2025
The code for the paper:
Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits
conda create -n taksie python=3.9
conda activate taksie
pip install -r requirements.txt
Run the example inference script to test the subgoal generation:
python example_inference.py
Follow the instructions here to install the CALVIN environment.
Install JAX and jaxrl_m
for the goal-conditioned policy:
# Follow the instructions here to install jax and jaxrl_m:
https://github.com/rail-berkeley/bridge_data_v2
Download the DGBC checkpoint from the provided link. Update the cfg/cfg.yaml
file by replacing the parent directory path with the path to the downloaded checkpoint.
Follow the instructions here to install the LIV framework. Additionally, download the fine-tuned LIV checkpoint for the CALVIN dataset:
- Download the checkpoint: Hugging Face
- Replace the default checkpoint in
~/.liv/resnet50
with the downloaded fine-tuned checkpoint.
Run the evaluation script with the appropriate configuration file:
python evaluate_calvin.py --running_config=cfg/cfg.yaml
Use the sample in example/example_trajectory
. Choose one CALVIN trajectory. Replace with CALVIN task D to train on the whole data.
mkdir -p cache_features/clip
mkdir cache_features/r3m
python scripts/generate_clip_feature.py --dataset_path example/example_trajectory --output_dir_path cache_features/clip
python scripts/generate_r3m_feature.py --dataset_path example/example_trajectory --output_dir_path cache_features/r3m
python scripts/select_keyframe.py --r3m_feature_dir cache_features/r3m --lang_annotations_path example/example_trajectory/lang_annotations/auto_lang_ann.npy --data_path example/example_trajectory --output_dir .
Saved to output_dir/selected_keyframes.npy
python scripts/generate_keyframe_segment.py --input_npy_path selected_keyframes.npy --output_npy_path keyframe_segment.npy
Use accelerate for multi-GPU training. Adjust batch_size
and gradient_accumulation_steps
based on GPU memory.
I set validation_step
=100 to monitor training, use a larger value for faster runs.
bash train_taksie.bash
check wandb for training logs.
Note on Training ControlNet:
Training an unconditional diffusion model can help ControlNet converge faster and achieve better results. You can use the original script from the official Diffuers. And then use the following dataset by setting dataset_name
: ShuaKang/calvin_d.
@INPROCEEDINGS{10943942,
author={Kang, Xuhui and Kuo, Yen-Ling},
booktitle={2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
title={Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits},
year={2025},
volume={},
number={},
pages={7490-7499},
doi={10.1109/WACV61041.2025.00728}}
Thank you for these excellent works!