Honglak Lee2,3 · Jinwoo Shin1 · Joseph J. Lim1* · Kimin Lee1*
1 KAIST 2University of Michigan 3LG AI Research
*Equal Advising
[project page] [arXiv]
Summary: We propose REDS: Reward learning from Demonstration with Segmentations, a new reward learning framework that leverages action-free videos with minimal supervision by treating segmented video demonstrations as ground-truth rewards.
conda create -n reds python=3.10
conda activate reds
../
git clone https://github.com/csmile-1006/REDS_reward_learning.git
cd REDS_reward_learning
pip install -r requirements.txt
pip install -e .
cd ../REDS_agent
pip install -r requirements.txt
pip install pre-commit
pre-commit install
Download Meta-world expert dataset with the following link.
First, install RLBench by following the instructions in the repository.
Next, download expert dataset with the following link
Make the folder structure as follows:
{BASE_PATH}
├── {TASK_TYPE:=metaworld or rlbench}_data
├── REDS_agent
└── REDS_reward_learning
Use the following command to first train REDS reward model:
python scripts/train_reds_metaworld.sh {TASK_NAME} {ITERATION} {DEVICE} {REWARD_TRAINING_STEPS} {DREAMER_TRAINING_STEPS} {NUM_DEMOS} {NUM_FAILURE_DEMOS} {BASE_PATH}
python scripts/train_reds_rlbench.sh {TASK_NAME} {ITERATION} {DEVICE} {REWARD_TRAINING_STEPS} {DREAMER_TRAINING_STEPS} {NUM_DEMOS} {NUM_FAILURE_DEMOS} {BASE_PATH}
# Default parameters:
# ITERATION=2
# REWARD_TRAINING_STEPS=3000
# DREAMER_TRAINING_STEPS=100000
# NUM_DEMOS=50 (for metaworld) or 100 (for rlbench)
# NUM_FAILURE_DEMOS=50 (for metaworld) or 100 (for rlbench)
After reward model training, the folder structure is as follows:
{BASE_PATH}
├── {TASK_TYPE:=metaworld or rlbench}_data
├── REDS_agent
├── REDS_reward_learning
├── pretrain_dreamerv3
└── reds_logdir
DEVICE_ID=0 TASK_NAME=door-open SEED=0 && XLA_PYTHON_CLIENT_PREALLOCATE=false LD_PRELOAD="" CUDA_VISIBLE_DEVICES=${DEVICE_ID} python scripts/train_dreamer.py \
--configs=reds_prior_rb metaworld \
--reward_model_path=${BASE_PATH}/reds_logdir/REDS/metaworld-${TASK_NAME}/${TASK_NAME}-phase2/s0/last_model.pkl \
--logdir=${BASE_PATH}/exp_local/${TASK_NAME}_reds_seed${SEED} \
--task=metaworld_${TASK_NAME} \
--env.metaworld.reward_type=sparse \
--seed=${SEED}
DEVICE_ID=0 TASK_NAME=take_umbrella_out_of_umbrella_stand SEED=0 && XLA_PYTHON_CLIENT_PREALLOCATE=false DISPLAY=:0.${DEVICE_ID} CUDA_VISIBLE_DEVICES=${DEVICE_ID} python scripts/train_dreamer.py \
--configs=reds_prior_rb rlbench \
--reward_model_path=${BASE_PATH}/reds_logdir/REDS/rlbench-${TASK_NAME}/${TASK_NAME}-phase2/s0/last_model.pkl \
--logdir=${BASE_PATH}/exp_local/${TASK_NAME}_reds_seed${SEED} \
--task=rlbench_${TASK_NAME} \
--env.rlbench.reward_type=sparse \
--seed=${SEED} \
--num_demos=0 \
--env.rlbench.actions_min_max_path=${BASE_PATH}/rlbench_data/ \
--reward_model_scale=0.005
@inproceedings{kim2025subtask,
title={Subtask-Aware Visual Reward Learning from Segmented Demonstrations},
author={Kim, Changyeon and Heo, Minho and Lee, Doohyun and Shin, Jinwoo and Lee, Honglak and Lim, Joseph J. and Lee, Kimin},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025},
}
Our code is based on the implementation of VIPER.