Subtask-Aware Visual Reward Learning
from Segmented Demonstrations

Changyeon Kim¹ · Minho Heo¹ · Doohyun Lee¹ ·
Honglak Lee^2,3 · Jinwoo Shin¹ · Joseph J. Lim^1* · Kimin Lee^1*
¹ KAIST ²University of Michigan ³LG AI Research
^*Equal Advising

[project page] [arXiv]

Summary: We propose REDS: Reward learning from Demonstration with Segmentations, a new reward learning framework that leverages action-free videos with minimal supervision by treating segmented video demonstrations as ground-truth rewards.

1. Install dependencies

Create miniconda environment

conda create -n reds python=3.10
conda activate reds

Install reward learning repository

../
git clone https://github.com/csmile-1006/REDS_reward_learning.git
cd REDS_reward_learning
pip install -r requirements.txt
pip install -e .

Install dependencies

cd ../REDS_agent
pip install -r requirements.txt
pip install pre-commit
pre-commit install

2. Downloading Data

Meta-world

Download Meta-world expert dataset with the following link.

RLBench

First, install RLBench by following the instructions in the repository.
Next, download expert dataset with the following link

Make the folder structure as follows:

{BASE_PATH}
├── {TASK_TYPE:=metaworld or rlbench}_data
├── REDS_agent
└── REDS_reward_learning

3. Reward Model Training

Use the following command to first train REDS reward model:

python scripts/train_reds_metaworld.sh {TASK_NAME} {ITERATION} {DEVICE} {REWARD_TRAINING_STEPS} {DREAMER_TRAINING_STEPS} {NUM_DEMOS} {NUM_FAILURE_DEMOS} {BASE_PATH}

python scripts/train_reds_rlbench.sh {TASK_NAME} {ITERATION} {DEVICE} {REWARD_TRAINING_STEPS} {DREAMER_TRAINING_STEPS} {NUM_DEMOS} {NUM_FAILURE_DEMOS} {BASE_PATH}

# Default parameters:
# ITERATION=2
# REWARD_TRAINING_STEPS=3000
# DREAMER_TRAINING_STEPS=100000
# NUM_DEMOS=50 (for metaworld) or 100 (for rlbench)
# NUM_FAILURE_DEMOS=50 (for metaworld) or 100 (for rlbench)

After reward model training, the folder structure is as follows:

{BASE_PATH}
├── {TASK_TYPE:=metaworld or rlbench}_data
├── REDS_agent
├── REDS_reward_learning
├── pretrain_dreamerv3
└── reds_logdir

4. Policy Training

Meta-world

DEVICE_ID=0 TASK_NAME=door-open SEED=0 && XLA_PYTHON_CLIENT_PREALLOCATE=false LD_PRELOAD="" CUDA_VISIBLE_DEVICES=${DEVICE_ID} python scripts/train_dreamer.py \
    --configs=reds_prior_rb metaworld \
    --reward_model_path=${BASE_PATH}/reds_logdir/REDS/metaworld-${TASK_NAME}/${TASK_NAME}-phase2/s0/last_model.pkl \
    --logdir=${BASE_PATH}/exp_local/${TASK_NAME}_reds_seed${SEED} \
    --task=metaworld_${TASK_NAME} \
    --env.metaworld.reward_type=sparse \
    --seed=${SEED}

RLBench

DEVICE_ID=0 TASK_NAME=take_umbrella_out_of_umbrella_stand SEED=0 && XLA_PYTHON_CLIENT_PREALLOCATE=false DISPLAY=:0.${DEVICE_ID} CUDA_VISIBLE_DEVICES=${DEVICE_ID} python scripts/train_dreamer.py \
  --configs=reds_prior_rb rlbench \
  --reward_model_path=${BASE_PATH}/reds_logdir/REDS/rlbench-${TASK_NAME}/${TASK_NAME}-phase2/s0/last_model.pkl \
  --logdir=${BASE_PATH}/exp_local/${TASK_NAME}_reds_seed${SEED} \
  --task=rlbench_${TASK_NAME} \
  --env.rlbench.reward_type=sparse \
  --seed=${SEED} \
  --num_demos=0 \
  --env.rlbench.actions_min_max_path=${BASE_PATH}/rlbench_data/ \
  --reward_model_scale=0.005

BibTeX

@inproceedings{kim2025subtask,
  title={Subtask-Aware Visual Reward Learning from Segmented Demonstrations},
  author={Kim, Changyeon and Heo, Minho and Lee, Doohyun and Shin, Jinwoo and Lee, Honglak and Lim, Joseph J. and Lee, Kimin},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025},
}

Acknowledgments

Our code is based on the implementation of VIPER.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figures		figures
reds_rl		reds_rl
reds_rl_data		reds_rl_data
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.ruff.toml		.ruff.toml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Subtask-Aware Visual Reward Learning
from Segmented Demonstrations

[project page] [arXiv]

1. Install dependencies

Create miniconda environment

Install reward learning repository

Install dependencies

2. Downloading Data

Meta-world

RLBench

3. Reward Model Training

4. Policy Training

Meta-world

RLBench

BibTeX

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

csmile-1006/REDS_agent

Folders and files

Latest commit

History

Repository files navigation

Subtask-Aware Visual Reward Learningfrom Segmented Demonstrations

[project page] [arXiv]

1. Install dependencies

Create miniconda environment

Install reward learning repository

Install dependencies

2. Downloading Data

Meta-world

RLBench

3. Reward Model Training

4. Policy Training

Meta-world

RLBench

BibTeX

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Subtask-Aware Visual Reward Learning
from Segmented Demonstrations

Packages