SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Webpage: https://serl-robot.github.io/

SERL provides a set of libraries, env wrappers, and examples to train RL policies for robotic manipulation tasks. The following sections describe how to use SERL. We will illustrate the usage with examples.

Table of Contents

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Installation

Setup Conda Environment: create an environment with
```
conda create -n serl python=3.10
```

Install Jax as follows:

For CPU (not recommended):
```
pip install --upgrade "jax[cpu]"
```

For GPU: (change cuda12 to cuda11 if you are using older driver versions)

pip install --upgrade "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

For TPU

pip install --upgrade "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

See the Jax Github page for more details on installing Jax.

Install the serl_launcher

cd serl_launcher
pip install -e .
pip install -r requirements.txt

Install Franka Sim library (Optional)

cd franka_sim
pip install -e .
pip install -r requirements.txt

Try if franka_sim is running via python franka_sim/franka_sim/test/test_gym_env_human.py. Checkout quick start with franka arm in sim for more details.

Overview and Code Structure

SERL provides a set of common libraries for users to train RL policies for robotic manipulation tasks. The main structure of running the RL experiments involves having an actor node and a learner node, both of which interact with the robot gym environment. Both nodes run asynchronously, with data being sent from the actor to the learner node via the network using agentlace. The learner will periodically synchronize the policy with the actor. This design provides flexibility for parallel training and inference.

Table for code structure

Code Directory	Description
serl_launcher	Main code for SERL
serl_launcher.agents	Agent Policies (e.g. DRQ, SAC, BC)
serl_launcher.wrappers	Gym env wrappers
serl_launcher.data	Replay buffer and data store
serl_launcher.vision	Vision related models and utils
franka_sim	Franka mujoco simulation gym environment
serl_robot_infra	Robot infra for running with real robots
serl_robot_infra.robot_servers	Flask server for sending commands to robot via ROS
serl_robot_infra.franka_env	Gym env for real franka robot

Quick Start with Franka Arm in Sim

Before beginning, please make sure that the simulation environment with franka_sim is working.

Note: to set MUJOCO_GL as egl if you are doing off-screen rendering. You can do so by export MUJOCO_GL=egl and remember to set the rendering argument to False in the script. If receives Cannot initialize a EGL device display due to GLIBCXX not found error, try run conda install -c conda-forge libstdcxx-ng (ref)

1. Training from state observation example

One-liner launcher (requires tmux, sudo apt install tmux):):

bash examples/async_sac_state_sim/tmux_launch.sh

To kill the tmux session, run tmux kill-session -t serl_session.

Click to show detailed commands

cd examples/async_sac_state_sim

Run learner node:

bash run_learner.sh

Run actor node with rendering window:

# add --ip x.x.x.x if running on a di
BE56
fferent machine
bash run_actor.sh

You can optionally launch the learner and actor on separate machines. For example, if the learner node is running on a PC with ip=x.x.x.x, you can launch the actor node on a different machine with internet access to ip=x.x.x.x and add --ip x.x.x. to the commands in run_actor.sh.

Remove --debug flag in run_learner.sh to upload training stats to wandb.

2. Training from image observation example

One-liner launcher (requires tmux, sudo apt install tmux):

bash examples/async_drq_sim/tmux_launch.sh

Click to show detailed commands

cd examples/async_drq_sim

# to use pre-trained ResNet weights, please download
wget https://github.com/rail-berkeley/serl/releases/download/resnet10/resnet10_params.pkl

Run learner node:

bash run_learner.sh

Run actor node with rendering window:

# add --ip x.x.x.x if running on a different machine
bash run_actor.sh

3. Training from image observation with 20 demo trajectories example

One-liner launcher (requires tmux):

bash examples/async_rlpd_drq_sim/tmux_launch.sh

Click to show detailed commands

cd examples/async_rlpd_drq_sim

# to use pre-trained ResNet weights, please download
# note manual download is only for now, once repo is public, auto download will work
wget https://github.com/rail-berkeley/serl/releases/download/resnet10/resnet10_params.pkl

# download 20 demo trajectories
wget \
https://github.com/rail-berkeley/serl/releases/download/franka_sim_lift_cube_demos/franka_lift_cube_image_20_trajs.pkl

Run learner node:

bash run_learner.sh

Run actor node with rendering window:

# add --ip x.x.x.x if running on a different machine
bash run_actor.sh

Run with Franka Arm on Real Robot

We demonstrate how to use SERL with real robot manipulators with 4 different tasks. Namely: Peg Insertion, PCB Component Insertion, Cable Routing, and Object Relocation. We provide detailed instruction on how to reproduce the Peg Insertion task as a setup test for the entire SERL package.

When running with a real robot, a separate gym env is needed. For our examples, we isolated the gym env as a client to a robot server. The robot server is a Flask server that sends commands to the robot via ROS. The gym env communicates with the robot server via post requests.

This requires the installation of the following packages:

serl_franka_controller
serl_robot_infra: readme

Follow the README in serl_robot_infra for basic robot operation instructions.

*NOTE: The following code will not run as it is, since it will require custom data, checkpoints, and robot env. We provide the code as a reference for how to use SERL with real robots. Learn this section in incremental order, starting from the first task (peg insertion) to the last task (bin relocation). Modify the code according to your needs. *

1. Peg Insertion 📍

Example is located in examples/async_peg_insert_drq/

Env and default config are located in serl_robot_infra/franka_env/envs/peg_env/

The franka_env.envs.wrappers.SpacemouseIntervention gym wrapper provides the ability to intervene the robot with a spacemouse

The peg insertion task is best for getting started with running SERL on a real robot. As the policy should converge and achieve 100% success rate within 30 minutes on a single GPU in the simplest case, this task is great for trouble-shooting the setup quickly. The procedure below assumes you have a Franka arm with a Robotiq Hand-E gripper and 2 RealSense D405 cameras.

Procedure

3D-print (1) Assembly Object of choice and (1) corresponding Assembly Board from the Single-Object Manipulation Objects section of FMB. Fix the board to the workspace and grasp the peg with the gripper.
3D-print (2) wrist camera mounts for the RealSense D405 and install onto the threads on the Robotiq Gripper. Update the camera serial numbers in REALSENSE_CAMERAS located in peg_env/config.py.
The reward is given by checking the end-effector pose matches a fixed target pose. Manually move the arm into a pose where the peg is inserted into the board and update the TARGET_POSE in peg_env/config.py with the measured end-effector pose.
Set RANDOM_RESET to False inside the config file to speedup training. Note the policy would only generalize to any board pose when this is set to True, but only try this after the basic task works.
Record 20 demo trajectories with the spacemouse.
```
python record_demo.py
```
The trajectories are saved in examples/async_peg_insert_drq/peg_insertion_20_trajs_{UUID}.pkl.
Train the RL agent with the collected demos by running both learner and actor nodes.
```
bash run_learner.sh
bash run_actor.sh
```
If nothing went wrong, the policy should converge with 100% success rate within 30 minutes without RANDOM_RESET and 60 minutes with RANDOM_RESET.
The checkpoints are automatically saved and can be evaluated with:
```
bash run_actor.sh
```
If the policy is trained with RANDOM_RESET, it should be able to insert the peg even when you move the board at test time.

2. PCB Component Insertion 🖥️

Example is located in examples/async_pcb_insert_drq/

Env and default config are located in serl_robot_infra/franka_env/envs/pcb_env/

Similar to peg insertion, here we record demo trajectories with the robot, then run the learner and actor nodes.

# record demo trajectories
python record_demo.py

# run learner and actor nodes
bash run_learner.sh
bash run_actor.sh

A baseline of using BC as policy is also provided. To train BC, simply run the following command:

python3 examples/bc_policy.py ....TODO_ADD_ARGS.....

To run the BC policy, simply run the following command:

bash run_bc.sh

3. Cable Routing 🔌

Example is located in examples/async_cable_routing_drq/

Env and default config are located in serl_robot_infra/franka_env/envs/cable_env/

In this cable routing task, we provided an example of a reward classifier. This replaced the hardcoded reward classifier which depends on the known TARGET_POSE defined in the config.py. The reward classifier is an image-based classifier (pretrained ResNet), which is trained to classify whether the cable is routed successfully or not. The reward classifier is trained with demo trajectories of successful and failed samples.

# NOTE: custom paths are used in this script
python train_reward_classifier.py

The reward classifier is used as a gym wrapper franka_env.envs.wrapper.BinaryRewardClassifier. The wrapper classifies the current observation and returns a reward of 1 if the observation is classified as successful, and 0 otherwise.

The reward classifier is then used in the BC policy and DRQ policy for the actor node, the path is provided as --reward_classifier_ckpt_path argument in run_bc.sh and run_actor.sh

4. Object Relocation 🗑️

Example is located in examples/async_bin_relocation_fwbw_drq/

Env and default config are located in serl_robot_infra/franka_env/envs/bin_env/

This bin relocation example demonstrates the usage of forward and backward policies. This is helpful for RL tasks, which require the robot to "reset". In this case, the robot is moving an object from one bin to another. The forward policy is used to move the object from the right bin to the left bin, and the backward policy is used to move the object from the left bin to the right bin.

Record demo trajectories

Multiple utility scripts have been provided to record demo trajectories. (e.g. record_demo.py: for RLPD, record_transitions.py: for reward classifier, reward_bc_demos.py: for bc policy). Note that both forward and backward trajectories require different demo trajectories.

Reward Classifier

Similar to the cable routing example, we need to train two reward classifiers for both forward and backward policies, shown in train_fwd_reward_classifier.sh and train_bwd_reward_classifier.sh. The reward classifiers are then used in the BC and DRQ policy for the actor node, checkpoint path is provided as --reward_classifier_ckpt_path argument in run_bc.sh and run_actor.sh.

Run 2 learners and 1 actor with 2 policies

Finally, 2 learner nodes will learn both forward and backward policies respectively. The actor node will switch between running the forward and backward policies with their respective reward classifiers during the RL training process.

bash run_actor.sh

# run 2 learners
bash run_fw_learner.sh
bash run_bw_learner.sh

Contribution

We welcome contributions to this repository! Fork and submit a PR if you have any improvements to the codebase. Before submitting a PR, please run pre-commit run --all-files to ensure that the codebase is formatted correctly.

Citation

If you use this code for your research, please cite our paper:

@misc{luo2024serl,
      title={SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning},
      author={Jianlan Luo and Zheyuan Hu and Charles Xu and You Liang Tan and Jacob Berg and Archit Sharma and Stefan Schaal and Chelsea Finn and Abhishek Gupta and Sergey Levine},
      year={2024},
      eprint={2401.16013},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
franka_sim		franka_sim
serl_launcher		serl_launcher
serl_robot_infra		serl_robot_infra
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Installation

Overview and Code Structure

Quick Start with Franka Arm in Sim

1. Training from state observation example

2. Training from image observation example

3. Training from image observation with 20 demo trajectories example

Run with Franka Arm on Real Robot

1. Peg Insertion 📍

Procedure

2. PCB Component Insertion 🖥️

3. Cable Routing 🔌

4. Object Relocation 🗑️

Contribution

Citation

About

Releases

Packages

Languages

License

mz0in/serl-robotic-learning

Folders and files

Latest commit

History

Repository files navigation

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning

Installation

Overview and Code Structure

Quick Start with Franka Arm in Sim

1. Training from state observation example

2. Training from image observation example

3. Training from image observation with 20 demo trajectories example

Run with Franka Arm on Real Robot

1. Peg Insertion 📍

Procedure

2. PCB Component Insertion 🖥️

3. Cable Routing 🔌

4. Object Relocation 🗑️

Contribution

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages