In this repository, we are trying different ways to make reinforcement learning environments from Mujoco Gym and dm_control deterministic. We strive to ensure that the environments have the following important properties:
- the function
reset()
gives always the same initial state; - the function
step(action)
such that a sequence ofactions
uniquely determinesstates
andrewards
; - the function
virtual_step(state, action)
uniquely determinesnext_state
andreward
.
We implemented a wrapper for the environments that aims to fulfill these points, but unfortunately, it works not for all environments yet.
The main reason is that Mujoco has internal variables and structures to which there is no obvious access. For example, in many environments, next_state
and reward
depend not only on the current states
, but also on "internal physics". Nonetheless, we strive to take such things into account. Our results are described in the table below.
Environments | reset() | step() | virtual_step() |
---|---|---|---|
DMControlEnv('acrobot', 'swingup') | 0 | 0 | 0 |
DMControlEnv('acrobot', 'swingup_sparse') | 0 | 0 | 0 |
DMControlEnv('ball_in_cup', 'catch') | 0 | ||
DMControlEnvWithPhysics('ball_in_cup', 'catch') | 0 | 0 | 0 |
DMControlEnv('cartpole', 'balance') | 0 | ||
DMControlEnvWithPhysics('cartpole', 'balance') | 0 | 0 | 0 |
DMControlEnv('cartpole', 'balance_sparse') | 0 | ||
DMControlEnvWithPhysics('cartpole', 'balance_sparse') | 0 | 0 | 0 |
DMControlEnv('cartpole', 'swingup') | 0 | ||
DMControlEnvWithPhysics('cartpole', 'swingup') | 0 | 0 | 0 |
DMControlEnv('cartpole', 'swingup_sparse') | 0 | ||
DMControlEnvWithPhysics('cartpole', 'swingup_sparse') | 0 | 0 | 0 |
DMControlEnv('cheetah', 'run') | 0 | ||
DMControlEnvWithPhysics('cheetah', 'run') | 0 | 0 | 0 |
DMControlEnv('finger', 'spin') | 0 | ||
DMControlEnvWithPhysics('finger', 'spin') | 0 | 0 | 0 |
DMControlEnv('finger', 'turn_easy') | 0 | ||
DMControlEnvWithPhysics('finger', 'turn_easy') | 0 | 0 | 0 |
DMControlEnv('finger', 'turn_hard') | 0 | ||
DMControlEnvWithPhysics('finger', 'turn_hard') | 0 | 0 | 0 |
DMControlEnv('fish', 'upright') | 0 | ||
DMControlEnvWithPhysics('fish', 'upright') | 0 | 0 | 0 |
DMControlEnv('fish', 'swim') | |||
DDMControlEnvWithPhysics('fish', 'swim') | |||
DMControlEnv('hopper', 'stand') | 0 | ||
DMControlEnvWithPhysics('hopper', 'stand') | 0 | 0 | 0 |
DMControlEnv('hopper', 'hop') | 0 | ||
DMControlEnvWithPhysics('hopper', 'hop') | 0 | 0 | 0 |
DMControlEnv('humanoid', 'stand') | 0 | ||
DMControlEnvWithPhysics('humanoid', 'stand') | 0 | 0 | 0 |
DMControlEnv('humanoid', 'walk') | 0 | ||
DMControlEnvWithPhysics('humanoid', 'walk') | 0 | 0 | 0 |
DMControlEnv('humanoid', 'run') | 0 | ||
DMControlEnvWithPhysics('humanoid', 'run') | 0 | 0 | 0 |
DMControlEnv('manipulator', 'bring_ball') | |||
DMControlEnvWithPhysics('manipulator', 'bring_ball') | |||
DMControlEnv('pendulum', 'swingup') | 0 | ||
DMControlEnvWithPhysics('pendulum', 'swingup') | 0 | 0 | 0 |
DMControlEnv('point_mass', 'easy') | 0 | ||
DMControlEnvWithPhysics('point_mass', 'easy') | 0 | 0 | 0 |
DMControlEnv('reacher', 'easy') | |||
DMControlEnvWithPhysics('reacher', 'easy') | |||
DMControlEnv('reacher', 'hard') | 0 | ||
DMControlEnvWithPhysics('reacher', 'hard') | 0 | ||
DMControlEnv('swimmer', 'swimmer6') | |||
DMControlEnvWithPhysics('swimmer', 'swimmer6') | |||
MControlEnv('swimmer', 'swimmer15') | |||
DMControlEnvWithPhysics('swimmer', 'swimmer15') | |||
DMControlEnv('walker', 'stand') | 0 | ||
DMControlEnvWithPhysics('walker', 'stand') | 0 | 0 | 0 |
DMControlEnv('walker', 'walk') | 0 | ||
DMControlEnvWithPhysics('walker', 'walk') | 0 | 0 | 0 |
DMControlEnv('walker', 'run') | 0 | ||
DMControlEnvWithPhysics('walker', 'run') | 0 | 0 | 0 |
Here DMControlEnv(domain_name, task_name)
is a simple wrapper to use usuall gym interface; DMControlEnvWithPhysics(domain_name, task_name)
is a special wrapper which implicitly saves "internal physics" of environments as an additional attribute of the numpy array state
.
Environments | reset() | step() | virtual_step() |
---|---|---|---|
GymEnv('Ant-v3') | 0 | ||
GymEnv('HalfCheetah-v3') | 0 | ||
GymEnv('Hopper-v3') | 0 | ||
GymEnv('Humanoid-v3') | 0 | ||
GymEnv('HumanoidStandup-v2') | |||
GymEnv('InvertedDoublePendulum-v2') | 0 | 0 | 0 |
GymEnv('InvertedPendulum-v2') | 0 | 0 | 0 |
GymEnv('Reacher-v2') | |||
GymEnv('Swimmer-v3') | 0 | ||
GymEnv('Walker2d-v3') | 0 |
Here GymEnv(domain_name, task_name)
is a special wrapper which implicitly saves MjSimState object as an additional attribute of the numpy array state
.
Support for Windows has been dropped in newer versions of mujoco-py. The latest working version is 1.50.1.68. But even here you can’t do without dancing with a tambourine.
Microsoft Visual C++ 14.0 or greater. https://visualstudio.microsoft.com/visual-cpp-build-tools/
-
Download binaries: http://www.roboti.us/download.html/mjpro150_win64.zip and activation key: http://www.roboti.us/license.html
-
Create directory
%userprofile%/.mujoco/
. -
Unzip the binaries and move the key to the created directory.
-
Add full path to directory
%userprofile%/.mujoco/mjpro150/bin
into a variable environments PATH. -
Download mujoco-py versions 1.50.1.68: https://files.pythonhosted.org/packages/cf/8c/64e0630b3d450244feef0688d90eab2448631e40ba6bdbd90a70b84898e7/mujoco-py-1.50.1.68.tar.gz
-
Unzip the downloaded archive to an arbitrary directory, navigate to this directory in the terminal and install mujoco-py using command:
python setup.py install
- Before each use, you must execute the commands
import os
os.add_dll_directory(os.path.join(os.path.expanduser('~'), ".mujoco", "mjpro150", "bin"))
from mujoco_py import GlfwContext
GlfwContext(True)
- Loading environments is done via gym
import gym
env = gym.make("Ant-v3")
- Further use of environments - habitual.
import gym
import matplotlib.pyplot as plt
env = gym.make("Ant-v3")
state = env.reset()
pixels = env.render("rgb_array")
plt.imshow(pixels)
Follow the instructions