MBARC (Atari)

Official repository of the Model-Based Aware Reward Classification (MBARC) thesis (Atari 100k benchmark version).

Installation
How to use
Per-environment performance

MBARC predicting 50 frames into the future from 4 initial frames on Freeway.

Installation

This program uses python 3.7, CUDA 10.2 if enabled, and was tested on Ubuntu 20.04.1.

Run the following command to install the dependencies:

pip install torch==1.7.1 gym==0.15.7 gym[atari] opencv-python==4.4.0.42 numpy==1.16.4 tqdm

git clone https://github.com/openai/baselines.git
pip install -e baselines

Install wandb (optional)

You can use wandb to track your experiments:

pip install wandb

To use wandb, pass the flag --use-wandb when running the program. See How to use for more details about flags.

How to use

CUDA is enabled by default, see the following section to disable it.

To run the program, run the following command from the folder containing the mbarc package:

python -m mbarc

Disable CUDA

To disable CUDA, pass the flag --device cpu to the command line. See the next section for more information about flags.

Flags

You can pass multiple flags to the command line, a summary is printed at launch time. The most useful flags are described in the following table:

Flag	Value	Default	Description
--agents	Any positive integer	16	The number of parallel environments to train the PPO agent on
--device	Any string accepted by torch.device	cuda	Sets the PyTorch's device
--env-name	Any game name (without the suffixes) as depicted here	Freeway	Sets the gym environment
--strategy	"online", "class_balanced", "square_root", "progressively_balanced" or "mbarc"	mbarc	The training strategy to use

The following boolean flags are set to False if not passed to the command line:

Flag	Description
--render-evaluation	Renders the environments during evaluation
--render-training	Renders the environments during training
--use-modified-model	Setting this flag will replace MBARC's proposed reward prediction head with SimPLe's
--use-wandb	Enables wandb to track the experiment

For example, to run the program without CUDA and to render the environments during training, run:

python -m mbarc --device cpu --render-training

Per-environment performance

The scores* obtained with this implementation are detailed in the following table:

Environment	Score
Alien	380
Amidar	56.8
Assault	582.3
Asterix	718.3
BankHeist	40.3
ChopperCommand	1020
DemonAttack	547.7
Freeway	23.8
Frostbite	218.7
Gopher	520
Hero	2323.5
Jamesbond	71.7
Kangaroo	380
Krull	3332.4
Qbert	356.7
Seaquest	248
UpNDown	2905

*Scores obtained on only one full training per environment. The scores are the maximum average cumulative reward obtained in the real environment.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
a2c_ppo_acktr/a2c_ppo_acktr		a2c_ppo_acktr/a2c_ppo_acktr
atari_utils/atari_utils		atari_utils/atari_utils
mbarc		mbarc
res		res
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MBARC (Atari)

Installation

Install wandb (optional)

How to use

Disable CUDA

Flags

Per-environment performance

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

thomas-schillaci/mbarc_atari

Folders and files

Latest commit

History

Repository files navigation

MBARC (Atari)

Installation

Install wandb (optional)

How to use

Disable CUDA

Flags

Per-environment performance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages