AlphaZero for Continuous Control

Famous successes with AlphaZero have been in games such as Chess and Go, where there is a clean reward signal, and the state space is discrete. Yet these results don't transfer well to real world control environments, where there is a continuous action space and noise.

Can AlphaZero-like approaches learn optimal low-level controls?

This repo produces an AlphaZero agent for continuous controls in ~250 lines of code.

Results

Training takes ~6 min on my laptop running parallelized MCTS over 3M simulation steps.

Average rollout reward is ~ -7.11 for vanilla MCTS, vs -3.741 with A0C (guiding with value and actor net helps!).

How to use

To train AlphaZero Continuous, evaluate the learned policy, and generate plots:

python train.py

To run the online MCTS planner (useful for debugging search):

python run_mcts.py

TODO

increase lag
keep track of state history, run on controls challenge
work for general Gym environment

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
networks		networks
utils		utils
.gitignore		.gitignore
README.md		README.md
run_mcts.py		run_mcts.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaZero for Continuous Control

Results

How to use

TODO

About

Releases

Packages

Languages

ellenjxu/controlzero

Folders and files

Latest commit

History

Repository files navigation

AlphaZero for Continuous Control

Results

How to use

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages