8000 GitHub - ellenjxu/controlzero: AlphaZero for Continuous Control
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ellenjxu/controlzero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlphaZero for Continuous Control

Famous successes with AlphaZero have been in games such as Chess and Go, where there is a clean reward signal, and the state space is discrete. Yet these results don't transfer well to real world control environments, where there is a continuous action space and noise.

Can AlphaZero-like approaches learn optimal low-level controls?

This repo produces an AlphaZero agent for continuous controls in ~250 lines of code.

Results

Training takes ~6 min on my laptop running parallelized MCTS over 3M simulation steps.

image

Average rollout reward is ~ -7.11 for vanilla MCTS, vs -3.741 with A0C (guiding with value and actor net helps!).

How to use

To train AlphaZero Continuous, evaluate the learned policy, and generate plots:

python train.py

To run the online MCTS planner (useful for debugging search):

python run_mcts.py

TODO

  • increase lag
  • keep track of state history, run on controls challenge
  • work for general Gym environment

About

AlphaZero for Continuous Control

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0