8000 GitHub - hahayonghuming/VDACs: Value-Decomposition Multi-Agent Actor-Critics
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

hahayonghuming/VDACs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Value-Decomposition Actor-Critics (VDACs)

VDACs is a fork of PyMARL. We added 4 more actor-critics:

  • IAC: Independent Actor-Critics
  • Naive Critic: Actor-Critic with a Centralized Critic
  • VDAC-sum: Proposed Actor-Critic
  • VDAC-mix: Proposed Actor-Critic

VDACs is written in PyTorch and uses SMAC as its environment.

Installation instructions

The orginal repository PyMARL uses docker to manage the virtual environment. We utilize conda in our implementation:

Create and activate conda virtual environment

cd VDACs
conda create --name pymarl python=3.5 
source activate pymarl
git clone git@github.com:hahayonghuming/VDACs.git
cd VDACs

Install required packages:

pip -r install requirements.txt

Set up StarCraft II and SMAC:

bash install_sc2.sh

Proposed Algorithms

Value-decompostion actor-critic (VDAC) follows an actor-critic approach and is based on three main ideas:

  • It is compatible with A2C, which is proposed to promote RL's training efficiency
  • Similar to QMIX, VDAC enforces the monotonic relationship between global state-value and local state-values , which is related to difference rewards
  • VDAC utilizes a simple temporal difference (TD) advantage policy gradient. Both COMA advantage gradient and TD advantage gradient are unbiased estimates of a vanilla multi-agent policy gradient. However, our StarCraft testbed results (comparison between naive critic and COMA) favors TD advantage over COMA advantage

Two VDAC algorithms are proposed:

  • VDAC-sum simply assumes the global state-value is a summation of local state-values. VDAC-sum does not take advantage of extra state information and shares a similar structure to IAC Image of vdnac
  • VDAC-mix utilizes a non-negative mixing network as an non-linear function approximator to represent a broader class of functions. The parameters in the mixing network is outputted by a set of hypernetworks, which take input as global states. Therefore, VDAC-mix is capable of incorporating extra state information Image of vmixac

Run the Proposed Algorithms

Run VDAC-mix

python3 src/main.py --config=vmix_a2c --env-config=sc2 with env_args.map_name=2s3z

Run VDAC-sum

python3 src/main.py --config=vdn_a2c --env-config=sc2 with env_args.map_name=2s3z

Run comparison experiments

Run Naive Critic

python3 src/main.py --config=central_critic --env-config=sc2 with env_args.map_name=2s3z

Run original QMIX

python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z

Run QMIX with A2C training Paradigm

python3 src/main.py --config=qmix_a2c --env-config=sc2 with env_args.map_name=2s3z

Run COMA

python3 src/main.py --config=coma --env-config=sc2 with env_args.map_name=2s3z

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

All results will be stored in the Results folder.

Training Results

5 independent experiments are conducted for each algorithm on each map. Colored Solid lines denotes the median of the win rate and shades represent the 25-75% percentile. The black dash line represents the win rates of a heurist ai. (My experiments are conducted on a RTX 2080 Ti GPU)

Note: We find that VDACs are sensitive to vf_coef located in src/config. This value penalizes critic losses. In our orignial implementation, we set vf_coef=0.5. However, we later find out that vf_coef=0.1 yields better performance.

1c3s5z

Image of 1c3s5z

3s5z

Image of 3s5z

2s3z

Image of 2s3z

8m

Image of 8m

bane_vs_bane

Image of bane_vs_bane

2s_vs_1sc

Image of 2s_vs_1sc

Saving and loading learnt models

Saving models

You can save the learnt models to disk by setting save_model = True, which is set to False by default. The frequency of saving models can be adjusted using save_model_interval configuration. Models will be saved in the result directory, under the folder called models.

Loading models

Learnt models can be loaded using the checkpoint_path parameter, after which the learning will proceed from the corresponding timestep.

Watching StarCraft II replays

save_replay option allows saving replays of models which are loaded using checkpoint_path. Once the model is successfully loaded, test_nepisode number of episodes are run on the test mode and a .SC2Replay file is saved in the Replay directory of StarCraft II. Please make sure to use the episode runner if you wish to save a replay, i.e., runner=episode. The name of the saved replay file starts with the given env_args.save_replay_prefix (map_name if empty), followed by the current timestamp. The saved replays can be watched by simply double-clicking on them.

Note: Replays cannot be watched using the Linux version of StarCraft II. Please use either the Mac or Windows version of the StarCraft II client. For Windows users who has problem openning replay files, you might need to download a free-trial StarCraft II under the directory C:\Program Files (x86)\StarCraft II

Description: Red Units are controlled by VDAC-mix and blue ones are controlled by build-in ai which is set to difficulty level 7. Strategies, such as focusing fires on enemies, zealots tend to attack stalkers, can be spotted in replays.

1c3s5z

Image of 1c3s5z

3s5z

Image of 3s5z

2s3z

Image of 2s3z

8m

Image of 8m

bane_vs_bane

Image of bane_vs_bane

Documentation

This repo is still under development. If you have any questions or concerns, please email js9wv@virginia.edu

Citation

If you this repository useful, please cite the following papers:

VDAC paper.

@article{su2020value,
  title={Value-Decomposition Multi-Agent Actor-Critics},
  author={Su, Jianyu and Adams, Stephen and Beling, Peter A},
  journal={arXiv preprint arXiv:2007.12306},
  year={2020}
}

SMAC paper.

@article{samvelyan19smac,
  title = {{The} {StarCraft} {Multi}-{Agent} {Challenge}},
  author = {Mikayel Samvelyan and Tabish Rashid and Christian Schroeder de Witt and Gregory Farquhar and Nantas Nardelli and Tim G. J. Rudner and Chia-Man Hung and Philiph H. S. Torr and Jakob Foerster and Shimon Whiteson},
  journal = {CoRR},
  volume = {abs/1902.04043},
  year = {2019},
}

About

Value-Decomposition Multi-Agent Actor-Critics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0