VDACs is a fork of PyMARL. We added 4 more actor-critics:
- IAC: Independent Actor-Critics
- Naive Critic: Actor-Critic with a Centralized Critic
- VDAC-sum: Proposed Actor-Critic
- VDAC-mix: Proposed Actor-Critic
VDACs is written in PyTorch and uses SMAC as its environment.
The orginal repository PyMARL uses docker to manage the virtual environment. We utilize conda in our implementation:
Create and activate conda virtual environment
cd VDACs
conda create --name pymarl python=3.5
source activate pymarl
git clone git@github.com:hahayonghuming/VDACs.git
cd VDACs
Install required packages:
pip -r install requirements.txt
Set up StarCraft II and SMAC:
bash install_sc2.sh
Value-decompostion actor-critic (VDAC) follows an actor-critic approach and is based on three main ideas:
- It is compatible with A2C, which is proposed to promote RL's training efficiency
- Similar to QMIX, VDAC enforces the monotonic relationship between global state-value and local state-values
, which is related to difference rewards
- VDAC utilizes a simple temporal difference (TD) advantage policy gradient. Both COMA advantage gradient and TD advantage gradient are unbiased estimates of a vanilla multi-agent policy gradient. However, our StarCraft testbed results (comparison between naive critic and COMA) favors TD advantage over COMA advantage
Two VDAC algorithms are proposed:
- VDAC-sum simply assumes the global state-value is a summation of local state-values. VDAC-sum does not take advantage of extra state information and shares a similar structure to IAC
- VDAC-mix utilizes a non-negative mixing network as an non-linear function approximator to represent a broader class of functions. The parameters in the mixing network is outputted by a set of hypernetworks, which take input as global states. Therefore, VDAC-mix is capable of incorporating extra state information
python3 src/main.py --config=vmix_a2c --env-config=sc2 with env_args.map_name=2s3z
python3 src/main.py --config=vdn_a2c --env-config=sc2 with env_args.map_name=2s3z
python3 src/main.py --config=central_critic --env-config=sc2 with env_args.map_name=2s3z
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z
python3 src/main.py --config=qmix_a2c --env-config=sc2 with env_args.map_name=2s3z
python3 src/main.py --config=coma --env-config=sc2 with env_args.map_name=2s3z
The config files act as defaults for an algorithm or environment.
They are all located in src/config
.
--config
refers to the config files in src/config/algs
--env-config
refers to the config files in src/config/envs
All results will be stored in the Results
folder.
5 independent experiments are conducted for each algorithm on each map. Colored Solid lines denotes the median of the win rate and shades represent the 25-75% percentile. The black dash line represents the win rates of a heurist ai. (My experiments are conducted on a RTX 2080 Ti GPU)
Note: We find that VDACs are sensitive to vf_coef
located in src/config
. This value penalizes critic losses. In our orignial implementation, we set vf_coef=0.5
. However, we later find out that vf_coef=0.1
yields better performance.
You can save the learnt models to disk by setting save_model = True
, which is set to False
by default. The frequency of saving models can be adjusted using save_model_interval
configuration. Models will be saved in the result directory, under the folder called models.
Learnt models can be loaded using the checkpoint_path
parameter, after which the learning will proceed from the corresponding timestep.
save_replay
option allows saving replays of models which are loaded using checkpoint_path
. Once the model is successfully loaded, test_nepisode
number of episodes are run on the test mode and a .SC2Replay file is saved in the Replay directory of StarCraft II. Please make sure to use the episode runner if you wish to save a replay, i.e., runner=episode
. The name of the saved replay file starts with the given env_args.save_replay_prefix
(map_name if empty), followed by the current timestamp. The saved replays can be watched by simply double-clicking on them.
Note: Replays cannot be watched using the Linux version of StarCraft II. Please use either the Mac or Windows version of the StarCraft II client. For Windows users who has problem openning replay files, you might need to download a free-trial StarCraft II under the directory C:\Program Files (x86)\StarCraft II
Description: Red Units are controlled by VDAC-mix and blue ones are controlled by build-in ai which is set to difficulty level 7. Strategies, such as focusing fires on enemies, zealots tend to attack stalkers, can be spotted in replays.
This repo is still under development. If you have any questions or concerns, please email js9wv@virginia.edu
If you this repository useful, please cite the following papers:
@article{su2020value,
title={Value-Decomposition Multi-Agent Actor-Critics},
author={Su, Jianyu and Adams, Stephen and Beling, Peter A},
journal={arXiv preprint arXiv:2007.12306},
year={2020}
}
@article{samvelyan19smac,
title = {{The} {StarCraft} {Multi}-{Agent} {Challenge}},
author = {Mikayel Samvelyan and Tabish Rashid and Christian Schroeder de Witt and Gregory Farquhar and Nantas Nardelli and Tim G. J. Rudner and Chia-Man Hung and Philiph H. S. Torr and Jakob Foerster and Shimon Whiteson},
journal = {CoRR},
volume = {abs/1902.04043},
year = {2019},
}