8000 GitHub - akaanirban/MLL-SGD: Multi-Level Local SGD simulation code
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

akaanirban/MLL-SGD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Level Local SGD simulation code

Corresponding paper: T. Castiglia, A. Das, and S. Patterson “Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks” in ICLR, 2021

This code is for simulating a multi-level network with heterogeneous workers, and running MLL-SGD to train a model on a set of data. The code is NOT for deployment purposes, and there is much room for optimization.

One can install our environment with Anaconda: conda create env -f flearn.yml

Figure results seen in the paper are under the "images" folder. Raw data results can be found under the "results" folder.

To rerun our experiments, run: ./run_exps.sh The bash script currently runs all experiments sequentially. They can be run in parallel depending on your machine's available memory.

To plot the results: python plot_exps.py

One can also run MLL-SGD with your own parameters. mll_sgd.py help output with extra comments:

usage: mll_sgd.py [-h] [--data [DATA]] [--model [MODEL]] [--hubs [HUBS]] [--workers [WORKERS]] [--tau [TAU]] [--q [Q]] [--graph [GRAPH]] [--epochs [EPOCHS]] [--batch [BATCH]] [--prob [PROB]] [--fed [FED]] [--chance [CHANCE]]

Run Multi-Level Local SGD.

optional arguments: -h, --help show this help message and exit --data [DATA] dataset to use in training. Value of 0 = MNIST data Value of 1 = EMNIST data Value of 2 = CIFAR-10 data --model [MODEL] model to use in training. Value of 0 = Logistic regression Value of 1 = CNN model for EMNIST Value of 2 = CIFARNet CNN model Value of 3 = ResNet-18 model --hubs [HUBS] number of hubs in system. --workers [WORKERS] number of workers per hub. --tau [TAU] number of local iterations for worker. --q [Q] number of sub-network iterations before global averaging. --graph [GRAPH] graph file ID to use for hub network. Values 1-4 use graphs in the "graphs" folder Value of 5 uses complete graph Value of 6 uses a line graph --epochs [EPOCHS] Number of epochs/global iterations to train for. --batch [BATCH] Batch size to use in Mini-batch SGD. --prob [PROB] Indicates with probability distribution to use for workers. Value of 0 = All worker probabilities are 1 Value of 1 = Use fixed probability defined by "chance" input Value of 2 = Uniform probability distribution from 0.1 to 1 Value of 3 = 10% of workers with probability 0.1, rest with 0.6 Value of 4 = 10% of workers with probability 1, rest with 0.5 Value of 5 = 10% of workers with probability 0.6, rest with 0.9 --fed [FED] Indicates if worker sets should be different sizes. False = All workers are given equal sized datasets True = Sub-networks receive either 5%, 10%, 20%, 25%, or 40% of the total dataset --chance [CHANCE] Fixed probability of taking gradient step. Only active when prob = 1.

About

Multi-Level Local SGD simulation code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.5%
  • Shell 6.5%
0