$ git clone git@github.com:tomoino/PyTorch-Project-Template.git
$ cd PyTorch-Project-Template
$ sh docker/build.sh
$ sh docker/run.sh
$ sh docker/exec.sh
- Add yaml file to "./configs/project"
$ vi ./configs/project/new_project.yaml
- Run train.py with -cn (or --config-name) flag to specify project
$ python train.py -cn new_project
$ python train.py
You can run train.py with multiple different configurations.
$ python train.py -m \
train.batch_size=16,32 \
train.optimizer.lr=0.01,0.001
$ python train.py train.eval=True model.initial_ckpt=best_ckpt.pth
You can use MLflow to check the results of your experiment. Access http://localhost:5000/ from your browser. If necessary, you can edit env.sh to change the port.
You can experiment with JupyterLab.
jupyterlab
- Add module to data/dataset/ (Inherit BaseDataset module)
- Edit data/dataset/__init__.py (Import module and add module to SUPPORTED_DATASET)
- Add config yaml file to configs/project/data/dataset/
- Add module to data/sampler/ (Inherit BaseSampler module)
- Edit data/sampler/__init__.py (Import module and add module to SUPPORTED_SAMPLER)
- Add config yaml file to configs/project/data/sampler/
- Add module to models/networks/ (Inherit BaseModel module)
- Edit models/__init__.py (Import module and add module to SUPPORTED_MODEL)
- Add config yaml file to configs/project/model/
- Edit trainers/optimizer/__init__.py (Add module to SUPPORTED_OPTIMIZER)
- Add config yaml file to configs/project/train/optimizer
- Edit trainers/criterion/__init__.py (Add module to SUPPORTED_CRITERION)
- Add config yaml file to configs/project/train/criterion
- Add module to trainers/metrics/ (Inherit BaseMetrics module)
- Edit trainers/metrics/__init__.py (Import module and add module to SUPPORTED_METRICS)
- Add config yaml file to configs/project/train/metrics
- Add module to trainers/ (Inherit BaseTrainer module)
- Edit trainers/__init__.py (Import module and add module to SUPPORTED_TRAINER)
- Add config yaml file to configs/project/train/trainer
$ tree -I "datasets|mlruns|__pycache__|outputs|multirun"
.
├── README.md
├── configs
│ └── project
│ ├── data
│ │ ├── dataset
│ │ │ └── cifar10.yaml
│ │ └── sampler
│ │ ├── balanced_batch_sampler.yaml
│ │ └── shuffle_sampler.yaml
│ ├── default.yaml
│ ├── hydra
│ │ └── job_logging
│ │ └── custom.yaml
│ ├── model
│ │ ├── resnet18.yaml
│ │ └── simple_cnn.yaml
│ └── train
│ ├── criterion
│ │ └── cross_entropy.yaml
│ ├── metrics
│ │ ├── classification.yaml
│ │ └── default.yaml
│ ├── optimizer
│ │ └── adam.yaml
│ └── trainer
│ └── default.yaml
├── data
│ ├── __init__.py
│ ├── dataloader
│ │ └── __init__.py
│ ├── dataset
│ │ ├── __init__.py
│ │ ├── base_dataset.py
│ │ ├── cifar10.py
│ │ └── helper.py
│ └── sampler
│ ├── __init__.py
│ ├── balanced_batch_sampler.py
│ ├── base_sampler.py
│ └── shuffle_sampler.py
├── docker
│ ├── Dockerfile
│ ├── build.sh
│ ├── env.sh
│ ├── env_dev.sh
│ ├── exec.sh
│ ├── init.sh
│ ├── requirements.txt
│ └── run.sh
├── models
│ ├── __init__.py
│ ├── base_model.py
│ └── networks
│ ├── resnet18.py
│ └── simple_cnn.py
├── train.py
└── trainers
├── __init__.py
├── base_trainer.py
├── criterion
│ └── __init__.py
├── default_trainer.py
├── metrics
│ ├── __init__.py
│ ├── base_metrics.py
│ ├── classification_metrics.py
│ └── default_metrics.py
└── optimizer
└── __init__.py
- optuna
- scheduler
- flake8
- error handling
- clear cache command
- assertion
- notification
- FP16 (apex)
- classmethod, staticmethod
- value error
- usage as template
- multi-gpu
- nohup
- docker-compose
- pytorch-lightning