Paper: An end-to-end and well-generalized weakly supervised whole slide image analysis for lung cancer classification using deep learning
Pretrained weights | Description |
---|---|
Fold1 | Fold1 model weights |
Fold2 | Fold2 model weights |
Fold3 | Fold3 model weights |
Fold4 | Fold4 model weights |
Fold5 | Fold5 model weights |
- Weakly supervised learning (No expensive annotations)
- End-to-end pipeline (Iterative sampling and feature aggregation module)
- Data efficiency (<200 images, AUCs of >0.9)
- Promising generalizability (six heterogenous external datasets with AUCs of 0.93-0.97)
- Overperform state-of-the-art methods
System: Linux(CentOS) Pytorch (version 1.7.0): model training and validation. OpenSlide (version 3.4.1): processing whole slide images. lmdb (version 1.3.0): building lmdb dataset for faster data loading. staintools: following https://github.com/Peter554/StainTools.
$ git clone https://github.com/DeepLaboratory/AlphaPatho.git # install AlphaPatho
$ cd AlphaPatho
Directory description:
├─ data.py // dataset code
├─ mean_pixel.pkl // mean value of pixel value for data augmentation
├─ model.py // model architecture
├─ run.py // main code for training and validation
├─ run.sh // run.py launcher
├─ test.py // main code for inference or evaluation
├─ run_test.sh // test.py launcher
├─ utils.py // tool code
├─ tmp.py // LMDB dataset building code
-
Slide tiling: see methods in our paper. When tiling is done, we should have files like below:
├─ slide1_folder │ ├─ tile1.png │ ├─ tile2.png │ ├─ ... ├─ slide2_folder │ ├─ tile1.png │ ├─ tile2.png │ ├─ ... ├─ ...
-
csv for LMDB dataset generation: after slide tiling, we need to generate a csv used for LMDB dataset generation. The csv content is as follow:
slides_name label xxx/slide1_folder 1 xxx/slide2_folder 0 ... ... ... -
Building LMDB dataset
tmp.py is the main code. We need to modify the csv path and lmdb dataset save directory. Using tmp.py, we can generate the LMDB dataset.
-
Dataset preparation: preparing csv files for 5-fold validation, the format of each csv file is lik 5AFB e follows:
slides_name label xxx/slide1_folder 1 xxx/slide2_folder 0 ... ... ... After cvs preparation, we will get train.csv, val.csv, and test.csv for each fold.
-
Model training: run run.sh and start training.
Preparing datasets for testing, modifying the parameters in the run_test.sh, and runing run_test.sh for testing!
- [raycaohmu]https://github.com/raycaohmu)