This repository contains source code generated by Luminide. It may be used to train, validate and tune deep learning models for image classification. The following directory structure is assumed:
├── code (source code)
├── input (dataset)
└── output (working directory)
The dataset should have images inside a directory named train_images
and a CSV file named train.csv
. An example is shown below:
input
├── train.csv
└── train_images
├── 800113bb65efe69e.jpg
├── 8002cb321f8bfcdf.jpg
├── 80070f7fb5e2ccaa.jpg
The CSV file is expected to have labels under a column named labels
as in the example below:
image,labels
800113bb65efe69e.jpg,healthy
8002cb321f8bfcdf.jpg,scab frog_eye_leaf_spot complex
80070f7fb5e2ccaa.jpg,scab
If an item has multiple labels, they should be separated by a space character as shown.
-
Attach a Compute Server and download a dataset. An example dataset is available at gs://luminide-example-plant-pathology.
-
For exploratory analysis, run eda.ipynb.
-
To train, use the
Run Experiment
menu. -
To monitor training progress, use the
Experiment Visualization
menu. -
To generate a report on the most recent training session, run report.sh from the
Run Experiment
tab. Make sureTrack Experiment
is checked. The results will be copied back to a file calledreport.html
. -
To tune the hyperparameters, edit sweep.yaml as desired and launch a sweep from the
Run Experiment
tab. Tuned values will be copied back to a file calledconfig-tuned.yaml
along with visualizations insweep-results.html
. -
After an experiment is complete, use the file browser on the IDE interface to access the results on the IDE Server.
-
Use the
Experiment Tracking
menu to track experiments. -
To use this repo for a Kaggle code competition:
- Configure your Kaggle API token on the
Import Data
tab. - Run kaggle.sh as a custom experiment to upload the code to Kaggle.
- To create a submission, copy kaggle.ipynb to a new Kaggle notebook.
- Add the notebook output of
https://www.kaggle.com/luminide/wheels1
as Data. - Add your dataset at
https://www.kaggle.com/<kaggle_username>/kagglecode
as Data. - Add the relevant competition dataset as Data.
- Run the notebook after turning off the
Internet
setting.
- Configure your Kaggle API token on the
Note: As configured, the code trains on 50% of the data. To train on the entire dataset, edit full.sh
and fast.sh
to remove the --subset
command line parameter so that the default value of 100 is used.
For more details on usage, see Luminide documentation