This repository contains source code generated by Luminide. It may be used to train, validate and tune deep learning models for image classification. The following directory structure is assumed:
├── code (source code)
├── input (dataset)
└── output (working directory)
The dataset should have images inside a directory named train_images
and a CSV file named train.csv
. An example is shown below:
input
├── train.csv
└── train_images
├── 800113bb65efe69e.jpg
├── 8002cb321f8bfcdf.jpg
├── 80070f7fb5e2ccaa.jpg
The CSV file is expected to have labels under a column named labels
as in the example below:
image,labels
800113bb65efe69e.jpg,healthy
8002cb321f8bfcdf.jpg,scab frog_eye_leaf_spot complex
80070f7fb5e2ccaa.jpg,scab
If an item has multiple labels, they should be separated by a space character as shown.
- Attach a Compute Server and download a dataset. An example dataset is available at gs://luminide-example-plant-pathology.
- For exploratory analysis, run eda.ipynb.
- To train, use the
Run Experiment
menu. - To monitor training progress, use the
Experiment Visualization
menu. - To generate a report on the most recent training session, run report.sh from the
Run Experiment
tab. Make sureTrack Experiment
is checked. The results will be copied back to a file calledreport.html
. - To tune the hyperparameters, edit sweep.yaml as desired and launch a sweep from the
Run Experiment
tab. Tuned values will be copied back to a file calledconfig-tuned.yaml
along with visualizations insweep-results.html
. - After an experiment is complete, use the file browser on the IDE interface to access the results on the IDE Server.
- Use the
Experiment Tracking
menu to track experiments.
Note: As configured, the code trains on 50% of the data. To train on the entire dataset, edit full.sh
and fast.sh
to remove the --subset
command line parameter so that the default value of 100 is used.
For more details on usage, see Luminide documentation
The picture below shows a cluster of galaxies that illustrates a phenomenon known as gravitational lensing.
In general, galaxies are expected to not prefer a specific orientation. The halo pattern in this image is believed to be caused by gravitational pull exerted by dark matter, on light emitted by the galaxies behind it.
The dataset is available in a storage bucket at gs://luminide-example-darkmatter. It contains images of simulated galaxy clusters. A patch of sky with no dark matter will have the galaxies oriented randomly as shown in this image.
Here is an example sky with an instance of the lensing effect.
Each sky in the dataset contains either 0 or 1 lens object. The task is to build a model that can detect the presence of a lens.
Name | Description |
---|---|
train_imges | directory containing images for training |
test_images | directory containing images for validation |
train.csv | training labels (counts of lens objects) |
train-lenses.csv | location of lens objects within the training images |
test.csv | validation labels (counts of lens objects) |
test-lenses.csv | location of lens objects within the validation images |
Instructions for using the sample code outside of Luminide is given below:
-
Create directory structure
mkdir darkmatter cd darkmatter mkdir code input output
-
Download source code
cd code git clone git@github.com:luminide/example-darkmatter.git .
-
Download data
cd ../input gsutil -m rsync -r gs://luminide-example-darkmatter . dtrx train.zip test.zip
-
Train and validate a model
cd ../output ../code/full.sh
The validation accuracy is calculated and displayed for every epoch. Once the script finishes, there should be an image similar to the one shown below that shows the predictions and their confidence levels for a sample of images from the validation set.