This repository contains the source code for the three laboratory assignments completed during the Deep Learning Applications course taught by Professor Andrew David Bagdanov (@bagdanov on Github). The labs cover a variety of topics related to deep learning, including convolutional neural networks, large language models, and reinforcement learning.
In the exercise 1.1 we have to implement a simple Multilayer Perceptron to classify the 10 digits of MNIST, implementing our own training pipeline, training this model to convergence and monitoring the loss and accuracy on the training and validation sets for every epoch.
Tensorboard has been used for performance monitoring.
The source code for this lab can be found in the lab1/
directory.
In trainer.py
has been implemented a Trainer Class. Trainer provides a train()
method and a test()
method that can be used not only for MLP, but for convolutional, residual or not, neural networks.
In the models.py
have been implemented the three models used in this Laboratory: the MLP, the CNN and the ResCNN.
Tensorboard logs can be found in lab1/model
with the saved models.
The results show the performance of the best MLP trained, the one with hidden-layer sizes [128, 64, 10], 15 epochs, Adam with lr 1e-4, batch size 2048
from left to right: MLP Train Loss (ep/loss), MLP Test Loss (ep/loss), MLP Test Accuracy (ep/acc)
In the exercise 1.2 we have to repeat the verification we did in exercise 1.1, but with Convolutional Neural Networks, showing that deeper CNNs without residual connections do not always work better whereas even deeper ones with residual connections. This time we use CIFAR10, since MNIST is very easy.
The same Trainer of MLP has benn used to train the CNNs and the ResCNNs. These time has been evaluted differents depths to validate the hypothesis. For the CNN has been evaluated: 20, 56 layers deth For the ResCNN has been evaluted: 10, 20 Legend: darker is deeper!
Legends:
CNN Test Loss (ep/loss), CNN Test Accuracy (ep/acc)
Looking at the images, considering that I didn't achieved convergence in the training process for lack of time, it can be observed that, CNN does not always benefit from an increase in depth. In fact, CNN-20-layers train smoother and performs better than CNN-56-layers. Note that CNN-56-layers is overfitting from half the train
Legends:
ResCNN 21 layers, 30 epochs, lr 4e-4 Adam optimzer
ResCNN 11 layers, 30 epochs, lr 4e-4 Adam optimzer
from left to right: ResCNN Train Loss (ep/loss), ResCNN Test Loss (ep/loss), ResCNN Test Accuracy (ep/acc)
This time, by observing the images, it can be seen that increasing depth always improves ResCNN.
In the exercise 2.1 we have to use our CNNs (with and without residual connections) to study and quantify why the residual versions of the networks learn more effectively. So i write a simple trainining loop where I train simultaneously a CNN and a ResCNN with the same layers depth for just 150 batch iterations. During the training I add to the Summary writer the mean of the absolute values of the gradients passing through the networks during backpropagation in the last layer of the two models, the dense layer. Different layers size has been compared, all of them show the same results. The gradient magnitudes of the CNN tends to zero, showing vanishing gradient problem, instead the ResNet don't suffer of vanishing, neither exploding, gradients, even with the biggest layers depth evaluated.
Legends:
Mean of the absolute values of the gradients during backprop in the dense layer for the first 150 batch iterations
The exercise 2.3 ask to use the CNN model we trained in Exercise 1.2 and implement Class Activation Maps:
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. CVPR'16 (arXiv:1512.04150, 2015).
Instead of implementing from scratch the Class Activation Maps mechanism, I enjoyed using the original source code of the tecnique: "zhoubolei/CAM"
The original code required few modifications to work with my custom ResCNN. I had only to change Tranforms pipeline.
from left to right: CIFAR10 Ship, CAM CIFAR10 Ship, Truck from internet, CAM Truck from internet
Above we can see CAMs resulting on a Ship from CIFAR10, and on Truck image taken from the internet. Ship from CIFAR10 prediction logits: 0.975 → ship, 0.022 → truck, 0.001 → else Truck image taken from the internet prediction logits: 0.998 → ship, 0.001 → automobile, 0.001 → else
The second lab explores large language models using the Hugging Face Transformers library. The source code for this lab can be found in the lab2/
directory.
In this first exercise we trained a small autoregressive GPT model for character generation (the one used by Karpathy in his video) to generate text in the style of Dante Aligheri, using this file, which contains the entire text of Dante's Inferno.
In this exercise we samples text from a GPT2 model, we instantiated a pre-trained GPT2LMHeadModel
and use the generate()
method to generate text from a prompt.
prompt input:
Halfway down the road of life...
output:
Halfway down the road of life, I feel like this is pretty serious stuff. I guess I'd say it would be pretty serious if it were made so many years before someone decided they were ready to make the kind of movie it would be like
number of characters of *Divina Commedia*:
186001 characters
lenght of tokenized *Divina Commedia*:
78825 tokens
ratio:
0.42% of input Divina Commedia
In the exercise 3.1 we have to peruse the text classification datasets on Hugging Face to choose a moderately sized dataset and use a LLM to train a classifier to solve the problem.
Note: A good first baseline for this problem was to use an LLM exclusively as a feature extractor and then train a shallow model... and that's what I've done!
I have chosen tu use AG News dataset, sourced from Hugging Face. The AG News dataset is a collection of news articles categorized into four classes
- Number of Classes: 4
- Classes:
- 1: World
- 2: Sports
- 3: Business
- 4: Sci/Tech
- Total Number of Samples: 120,000 train, 7,600 test
I choose to use DistillBert only as a feature extractor on the AG News Dataset, and train a OVR-LOgistic-Regression on these embeddings.
TSNE plot of train features, TSNE plot of test features
It is incredible to see how well the LLM separates the emebedding representation of the 4 different classes. This allows the simplest Logistic Regression to work weel even on a benchmark text classification task.
The third and final lab covers reinforcement learning, specifically Deep Q-learning and . The source code for this lab can be found in the lab3/
directory.
I chose to reactor the original repository, so in the lab3/
you can find:
main.py:
it is the main script, it starts the train or the evaluation of the agentParser.py:
contains the implementation of a Parser class, that allows the user to set hyperparameyters and executions parameters from terminalDQLN.py:
contains the old implementation of the DQLNTrainer.py:
contains the implementaion of a Trainer class, that set-up the environment and train/evaluate the agent
A Minimal PyTorch implementation of Proximal Policy Optimization (PPO) with clipped objective for Gymnasyum environments has been addes as requested. You can find the implementation in PPO.py
(source code on: "nikhilbarhate99/PPO-PyTorch")
To run the code in this repository, you will need to have Python 3 installed, as well as several deep learning libraries including PyTorch, and Hugging Face Transformers.
To get started, clone this repository to your local machine and navigate to the directory of the lab you wish to run. From there, you can run each exercise seperately
This repository was created by Marco Mistretta. If you have any questions or concerns, please contact marco.mistretta@stud.unifi.it.
We would like to thank Professor Andrew David Bagdanov for teaching the Deep Learning Applicationsì' course and providing guidance on these labs