Knowing who is using your product and in what context is power! By using Computer Vision we can acquire organic customer insights by detecting the right images on social media. This repository showcases models trained to detect 15 distinct brand logos in pictures/videos online!
This project was developed as part of my Data Science degree coursework, with the goal of maximizing accuracy. After a brief experimentation with R-CNN, YOLOv5 was identified as the best performing model for the job. Additionally, thanks to its state of the art architecture, I was able to develop two distinct models that one can choose to implement:
- yolo1000_s_cust - based on YOLOv5_s architecture
- yolo1000_x - based on YOLOv5_x architecture
The different architectures allow the user a tradeoff between speed and accuracy. Here is a more detailed comparison between the two architectures (statistics based on pretrained checkpoints):
Model | size (pixels) |
mAPval 0.5:0.95 |
mAPval 0.5 |
Speed CPU b1 (ms) |
Speed V100 b1 (ms) |
Speed V100 b32 (ms) |
params (M) |
FLOPs @640 (B) |
---|---|---|---|---|---|---|---|---|
YOLOv5s | 640 | 37.2 | 56.0 | 98 | 6.4 | 0.9 | 7.2 | 16.5 |
YOLOv5x | 640 | 50.7 | 68.9 | 766 | 12.1 | 4.8 | 86.7 | 205.7 |
The main model development environment was an Azure instance with an NVIDIA K80 with 12 GB of vram. Additionally, I used Google Colab with GPU acceleration enabled in order to test different hyperparameter configurations.
Finally, in order to make the reproducibility and improvement of this repository as straightforward as possible, I used Git Large File Storage. This allowed for a simple cloning that includes all relevant training data, as well as model weights.
The raw dataset provided 17 brand logos. After initial inspection, Intimissimi and Ralph Lauren were dropped due to a large amount of mislabeled data. This left me with 15 logos: Adidas, Apple Inc., Chanel, Coca-Cola, Emirates, Hard Rock Cafe, Mercedes-Benz, NFL, Nike, Pepsi, Puma, Starbucks, The North Face, Toyota, and Under Armour.
In order to convert the dataset to YOLOv5 PyTorch TXT format, I used Roboflow. As far as preprocessing, I applied auto-orient and image resize (to the correct model size 640x640). Since YOLOv5 already applies data augmention in its training script which can be finutened through hyperparameters, no data augmentation steps were taken in Roboflow.
After applying additional preprocessing steps described in the following section, the final dataset can be found under the name yolo1000, which has the following statistics:
For model training, transfer learning was used. Instead of starting from a randomized set of weights, I used the pretrained checkpoints provided by YOLOv5 repo, which are already trained on the COCO dataset.
To establish baseline performance, I first trained a model using an unaltered version of the full dataset (around 38K images) using YOLOv5l. Following that, I took the following steps to improve performance:
- Improved label consistency by adding missing bounding boxes -- original data had only one box per image
- Improved label accuracy by resizing bounding boxes with inaccurate position
- Balanced the class distribution by including a maximum of 1000 images per class
- Added 2% background images (218 images) and removed wrong bounding boxes in order to reduce false positives
By applying those, not only did the performance of the much smaller YOLOv5s model reach the baseline performance, but there was also a significant improvement with instances of false positves (example below).
YOLOv5l - baseline | YOLOv5s - v1 |
---|---|
Finally, through experimentation with hyperparameters the YOLOv5s - v1 was further improved and the following data augmentation steps were applied: HSV -Hue/Saturation/Value, Translation, Scale, Flip (horizontal), Shear, Mosaic and Mixup.
Both final models, YOLOv5s - v2 and YOLOv5x - final, were trained using those finetuned hyperparameters.
For model evaluation the main metric used was mAP. Additionally, the IoU (Intersection over Union) was calculated at bounding box confidence of 50% in order to exclude "lucky guesses".
Here are the average results for each model:
Model | mAPval 0.5 |
mAPval 0.5:0.95 |
---|---|---|
YOLOv5l - baseline | 0.870 | 0.701 |
YOLOv5s - v1 | 0.866 | 0.699 |
YOLOv5s - v2 | 0.873 | 0.711 |
YOLOv5x - final | 0.881 | 0.722 |
We can see that the performance of the two final models is very close, while the speed of the X model is significantly lower. Therefore, if the user wants absolute optimal performance over a set of images, YOLOv5x - final should be used, while YOLOv5s - v2 is better suited for realtime video detection.
Here are the final results for YOLOv5x - final:
Logo | mAP 0.5 |
mAP 0.5:0.95 |
IoU |
---|---|---|---|
Adidas | 0.837 | 0.713 | 0.921 |
Apple Inc. | 0.936 | 0.764 | 0.924 |
Chanel | 0.830 | 0.644 | 0.869 |
Coca Cola | 0.889 | 0.648 | 0.867 |
Emirates | 0.786 | 0.688 | 0.908 |
Hard Rock Cafè | 0.953 | 0.813 | 0.906 |
Mercedes Benz | 0.908 | 0.778 | 0.913 |
NFL | 0.921 | 0.744 | 0.902 |
Nike | 0.781 | 0.630 | 0.905 |
Pepsi | 0.721 | 0.530 | 0.819 |
Puma | 0.869 | 0.679 | 0.889 |
Starbucks | 0.964 | 0.869 | 0.941 |
The North Face | 0.958 | 0.818 | 0.925 |
Toyota | 0.907 | 0.749 | 0.925 |
Under Armour | 0.963 | 0.771 | 0.880 |
Running the models requires a Python environment with the latest version of PyTorch and CUDA 11.3. I have opted for Python 3.7.11 as it provides for an easier integration with Google Colab.
The easiest and cleanest way to go about this is to levarage Anaconda and set up a virtual environment with the required dependencies. Here is how to do so in three steps (assuming you already have Anaconda installed):
- First you need Git LFS in order to be able to properly clone the repository
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
- Next you clone the repository including submodules (YOLOv5)
git clone --recurse-submodules https://github.com/dvdimitrov13/Logo_Detection.git yolo
- Finally, create a virtual environment based on the provided environment.yaml (the name will be "pytorch")
cd yolo
conda env create -f environment.yml
N.B! If you want to install everything manually you can set the conda environment like so:
conda create -n pytorch python=3.7
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
And then follow the tutorial on ultralytics/yolov5 - Training on Custom Data
In order to use the models provided, first you need to create a directory called detect under yolo/runs/:
mkdir runs/detect
Afterwards, you cd in yolov5/ and run python detect.py. In the example below, we will run the command on the test set, but you can specify any image/video/camera feed you are interested in!
cd yolov5
python detect.py --weight ../runs/train/yolo1000_x/weights/best.pt --source ../formatted_data/yolo1000/test/images --agnostic-nms --augment --project ../runs/detect --name output_{model_name} --save-txt --save-conf
yolo1000_s (yolov5s - v1)
yolo1000_s_cust (yolov5s - v2)
yolo_full_l (yolov5l - baseline)
Once we have run the detect.py on the test set, we can calculate the IoU using utils/calculate_IoU.py
python utils/calculate_IoU.py --model_name yolo1000_x
yolo1000_s (yolov5s - v1)
yolo1000_s_cust (yolov5s - v2)
yolo_full_l (yolov5l - baseline)
To train, use the following command:
python train.py --img 640 --batch 32 --epochs 50 --data ../formatted_data/yolo1000/yolo1000.yaml --weights yolov5s.pt --name yolo1000_s_new --hyp ../runs/train/yolo1000_s_cust/hyp.yaml
- If you want to train on custom data, change you data.yaml file according to the official tutorial
- If you want to experiment with different hyperparameters, change the hyp.yaml file
Distributed under the MIT License. See LICENSE.txt
for more information.
Dimitar Dimitrov
Email - dvdimitrov13@gmail.com
LinkedIn - https://www.linkedin.com/in/dimitarvalentindimitrov/
This project would not have been possible without the support of my Computer Vision professors at Bocconi University who provided the two key ingedients -- guidance and data!