GitHub - RayanGarg/Logo_Detection: In this repo I present a CNN trained to detect different brand logos in pictures/videos. The network is based on the yolov5 architecture and takes advanatage of a couple different model sizes that offer a speed/accuracy tradeoff

Knowing who is using your product and in what context is power! By using Computer Vision we can acquire organic customer insights by detecting the right images on social media. This repository showcases models trained to detect 15 distinct brand logos in pictures/videos online!

About the project

This project was developed as part of my Data Science degree coursework, with the goal of maximizing accuracy. After a brief experimentation with R-CNN, YOLOv5 was identified as the best performing model for the job. Additionally, thanks to its state of the art architecture, I was able to develop two distinct models that one can choose to implement:

yolo1000_s_cust - based on YOLOv5_s architecture
yolo1000_x - based on YOLOv5_x architecture

The different architectures allow the user a tradeoff between speed and accuracy. Here is a more detailed comparison between the two architectures (statistics based on pretrained checkpoints):

Model	size ^(pixels)	mAP^val 0.5:0.95	mAP^val 0.5	Speed ^{CPU b1 (ms)}	Speed ^{V100 b1 (ms)}	Speed ^{V100 b32 (ms)}	params ^(M)	FLOPs ^{@640 (B)}
YOLOv5s	640	37.2	56.0	98	6.4	0.9	7.2	16.5
YOLOv5x	640	50.7	68.9	766	12.1	4.8	86.7	205.7

(back to top)

Environment

The main model development environment was an Azure instance with an NVIDIA K80 with 12 GB of vram. Additionally, I used Google Colab with GPU acceleration enabled in order to test different hyperparameter configurations.

Finally, in order to make the reproducibility and improvement of this repository as straightforward as possible, I used Git Large File Storage. This allowed for a simple cloning that includes all relevant training data, as well as model weights.

(back to top)

Development

1. Dataset

The raw dataset provided 17 brand logos. After initial inspection, Intimissimi and Ralph Lauren were dropped due to a large amount of mislabeled data. This left me with 15 logos: Adidas, Apple Inc., Chanel, Coca-Cola, Emirates, Hard Rock Cafe, Mercedes-Benz, NFL, Nike, Pepsi, Puma, Starbucks, The North Face, Toyota, and Under Armour.

In order to convert the dataset to YOLOv5 PyTorch TXT format, I used Roboflow. As far as preprocessing, I applied auto-orient and image resize (to the correct model size 640x640). Since YOLOv5 already applies data augmention in its training script which can be finutened through hyperparameters, no data augmentation steps were taken in Roboflow.

After applying additional preprocessing steps described in the following section, the final dataset can be found under the name yolo1000, which has the following statistics:

2. Training

For model training, transfer learning was used. Instead of starting from a randomized set of weights, I used the pretrained checkpoints provided by YOLOv5 repo, which are already trained on the COCO dataset.

To establish baseline performance, I first trained a model using an unaltered version of the full dataset (around 38K images) using YOLOv5l. Following that, I took the following steps to improve performance:

Improved label consistency by adding missing bounding boxes -- original data had only one box per image
Improved label accuracy by resizing bounding boxes with inaccurate position
Balanced the class distribution by including a maximum of 1000 images per class
Added 2% background images (218 images) and removed wrong bounding boxes in order to reduce false positives

By applying those, not only did the performance of the much smaller YOLOv5s model reach the baseline performance, but there was also a significant improvement with instances of false positves (example below).

YOLOv5l - baseline	YOLOv5s - v1

Finally, through experimentation with hyperparameters the YOLOv5s - v1 was further improved and the following data augmentation steps were applied: HSV -Hue/Saturation/Value, Translation, Scale, Flip (horizontal), Shear, Mosaic and Mixup.

Both final models, YOLOv5s - v2 and YOLOv5x - final, were trained using those finetuned hyperparameters.

Results

For model evaluation the main metric used was mAP. Additionally, the IoU (Intersection over Union) was calculated at bounding box confidence of 50% in order to exclude "lucky guesses".

Here are the average results for each model:

Model	mAP^val 0.5	mAP^val 0.5:0.95
YOLOv5l - baseline	0.870	0.701
YOLOv5s - v1	0.866	0.699
YOLOv5s - v2	0.873	0.711
YOLOv5x - final	0.881	0.722

We can see that the performance of the two final models is very close, while the speed of the X model is significantly lower. Therefore, if the user wants absolute optimal performance over a set of images, YOLOv5x - final should be used, while YOLOv5s - v2 is better suited for realtime video detection.

Here are the final results for YOLOv5x - final:

Logo	mAP 0.5	mAP 0.5:0.95	IoU
Adidas	0.837	0.713	0.921
Apple Inc.	0.936	0.764	0.924
Chanel	0.830	0.644	0.869
Coca Cola	0.889	0.648	0.867
Emirates	0.786	0.688	0.908
Hard Rock Cafè	0.953	0.813	0.906
Mercedes Benz	0.908	0.778	0.913
NFL	0.921	0.744	0.902
Nike	0.781	0.630	0.905
Pepsi	0.721	0.530	0.819
Puma	0.869	0.679	0.889
Starbucks	0.964	0.869	0.941
The North Face	0.958	0.818	0.925
Toyota	0.907	0.749	0.925
Under Armour	0.963	0.771	0.880

User manual

You can find a full implementation of the user manual below, including reproduction of evaluation and training results, in the Colab notebook above!

Installation

Running the models requires a Python environment with the latest version of PyTorch and CUDA 11.3. I have opted for Python 3.7.11 as it provides for an easier integration with Google Colab.

The easiest and cleanest way to go about this is to levarage Anaconda and set up a virtual environment with the required dependencies. Here is how to do so in three steps (assuming you already have Anaconda installed):

First you need Git LFS in order to be able to properly clone the repository

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install

Next you clone the repository including submodules (YOLOv5)

git clone --recurse-submodules https://github.com/dvdimitrov13/Logo_Detection.git yolo

Finally, create a virtual environment based on the provided environment.yaml (the name will be "pytorch")

cd yolo
conda env create -f environment.yml

N.B! If you want to install everything manually you can set the conda environment like so:

conda create -n pytorch python=3.7
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

And then follow the tutorial on ultralytics/yolov5 - Training on Custom Data

Detection and evaluation

In order to use the models provided, first you need to create a directory called detect under yolo/runs/:

mkdir runs/detect

Afterwards, you cd in yolov5/ and run python detect.py. In the example below, we will run the command on the test set, but you can specify any image/video/camera feed you are interested in!

cd yolov5
python detect.py --weight ../runs/train/yolo1000_x/weights/best.pt --source ../formatted_data/yolo1000/test/images --agnostic-nms --augment --project ../runs/detect --name output_{model_name} --save-txt --save-conf
										yolo1000_s 	(yolov5s - v1)
										yolo1000_s_cust (yolov5s - v2)
										yolo_full_l 	(yolov5l - baseline)

Once we have run the detect.py on the test set, we can calculate the IoU using utils/calculate_IoU.py

python utils/calculate_IoU.py --model_name yolo1000_x
										   yolo1000_s 		(yolov5s - v1)
 										   yolo1000_s_cust 	(yolov5s - v2)
										   yolo_full_l 		(yolov5l - baseline)

Training

To train, use the following command:

python train.py --img 640 --batch 32 --epochs 50 --data ../formatted_data/yolo1000/yolo1000.yaml --weights yolov5s.pt --name yolo1000_s_new --hyp ../runs/train/yolo1000_s_cust/hyp.yaml

If you want to train on custom data, change you data.yaml file according to the official tutorial
If you want to experiment with different hyperparameters, change the hyp.yaml file

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Dimitar Dimitrov
Email - dvdimitrov13@gmail.com
LinkedIn - https://www.linkedin.com/in/dimitarvalentindimitrov/

(back to top)

Acknowledgments

This project would not have been possible without the support of my Computer Vision professors at Bocconi University who provided the two key ingedients -- guidance and data!

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
formatted_data/yolo1000		formatted_data/yolo1000
images		images
runs/train		runs/train
utils		utils
yolov5 @ 8efe977		yolov5 @ 8efe977
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
linux_commands.txt		linux_commands.txt
test_command.txt		test_command.txt
train_commands.txt		train_commands.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Table of Contents

About the project

Environment

Development

1. Dataset

2. Training

Results

User manual

Installation

Detection and evaluation

Training

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

RayanGarg/Logo_Detection

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

About the project

Environment

Development

1. Dataset

2. Training

Results

User manual

Installation

Detection and evaluation

Training

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages