Indian-Scene-Text-Detection

The Indian scene text detection model is developed as part of the work towards Indian Signboard Translation Project by AI4Bharat. I worked on this project under the mentorship of Mitesh Khapra and Pratyush Kumar from IIT Madras.

Indian Signboard Translation involves 4 modular tasks:

T1: Detection: Detecting bounding boxes containing text in the images
T2: Classification: Classifying the language of the text in the bounding box identifed by T1
T3: Recognition: Getting the text from the detected crop by T1 using the T2 classified recognition model
T4: Translation: Translating text from T3 from one Indian language to other Indian language

Note: T2: Classification is not updated in the above picture

Dataset

Indian Scene Text Detection Dataset(D1-Big + D1-English) is used for training the detection model and evaluation. Axis-Aligned Bounding Box representation of the text boxes are used.

Labels

The score map for an image is the region with in the shrinked bounding box. The geometry map at a point inside the bounding box represents the distance of that point to the left, top, right and bottom boundaries respectively.

Model

The fully convolutional neural network proposed in the paper titled "An Efficient and Accurate Scene Text Detector" (EAST) is used to predict the word instance regions and their geometries. The following two variants of the model are experimented:

M1: Pretrained VGG-16 net as a feature extractor. It produces output in the reduced dimensions by a factor of 4.

Input Image Shape: [320, 320, 3]
Output Score Map Shape: [80, 80, 1]
Output Geometry Map Shape: [80, 80, 4]

M2: U-Net for feature extractor and merging. It produces per pixel predictions of text regions and geometries.

Input Image Shape: [320, 320, 3]
Output Score Map Shape: [320, 320, 1]
Output Geometry Map Shape: [320, 320, 4]

Non-Maximal Supression (NMS) is performed to remove the overlapping bounding boxes with the maximum permitted IoU threshold of 0.1.

For detailed model architecture, check the file model.py

Sample Input-Output

Training

M1 & M2 converged to simliar score and geometry losses after training for a specific number of epochs. As M1is significantly efficient in memory and computation, it is selected over M2. The detection model is trained for 30 epochs. The model weights are saved every 3 epochs and you can find them in the Models directory.

The final hyperparameters can be accessed in config.yaml

Performance

The lowest validation loss is observed in epoch 12. Hence, the model Models/EAST-Detector-e12.pth is used to evaluate the detection performance. In the NMS stage, minimum score threshold is set as 0.85 and maximum permitted IoU threshold is set as 0.2

Minimum IoU threshold for the predicted bounding boxes to be considered as correct is set as 0.70

Metric	Precision	Recall	F1-Score
Trainset	0.311847	0.360114	0.426558
Valset	0.331797	0.384548	0.446315
Testset	0.267891	0.343183	0.343183

Sample Detections:

Code

Model: model.py
Merging English Data: 0-Merge-English-Data.ipynb
Training: 1-Indian-Scene-Text-Detection-Training
Training Visualisation: 2-MLFlow-Training-Visualisation
Prediction: 3-Indian-Scene-Text-Detection-Prediction
Evaluation: 4-Indian-Scene-Text-Detection-Evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Images		Images
Models		Models
0-Merge-English-Data.ipynb		0-Merge-English-Data.ipynb
1-Indian-Scene-Text-Detection-Training.ipynb		1-Indian-Scene-Text-Detection-Training.ipynb
2-MLFlow-Training-Visualisation.ipynb		2-MLFlow-Training-Visualisation.ipynb
3-Indian-Scene-Text-Detection-Prediction.ipynb		3-Indian-Scene-Text-Detection-Prediction.ipynb
4-Indian-Scene-Text-Detection-Evaluation.ipynb		4-Indian-Scene-Text-Detection-Evaluation.ipynb
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Indian-Scene-Text-Detection

Dataset

Labels

Model

Training

Performance

Code

Related Links:

References:

About

Uh oh!

Releases

Packages

Languages

License

gokulkarthik/Indian-Scene-Text-Detection

Folders and files

Latest commit

History

Repository files navigation

Indian-Scene-Text-Detection

Dataset

Labels

Model

Training

Performance

Code

Related Links:

References:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages