In this repository we share the BIR research database for identifying typewritten emphasis in list-like historical documents.
The repository contains:
-
The original page images.
-
Ground truth: Word bounding boxes together with the style of the words (regular, italic, bold).
-
A visual representation of the ground truth.
-
Training, validation, and test sets of the benchmark experiments described in the following paper: “Anna Scius-Bertrand, Simon Gabay, Ljudmila Petković, Juliette Janes, Caroline Corbières, and Thibault Clérice. 2020. The BIR database – Identifying typographic emphasis in list-like historical documents. In HIP’21: 6th International Workshop on Historical Document Imaging and Processing, September 05–07, 2021, Lausanne, CH. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3423603.3424002”
-
Source code of the classification experiment for style detection (bold, italic, regular) in individual word images.