Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.07682 (cs)

[Submitted on 16 Aug 2022]

Title:The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Authors:Silvia Cascianelli, Vittorio Pippi, Martin Maarand, Marcella Cornia, Lorenzo Baraldi, Christopher Kermorvant, Rita Cucchiara

View PDF

Abstract:Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting -- even of the same author over a wide time-span -- and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both configurations, we analyze quantitative and qualitative characteristics, also with respect to other line-level HTR benchmarks, and present the recognition performance of state-of-the-art HTR architectures. The dataset is available for download at \url{this https URL}.

Comments:	Accepted at ICPR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Digital Libraries (cs.DL)
Cite as:	arXiv:2208.07682 [cs.CV]
	(or arXiv:2208.07682v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.07682

Submission history

From: Silvia Cascianelli PhD [view email]
[v1] Tue, 16 Aug 2022 11:44:16 UTC (18,446 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators