Abstract
The transcription of handwritten documents is useful to make their contents accessible to the general public. However, so far automatic transcription of historical documents has mostly focused on producing diplomatic transcripts, even if such transcripts are often only understandable by experts. Main difficulties come from the heavy use of extremely abridged and tangled abbreviations and archaic or outdated word forms. Here we study different approaches to train optical models which allow to recognize historic document images containing archaic and abbreviated handwritten text and produce modernized transcripts with expanded abbreviations. Experiments comparing the performance of the different approaches proposed are carried out on a document collection related with Spanish naval commerce during the XV–XIX centuries, which includes extremely difficult handwritten text images.
Work partially supported by the BBVA Foundation through the 2017–2018 Digital Humanities research grant “Carabela”, by Miniterio de Ciencia/AEI/FEDER/EU through the MIRANDA-DocTIUM project (RTI2018-095645-B-C22), and by EU JPICH project “HOME – History Of Medieval Europe” (Spanish PEICTI Ref. PCI2018-093122).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bloomberg, D.S., Kopec, G.E., Dasari, L.: Measuring document image skew and orientation. In: SPIE, vol. 2422, pp. 302–316 (1995)
Bluche, T., et al.: Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project. In: 2017 14th ICDAR, vol. 01, pp. 311–316 (2017)
Bluche, T., Ney, H., Kermorvant, C.: The LIMSI/A2iA handwriting recognition systems for the HTRtS contest. In: ICDAR, pp. 448–452 (2015)
Bluche, T.: Deep neural networks for large vocabulary handwritten text recognition. Ph.D. thesis, Ecole Doctorale Informatique de Paris-Sud, May 2015
Buse, R., Liu, Z., Caelli, T.: A structural and relational approach to handwritten word recognition. IEEE Trans. SMCS, Part B 27(5), 847–861 (1997)
España-Boquera, S., Castro-Bleda, M., Gorbe-Moya, J., Zamora-Martínez, F.: Improving offline handwriting text recognition with hybrid HMM/ANN models. IEEE Trans. PAMI 33(4), 767–779 (2011)
Fawzi, A., Gadea, M.P., Martínez-Hinarejos, C.D.: Baseline detection on Arabic handwritten documents. In: Proceedings of the 2017 ACM Symposium on Document Engineering, DocEng 2017, pp. 193–196. ACM (2017)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. PAMI 31(5), 855–868 (2009)
Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: ICASSP 1995, vol. 1, pp. 181–184. IEEE Computer Society (1995)
Leiva, L.A., Toselli, A.H., Bordes-Cabrera, I., Hernández-Tornero, C., Vidal, E., Bosch, V.: Transcribing a 17th-century botanical manuscript: longitudinal evaluation of document layout detection and interactive transcription. Digit. Scholarsh. Humanit. 33(1), 173–202 (2017)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: International Conference on Machine Learning, vol. 30 (2013)
Moysset, B., et al.: The A2iA multi-lingual text recognition system at the second Maurdor evaluation. In: ICFHR, pp. 297–302 (2014)
Pham, V., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. CoRR abs/1312.4569 (2013)
Povey, D., et al.: The Kaldi speech recognition toolkit. In: ASRU, December 2011
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR, vol. 01, pp. 67–72 (2017)
Quirós, L., Bosch, V., Serrano, L., Toselli, A.H., Vidal, E.: From HMMs to RNNs: computer-assisted transcription of a handwritten notarial records collection. In: 2018 16th International Conference on Frontiers in Handwriting Recognition, pp. 116–121 (2018)
Roeder, P.: Adapting the RWTH-OCR handwriting recognition system to French handwriting. Ph.D. thesis, RWTH Aachen University, Aachen, Germany (2009)
Romero, V., Toselli, A.H., Sánchez, J.A., Vidal, E.: Handwriting transcription and keyword spotting in historical daily records documents. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 275–280, April 2016
Romero, V., Toselli, A.H., Vidal, E.: Multimodal Interactive Handwritten Text Transcription. Series in MPAI. World Scientific Publishing, Singapore (2012)
Sánchez, J.A., Bosch, V., Romero, V., Depuydt, K., de Does, J.: Handwritten text recognition for historical documents in the transcriptorium project. In: Proceedings of the DATeCH 2014, pp. 111–117, New York, NY, USA (2014)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs/1507.05717 (2015)
Stolcke, A.: SRILM—an extensible language modeling toolkit. In: The 7th International Conference on Spoken Language Processing (ICSLP 2002), vol. 2, July 2004
Tieleman, T., Hinton, G.: Lecture 6.5-RMSProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4(2), 26–30 (2012)
Toselli, A.H., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical n-gram-HMM methods. In: International Workshop on Historical Document Imaging and Processing, pp. 15–22 (2015)
Villegas, M., Romero, V., Sánchez, J.A.: On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Paredes, R., Cardoso, J.S., Pardo, X.M. (eds.) IbPRIA 2015. LNCS, vol. 9117, pp. 208–215. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19390-8_24
Villegas, M., Toselli, A.H., Romero, V., Vidal, E.: Exploiting existing modern transcripts for historical handwritten text recognition. In: 2016 ICFHR, pp. 66–71, October 2016
Vinciarelli, A., Luettin, J.: A new normalization technique for cursive handwritten words. Pattern Recogn. Lett. 22(9), 1043–1050 (2001)
Vinciarelli, A., Bengio, S., Bunke, H.: Off-line recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. PAMI 26(6), 709–720 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Romero, V., Toselli, A.H., Vidal, E., Sánchez, J.A., Alonso, C., Marqués, L. (2019). Modern vs Diplomatic Transcripts for Historical Handwritten Text Recognition. In: Cristani, M., Prati, A., Lanz, O., Messelodi, S., Sebe, N. (eds) New Trends in Image Analysis and Processing – ICIAP 2019. ICIAP 2019. Lecture Notes in Computer Science(), vol 11808. Springer, Cham. https://doi.org/10.1007/978-3-030-30754-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-30754-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30753-0
Online ISBN: 978-3-030-30754-7
eBook Packages: Computer ScienceComputer Science (R0)