The Learnable Typewriter: A Generative Approach to Text Analysis

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14805))

Included in the following conference series:

International Conference on Document Analysis and Recognition

347 Accesses
3 Citations

Abstract

We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real images of text and that they can be trained using weak supervision are significant progresses. Second, we show the potential of our method for new applications, more specifically in the field of palaeography, which studies the history and variations of handwriting, and for cipher analysis. We demonstrate our approach on four very different datasets: a printed volume of the Google1000 dataset [19, 48], the Copiale cipher [2, 27], a large scale multi-font benchmark [41], and historical handwritten charters from the 12th and early 13th century [6].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 49.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Line extraction in handwritten documents via instance segmentation

Article 21 May 2023

Start, Follow, Read: End-to-End Full-Page Handwriting Recognition

Text Line Segmentation on Ancient Egyptian Papyri: Layout Analysis with Object Detection Networks and Connected Components

References

Baird, H.S.: Model-directed document image analysis. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)
Google Scholar
Baró, A., Chen, J., Fornés, A., Megyesi, B.: Towards a generic unsupervised method for transcription of encoded manuscripts. In: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage (2019)
Google Scholar
Berg-Kirkpatrick, T., Durrett, G., Klein, D.: Unsupervised Transcription of Historical Documents. ACL (2013)
Google Scholar
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017)
Google Scholar
Burgess, C.P., Matthey, L., et al.: A.: MONet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)
Camps, J.B., Vidal-Gorène, C., Stutzmann, D., Vernet, M., Pinche, A.: Data diversity in handwritten text recognition: challenge or opportunity? Digital Humanities (2022)
Google Scholar
Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: AAAI (2019)
Google Scholar
Deng, F., Zhi, Z., Lee, D., Ahn, S.: Generative scene graph networks. In: ICLR (2020)
Google Scholar
Emami, P., He, P., Ranka, S., Rangarajan, A.: Efficient iterative amortized inference for learning symmetric and disentangled multi-object representations. In: ICML (2021)
Google Scholar
Eslami, S.M.A., et al.: Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. Advances in Neural Information Processing Systems (2016)
Google Scholar
Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: ScrabbleGAN: Semi-supervised varying length handwritten text generation. In: CVPR (2020)
Google Scholar
Garrette, D., Alpert-Abrams, H., Berg-Kirkpatrick, T., Klein, D.: Unsupervised code-switching for multilingual historical document transcription. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Goyal, K., Dyer, C., Warren, C., G’Sell, M., Berg-Kirkpatrick, T.: A probabilistic generative model for typographical analysis of early modern printing. In: ACL (2020)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)
Google Scholar
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NeurIPS (2008)
Google Scholar
Greff, K., et al.: Multi-object representation learning with iterative variational inference. In: ICML (2019)
Google Scholar
Greff, K., Van Steenkiste, S., Schmidhuber, J.: Neural expectation maximization. In: NeurIPS (2017)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Learning to read by spelling: Towards unsupervised text recognition. arXiv:1809.08675 [cs] (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. (1997)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial Transformer Networks. In: NeurIPS (2015)
Google Scholar
Jiang, J., Ahn, S.: Generative neurosymbolic machines. In: NeurIPS (2020)
Google Scholar
Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus-a service platform for transcription, recognition and retrieval of historical documents. In: ICDAR (2017)
Google Scholar
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recogn. (2022)
Google Scholar
Karazija, L., Laina, I., Rupprecht, C.: ClevrTex: a texture-rich benchmark for unsupervised multi-object segmentation. In: NeurIPS Datasets and Benchmarks (2021)
Google Scholar
Knight, K., Megyesi, B., Schaefer, C.: The Copiale Cipher. In: Proceedings of the ACL Workshop on Building and Using Comparable Corpora (2011)
Google Scholar
Kopec, G.E., Lomelin, M.: Document-specific character template estimation. In: Document Recognition III (1996)
Google Scholar
Kopec, G.E., Lomelin, M.: Supervised template estimation for document image decoding. IEEE Trans. Pattern Anal. Mach. Intell. (1997)
Google Scholar
Kopec, G.E., Said, M.R., Popat, K.: N-gram language models for document image decoding. In: Document Recognition and Retrieval IX (2001)
Google Scholar
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. (1989)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)
Google Scholar
Li, M., et al.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Google Scholar
Monnier, T., Aubry, M.: docExtractor: An off-the-shelf historical document element extraction. In: ICFHR (2020)
Google Scholar
Monnier, T., Groueix, T., Aubry, M.: Deep transformation-invariant clustering. In: NeurIPS (2020)
Google Scholar
Monnier, T., Vincent, E., Ponce, J., Aubry, M.: Unsupervised Layered Image Decomposition into Object Prototypes. In: ICCV (2021)
Google Scholar
Nolan, J.C., Filippini, R.: Method and apparatus for creating a high-fidelity glyph prototype from low-resolution glyph images (2010), uS Patent 7,702,182
Google Scholar
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR (2017)
Google Scholar
Reddy, P., Guerrero, P., Mitra, N.J.: Search for concepts: Discovering visual concepts using direct optimization. arXiv preprint arXiv:2210.14808 (2022)
Seuret, M., et al.: Combining ocr models for reading early modern books. ICDAR (2023)
Google Scholar
Smirnov, D., Gharbi, M., Fisher, M., Guizilini, V., Efros, A.A., Solomon, J.: MarioNette: self-supervised sprite learning. In: NeurIPS 2021 (2021)
Google Scholar
Souibgui, M.A., Fornés, A., Kessentini, Y., Tudor, C.: A few-shot learning approach for historical ciphered manuscript recognition. CoRR (2020)
Google Scholar
de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: Htr-flor: a deep learning system for offline handwritten text recognition. In: SIBGRAPI (2020)
Google Scholar
Srivatsan, N., Vega, J., Skelton, C., Berg-Kirkpatrick, T.: Neural representation learning for scribal hands of linear b. In: ICDAR 2021 Workshops (2021)
Google Scholar
Srivatsan, N., Wu, S., Barron, J., Berg Kirkpatrick, T.: Scalable font reconstruction with dual latent manifolds. In: EMNLP (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR (2007)
Google Scholar
Xu, Y., Nagy, G.: Prototype extraction and adaptive OCR. IEEE Trans. Pattern Analysis Mach. Intell. (1999)
Google Scholar
Yang, Y., Chen, Y., Soatto, S.: Learning to manipulate individual objects in an image. In: CVPR (2020)
Google Scholar
Ye, V., Li, Z., Tucker, R., Kanazawa, A., Snavely, N.: Deformable sprites for unsupervised video decomposition. In: CVPR (2022)
Google Scholar
Zhang, C., Gupta, A., Zisserman, A.: Adaptive text recognition through visual matching. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 51–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_4
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Vallée, France
Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier & Mathieu Aubry

Authors

Ioannis Siglidis
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Gonthier
View author publications
You can also search for this author in PubMed Google Scholar
Julien Gaubil
View author publications
You can also search for this author in PubMed Google Scholar
Tom Monnier
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Aubry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ioannis Siglidis .

Editor information

Editors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14694 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siglidis, I., Gonthier, N., Gaubil, J., Monnier, T., Aubry, M. (2024). The Learnable Typewriter: A Generative Approach to Text Analysis. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14805. Springer, Cham. https://doi.org/10.1007/978-3-031-70536-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-70536-6_18
Published: 03 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70535-9
Online ISBN: 978-3-031-70536-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

The Learnable Typewriter: A Generative Approach to Text Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Line extraction in handwritten documents via instance segmentation

Start, Follow, Read: End-to-End Full-Page Handwriting Recognition

Text Line Segmentation on Ancient Egyptian Papyri: Layout Analysis with Object Detection Networks and Connected Components

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 14694 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

The Learnable Typewriter: A Generative Approach to Text Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Line extraction in handwritten documents via instance segmentation

Start, Follow, Read: End-to-End Full-Page Handwriting Recognition

Text Line Segmentation on Ancient Egyptian Papyri: Layout Analysis with Object Detection Networks and Connected Components

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 14694 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation