Abstract
We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real images of text and that they can be trained using weak supervision are significant progresses. Second, we show the potential of our method for new applications, more specifically in the field of palaeography, which studies the history and variations of handwriting, and for cipher analysis. We demonstrate our approach on four very different datasets: a printed volume of the Google1000 dataset [19, 48], the Copiale cipher [2, 27], a large scale multi-font benchmark [41], and historical handwritten charters from the 12th and early 13th century [6].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baird, H.S.: Model-directed document image analysis. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)
Baró, A., Chen, J., Fornés, A., Megyesi, B.: Towards a generic unsupervised method for transcription of encoded manuscripts. In: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage (2019)
Berg-Kirkpatrick, T., Durrett, G., Klein, D.: Unsupervised Transcription of Historical Documents. ACL (2013)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017)
Burgess, C.P., Matthey, L., et al.: A.: MONet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)
Camps, J.B., Vidal-Gorène, C., Stutzmann, D., Vernet, M., Pinche, A.: Data diversity in handwritten text recognition: challenge or opportunity? Digital Humanities (2022)
Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: AAAI (2019)
Deng, F., Zhi, Z., Lee, D., Ahn, S.: Generative scene graph networks. In: ICLR (2020)
Emami, P., He, P., Ranka, S., Rangarajan, A.: Efficient iterative amortized inference for learning symmetric and disentangled multi-object representations. In: ICML (2021)
Eslami, S.M.A., et al.: Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. Advances in Neural Information Processing Systems (2016)
Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: ScrabbleGAN: Semi-supervised varying length handwritten text generation. In: CVPR (2020)
Garrette, D., Alpert-Abrams, H., Berg-Kirkpatrick, T., Klein, D.: Unsupervised code-switching for multilingual historical document transcription. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Goyal, K., Dyer, C., Warren, C., G’Sell, M., Berg-Kirkpatrick, T.: A probabilistic generative model for typographical analysis of early modern printing. In: ACL (2020)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NeurIPS (2008)
Greff, K., et al.: Multi-object representation learning with iterative variational inference. In: ICML (2019)
Greff, K., Van Steenkiste, S., Schmidhuber, J.: Neural expectation maximization. In: NeurIPS (2017)
Gupta, A., Vedaldi, A., Zisserman, A.: Learning to read by spelling: Towards unsupervised text recognition. arXiv:1809.08675 [cs] (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. (1997)
Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial Transformer Networks. In: NeurIPS (2015)
Jiang, J., Ahn, S.: Generative neurosymbolic machines. In: NeurIPS (2020)
Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus-a service platform for transcription, recognition and retrieval of historical documents. In: ICDAR (2017)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recogn. (2022)
Karazija, L., Laina, I., Rupprecht, C.: ClevrTex: a texture-rich benchmark for unsupervised multi-object segmentation. In: NeurIPS Datasets and Benchmarks (2021)
Knight, K., Megyesi, B., Schaefer, C.: The Copiale Cipher. In: Proceedings of the ACL Workshop on Building and Using Comparable Corpora (2011)
Kopec, G.E., Lomelin, M.: Document-specific character template estimation. In: Document Recognition III (1996)
Kopec, G.E., Lomelin, M.: Supervised template estimation for document image decoding. IEEE Trans. Pattern Anal. Mach. Intell. (1997)
Kopec, G.E., Said, M.R., Popat, K.: N-gram language models for document image decoding. In: Document Recognition and Retrieval IX (2001)
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. (1989)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)
Li, M., et al.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)
Monnier, T., Aubry, M.: docExtractor: An off-the-shelf historical document element extraction. In: ICFHR (2020)
Monnier, T., Groueix, T., Aubry, M.: Deep transformation-invariant clustering. In: NeurIPS (2020)
Monnier, T., Vincent, E., Ponce, J., Aubry, M.: Unsupervised Layered Image Decomposition into Object Prototypes. In: ICCV (2021)
Nolan, J.C., Filippini, R.: Method and apparatus for creating a high-fidelity glyph prototype from low-resolution glyph images (2010), uS Patent 7,702,182
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR (2017)
Reddy, P., Guerrero, P., Mitra, N.J.: Search for concepts: Discovering visual concepts using direct optimization. arXiv preprint arXiv:2210.14808 (2022)
Seuret, M., et al.: Combining ocr models for reading early modern books. ICDAR (2023)
Smirnov, D., Gharbi, M., Fisher, M., Guizilini, V., Efros, A.A., Solomon, J.: MarioNette: self-supervised sprite learning. In: NeurIPS 2021 (2021)
Souibgui, M.A., Fornés, A., Kessentini, Y., Tudor, C.: A few-shot learning approach for historical ciphered manuscript recognition. CoRR (2020)
de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: Htr-flor: a deep learning system for offline handwritten text recognition. In: SIBGRAPI (2020)
Srivatsan, N., Vega, J., Skelton, C., Berg-Kirkpatrick, T.: Neural representation learning for scribal hands of linear b. In: ICDAR 2021 Workshops (2021)
Srivatsan, N., Wu, S., Barron, J., Berg Kirkpatrick, T.: Scalable font reconstruction with dual latent manifolds. In: EMNLP (2021)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR (2007)
Xu, Y., Nagy, G.: Prototype extraction and adaptive OCR. IEEE Trans. Pattern Analysis Mach. Intell. (1999)
Yang, Y., Chen, Y., Soatto, S.: Learning to manipulate individual objects in an image. In: CVPR (2020)
Ye, V., Li, Z., Tucker, R., Kanazawa, A., Snavely, N.: Deformable sprites for unsupervised video decomposition. In: CVPR (2022)
Zhang, C., Gupta, A., Zisserman, A.: Adaptive text recognition through visual matching. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 51–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Siglidis, I., Gonthier, N., Gaubil, J., Monnier, T., Aubry, M. (2024). The Learnable Typewriter: A Generative Approach to Text Analysis. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14805. Springer, Cham. https://doi.org/10.1007/978-3-031-70536-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-70536-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70535-9
Online ISBN: 978-3-031-70536-6
eBook Packages: Computer ScienceComputer Science (R0)