[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

The Learnable Typewriter: A Generative Approach to Text Analysis

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

We present a generative document-specific approach to character analysis and recognition in text lines. Our main idea is to build on unsupervised multi-object segmentation methods and in particular those that reconstruct images based on a limited amount of visual elements, called sprites. Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters and leverage line-level annotations when available. Our contribution is twofold. First, we provide the first adaptation and evaluation of a deep unsupervised multi-object segmentation approach for text line analysis. Since these methods have mainly been evaluated on synthetic data in a completely unsupervised setting, demonstrating that they can be adapted and quantitatively evaluated on real images of text and that they can be trained using weak supervision are significant progresses. Second, we show the potential of our method for new applications, more specifically in the field of palaeography, which studies the history and variations of handwriting, and for cipher analysis. We demonstrate our approach on four very different datasets: a printed volume of the Google1000 dataset [19, 48], the Copiale cipher [2, 27], a large scale multi-font benchmark [41], and historical handwritten charters from the 12th and early 13th century [6].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 49.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 64.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Baird, H.S.: Model-directed document image analysis. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)

    Google Scholar 

  2. Baró, A., Chen, J., Fornés, A., Megyesi, B.: Towards a generic unsupervised method for transcription of encoded manuscripts. In: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage (2019)

    Google Scholar 

  3. Berg-Kirkpatrick, T., Durrett, G., Klein, D.: Unsupervised Transcription of Historical Documents. ACL (2013)

    Google Scholar 

  4. Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017)

    Google Scholar 

  5. Burgess, C.P., Matthey, L., et al.: A.: MONet: Unsupervised scene decomposition and representation. arXiv preprint arXiv:1901.11390 (2019)

  6. Camps, J.B., Vidal-Gorène, C., Stutzmann, D., Vernet, M., Pinche, A.: Data diversity in handwritten text recognition: challenge or opportunity? Digital Humanities (2022)

    Google Scholar 

  7. Crawford, E., Pineau, J.: Spatially invariant unsupervised object detection with convolutional neural networks. In: AAAI (2019)

    Google Scholar 

  8. Deng, F., Zhi, Z., Lee, D., Ahn, S.: Generative scene graph networks. In: ICLR (2020)

    Google Scholar 

  9. Emami, P., He, P., Ranka, S., Rangarajan, A.: Efficient iterative amortized inference for learning symmetric and disentangled multi-object representations. In: ICML (2021)

    Google Scholar 

  10. Eslami, S.M.A., et al.: Attend, Infer, Repeat: Fast Scene Understanding with Generative Models. Advances in Neural Information Processing Systems (2016)

    Google Scholar 

  11. Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: ScrabbleGAN: Semi-supervised varying length handwritten text generation. In: CVPR (2020)

    Google Scholar 

  12. Garrette, D., Alpert-Abrams, H., Berg-Kirkpatrick, T., Klein, D.: Unsupervised code-switching for multilingual historical document transcription. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)

    Google Scholar 

  13. Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  14. Goyal, K., Dyer, C., Warren, C., G’Sell, M., Berg-Kirkpatrick, T.: A probabilistic generative model for typographical analysis of early modern printing. In: ACL (2020)

    Google Scholar 

  15. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: ICML (2006)

    Google Scholar 

  16. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: NeurIPS (2008)

    Google Scholar 

  17. Greff, K., et al.: Multi-object representation learning with iterative variational inference. In: ICML (2019)

    Google Scholar 

  18. Greff, K., Van Steenkiste, S., Schmidhuber, J.: Neural expectation maximization. In: NeurIPS (2017)

    Google Scholar 

  19. Gupta, A., Vedaldi, A., Zisserman, A.: Learning to read by spelling: Towards unsupervised text recognition. arXiv:1809.08675 [cs] (2018)

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  21. Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. (1997)

    Google Scholar 

  22. Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial Transformer Networks. In: NeurIPS (2015)

    Google Scholar 

  23. Jiang, J., Ahn, S.: Generative neurosymbolic machines. In: NeurIPS (2020)

    Google Scholar 

  24. Kahle, P., Colutto, S., Hackl, G., Mühlberger, G.: Transkribus-a service platform for transcription, recognition and retrieval of historical documents. In: ICDAR (2017)

    Google Scholar 

  25. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recogn. (2022)

    Google Scholar 

  26. Karazija, L., Laina, I., Rupprecht, C.: ClevrTex: a texture-rich benchmark for unsupervised multi-object segmentation. In: NeurIPS Datasets and Benchmarks (2021)

    Google Scholar 

  27. Knight, K., Megyesi, B., Schaefer, C.: The Copiale Cipher. In: Proceedings of the ACL Workshop on Building and Using Comparable Corpora (2011)

    Google Scholar 

  28. Kopec, G.E., Lomelin, M.: Document-specific character template estimation. In: Document Recognition III (1996)

    Google Scholar 

  29. Kopec, G.E., Lomelin, M.: Supervised template estimation for document image decoding. IEEE Trans. Pattern Anal. Mach. Intell. (1997)

    Google Scholar 

  30. Kopec, G.E., Said, M.R., Popat, K.: N-gram language models for document image decoding. In: Document Recognition and Retrieval IX (2001)

    Google Scholar 

  31. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. (1989)

    Google Scholar 

  32. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)

    Google Scholar 

  33. Li, M., et al.: Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)

  34. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR (2019)

    Google Scholar 

  35. Monnier, T., Aubry, M.: docExtractor: An off-the-shelf historical document element extraction. In: ICFHR (2020)

    Google Scholar 

  36. Monnier, T., Groueix, T., Aubry, M.: Deep transformation-invariant clustering. In: NeurIPS (2020)

    Google Scholar 

  37. Monnier, T., Vincent, E., Ponce, J., Aubry, M.: Unsupervised Layered Image Decomposition into Object Prototypes. In: ICCV (2021)

    Google Scholar 

  38. Nolan, J.C., Filippini, R.: Method and apparatus for creating a high-fidelity glyph prototype from low-resolution glyph images (2010), uS Patent 7,702,182

    Google Scholar 

  39. Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: ICDAR (2017)

    Google Scholar 

  40. Reddy, P., Guerrero, P., Mitra, N.J.: Search for concepts: Discovering visual concepts using direct optimization. arXiv preprint arXiv:2210.14808 (2022)

  41. Seuret, M., et al.: Combining ocr models for reading early modern books. ICDAR (2023)

    Google Scholar 

  42. Smirnov, D., Gharbi, M., Fisher, M., Guizilini, V., Efros, A.A., Solomon, J.: MarioNette: self-supervised sprite learning. In: NeurIPS 2021 (2021)

    Google Scholar 

  43. Souibgui, M.A., Fornés, A., Kessentini, Y., Tudor, C.: A few-shot learning approach for historical ciphered manuscript recognition. CoRR (2020)

    Google Scholar 

  44. de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: Htr-flor: a deep learning system for offline handwritten text recognition. In: SIBGRAPI (2020)

    Google Scholar 

  45. Srivatsan, N., Vega, J., Skelton, C., Berg-Kirkpatrick, T.: Neural representation learning for scribal hands of linear b. In: ICDAR 2021 Workshops (2021)

    Google Scholar 

  46. Srivatsan, N., Wu, S., Barron, J., Berg Kirkpatrick, T.: Scalable font reconstruction with dual latent manifolds. In: EMNLP (2021)

    Google Scholar 

  47. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  48. Vincent, L.: Google book search: document understanding on a massive scale. In: ICDAR (2007)

    Google Scholar 

  49. Xu, Y., Nagy, G.: Prototype extraction and adaptive OCR. IEEE Trans. Pattern Analysis Mach. Intell. (1999)

    Google Scholar 

  50. Yang, Y., Chen, Y., Soatto, S.: Learning to manipulate individual objects in an image. In: CVPR (2020)

    Google Scholar 

  51. Ye, V., Li, Z., Tucker, R., Kanazawa, A., Snavely, N.: Deformable sprites for unsupervised video decomposition. In: CVPR (2022)

    Google Scholar 

  52. Zhang, C., Gupta, A., Zisserman, A.: Adaptive text recognition through visual matching. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 51–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_4

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis Siglidis .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14694 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Siglidis, I., Gonthier, N., Gaubil, J., Monnier, T., Aubry, M. (2024). The Learnable Typewriter: A Generative Approach to Text Analysis. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14805. Springer, Cham. https://doi.org/10.1007/978-3-031-70536-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70536-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70535-9

  • Online ISBN: 978-3-031-70536-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics