End-to-End Multilingual Text Recognition Based on Byte Modeling

Jiajia Wu^14,15,
Kun Zhao¹⁵,
Zhengyan Yang¹⁵,
Bing Yin¹⁵,
Cong Liu¹⁴ &
…
Lirong Dai¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14357))

Included in the following conference series:

International Conference on Image and Graphics

576 Accesses
1 Citations

Abstract

Nowadays, multilingual text recognition is more and more widely used in computer vision. However, in practical applications, the independent modeling of each language cannot make full use of the information between different languages and consumes hardware resources very much, which makes the unified modeling of multiple languages very necessary. A natural approach to unified multilingual modeling is to combine modeling units (characters, subwords, or words) from all languages into a large vocabulary, and then use a sequence-to-sequence approach to modeling. However, this vocabulary is often very large making modeling difficult. In this paper, we propose a byte-based multilingual text recognition method, which makes the vocabulary size only 256, which effectively solves the problem of unified modeling. The experiments show that our method effectively utilizes the information between different languages and outperforms the baseline of independent modeling by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 51.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 64.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MUGS: A Multiple Granularity Semi-supervised Method for Text Recognition

Multi-granularity Prediction for Scene Text Recognition

Speed-Up Pre-trained Vision Encoder–Decoder Transformers by Leveraging Lightweight Mixer Layers for Text Recognition

References

del Agua, M.A., Serrano, N., Civera, J., Juan, A.: Character-based handwritten text recognition of multilingual documents. In: Torre Toledano, D., et al. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 187–196. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35292-8_20
Chapter Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
Google Scholar
Bettels, J., Bishop, F.A.: Unicode: a universal character code. Digit. Tech. J. 5(3), 21–31 (1993)
Google Scholar
Brown, P.F., Della Pietra, V.J., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–480 (1992)
Google Scholar
Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1296–1306. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-1155
Graves, A.: Supervised sequence labelling. in: supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol. 385. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2_2
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)
Google Scholar
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation (2014)
Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. Computer Science (2014)
Google Scholar
Li, B., Zhang, Y., Sainath, T., Wu, Y., Chan, W.: Bytes are all you need: End-to-end multilingual speech recognition and synthesis with bytes. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5621–5625. IEEE (2019)
Google Scholar
Ma, L.L., Liu, C.L.: On-line handwritten Chinese character recognition based on nested segmentation of radicals. In: 2009 Chinese Conference on Pattern Recognition, pp. 1–5. IEEE (2009)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech. vol. 2, pp. 1045–1048. Makuhari (2010)
Google Scholar
Mikolov, T., Sutskever, I., Deoras, A., Le, H.S., Kombrink, S., Cernocký, J.H.: Subword language modeling with neural networks (2011)
Google Scholar
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany (2016). https://doi.org/10.18653/v1/P16-1162
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Smith, L.N., Topin, N.: Super-convergence: Very fast training of neural networks using large learning rates (2017)
Google Scholar
Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation. In: Proceedings of ACL-08: HLT, pp. 737–745. Association for Computational Linguistics, Columbus, Ohio (2008)
Google Scholar
Tian, S., et al.: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn. 51, 125–134 (2016)
Article Google Scholar
Tiedemann, J.: Character-based PSMT for closely related languages. In: Proceedings of the 13th Annual Conference of the European Association for Machine Translation. European Association for Machine Translation, Barcelona, Spain (2009)
Google Scholar
Tu, Z., Lu, Z., Yang, L., Liu, X., Hang, L.: Modeling coverage for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2016)
Google Scholar
Vilar, D., Peter, J.T., Ney, H.: Can we translate letters? In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 33–39. Association for Computational Linguistics, Prague, Czech Republic (2007)
Google Scholar
Wang, C., Cho, K., Gu, J.: Neural machine translation with byte-level subwords. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 9154–9160 (2020)
Google Scholar
Wang, Q.F., Yin, F., Liu, C.L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1469–1481 (2011)
Article Google Scholar
Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6538–6547 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Jiajia Wu, Cong Liu & Lirong Dai
IFLYTEK Research, Hefei, China
Jiajia Wu, Kun Zhao, Zhengyan Yang & Bing Yin

Authors

Jiajia Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bing Yin
View author publications
You can also search for this author in PubMed Google Scholar
Cong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lirong Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiajia Wu .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Huchuan Lu
University of Sydney, Sydney, NSW, Australia
Wanli Ouyang
Shenzhen University, Shenzhen, China
Hui Huang
Tsinghua University, Beijing, China
Jiwen Lu
Dalian University of Technology, Dalian, China
Risheng Liu
Institute of Automation, CAS, Beijing, China
Jing Dong
University of Technology Sydney, Sydney, NSW, Australia
Min Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Zhao, K., Yang, Z., Yin, B., Liu, C., Dai, L. (2023). End-to-End Multilingual Text Recognition Based on Byte Modeling. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14357. Springer, Cham. https://doi.org/10.1007/978-3-031-46311-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-46311-2_11
Published: 29 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46310-5
Online ISBN: 978-3-031-46311-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

End-to-End Multilingual Text Recognition Based on Byte Modeling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MUGS: A Multiple Granularity Semi-supervised Method for Text Recognition

Multi-granularity Prediction for Scene Text Recognition

Speed-Up Pre-trained Vision Encoder–Decoder Transformers by Leveraging Lightweight Mixer Layers for Text Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

End-to-End Multilingual Text Recognition Based on Byte Modeling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MUGS: A Multiple Granularity Semi-supervised Method for Text Recognition

Multi-granularity Prediction for Scene Text Recognition

Speed-Up Pre-trained Vision Encoder–Decoder Transformers by Leveraging Lightweight Mixer Layers for Text Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation