Abstract
Nowadays, multilingual text recognition is more and more widely used in computer vision. However, in practical applications, the independent modeling of each language cannot make full use of the information between different languages and consumes hardware resources very much, which makes the unified modeling of multiple languages very necessary. A natural approach to unified multilingual modeling is to combine modeling units (characters, subwords, or words) from all languages into a large vocabulary, and then use a sequence-to-sequence approach to modeling. However, this vocabulary is often very large making modeling difficult. In this paper, we propose a byte-based multilingual text recognition method, which makes the vocabulary size only 256, which effectively solves the problem of unified modeling. The experiments show that our method effectively utilizes the information between different languages and outperforms the baseline of independent modeling by a large margin.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
del Agua, M.A., Serrano, N., Civera, J., Juan, A.: Character-based handwritten text recognition of multilingual documents. In: Torre Toledano, D., et al. (eds.) IberSPEECH 2012. CCIS, vol. 328, pp. 187–196. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35292-8_20
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
Bettels, J., Bishop, F.A.: Unicode: a universal character code. Digit. Tech. J. 5(3), 21–31 (1993)
Brown, P.F., Della Pietra, V.J., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–480 (1992)
Gillick, D., Brunk, C., Vinyals, O., Subramanya, A.: Multilingual language processing from bytes. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1296–1306. Association for Computational Linguistics, San Diego, California (2016). https://doi.org/10.18653/v1/N16-1155
Graves, A.: Supervised sequence labelling. in: supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol. 385. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2_2
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)
Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation (2014)
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. Computer Science (2014)
Li, B., Zhang, Y., Sainath, T., Wu, Y., Chan, W.: Bytes are all you need: End-to-end multilingual speech recognition and synthesis with bytes. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5621–5625. IEEE (2019)
Ma, L.L., Liu, C.L.: On-line handwritten Chinese character recognition based on nested segmentation of radicals. In: 2009 Chinese Conference on Pattern Recognition, pp. 1–5. IEEE (2009)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech. vol. 2, pp. 1045–1048. Makuhari (2010)
Mikolov, T., Sutskever, I., Deoras, A., Le, H.S., Kombrink, S., Cernocký, J.H.: Subword language modeling with neural networks (2011)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics, Berlin, Germany (2016). https://doi.org/10.18653/v1/P16-1162
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Smith, L.N., Topin, N.: Super-convergence: Very fast training of neural networks using large learning rates (2017)
Snyder, B., Barzilay, R.: Unsupervised multilingual learning for morphological segmentation. In: Proceedings of ACL-08: HLT, pp. 737–745. Association for Computational Linguistics, Columbus, Ohio (2008)
Tian, S., et al.: Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn. 51, 125–134 (2016)
Tiedemann, J.: Character-based PSMT for closely related languages. In: Proceedings of the 13th Annual Conference of the European Association for Machine Translation. European Association for Machine Translation, Barcelona, Spain (2009)
Tu, Z., Lu, Z., Yang, L., Liu, X., Hang, L.: Modeling coverage for neural machine translation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2016)
Vilar, D., Peter, J.T., Ney, H.: Can we translate letters? In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 33–39. Association for Computational Linguistics, Prague, Czech Republic (2007)
Wang, C., Cho, K., Gu, J.: Neural machine translation with byte-level subwords. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 9154–9160 (2020)
Wang, Q.F., Yin, F., Liu, C.L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1469–1481 (2011)
Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6538–6547 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, J., Zhao, K., Yang, Z., Yin, B., Liu, C., Dai, L. (2023). End-to-End Multilingual Text Recognition Based on Byte Modeling. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14357. Springer, Cham. https://doi.org/10.1007/978-3-031-46311-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-46311-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46310-5
Online ISBN: 978-3-031-46311-2
eBook Packages: Computer ScienceComputer Science (R0)