[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-09037-0_27guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Scene Text Recognition: An Overview

Published: 01 June 2022 Publication History

Abstract

Recent years have witnessed increasing interest in recognizing text in natural scenes in both academia and industry due to the rich semantic information carried by text. With the rapid development of deep learning technology, text recognition in natural scene, also known as scene text recognition (STR), has also made breakthrough progress. However, noise interference in natural scene such as extreme illumination and occlusion, as well as other factors, lead huge challenges to it. Recent research has shown promising in terms of accuracy and efficiency. In order to present the entire picture of the field of STR, this paper try to: 1) summarize the fundamental problems of STR and the progress of representative STR algorithms in recent years; 2) analyze and compare the advantages and disadvantages of them; 3) point out directions for future work to inspire future research.

References

[1]
Liao M, Shi B, and Bai X Textboxes++: A single-shot oriented scene text detector IEEE Trans. Image Process. 2018 27 8 3676-3690
[2]
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).
[3]
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
[4]
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI Conference on Artificial Intelligence (2017)
[5]
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
[6]
Ma C, Sun L, Zhong Z, and Huo Q ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks Pattern Recogn. 2021 111
[7]
Wang, X., Zheng, S., Zhang, C., Li, R., Gui, L.: R-YOLO: a real-time text detector for natural scenes with arbitrary rotation. Sensors 21(3), 888 (2021)
[8]
Xiao, L., Zhou, P., Xu, K., Zhao, X.: Multi-directional scene text detection based on improved YOLOv3. Sensors 21(14), 4870 (2021)
[9]
Long S, Ruan J, Zhang W, He X, Wu W, and Yao C Ferrari V, Hebert M, Sminchisescu C, and Weiss Y TextSnake: a flexible representation for detecting text of arbitrary shapes Computer Vision – ECCV 2018 2018 Cham Springer 19-35
[10]
Xie, E., et al.: Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 9038–9045 (2019)
[11]
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
[12]
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
[13]
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
[14]
Tian, Z., et al.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)
[15]
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, and Bai X TextField: learning a deep direction field for irregular scene text detection IEEE Trans. Image Process. 2019 28 11 5566-5579
[16]
Zhu Y and Du J Textmountain: accurate scene text detection via instance segmentation Pattern Recogn. 2021 110
[17]
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11474–11481 (2020)
[18]
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
[19]
Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao XZ, and Roy K Movie title extraction and script separation using shallow convolution neural network IEEE Access 2021 9 125184-125201
[20]
Zhang, C., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10552–10561 (2019)
[21]
He, M., et al.: MOST: a multi-oriented scene text detector with localization refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8813–8822 (2021)
[22]
Shi B, Bai X, and Yao C An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition IEEE Trans. Pattern Anal. Mach. Intell. 2016 39 11 2298-2304
[23]
Wang, J., Hu, X.: Gated recurrent convolution neural network for OCR. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 334–343 (2017)
[24]
Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: STAR-Net: a spatial attention residue network for scene text recognition. In: BMVC, vol. 2, p. 7 (2016)
[25]
Liu H, Jin S, and Zhang C Connectionist temporal classification with maximum entropy regularization Adv. Neural. Inf. Process. Syst. 2018 31 831-841
[26]
Yin, F., Wu, Y.C., Zhang, X.Y., Liu, C.L.: Scene text recognition with sliding convolutional character models. arXiv preprint arXiv:1709.01727(2017)
[27]
Gao Y, Chen Y, Wang J, Tang M, and Lu H Reading scene text with fully convolutional sequence modeling Neurocomputing 2019 339 161-170
[28]
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
[29]
Shi B, Yang M, Wang X, Lyu P, Yao C, and Bai X Aster: An attentional scene text recognizer with flexible rectification IEEE Trans. Pattern Anal. Mach. Intell. 2018 41 9 2035-2048
[30]
Luo C, Jin L, and Sun Z MORAN: a multi-object rectified attention network for scene text recognition Pattern Recognt. 2019 90 109-118
[31]
Lin Q, Luo C, Jin L, and Lai S STAN: a sequential transformation attention-based network for scene text recognition Pattern Recognt. 2021 111
[32]
Cheng, Z., et al.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
[33]
Lu N et al. MASTER: multi-aspect non-local network for scene text recognition Pattern Recognt. 2021 117
[34]
Wang, T., et al.: Decoupled attention network for text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12216–12224 (2020)
[35]
Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 284–293 (2021)
[36]
Chen, Y., et al.: Graph-based global reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 433–442 (2019)
[37]
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7098–7107 (2021)
[38]
Bhunia, A. K., et al.: Joint visual semantic reasoning: Multi-stage decoder for text recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14940–14949 (2021)
[39]
Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)
[40]
Litman, R., et al.: SCATTER: selective context attentional scene text recognizer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11962–11972 (2020)
[41]
Hu, W., Cai, X., Hou, J., Yi, S., Lin, Z.: GTC: guided training of CTC towards efficient and accurate scene text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 11005–11012 (2020)
[42]
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) ECCV 2018. LNCS, vol. 11218, pp. 67–83. Springer, Cham (2018).
[43]
Liu, X., et al.: FOTS: fast oriented text spotting with a unified network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5676–5685 (2018)
[44]
Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9076–9085 (2019)
[45]
Liao, M., et al.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
[46]
Wang, H., et al.: All you need is boundary: toward arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12160–12167 (2020)
[47]
Mittal A, Shivakumara P, Pal U, Lu T, and Blumenstein M A new method for detection and prediction of occluded text in natural scene images Signal Process. Image Commun. 2022 100
[48]
Liu, Y., et al.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
[49]
Wang, P., et al.: PGNet: real-time arbitrarily-shaped text spotting with point gathering network. arXiv preprint arXiv:2104.05458(2021)
[50]
Wang, W., et al.: PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans. Pattern Anal. Machi. Intell. (2021)

Index Terms

  1. Scene Text Recognition: An Overview
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Pattern Recognition and Artificial Intelligence: Third International Conference, ICPRAI 2022, Paris, France, June 1–3, 2022, Proceedings, Part I
      Jun 2022
      718 pages
      ISBN:978-3-031-09036-3
      DOI:10.1007/978-3-031-09037-0

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 June 2022

      Author Tags

      1. Deep learning
      2. Scene text recognition
      3. Scene text detection

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media