Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

233 Accesses
3 Citations
Explore all metrics

Abstract

This paper proposes a multi-oriented text localization method in natural images suitable for real-time processing of high-definition video on portable and mobile devices. Our method is based on the connected components (CC) approach: first, CC are isolated by convolving a multi-scale pyramid with a specifically designed linear spatial filter followed by hysteresis thresholding. Next, non-textual CC are pruned employing a local classifier consisting of a cascade of multilayer perceptron (MLP) fed with increasingly extended feature vectors. The stroke width feature is estimated in linear time complexity by computing the maximal inscribed squares in the CC. Candidate CC and their neighbors are then checked using a more context aware neural network classifier that takes into account the target CC and their vicinity. Finally, text sequences are extracted in all pyramid levels and fused using dynamic programming. The main contribution of the work presented here is execution speed: the CPU-only parallel implementation of the proposed method is capable of processing 1080p HD video at nearly 30 frames per second on a standard laptop. Furthermore, when benchmarked on the ICDAR 2013 Robust Reading and on the ICDAR 2015 Incidental Scene Text data sets, our system performs more than twice faster than the state-of-the-art, while still delivering competitive results in terms of precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Aggregating Local Context for Accurate Scene Text Detection

Real-Time Text Detection with Multi-level Feature Fusion and Pixel Clustering

Scene Text Localization Using Lightweight Convolutional Networks

Notes

Physics constants have been removed.
2D vectors in the cross product extended to 3D by setting z to 0.

References

Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009). https://doi.org/10.1109/TPAMI.2009.38
Article Google Scholar
Jiao, J., Ye, Q., Huang, Q.: A configurable method for multi-style license plate recognition. Pattern Recognit. 42(3), 358–369 (2009). https://doi.org/10.1016/j.patcog.2008.08.016
Article MATH Google Scholar
Park, J., Lee, G., Kim, E., Lim, J., Kim, S., Yang, H., Lee, M., Hwang, S.: Automatic detection and recognition of Korean text in outdoor signboard images. Pattern Recognit. Lett. 31(12), 1728–1739 (2010). https://doi.org/10.1016/j.patrec.2010.05.024
Article Google Scholar
Liu, X., Wang, W., Zhu, T.: Extracting captions in complex background from videos. In: ICPR, pp. 3232–3235 (2010). https://doi.org/10.1109/ICPR.2010.790
Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: CBDAR’11, pp. 15–28. Springer (2012). https://doi.org/10.1007/978-3-642-29364-1_2
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457 –1464 (2011). https://doi.org/10.1109/ICCV.2011.6126402
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: ICCV, pp. 4651–4659 (2015). https://doi.org/10.1109/ICCV.2015.528
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: CVPR, pp. 2558–2567 (2015). https://doi.org/10.1109/CVPR.2015.7298871
Qi, Z.K., Kimachi, M., Wu, Y., Aziwa, T.: Using Adaboost to detect and segment characters from natural scenes. In: Proceedings of CBDAR, ICDAR Workshop (2005)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010). https://doi.org/10.1109/CVPR.2010.5540041
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR, pp. 687–691 (2011). https://doi.org/10.1109/ICDAR.2011.144
Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22(6), 2296–2305 (2013). https://doi.org/10.1109/TIP.2013.2249082
Article MathSciNet MATH Google Scholar
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014). https://doi.org/10.1109/TPAMI.2013.182
Article Google Scholar
Lu, S., Chen, T., Tian, S., Lim, J.H., Tan, C.L.: Scene text extraction based on edges and support vector regression. IJDAR 18(2), 125–135 (2015). https://doi.org/10.1007/s10032-015-0237-z
Article Google Scholar
Qin, S., Manduchi, R.: A fast and robust text spotter. In: WACV, pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477663
Tian, C., Xia, Y., Zhang, X., Gao, X.: Natural scene text detection with MC-MR candidate extraction and coarse-to-fine filtering. Neurocomputing 260, 112–122 (2017). https://doi.org/10.1016/j.neucom.2017.03.078
Article Google Scholar
Wei, Y., Shen, W., Zeng, D., Ye, L., Zhang, Z.: Multi-oriented text detection from natural scene images based on a CNN and pruning non-adjacent graph edges. Signal Process. Image Commun. 64, 89–98 (2018). https://doi.org/10.1016/j.image.2018.02.016
Article Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, Lecture Notes in Computer Science, vol. 9912, pp. 56–72. , Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3482–3490 (2017). https://doi.org/10.1109/CVPR.2017.371
Tang, Y., Wu, X.: Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans. Image Process. 26(3), 1509–1520 (2017). https://doi.org/10.1109/TIP.2017.2656474
Article MathSciNet Google Scholar
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 745–753 (2017). https://doi.org/10.1109/ICCV.2017.87
Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 4950–4959 (2017). https://doi.org/10.1109/ICCV.2017.529
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: An efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2017). https://doi.org/10.1109/CVPR.2017.283
Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20(11), 3111–3122 (2018). https://doi.org/10.1109/TMM.2018.2818020
Article Google Scholar
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, Lecture Notes in Computer Science, vol. 11218, pp 71–88. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, Lecture Notes in Computer Science, pp. 19–35. Springer International Publishing
Mohanty, S., Dutta, T., Gupta, H.P.: Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2750–2754 (2018). https://doi.org/10.1109/ICIP.2018.8451058
Liao, M., Shi, B., Bai, X.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018). https://doi.org/10.1109/TIP.2018.2825107
Article MathSciNet MATH Google Scholar
Jenq, J., Sahni, S.: Serial and parallel algorithms for the medial axis transform. IEEE Trans. Pattern Anal. Mach. Intell. 14(12), 1218–1224 (1992). https://doi.org/10.1109/34.177389
Article Google Scholar
Gironés, X., Julià, C.: Real-time text localization in natural scene images using a linear spatial filter. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, Kyoto, pp. 1261–1268 (2017). https://doi.org/10.1109/ICDAR.2017.208
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 156–1160 (2015).https://doi.org/10.1109/ICDAR.2015.7333942
Liu, X., Fu, H., Jia, Y.: Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images. Pattern Recognit. 41(2), 484–493 (2008). https://doi.org/10.1016/j.patcog.2007.06.004
Article MATH Google Scholar
He, X., Song, Y., Zhang, Y.: A coarse-to-fine scene text detection method based on skeleton-cut detector and binary-tree-search based rectification. Pattern Recognit. Lett. 112, 27–33 (2018). https://doi.org/10.1016/j.patrec.2018.05.020
Article Google Scholar
Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012). https://doi.org/10.1109/TIP.2012.2199327
Article MathSciNet MATH Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV, 6494, pp. 770–783. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-19318-7_60
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3538 –3545 (2012). https://doi.org/10.1109/CVPR.2012.6248097
Buta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: ICCV, pp. 1206–1214 (2015). https://doi.org/10.1109/ICCV.2015.143
Fabrizio, J., Robert-Seidowsky, M., Dubuisson, S., Calarasanu, S., Boissel, R.: TextCatcher: a method to detect curved and challenging text in natural scenes. IJDAR 19(2), 99–117 (2016). https://doi.org/10.1007/s10032-016-0264-4
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006
Article Google Scholar
Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: ACCV Workshops, 9009, pp. 91–105. Springer (2014). https://doi.org/10.1007/978-3-319-16631-5_7
Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: ICDAR, pp 746–750 (2015). https://doi.org/10.1109/ICDAR.2015.7333861
Zhu, R., Mao, X.J., Zhu, Q.H., Li, N., Yang, Y.B.: Text detection based on convolutional neural networks with spatial pyramid pooling. In: ICIP, pp. 1032–1036 (2016). https://doi.org/10.1109/ICIP.2016.7532514
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: ECCV, pp. 183–196 (2008). https://doi.org/10.1007/978-3-540-88688-4_14
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazàn, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013). https://doi.org/10.1109/ICDAR.2013.221
Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of Niblack inspired binarization methods for ancient documents. In: SPIE, vol. 7247, pp. 72470U–72470U–9 (2009). https://doi.org/10.1117/12.805827
Niblack, W.: An Introduction to Digital Image Processing, First English edn. Prentice Hall, Upper Saddle River (1986)
Google Scholar
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011). https://doi.org/10.1109/TIP.2010.2070803
Article MathSciNet MATH Google Scholar
Wang, L., Fan, W., He, Y., Sun, J., Katsuyama, Y., Hotta, Y.: Fast and accurate text detection in natural scene images with user-intention. In: ICPR, pp. 2920–2925 (2014). https://doi.org/10.1109/ICPR.2014.503
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000). https://doi.org/10.1016/S0031-3203(99)00055-2
Article Google Scholar
Rodtook, S., Rangsanseri, Y.: Adaptive thresholding of document images based on Laplacian sign. In: Proceedings International Conference on Information Technology: Coding and Computing, pp. 501–505 (2001). https://doi.org/10.1109/ITCC.2001.918846
Howe, N.R.: A Laplacian energy for document binarization. In: ICDAR, pp. 6–10 (2011). https://doi.org/10.1109/ICDAR.2011.11
Zhang, Y., Lai, J.: Arbitrarily oriented text detection using geodesic distances between corners and skeletons. In: ICPR, pp. 1896–1899 (2012)
Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011). https://doi.org/10.1109/TPAMI.2010.166
Article Google Scholar
Liu, Y., Zhang, D., Zhang, Y., Lin, S.: Real-time scene text detection based on stroke model. In: ICPR, pp. 3116–3120 (2014). https://doi.org/10.1109/ICPR.2014.537
Schwarz, C., Teich, J., Welzl, E., Evans, B.: On Finding a Minimal Enclosing Parallelogram. Tech. Rep. TR-94-036, International Computer Science Institute, Berkeley (1994)
Girones, X., Julia, C., Puig, D.: Full quadrant approximations for the arctangent function [tips and tricks]. IEEE Signal Process. Mag. 30(1), 130–135 (2013). https://doi.org/10.1109/MSP.2012.2219677
Article Google Scholar
Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Image Processing (ICIP), 2011 18th IEEE International Conference, pp. 2609–2612 (2011). https://doi.org/10.1109/ICIP.2011.6116200
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Opitz, M., Diem, M., Fiel, S., Kleber, F., Sablatnig, R.: End-to-end text recognition using local ternary patterns, MSER and deep convolutional nets. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 186–190 (2014). https://doi.org/10.1109/DAS.2014.29
Bernsen, J.: Dynamic thresholding of grey-level images. ICPR 2, 1251–1255 (1986)
Google Scholar
Wolf, C., Jolion, J.M., Chassaing, F.: Text localization, enhancement and binarization in multimedia documents. In: ICPR, vol. 2, pp. 1037–1040 (2002). https://doi.org/10.1109/ICPR.2002.1048482
Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007). https://doi.org/10.1080/2151237X.2007.10129236
Article Google Scholar
Lee, S., Cho, M.S., Jung, K., Kim, J.H.: Scene text extraction with edge constraint and text collinearity. In: ICPR, pp. 3983–3986 (2010). https://doi.org/10.1109/ICPR.2010.969
Cho, H., Sung, M., Jun, B.: Canny text detector: fast and robust scene text localization algorithm. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3566–3573 (2016). https://doi.org/10.1109/CVPR.2016.388
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4), 280–296 (2006). https://doi.org/10.1007/s10032-006-0014-0
Article Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 6773–6780. AAAI Press (2018)
Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, pp. 682–687 (2003). https://doi.org/10.1109/ICDAR.2003.1227749
Du, Y., Duan, G., Ai, H.: Context-based text detection in natural scenes. In: ICIP, pp. 1857–1860 (2012). https://doi.org/10.1109/ICIP.2012.6467245
Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, pp. 1007–1016. ACM, New York (2013). https://doi.org/10.1145/2502081.2502108

Download references

Acknowledgements

Work supported by the Spanish government under Grant TIN2016-80250-R.

Author information

Authors and Affiliations

Universitat Rovira i Virgili, Tarragona, Spain
Xavier Gironés & Carme Julià

Authors

Xavier Gironés
View author publications
You can also search for this author in PubMed Google Scholar
Carme Julià
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Gironés.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gironés, X., Julià, C. Real-time localization of multi-oriented text in natural scene images using a linear spatial filter. J Real-Time Image Proc 17, 1505–1525 (2020). https://doi.org/10.1007/s11554-019-00911-9

Download citation

Received: 26 April 2019
Accepted: 05 September 2019
Published: 17 September 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11554-019-00911-9

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Aggregating Local Context for Accurate Scene Text Detection

Real-Time Text Detection with Multi-level Feature Fusion and Pixel Clustering

Scene Text Localization Using Lightweight Convolutional Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Aggregating Local Context for Accurate Scene Text Detection

Real-Time Text Detection with Multi-level Feature Fusion and Pixel Clustering

Scene Text Localization Using Lightweight Convolutional Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation