Abstract
A general method for text localization and recognition in real-world images is presented. The proposed method is novel, as it (i) departs from a strict feed-forward pipeline and replaces it by a hypotheses-verification framework simultaneously processing multiple text line hypotheses, (ii) uses synthetic fonts to train the algorithm eliminating the need for time-consuming acquisition and labeling of real-world training data and (iii) exploits Maximally Stable Extremal Regions (MSERs) which provides robustness to geometric and illumination conditions.
The performance of the method is evaluated on two standard datasets. On the Char74k dataset, a recognition rate of 72% is achieved, 18% higher than the state-of-the-art. The paper is first to report both text detection and recognition results on the standard and rather challenging ICDAR 2003 dataset. The text localization works for number of alphabets and the method is easily adapted to recognition of other scripts, e.g. cyrillics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wu, V., Manmatha, R., Riseman Sr., E.M.: Textfinder: An automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. (1999)
Chen, X., Yang, J., Zhang, J., Waibel, A.: Automatic Detection and Recognition of Signs From Natural Scenes. IEEE Trans. on Image Processing 13, 87–99 (2004)
Ezaki, N.: Text detection from natural scene images: towards a system for visually impaired persons. In: Int. Conf. on Pattern Recognition, pp. 683–686 (2004)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR 2010: Proc. of the 2010 Conference on Computer Vision and Pattern Recognition (2010)
Lin, X.: Reliable OCR solution for digital content re-mastering. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series (2001)
Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 366–373 (2004)
Gao, J., Yang, J.: An adaptive algorithm for text detection from natural scenes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, p. 84 (2001)
Jain, A.K., Yu, B.: Automatic text location in images and video frames. In: International Conference on Pattern Recognition, vol. 2, p. 1497 (1998)
Pan, Y.F., Hou, X., Liu, C.L.: A robust system to detect and localize texts in natural scene images. In: IAPR International Workshop on Document Analysis Systems, pp. 35–42 (2008)
Kim, E., Lee, S., Kim, J.: Scene text extraction using focus of mobile camera. In: International Conference on Document Analysis and Recognition, pp. 166–170 (2009)
Pan, Y.F., Hou, X., Liu, C.L.: Text localization in natural scene images based on conditional random field. In: ICDAR 2009: Proc. of the 2009 10th International Conference on Document Analysis and Recognition, pp. 6–10 (2009)
de Campos, T.E., Babu, B.R., Varma, M.: Character recognition in natural images. In: VISAPP, February 05-08 (2009)
Yokobayashi, M., Wakahara, T.: Segmentation and recognition of characters in scene images using selective binarization in color space and gat correlation. In: Proc. of the 8th International Conference on Document Analysis and Recognition, pp. 167–171 (2005)
Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 1733–1746 (2009)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing 22, 761–767 (2004)
Matas, J(G.), Zimmermann, K.: A new class of learnable detectors for categorisation. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 541–550. Springer, Heidelberg (2005)
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Muller, K.R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. on Neural Networks 12, 181–201 (2001)
Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: Icdar 2003 robust reading competitions. In: ICDAR 2003: Proc. of the 7th International Conference on Document Analysis and Recognition, p. 682 (2003)
Myers, G.K., Bolles, R.C., Luong, Q.T., Herson, J.A., Aradhye, H.: Rectification and recognition of text in 3-d scenes. IJDAR 7, 147–158 (2005)
Liu, C.L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit recognition: investigation of normalization and feature extraction techniques. Pattern Recognition 37, 265–279 (2004)
Lucas, S.M.: Text locating competition results. In: International Conference on Document Analysis and Recognition, pp. 80–85 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neumann, L., Matas, J. (2011). A Method for Text Localization and Recognition in Real-World Images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19318-7_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-19318-7_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19317-0
Online ISBN: 978-3-642-19318-7
eBook Packages: Computer ScienceComputer Science (R0)