Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees

Weilin Huang^19,20,
Yu Qiao¹⁹ &
Xiaoou Tang^20,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8692))

Included in the following conference series:

European Conference on Computer Vision

26k Accesses
176 Citations

Abstract

Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel operation inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to the difficulty in distinguishing texts from background components. In this paper, we propose a novel framework to tackle this problem by leveraging the high capability of convolutional neural network (CNN). In contrast to recent methods using a set of low-level heuristic features, the CNN network is capable of learning high-level features to robustly identify text components from text-like outliers (e.g. bikes, windows, or leaves). Our approach takes advantages of both MSERs and sliding-window based methods. The MSERs operator dramatically reduces the number of windows scanned and enhances detection of the low-quality texts. While the sliding-window with CNN is applied to correctly separate the connections of multiple characters in components. The proposed system achieved strong robustness against a number of extreme text variations and serious real-world problems. It was evaluated on the ICDAR 2011 benchmark dataset, and achieved over 78% in F-measure, which is significantly higher than previous methods.

Download to read the full chapter text

Chapter PDF

Accurate Detection for Scene Texts with a Cascaded CNN Networks

Scene text detection using enhanced Extremal region and convolutional neural network

Article 22 July 2020

Scene text detection with fully convolutional neural networks

Article 21 January 2019

Keywords

References

Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontralled conditions. In: ICCV (2013)
Google Scholar
Chen, H., Tsai, S., Schronth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: ICIP (2012)
Google Scholar
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: CVPR (2004)
Google Scholar
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR (2011)
Google Scholar
Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
Google Scholar
González, A., Bergasa, L., Yebes, J., Bronte, S.: Text location in complex images. In: ICPR (2012)
Google Scholar
Hanif, S., Prevost, L.: Text detection and localization in complex scene images using constrained adaboost algorithm. In: ICDAR (2009)
Google Scholar
Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: ICCV (2013)
Google Scholar
Kim, K., Jung, K., Kim, J.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence 25, 1631–1639 (2003)
Article Google Scholar
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998)
Article Google Scholar
Lucas, S.: Icdar 2005 text locating competition results. In: ICDAR (2005)
Google Scholar
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: Icdar 2003 robust reading competitions. In: ICDAR (2003)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal region. In: BMVC (2002)
Google Scholar
Minetto, R., Thome, N., Cord, M., Fabrizio, J., Marcotegui, B.: Snoopertext: A multiresolution system for text detection in complex visual scenes. In: ICIP (2010)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning
Google Scholar
Neubeck, A., Gool, L.: Efficient non-maximum suppression. In: ICPR (2006)
Google Scholar
Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: ICDAR (2013)
Google Scholar
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV (2013)
Google Scholar
Neumann, L., Matas, K.: Text localization in real-world images using eficiently pruned exhaustive search. In: ICDAR (2011)
Google Scholar
Neumann, L., Matas, K.: Real-time scene text localization and recognition. In: CVPR (2012)
Google Scholar
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
Chapter Google Scholar
Shahab, A., Shafait, F., Dengel, A.: Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)
Google Scholar
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition 34, 107–116 (2013)
Article Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)
Google Scholar
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural network. In: ICPR (2012)
Google Scholar
Wolf, C., Jolion, J.-M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal on Document Analysis and Recognition 8, 280–296 (2006)
Article Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR (2012)
Google Scholar
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Processing 20, 2594–2605 (2011)
Article MathSciNet Google Scholar
Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modeling. Computer Vision and Image Understanding 117, 182–194 (2013)
Google Scholar
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Analysis and Machine Intelligence (to appear)
Google Scholar
Zhang, J., Kasturi, R.: Character energy and link energybased text extraction in scene images. In: ACCV (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Key Lab of Comp. Vis and Pat. Rec., Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, China
Weilin Huang, Yu Qiao & Xiaoou Tang
Department of Information Engineering, The Chinese University of Hong Kong, China
Weilin Huang & Xiaoou Tang

Authors

Weilin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoou Tang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
KU Leuven, ESAT - PSI, iMinds, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, W., Qiao, Y., Tang, X. (2014). Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-10593-2_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees

Abstract

Chapter PDF

Similar content being viewed by others

Accurate Detection for Scene Texts with a Cascaded CNN Networks

Scene text detection using enhanced Extremal region and convolutional neural network

Scene text detection with fully convolutional neural networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees

Abstract

Chapter PDF

Similar content being viewed by others

Accurate Detection for Scene Texts with a Cascaded CNN Networks

Scene text detection using enhanced Extremal region and convolutional neural network

Scene text detection with fully convolutional neural networks

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation