Abstract
Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel operation inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to the difficulty in distinguishing texts from background components. In this paper, we propose a novel framework to tackle this problem by leveraging the high capability of convolutional neural network (CNN). In contrast to recent methods using a set of low-level heuristic features, the CNN network is capable of learning high-level features to robustly identify text components from text-like outliers (e.g. bikes, windows, or leaves). Our approach takes advantages of both MSERs and sliding-window based methods. The MSERs operator dramatically reduces the number of windows scanned and enhances detection of the low-quality texts. While the sliding-window with CNN is applied to correctly separate the connections of multiple characters in components. The proposed system achieved strong robustness against a number of extreme text variations and serious real-world problems. It was evaluated on the ICDAR 2011 benchmark dataset, and achieved over 78% in F-measure, which is significantly higher than previous methods.
Chapter PDF
Similar content being viewed by others
Keywords
References
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontralled conditions. In: ICCV (2013)
Chen, H., Tsai, S., Schronth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: ICIP (2012)
Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: CVPR (2004)
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR (2011)
Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
González, A., Bergasa, L., Yebes, J., Bronte, S.: Text location in complex images. In: ICPR (2012)
Hanif, S., Prevost, L.: Text detection and localization in complex scene images using constrained adaboost algorithm. In: ICDAR (2009)
Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: ICCV (2013)
Kim, K., Jung, K., Kim, J.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence 25, 1631–1639 (2003)
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W.: Handwritten digit recognition with a back-propagation network. In: NIPS (1989)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998)
Lucas, S.: Icdar 2005 text locating competition results. In: ICDAR (2005)
Lucas, S., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: Icdar 2003 robust reading competitions. In: ICDAR (2003)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal region. In: BMVC (2002)
Minetto, R., Thome, N., Cord, M., Fabrizio, J., Marcotegui, B.: Snoopertext: A multiresolution system for text detection in complex visual scenes. In: ICIP (2010)
Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning
Neubeck, A., Gool, L.: Efficient non-maximum suppression. In: ICPR (2006)
Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: ICDAR (2013)
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV (2013)
Neumann, L., Matas, K.: Text localization in real-world images using eficiently pruned exhaustive search. In: ICDAR (2011)
Neumann, L., Matas, K.: Real-time scene text localization and recognition. In: CVPR (2012)
Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
Shahab, A., Shafait, F., Dengel, A.: Icdar 2011 robust reading competition challenge 2: Reading text in scene images. In: ICDAR (2011)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition 34, 107–116 (2013)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV (2011)
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural network. In: ICPR (2012)
Wolf, C., Jolion, J.-M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal on Document Analysis and Recognition 8, 280–296 (2006)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR (2012)
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Processing 20, 2594–2605 (2011)
Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modeling. Computer Vision and Image Understanding 117, 182–194 (2013)
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Analysis and Machine Intelligence (to appear)
Zhang, J., Kasturi, R.: Character energy and link energybased text extraction in scene images. In: ACCV (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, W., Qiao, Y., Tang, X. (2014). Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8692. Springer, Cham. https://doi.org/10.1007/978-3-319-10593-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-10593-2_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10592-5
Online ISBN: 978-3-319-10593-2
eBook Packages: Computer ScienceComputer Science (R0)