Scene word recognition from pieces to whole

Anna Zhu¹ &
Seiichi Uchida²

89 Accesses
Explore all metrics

Abstract

Convolutional neural networks (CNNs) have had great success with regard to the object classification problem. For character classification, we found that training and testing using accurately segmented character regions with CNNs resulted in higher accuracy than when roughly segmented regions were used. Therefore, we expect to extract complete character regions from scene images. Text in natural scene images has an obvious contrast with its attachments. Many methods attempt to extract characters through different segmentation techniques. However, for blurred, occluded, and complex background cases, those methods may result in adjoined or over segmented characters. In this paper, we propose a scene word recognition model that integrates words from small pieces to entire after-cluster-based segmentation. The segmented connected components are classified as four types: background, individual character proposals, adjoined characters, and stroke proposals. Individual character proposals are directly inputted to a CNN that is trained using accurately segmented character images. The sliding window strategy is applied to adjoined character regions. Stroke proposals are considered as fragments of entire characters whose locations are estimated by a stroke spatial distribution system. Then, the estimated characters from adjoined characters and stroke proposals are classified by a CNN that is trained on roughly segmented character images. Finally, a lexicon-driven integration method is performed to obtain the final word recognition results. Compared to other word recognition methods, our method achieves a comparable performance on Street View Text and the ICDAR 2003 and ICDAR 2013 benchmark databases. Moreover, our method can deal with recognizing text images of occlusion and improperly segmented text images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Article 17 March 2021

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

Article 31 January 2020

Deep-learning based end-to-end system for text reading in the wild

Article 21 March 2022

References

Weinman J J, Butler Z, Knoll D, Feild J. Toward integrated scene text reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(2): 375–387
Article Google Scholar
Ye Q, Doermann D. Text detection and recognition in imagery: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1480–1500
Article Google Scholar
Zhu Y Y, Yao C, Bai X. Scene text detection and recognition: Recent advances and future trends. Frontiers of Computer Science, 2016, 10(1): 19–36
Article Google Scholar
Goel V, Mishra A, Alahari K, Jawahar C V. Whole is greater than sum of parts: Recognizing scene text words. In: Proceedings of IEEE International Conference on Document Analysis and Recognition. 2013, 398–402
Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 2016, 116(1): 1–20
Article MathSciNet Google Scholar
Wang T, Wu D J, Coates A, Ng A Y. End-to-end text recognition with convolutional neural networks. In: Proceedings of IEEE International Conference on Pattern Recognition. 2012, 3304–3308
Google Scholar
Mishra A, Alahari K, Jawahar C V. Top-down and bottom-up cues for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2012, 2687–2694
Google Scholar
He P, Huang W, Qiao Y, Loy C C, Tang X. Reading scene text in deep convolutional sequences. In: Proceedings of AAAI Conference on Artificial Intelligence. 2016
Google Scholar
Shi B G, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. 2015, arXiv preprint arXiv:1507.05717
Google Scholar
Alsharif O, Pineau J. End-to-end text recognition with hybrid HMM maxout models. 2013, arXiv preprint arXiv:1310.1811
Google Scholar
Yao C, Bai X, Shi B, Liu W Y. Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2014, 4042–4049
Google Scholar
Zitnick C L, Dollár P. Edge boxes: locating object proposals from edges. In: Proceedings of European Conference on Computer Vision. 2014, 391–405
Google Scholar
Mancas-Thillou C, Gosselin B. Color text extraction with selective metric-based clustering. Computer Vision and Image Understanding, 2007, 107(1): 97–107
Article Google Scholar
Sarawagi S, Cohen W W. Semi-Markov conditional random fields for information extraction. In: Proceedings of International Conference on Neural Information Processing Systems. 2004, 1185–1192
Google Scholar
Wang B, Li X F, Liu F, Hu F Q. Color text image binarization based on binary texture analysis. Pattern Recognition Letters, 2005, 26(11): 1650–1657
Article Google Scholar
Seok J H, Kim J H. Scene text recognition using a Hough forest implicit shape model and semi-Markov conditional random fields. Pattern Recognition, 2015, 48(11): 3584–3599
Article Google Scholar
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2005, 886–893
Google Scholar
McCann S, Lowe D G. Spatially local coding for object recognition. In: Proceedings of Asian Conference on Computer Vision. 2012, 204–217
Google Scholar
Neubeck A, Van Gool L. Efficient non-maximum suppression. In: Proceedings of IEEE International Conference on Pattern Recognition. 2006, 850–855
Google Scholar
de Campos T E, Babu B R, Varma M. Character Recognition in Natural Images. In: Proceedings of International Conference on Computer Vision Theory and Applications. 2009, 273–280
Google Scholar
Lucas S M, Panaretos A, Sosa L, Tang A, Wong S, Young R. ICDAR 2003 robust reading competitions. In: Proceedings of IEEE International Conference on Document Analysis and Recognition. 2003
Google Scholar
Wang K, Babenko B, Belongie S. End-to-end scene text recognition. In: Proceedings of International Conference on Computer Vision. 2011, 1457–1464
Google Scholar
Wang K, Belongie S. Word spotting in the wild. In: Proceedings of European Conference on Computer Vision. 2010, 591–604
Google Scholar
Bai X, Yao C, Liu W Y. Strokelets: a learned multi-scale mid-level representation for scene text recognition. IEEE Transactions on Image Processing, 2016, 25(6): 2789–2802
Article MathSciNet MATH Google Scholar
Shi C Z, Wang C H, Xiao B H, Gao S, Hu J L. End-to-end scene text recognition using tree-structured models. Pattern Recognition, 2014, 47(9): 2853–2866
Article Google Scholar
Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In: Proceedings of British Machine Vision Conference. 2012
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant No. 61703316), and in part by the Human Interface Lab of Kyushu University, Japan.

Author information

Authors and Affiliations

SCST, Wuhan University of Technology, Wuhan, 430000, China
Anna Zhu
ISEE-AIT, Kyushu University, Fukuoka, 819-0395, Japan
Seiichi Uchida

Authors

Anna Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Seiichi Uchida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Zhu.

Additional information

Anna Zhu received the BS and PhD degrees from Huazhong University of Science and Technology, China in 2011 and 2016, respectively. She was once a research fellow at the Human Interface Laboratory, Kyushu University, Japan. Her research interests include text detection, image processing, pattern recognition, and machine learning.

Seiichi Uchida received the BS, MS, and PhD degrees from Kyushu University, Japan in 1990, 1992, and 1999, respectively. From 1992 to 1996, he was with SECOM Co., Ltd., Japan. Currently, he is a professor at Kyushu University. His research interests include pattern recognition and image processing. Dr. Uchida is a member of IEEE and IPSJ.

Electronic supplementary material

Supplementary material, approximately 276 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, A., Uchida, S. Scene word recognition from pieces to whole. Front. Comput. Sci. 13, 292–301 (2019). https://doi.org/10.1007/s11704-017-6420-2

Download citation

Received: 22 August 2016
Accepted: 12 March 2017
Published: 11 April 2019
Issue Date: April 2019
DOI: https://doi.org/10.1007/s11704-017-6420-2

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

Deep-learning based end-to-end system for text reading in the wild

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 276 KB.

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Scene word recognition from pieces to whole

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolutional neural network with joint stepwise character/word modeling based system for scene text recognition

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

Deep-learning based end-to-end system for text reading in the wild

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 276 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation