[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Text Recognition in the Wild: A Survey

Published: 05 March 2021 Publication History

Abstract

The history of text can be traced back over thousands of years. Rich and precise semantic information carried by text is important in a wide range of vision-based application scenarios. Therefore, text recognition in natural scenes has been an active research topic in computer vision and pattern recognition. In recent years, with the rise and development of deep learning, numerous methods have shown promising results in terms of innovation, practicality, and efficiency. This article aims to (1) summarize the fundamental problems and the state-of-the-art associated with scene text recognition, (2) introduce new insights and ideas, (3) provide a comprehensive review of publicly available resources, and (4) point out directions for future work. In summary, this literature review attempts to present an entire picture of the field of scene text recognition. It provides a comprehensive reference for people entering this field and could be helpful in inspiring future research. Related resources are available at our GitHub repository: https://github.com/HCIILAB/Scene-Text-Recognition.

References

[1]
Jon Almazán, Albert Gordo, Alicia Fornés, and Ernest Valveny. 2014. Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 12 (2014), 2552--2566.
[2]
Ouais Alsharif and Joelle Pineau. 2014. End-to-end text recognition with hybrid HMM maxout models. In Proceedings of ICLR.
[3]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of CVPR. 6077--6086.
[4]
Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee. 2019. What is wrong with scene text recognition model comparisons? Dataset and model analysis. In Proceedings of ICCV. 4714--4722.
[5]
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character region awareness for text detection. In Proceedings of CVPR. 9365--9374.
[6]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR.
[7]
Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio. 2016. End-to-end attention-based large vocabulary speech recognition. In Proceedings of ICASSP. 4945--4949.
[8]
Fan Bai, Zhanzhan Cheng, Yi Niu, Shiliang Pu, and Shuigeng Zhou. 2018. Edit probability for scene text recognition. In Proceedings of CVPR. 1508--1516.
[9]
Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, and Jiebo Luo. 2018. Integrating scene text and visual appearance for fine-grained image classification. IEEE Access 6 (2018), 66322--66335.
[10]
Simon Baker and Takeo Kanade. 2002. Limits on super-resolution and how to break them. IEEE Trans. Pattern Anal. Mach. Intell.9 (2002), 1167--1183.
[11]
Christian Bartz, Haojin Yang, and Christoph Meinel. 2018. SEE: Towards semi-supervised end-to-end scene text recognition. In Proceedings of AAAI. 6674--6681.
[12]
Alessandro Bissacco, Mark Cummins, Yuval Netzer, and Hartmut Neven. 2013. Photoocr: Reading text in uncontrolled conditions. In Proceedings of ICCV. 785--792.
[13]
Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusinol, Ernest Valveny, C. V. Jawahar, and Dimosthenis Karatzas. 2019. Scene text visual question answering. In Proceedings of ICCV. 4291--4301.
[14]
Théodore Bluche. 2016. Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In Proceedings of NIPS. 838--846.
[15]
Michal Busta, Lukas Neumann, and Jiri Matas. 2017. Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In Proceedings of ICCV. 2204--2212.
[16]
Gulcin Caner and Ismail Haritaoglu. 2010. Shape-dna: Effective character restoration and enhancement for arabic text documents. In Proceedings of ICPR. 2053--2056.
[17]
John Canny. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6 (1986), 679--698.
[18]
Richard G. Casey and Eric Lecolinet. 1996. A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18, 7 (1996), 690--706.
[19]
Rui Chen, Bipin C. Desai, and Cong Zhou. 2007. CINDI robot: An intelligent Web crawler based on multi-level inspection. In Proceedings of IDEAS. 93--101.
[20]
Xiaoxue Chen, Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, and Canjie Luo. 2020. Adaptive embedding gate for attention-based scene text recognition. Neurocomputing 381 (2020), 261--271.
[21]
Xilin Chen, Jie Yang, Jing Zhang, and Alex Waibel. 2004. Automatic detection and recognition of signs from natural scenes. IEEE Trans. Image Process. 13, 1 (2004), 87--99.
[22]
Changxu Cheng, Qiuhui Huang, Xiang Bai, Bin Feng, and Wenyu Liu. 2019. Patch aggregator for scene text script identification. In Proceedings of ICDAR. 1077--1083.
[23]
Yong Cheng. 2019. Semi-supervised learning for neural machine translation. In Joint Training for Neural Machine Translation. Springer, 25--40.
[24]
Zhanzhan Cheng, Fan Bai, Yunlu Xu, Gang Zheng, Shiliang Pu, and Shuigeng Zhou. 2017. Focusing attention: Towards accurate text recognition in natural images. In Proceedings of ICCV. 5086--5094.
[25]
Zhanzhan Cheng, Yangliu Xu, Fan Bai, Yi Niu, Shiliang Pu, and Shuigeng Zhou. 2018. AON: Towards arbitrarily-oriented text recognition. In Proceedings of CVPR. 5571--5579.
[26]
Chee-Kheng Ch’ng, Chee Seng Chan, and Cheng-Lin Liu. 2019. Total-text: Toward orientation robustness in scene text detection. Int. J. Doc. Anal. Recogn. 23, 1 (2019), 31--52.
[27]
Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, et al. 2019. ICDAR2019 robust reading challenge on arbitrary-shaped text (RRC-ArT). In Proceedings of ICDAR. 1571--1576.
[28]
Hojin Cho, Myungchul Sung, and Bongjin Jun. 2016. Canny text detector: Fast and robust scene text localization algorithm. In Proceedings of CVPR. 3566--3573.
[29]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of EMNLP. 1724--1734.
[30]
Gobinda G. Chowdhury. 2003. Natural language processing. Annu. Rev. Info. Sci. Technol. 37, 1 (2003), 51--89.
[31]
Fuze Cong, Wenping Hu, Huo Qiang, and Li Guo. 2019. A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In Proceedings of ICDAR. 916--921.
[32]
Andrea Corbelli, Lorenzo Baraldi, Costantino Grana, and Rita Cucchiara. 2016. Historical document digitization through layout analysis and deep content classification. In Proceedings of ICPR. 4077--4082.
[33]
Pengwen Dai, Hua Zhang, and Xiaochun Cao. 2019. Deep multi-scale context aware feature aggregation for curved scene text detection. IEEE Trans. Multimedia 22, 8 (2019), 1969--1984.
[34]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of CVPR. 886--893.
[35]
Tuan Anh Nguyen Dang and Dat Nguyen Thanh. 2019. End-to-end information extraction by character-level embedding and multi-stage attentional u-net. In Proceedings of BMVC. 96.
[36]
Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, and Dhruv Batra. 2018. Embodied question answering. In Proceedings of CVPR. 2054--2063.
[37]
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Marc’aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, et al. 2012. Large scale distributed deep networks. In Proceedings of NIPS. 1223--1231.
[38]
Guilherme N. DeSouza and Avinash C. Kak. 2002. Vision for mobile robot navigation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 24, 2 (2002), 237--267.
[39]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2015. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2015), 295--307.
[40]
Shireen Y. Elhabian, Khaled M. El-Sayed, and Sumaya H. Ahmed. 2008. Moving object detection in spatial domain using background removal techniques-state-of-art. Recent Patents Comput. Sci. 1, 1 (2008), 32--54.
[41]
Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of CVPR. IEEE, 2963--2970.
[42]
Nobuo Ezaki, Kimiyasu Kiyota, Bui Truong Minh, Marius Bulacu, and Lambert Schomaker. 2005. Improved text-detection methods for a camera-based text reading system for blind persons. In Proceedings of ICDAR. 257--261.
[43]
Shancheng Fang, Hongtao Xie, Jianjun Chen, Jianlong Tan, and Yongdong Zhang. 2019. Learning to draw text in natural images with conditional adversarial networks. In Proceedings of IJCAI. 715--722.
[44]
Shancheng Fang, Hongtao Xie, Zheng-Jun Zha, Nannan Sun, Jianlong Tan, and Yongdong Zhang. 2018. Attention and language ensemble for scene text recognition with convolutional sequence modeling. In Proceedings of ACM MM. 248--256.
[45]
Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2019. TextDragon: An end-to-end framework for arbitrary shaped text spotting. In Proceedings of ICCV. 9076--9085.
[46]
Xinjie Feng, Hongxun Yao, and Shengping Zhang. 2019. Focal CTC loss for chinese optical character recognition on unbalanced datasets. Complexity 2019 (2019), 9345861:1--9345861:11.
[47]
Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2018. Dense chained attention network for scene text recognition. In Proceedings of ICIP. 679--683.
[48]
Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2019. Reading scene text with fully convolutional sequence modeling. Neurocomputing 339 (2019), 161--170.
[49]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of ICCV. 1440--1448.
[50]
Vibhor Goel, Anand Mishra, Karteek Alahari, and C. V. Jawahar. 2013. Whole is greater than sum of parts: Recognizing scene text words. In Proceedings of ICDAR. 398--402.
[51]
Lluis Gomez and Dimosthenis Karatzas. 2016. A fine-grained approach to scene text script identification. In Proceedings of IAPR DAS. 192--197.
[52]
Lluis Gomez, Anguelos Nicolaou, and Dimosthenis Karatzas. 2017. Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn. 67 (2017), 85--96.
[53]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of NIPS. 2672--2680.
[54]
Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. Maxout networks. In Proceedings of ICML. 1319--1327.
[55]
Albert Gordo. 2015. Supervised mid-level features for word image representation. In Proceedings of CVPR. 2956--2964.
[56]
Alex Graves. 2012. Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks. Springer, 5--13.
[57]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of ICML. 369--376.
[58]
Alex Graves and Navdeep Jaitly. 2014. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of ICML. 1764--1772.
[59]
Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, and Jürgen Schmidhuber. 2008. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 5 (2008), 855--868.
[60]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of ICASSP. 6645--6649.
[61]
Qiang Guo, Fenglei Wang, Jun Lei, Dan Tu, and Guohui Li. 2016. Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neurocomputing 184 (2016), 78--90.
[62]
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic data for text localisation in natural images. In Proceedings of CVPR. 2315--2324.
[63]
Young Kug Ham, Min Seok Kang, Hong Kyu Chung, Rae-Hong Park, and Gwi Tae Park. 1995. Recognition of raised characters for automatic classification of rubber tires. Optic. Eng. 34, 1 (1995), 102--110.
[64]
Dafang He, Xiao Yang, Chen Liang, Zihan Zhou, Alexander G. Ororbi, Daniel Kifer, and C. Lee Giles. 2017. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In Proceedings of CVPR. 3519--3528.
[65]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of ICCV. 2961--2969.
[66]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of CVPR. 770--778.
[67]
Mengchao He, Yuliang Liu, Zhibo Yang, Sheng Zhang, Canjie Luo, Feiyu Gao, Qi Zheng, Yongpan Wang, Xin Zhang, and Lianwen Jin. 2018. ICPR2018 Contest on robust reading for multi-type web images. In Proceedings of ICPR. 7--12.
[68]
Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, and Xiaolin Li. 2017. Single shot text detector with regional attention. In Proceedings of ICCV. 3047--3055.
[69]
Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, and Xiaoou Tang. 2016. Reading scene text in deep convolutional sequences. In Proceedings of AAAI. 3501--3508.
[70]
Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, and Changming Sun. 2018. An end-to-end textspotter with explicit alignment and attention. In Proceedings of CVPR. 5020--5029.
[71]
Xinwei He, Yang Yang, Baoguang Shi, and Xiang Bai. 2019. VD-SAN: Visual-densely semantic attention network for image caption generation. Neurocomputing 328 (2019), 48--55.
[72]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.
[73]
Wenyang Hu, Xiaocong Cai, Jun Hou, Shuai Yi, and Zhiping Lin. 2020. GTC: Guided training of CTC towards efficient and accurate scene text recognition. In Proceedings of AAAI.
[74]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of CVPR. 4700--4708.
[75]
Hu Huang, Ya Zhong, Shiying Yin, Junlin Xiang, Lijun He, Yu Lv, and Peng Huang. 2019. Express delivery system based on fingerprint identification. In Proceedings of ITNEC. IEEE, 363--367.
[76]
Yunlong Huang, Zenghui Sun, Lianwen Jin, and Canjie Luo. 2020. EPAN: Effective parts attention network for scene text recognition. Neurocomputing 376 (2020), 202--213.
[77]
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Synthetic data and artificial neural networks for natural scene text recognition. In Proceedings of NIPS-W.
[78]
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2016. Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis 116, 1 (2016), 1--20.
[79]
Max Jaderberg, Karen Simonyan, and Andrew Zisserman. 2015. Deep structured output learning for unconstrained text recognition. In Proceedings of ICLR.
[80]
Max Jaderberg, Karen Simonyan, Andrew Zisserman et al. 2015. Spatial transformer networks. In Proceedings of NIPS. 2017--2025.
[81]
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep features for text spotting. In Proceedings of ECCV. 512--528.
[82]
Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al. 2015. ICDAR 2015 competition on robust reading. In Proceedings of ICDAR. 1156--1160.
[83]
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. 2013. ICDAR 2013 robust reading competition. In Proceedings of ICDAR. 1484--1493.
[84]
Anoop Raveendra Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Höhne, and Jean Baptiste Faddoul. 2018. Chargrid: Towards understanding 2D documents. In Proceedings of EMNLP. 4459--4469.
[85]
Wonjun Kim and Changick Kim. 2008. A new approach for overlay text detection and extraction from complex video scene. IEEE Trans. Image Process. 18, 2 (2008), 401--411.
[86]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of ICLR.
[87]
Hyung Il Koo and Duck Hoon Kim. 2013. Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22, 6 (2013), 2296--2305.
[88]
Ivan Krasin, Tom Duerig, Neil Alldrin, Vittorio Ferrari, Sami Abu-El-Haija, Alina Kuznetsova, Hassan Rom, Jasper Uijlings, Stefan Popov, Andreas Veit, et al. 2017. Openimages: A public dataset for large-scale multi-label and multi-class image classification. Retrieved from https://github.com/openimages.
[89]
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML. 282--289.
[90]
Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324.
[91]
Chen-Yu Lee and Simon Osindero. 2016. Recursive recurrent nets with attention modeling for OCR in the wild. In Proceedings of CVPR. 2231--2239.
[92]
SeongHun Lee, Min Su Cho, Kyomin Jung, and Jin Hyung Kim. 2010. Scene text extraction with edge constraint and text collinearity. In Proceedings of ICPR. 3983--3986.
[93]
Hui Li, Peng Wang, and Chunhua Shen. 2017. Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of ICCV. 5238--5246.
[94]
Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang. 2019. Show, attend and read: A simple and strong baseline for irregular text recognition. In Proceedings of AAAI. 8610--8617.
[95]
Minhua Li and Chunheng Wang. 2008. An adaptive text detection approach in images and video frames. In Proceedings of IJCNN. 72--77.
[96]
Peipei Li, Haixun Wang, Hongsong Li, and Xindong Wu. 2018. Employing semantic context for sparse information extraction assessment. ACM Trans. Knowl. Discov. Data 12, 5 (2018), 54.
[97]
Ming Liang and Xiaolin Hu. 2015. Recurrent convolutional neural network for object recognition. In Proceedings of CVPR. 3367--3375.
[98]
Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, and Xiang Bai. 2019. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019).
[99]
Minghui Liao, Baoguang Shi, and Xiang Bai. 2018. Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 27, 8 (2018), 3676--3690.
[100]
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. Textboxes: A fast text detector with a single deep neural network. In Proceedings of AAAI. 4161--4167.
[101]
Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. Scene text recognition from two-dimensional perspective. In Proceedings of AAAI. 8714--8721.
[102]
Rainer Lienhart and Axel Wernicke. 2002. Localizing and segmenting text in images and videos. IEEE Trans. Circ. Syst. Video Technol. 12, 4 (2002), 256--268.
[103]
Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, and R. Manmatha. 2020. SCATTER: Selective context attentional scene text recognizer. In Proceedings of CVPR.
[104]
Cheng-Lin Liu, Masashi Koga, and Hiromichi Fujisawa. 2002. Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading. IEEE Trans. Pattern Anal. Mach. Intell. 24, 11 (2002), 1425--1437.
[105]
Fei Liu, Jeffrey Flanigan, Sam Thomson, Norman Sadeh, and Noah A. Smith. 2018. Toward abstractive summarization using semantic representations. CoRR abs/1805.10399.
[106]
Hu Liu, Sheng Jin, and Changshui Zhang. 2018. Connectionist temporal classification with maximum entropy regularization. In Proceedings of NIPS. 831--841.
[107]
Wei Liu, Chaofeng Chen, and Kwan-Yee K. Wong. 2018. Char-Net: A character-aware neural network for distorted scene text recognition. In Proceedings of AAAI. 7154--7161.
[108]
Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong, Zhizhong Su, and Junyu Han. 2016. STAR-Net: A spatial attention residue network for scene text recognition. In Proceedings of BMVC. 7.
[109]
Xu Liu. 2008. A camera phone based currency reader for the visually impaired. In Proceedings of ACM SIGACCESS. 305--306.
[110]
Xiaojing Liu, Feiyu Gao, Qiong Zhang, and Huasha Zhao. 2019. Graph convolution for multimodal information extraction from visually rich documents. In Proceedings of NAACL. 32--39.
[111]
Xinhao Liu, Takahito Kawanishi, Xiaomeng Wu, and Kunio Kashino. 2016. Scene text recognition with CNN classifier and WFST-based word labeling. In Proceedings of ICPR. 3999--4004.
[112]
Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. 2018. Fots: Fast oriented text spotting with a unified network. In Proceedings of CVPR. 5676--5685.
[113]
Xiaoqian Liu and Weiqiang Wang. 2011. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans. Multimedia 14, 2 (2011), 482--489.
[114]
Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, et al. 2019. ICDAR 2019 robust reading challenge on reading chinese text on signboard. In Proceedings of ICDAR. 1577--1581.
[115]
Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. ABCNet: Real-time scene text spotting with adaptive bezier-curve network. In Proceedings of CVPR.
[116]
Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of CVPR. 1962--1969.
[117]
Yuliang Liu, Lianwen Jin, and Chuanming Fang. 2020. Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29 (2020), 2918--2930.
[118]
Yuliang Liu, Lianwen Jin, Zecheng Xie, Canjie Luo, Shuaitao Zhang, and Lele Xie. 2019. Tightness-aware evaluation protocol for scene text detection. In Proceedings of CVPR. 9612--9620.
[119]
Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Sheng Zhang. 2019. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn. 90 (2019), 337--345.
[120]
Yang Liu, Zhaowen Wang, Hailin Jin, and Ian Wassell. 2018. Synthetically supervised feature learning for scene text recognition. In Proceedings of ECCV. 449--465.
[121]
Zichuan Liu, Yixing Li, Fengbo Ren, Wang Ling Goh, and Hao Yu. 2018. Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In Proceedings of AAAI. 7194--7201.
[122]
Shangbang Long, Xin He, and Cong Ya. 2018. Scene text detection and recognition: The deep learning era. CoRR abs/1811.04256.
[123]
Shangbang Long and Cong Yao. 2020. UnrealText: Synthesizing realistic scene text images from the unreal world. In Proceedings of CVPR.
[124]
Fang Lu, Corey S. McCaffrey, and Elaine I. Kuo. 2011. Foreign language abbreviation translation in an instant messaging system. U.S. Patent 7,890,525.
[125]
Simon M. Lucas. 2005. ICDAR 2005 text locating competition results. In Proceedings of ICDAR. 80--84.
[126]
Simon M. Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong, and Robert Young. 2003. ICDAR 2003 robust reading competitions. In Proceedings of ICDAR. 682--687.
[127]
Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A multi-object rectified attention network for scene text recognition. Pattern Recogn. 90 (2019), 109--118.
[128]
Canjie Luo, Qingxiang Lin, Yuliang Liu, Jin Lianwen, and Shen Chunhua. 2020. Separating content from style using adversarial learning for recognizing text in the wild. Int. J. Comput. Vis (2020).
[129]
Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of ECCV. 67--83.
[130]
Jieru Mei, Luo Dai, Baoguang Shi, and Xiang Bai. 2016. Scene text script identification with convolutional recurrent neural networks. In Proceedings of ICPR. 4053--4058.
[131]
Yajie Miao, Mohammad Gowayyed, and Florian Metze. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In Proceedings of IEEE ASRU. 167--174.
[132]
Anand Mishra, Karteek Alahari, and C. V. Jawahar. 2012. Scene text recognition using higher order language priors. In Proceedings of BMVC. 1--11.
[133]
Anand Mishra, Karteek Alahari, and C. V. Jawahar. 2012. Top-down and bottom-up cues for scene text recognition. In Proceedings of CVPR. 2687--2694.
[134]
Anand Mishra, Karteek Alahari, and C. V. Jawahar. 2016. Enhancing energy minimization framework for scene text recognition with top-down cues. Comput. Vision Image Understand. 145 (2016), 30--42.
[135]
Mehryar Mohri, Fernando Pereira, and Michael Riley. 2002. Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 16, 1 (2002), 69--88.
[136]
Ali Mosleh, Nizar Bouguila, and A. Ben Hamza. 2012. Image text detection using a bandlet-based edge detector and stroke width transform. In Proceedings of BMVC. 1--12.
[137]
George Nagy. 2000. Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (2000), 38--62.
[138]
Nibal Nayef, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, Jean-Christophe Burie, Cheng-lin Liu, et al. 2019. ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition--RRC-MLT-2019. In Proceedings of ICDAR. 1582--1587.
[139]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of NIPS.
[140]
Lukas Neumann and Jiri Matas. 2010. A method for text localization and recognition in real-world images. In Proceedings of ACCV. 770--783.
[141]
Lukáš Neumann and Jiří Matas. 2012. Real-time scene text localization and recognition. In Proceedings of CVPR. 3538--3545.
[142]
Lukáš Neumann and Jiří Matas. 2015. Efficient scene text localization and recognition with local character refinement. In Proceedings of ICDAR. 746--750.
[143]
Lukáš Neumann and Jiří Matas. 2015. Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 9 (2015), 1872--1885.
[144]
Shigueo Nomura, Keiji Yamanaka, Osamu Katai, Hiroshi Kawakami, and Takayuki Shiose. 2005. A novel adaptive morphological approach for degraded character image segmentation. Pattern Recogn. 38, 11 (2005), 1961--1975.
[145]
Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu. 2011. A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20, 3 (2011), 800--813.
[146]
Clément Peyrard, Moez Baccouche, Franck Mamalet, and Christophe Garcia. 2015. ICDAR2015 competition on text image super-resolution. In Proceedings of ICDAR. 1201--1205.
[147]
Xianbiao Qi, Yihao Chen, Rong Xiao, Chun-Guang Li, Qin Zou, and Shuguang Cui. 2019. A novel joint character categorization and localization approach for character-level scene text recognition. In Proceedings of ICDAR. 83--90.
[148]
Liang Qiao, Sanli Tang, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. 2020. Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In Proceedings of AAAI.
[149]
Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, and Weiping Wang. 2020. SEED: Semantics enhanced encoder-decoder framework for scene text recognition. In Proceedings of CVPR.
[150]
Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, and Ying Xiao. 2019. Towards unconstrained end-to-end text spotting. In Proceedings of ICCV. 4704--4714.
[151]
Weichao Qiu and Alan L. Yuille. 2016. UnrealCV: Connecting computer vision to unreal engine. In Proceedings of ECCV. 909--916.
[152]
Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, and Chew Lim Tan. 2013. Recognizing text with perspective distortion in natural scenes. In Proceedings of ICCV. 569--576.
[153]
Anhar Risnumawan, Palaiahankote Shivakumara, Chee Seng Chan, and Chew Lim Tan. 2014. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41, 18 (2014), 8027--8048.
[154]
Jose A Rodriguez-Serrano, Albert Gordo, and Florent Perronnin. 2015. Label embedding: A frugal baseline for text recognition. Int. J. Comput. Vis 113, 3 (2015), 193--207.
[155]
Alain Rouh and Jean Beaudet. 2019. Method and a device for tracking characters that appear on a plurality of images of a video stream of a text. U.S. Patent App. 10/185,873.
[156]
Joan Andreu Sanchez, Verónica Romero, Alejandro H. Toselli, Mauricio Villegas, and Enrique Vidal. 2017. ICDAR2017 competition on handwritten text recognition on the read dataset. In Proceedings of ICDAR. 1383--1388.
[157]
Pierre Sermanet, Soumith Chintala, and Yann LeCun. 2012. Convolutional neural networks applied to house numbers digit classification. In Proceedings of ICPR. 3288--3291.
[158]
Asif Shahab, Faisal Shafait, and Andreas Dengel. 2011. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of ICDAR. 1491--1496.
[159]
Fenfen Sheng, Zhineng Chen, and Bo Xu. 2019. NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In Proceedings of ICDAR. 781--786.
[160]
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. Script identification in the wild via discriminative convolutional neural network. Pattern Recogn. 52 (2016), 448--458.
[161]
Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 11 (2017), 2298--2304.
[162]
Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016. Robust scene text recognition with automatic rectification. In Proceedings of CVPR. 4168--4176.
[163]
Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. ASTER: An attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41, 9 (2019), 2035--2048.
[164]
Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, and Xiang Bai. 2017. ICDAR2017 competition on reading Chinese text in the wild. In Proceedings of ICDAR. 1429--1434.
[165]
Baoguang Shi, Cong Yao, Chengquan Zhang, Xiaowei Guo, Feiyue Huang, and Xiang Bai. 2015. Automatic script identification in the wild. In Proceedings of ICDAR. 531--535.
[166]
Cunzhao Shi, Chunheng Wang, Baihua Xiao, Yang Zhang, Song Gao, and Zhong Zhang. 2013. Scene text recognition using part-based tree-structured character detection. In Proceedings of CVPR. 2961--2968.
[167]
Palaiahnakote Shivakumara, Souvik Bhowmick, Bolan Su, Chew Lim Tan, and Umapada Pal. 2011. A new gradient based character segmentation method for video text recognition. In Proceedings of ICDAR. 126--130.
[168]
Palaiahnakote Shivakumara, Weihua Huang, Trung Quy Phan, and Chew Lim Tan. 2010. Accurate video text detection through classification of low and high contrast images. Pattern Recogn. 43, 6 (2010), 2165--2185.
[169]
Palaiahnakote Shivakumara, Trung Quy Phan, and Chew Lim Tan. 2009. A gradient difference based technique for video text detection. In Proceedings of ICDAR. 156--160.
[170]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of ICLR.
[171]
Amanpreet Singh, Vivek Natarajan, Meet Shah, Yu Jiang, Xinlei Chen, Dhruv Batra, Devi Parikh, and Marcus Rohrbach. 2019. Towards VQA models that can read. In Proceedings of CVPR. 8317--8326.
[172]
Ajeet Kumar Singh, Anand Mishra, Pranav Dabral, and C. V. Jawahar. 2016. A simple and effective solution for script identification in the wild. In Proceedings of IAPR DAS. 428--433.
[173]
Bolan Su and Shijian Lu. 2014. Accurate scene text recognition based on recurrent neural network. In Proceedings of ACCV. 35--48.
[174]
Bolan Su and Shijian Lu. 2017. Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recogn. 63 (2017), 397--405.
[175]
Yipeng Sun, Jiaming Liu, Wei Liu, Junyu Han, Errui Ding, and Jingtuo Liu. 2019. Chinese street view text: Large-scale chinese text reading with partially supervised learning. In Proceedings of ICCV. 9086--9095.
[176]
Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas et al. 2019. ICDAR 2019 Competition on large-scale street view text with partial labeling--RRC-LSVT. In Proceedings of ICDAR. 1557--1562.
[177]
Youbao Tang and Xiangqian Wu. 2018. Scene text detection using superpixel-based stroke feature transform and deep learning based region classification. IEEE Trans. Multimedia 20, 9 (2018), 2276--2288.
[178]
Shu Tian, Xu-Cheng Yin, Ya Su, and Hong-Wei Hao. 2018. A unified framework for tracking based text detection and recognition from web videos. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (2018), 542--554.
[179]
Sam S. Tsai, Huizhong Chen, David Chen, Georg Schroth, Radek Grzeszczuk, and Bernd Girod. 2011. Mobile visual search on printed documents using text and low bit-rate features. In Proceedings of ICIP. 2601--2604.
[180]
Seiichi Uchida. Text localization and recognition in images and video. In Handbook of Document Image Processing and Recognition, David Doermann and Karl Tombre (Eds.). Springer-Verlag London, 843--883.
[181]
Ranjith Unnikrishnan and Ray Smith. 2009. Combined script and page orientation estimation using the tesseract OCR engine. In Proceedings of the ICDAR. 6.
[182]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of NIPS. 5998--6008.
[183]
Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. CoRR abs/1601.07140.
[184]
Luis Von Ahn, Benjamin Maurer, Colin McMillen, David Abraham, and Manuel Blum. 2008. Recaptcha: Human-based character recognition via web security measures. Science 321, 5895 (2008), 1465--1468.
[185]
Zhaoyi Wan, Mingling He, Haoran Chen, Xiang Bai, and Cong Yao. 2020. TextScanner: Reading characters in order for robust scene text recognition. In Proceedings of AAAI.
[186]
Zhaoyi Wan, Fengming Xie, Yibo Liu, Xiang Bai, and Cong Yao. 2019. 2D-CTC for scene text recognition. CoRR abs/1907.09705.
[187]
Cong Wang, Fei Yin, and Cheng-Lin Liu. 2018. Memory-augmented attention model for scene text recognition. In Proceedings of ICFHR. 62--67.
[188]
Hao Wang, Pu Lu, Hui Zhang, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, and Wenyu Liu. 2020. All you need is boundary: Toward arbitrary-shaped text spotting. In Proceedings of AAAI.
[189]
Jianfeng Wang and Xiaolin Hu. 2017. Gated recurrent convolution neural network for OCR. In Proceedings of NIPS. 335--344.
[190]
Kai Wang, Boris Babenko, and Serge Belongie. 2011. End-to-end scene text recognition. In Proceedings of ICCV. 1457--1464.
[191]
Kai Wang and Serge Belongie. 2010. Word spotting in the wild. In Proceedings of ECCV. 591--604.
[192]
Peng Wang, Lu Yang, Hui Li, Yuyan Deng, Chunhua Shen, and Yanning Zhang. 2019. A simple and robust convolutional-attention network for irregular text recognition. CoRR abs/1904.01375.
[193]
Qingqing Wang, Wenjing Jia, Xiangjian He, Yue Lu, Michael Blumenstein, Ye Huang, and Shujing Lyu. 2019. ReELFA: A scene text recognizer with encoded location and focused attention. In Proceedings of ICDAR. 71--76.
[194]
Qi Wang, Shaoteng Liu, Jocelyn Chanussot, and Xuelong Li. 2018. Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans. Geosci. Remote Sens. 57, 2 (2018), 1155--1167.
[195]
Siwei Wang, Yongtao Wang, Xiaoran Qin, Qijie Zhao, and Zhi Tang. 2019. Scene text recognition via gated cascade attention. In Proceedings of ICME. 1018--1023.
[196]
Tao Wang, David J. Wu, Adam Coates, and Andrew Y. Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of ICPR. 3304--3308.
[197]
Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, and Mingxiang Cai. 2020. Decoupled attention network for text recognition. In Proceedings of AAAI.
[198]
Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, and Ping Luo. 2019. TextSR: Content-aware text super-resolution guided by recognition. CoRR abs/1909.07113.
[199]
Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, and Liangwei Wang. 2020. On the general value of evidence, and bilingual scene-text visual question answering. In Proceedings of CVPR.
[200]
Yuyang Wang, Feng Su, and Ye Qian. 2019. Text-attentional conditional generative adversarial network for super-resolution of text images. In Proceedings of ICME. 1024--1029.
[201]
Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Youliang Tian, Zilong Fu, and Yongdong Zhang. 2021. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Trans. Multimedia (2021).
[202]
Fred L. Bookstein Principal Warps. 1989. Thin-plate splines and the decompositions of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11, 6 (1989).
[203]
Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jingtuo Liu, Errui Ding, and Xiang Bai. 2019. Editing text in the wild. In Proceedings of ACMICMR. 1500--1508.
[204]
Yue Wu and Prem Natarajan. 2017. Self-organized text detection with minimal post-processing via border learning. In Proceedings of ICCV. 5000--5009.
[205]
Hongtao Xie, Shancheng Fang, Zheng-Jun Zha, Yating Yang, Yan Li, and Yongdong Zhang. 2019. Convolutional attention networks for scene text recognition. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1s (2019), 3.
[206]
Lele Xie, Tasweer Ahmad, Lianwen Jin, Yuliang Liu, and Sheng Zhang. 2018. A new CNN-based method for multi-directional car license plate detection. IEEE Trans. Intell. Transport. Syst. 19, 2 (2018), 507--517.
[207]
Lele Xie, Yuliang Liu, Lianwen Jin, and Zecheng Xie. 2019. DeRPN: Taking a further step toward more general object detection. In Proceedings of AAAI. 9046--9053.
[208]
Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, and Lele Xie. 2019. Aggregation cross-entropy for sequence recognition. In Proceedings of CVPR. 6538--6547.
[209]
Zecheng Xie, Zenghui Sun, Lianwen Jin, Hao Ni, and Terry Lyons. 2017. Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 8 (2017), 1903--1917.
[210]
Linjie Xing, Zhi Tian, Weilin Huang, and Matthew R. Scott. 2019. Convolutional character networks. In Proceedings of ICCV. 9125--9135.
[211]
Wenhui Xing, Junsheng Qi, Xiaohui Yuan, Lin Li, Xiaoyu Zhang, Yuhua Fu, Shengwu Xiong, Lun Hu, and Jing Peng. 2018. A gene--phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach. Bioinformatics 34, 13 (2018), i386--i394.
[212]
Li Xu and Jiaya Jia. 2010. Two-phase kernel estimation for robust motion deblurring. In Proceedings of ECCV. 157--170.
[213]
Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, and Xiang Bai. 2019. TextField: Learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28, 11 (2019), 5566--5579.
[214]
Chenggang Yan, Hongtao Xie, Jianjun Chen, Zhengjun Zha, Xinhong Hao, Yongdong Zhang, and Qionghai Dai. 2018. A fast Uyghur text detector for complex background images. IEEE Trans. Multimedia 20, 12 (2018), 3389--3398.
[215]
Fan Yang, Lianwen Jin, Songxuan Lai, Xue Gao, and Zhaohai Li. 2019. Fully convolutional sequence recognition network for water meter number reading. IEEE Access 7 (2019), 11679--11687.
[216]
Mingkun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, and Xiang Bai. 2019. Symmetry-constrained rectification network for scene text recognition. In Proceedings of ICCV. 9147--9156.
[217]
Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, and C. Lee Giles. 2017. Learning to read irregular text with attention mechanisms. In Proceedings of IJCAI. 3280--3286.
[218]
Cong Yao, Xiang Bai, and Wenyu Liu. 2014. A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23, 11 (2014), 4737--4749.
[219]
Cong Yao, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. 2012. Detecting texts of arbitrary orientations in natural images. In Proceedings of CVPR. 1083--1090.
[220]
Cong Yao, Xiang Bai, Baoguang Shi, and Wenyu Liu. 2014. Strokelets: A learned multi-scale representation for scene text recognition. In Proceedings of CVPR. 4042--4049.
[221]
Cong Yao, Xin Zhang, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. 2013. Rotation-invariant features for multi-oriented text detection in natural images. PloS One 8, 8 (2013), e70173.
[222]
Qixiang Ye and David Doermann. 2014. Text detection and recognition in imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 37, 7 (2014), 1480--1500.
[223]
Qixiang Ye, Wen Gao, Weiqiang Wang, and Wei Zeng. 2003. A robust text detection algorithm in images and video frames. In Proceedings of ICASSP. IEEE, 802--806.
[224]
Qixiang Ye, Qingming Huang, Wen Gao, and Debin Zhao. 2005. Fast and robust text detection in images and video frames. Image Vision Comput. 23, 6 (2005), 565--576.
[225]
Chucai Yi and YingLi Tian. 2011. Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20, 9 (2011), 2594--2605.
[226]
Fang Yin, Rui Wu, Xiaoyang Yu, and Guanglu Sun. 2019. Video text localization based on adaboost. Multimedia Tools Appl. 78, 5 (2019), 5345--5354.
[227]
Fei Yin, Yi-Chao Wu, Xu-Yao Zhang, and Cheng-Lin Liu. 2017. Scene text recognition with sliding convolutional character models. In Proceedings of ICCV.
[228]
Xu-Cheng Yin, Ze-Yu Zuo, Shu Tian, and Cheng-Lin Liu. 2016. Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans. Image Process. 25, 6 (2016), 2752--2773.
[229]
Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, and Errui Ding. 2020. Towards accurate scene text recognition with semantic reasoning networks. In Proceedings of CVPR.
[230]
Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, and Shi-Min Hu. 2018. Chinese text in the wild. CoRR abs/1803.00085.
[231]
Liu Yuliang, Jin Lianwen, Zhang Shuaitao, and Zhang Sheng. 2017. Detecting curve text in the wild: New dataset and new solution. CoRR abs/1712.02170.
[232]
Razieh Nokhbeh Zaeem, Rachel L. German, and K. Suzanne Barber. 2018. PrivacyCheck: Automatic summarization of privacy policies using data mining. ACM Trans. Internet Technol. 18, 4 (2018), 53.
[233]
Fangneng Zhan and Shijian Lu. 2019. ESIR: End-to-end scene text recognition via iterative image rectification. In Proceedings of CVPR. 2059--2068.
[234]
Fangneng Zhan, Shijian Lu, and Chuhui Xue. 2018. Verisimilar image synthesis for accurate detection and recognition of texts in scenes. In Proceedings of ECCV. 249--266.
[235]
Fangneng Zhan, Hongyuan Zhu, and Shijian Lu. 2019. Spatial fusion gan for image synthesis. In Proceedings of CVPR. 3653--3662.
[236]
Honggang Zhang, Kaili Zhao, Yi-Zhe Song, and Jun Guo. 2013. Text extraction from natural scene image: A survey. Neurocomputing 122 (2013), 310--323.
[237]
Sheng Zhang, Yuliang Liu, Lianwen Jin, and Canjie Luo. 2018. Feature enhancement network: A refined scene text detector. In Proceedings of AAAI. 2612--2619.
[238]
Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. 2019. Sequence-to-sequence domain adaptation network for robust text image recognition. In Proceedings of CVPR. 2740--2749.
[239]
Xu Zhao, Kai-Hsiang Lin, Yun Fu, Yuxiao Hu, Yuncai Liu, and Thomas S. Huang. 2010. Text from corners: A novel approach to detect text and caption in videos. IEEE Trans. Image Process. 20, 3 (2010), 790--799.
[240]
Yu Zhong, Hongjiang Zhang, and Anil K. Jain. 2000. Automatic caption localization in compressed video. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4 (2000), 385--392.
[241]
Yu Zhou, Shuang Liu, Yongzheng Zhang, Yipeng Wang, and Weiyao Lin. 2014. Perspective scene text recognition with feature compression and ranking. In Proceedings of ACCV. 181--195.
[242]
Yiwei Zhu, Shilin Wang, Zheng Huang, and Kai Chen. 2019. Text recognition in images based on transformer with hierarchical attention. In Proceedings of ICIP. 1945--1949.
[243]
Yingying Zhu, Cong Yao, and Xiang Bai. 2016. Scene text detection and recognition: Recent advances and future trends. Front. Comput. Sci. 10, 1 (2016), 19--36.

Cited By

View all
  • (2025)EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognitionPattern Recognition10.1016/j.patcog.2024.111130159(111130)Online publication date: Mar-2025
  • (2024)GOAT: A Generalized Cross-Dataset Activity Recognition Framework with Natural Language SupervisionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997368:4(1-28)Online publication date: 21-Nov-2024
  • (2024)Unlocking the Power of Numbers: Log Compression via Numeric Token ParsingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695474(919-930)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 54, Issue 2
March 2022
800 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3450359
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021
Accepted: 01 December 2020
Received: 01 July 2019
Published in CSUR Volume 54, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Scene text recognition
  2. deep learning
  3. end-to-end systems

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)381
  • Downloads (Last 6 weeks)40
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)EGO-LM: An efficient, generic, and out-of-the-box language model for handwritten text recognitionPattern Recognition10.1016/j.patcog.2024.111130159(111130)Online publication date: Mar-2025
  • (2024)GOAT: A Generalized Cross-Dataset Activity Recognition Framework with Natural Language SupervisionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997368:4(1-28)Online publication date: 21-Nov-2024
  • (2024)Unlocking the Power of Numbers: Log Compression via Numeric Token ParsingProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695474(919-930)Online publication date: 27-Oct-2024
  • (2024)Using Large Language Models to Compare Explainable Models for Smart Home Human Activity RecognitionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3679000(881-884)Online publication date: 5-Oct-2024
  • (2024)DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising TrainingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680981(10134-10143)Online publication date: 28-Oct-2024
  • (2024)Go Static: Contextualized Logging Statement GenerationProceedings of the ACM on Software Engineering10.1145/36437541:FSE(609-630)Online publication date: 12-Jul-2024
  • (2024)Spatial-Temporal Masked Autoencoder for Multi-Device Wearable Human Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314157:4(1-25)Online publication date: 12-Jan-2024
  • (2024)UniLog: Automatic Logging via LLM and In-Context LearningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623326(1-12)Online publication date: 20-May-2024
  • (2024)Improving automatic text recognition through atmospheric turbulenceArtificial Intelligence for Security and Defence Applications II10.1117/12.3031801(36)Online publication date: 13-Nov-2024
  • (2024)Computer Vision on the Edge: Individual Cattle Identification in Real-time with ReadMyCow System2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00690(7041-7050)Online publication date: 3-Jan-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media