More Web Proxy on the site http://driver.im/

research-article

NASTER: Non-local Attentional Scene Text Recognizer

Authors:

Richang HongAuthors Info & Claims

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 331 - 338

https://doi.org/10.1145/3460426.3463623

Published: 01 September 2021 Publication History

Abstract

Scene text recognition has been widely investigated in computer vision. In the literature, the encoder-decoder based framework, which first encodes image into feature map and then decodes them into corresponding text sequences, have achieved great success. However, this solution fails in low-quality images, as the local visual features extracted from curved or blurred images are difficult to decode into corresponding text. To address this issue, we propose a new framework for Scene Text Recognition (STR), named Non-Local Attentional Scene Text Recognizer (NASTER). We use ResNet with Global Context Block (GC block) to extract global visual features. The global context information is then captured in parallel using the self-attention module and finally decoded by a multi-layer attention decoder with an intermediate supervision module. The proposed method achieves the state-of-the-art performances on seven benchmark datasets, demonstrating the effectiveness of our approach.

References

[1]

Jeonghun Baek, Geewook Kim, and Junyeop Lee. 2019. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) .

[2]

F. Bai, Z. Cheng, and Y. Niu. 2018. Edit Probability for Scene Text Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1508--1516. https://doi.org/10.1109/CVPR.2018.00163

[3]

Yue Cao, Jiarui Xu, and Stephen Lin. 2019. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. 1971--1980. https://doi.org/10.1109/ICCVW.2019.00246

[4]

Zhanzhan Cheng, Fan Bai, and Yunlu Xu. 2017. Focusing Attention: Towards Accurate Text Recognition in Natural Images. 5086--5094. https://doi.org/10.1109/ICCV.2017.543

[5]

Z. Cheng, Y. Xu, and F. Bai. 2018. AON: Towards Arbitrarily-Oriented Text Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5571--5579. https://doi.org/10.1109/CVPR.2018.00584

[6]

Yunze Gao, Yingying Chen, Jinqiao Wang, Ming Tang, and Hanqing Lu. 2019. Reading scene text with fully convolutional sequence modeling. Neurocomputing, Vol. 339 (2019), 161 -- 170. https://doi.org/10.1016/j.neucom.2019.01.094

Digital Library

[7]

Yuting Gao, Zheng Huang, and Yuchen Dai. 2018. Double Supervised Network with Attention Mechanism for Scene Text Recognition. CoRR, Vol. abs/1808.00677 (2018). arxiv: 1808.00677 http://arxiv.org/abs/1808.00677

[8]

A. Gupta, A. Vedaldi, and A. Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2315--2324. https://doi.org/10.1109/CVPR.2016.254

[9]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90

[10]

T. He, Z. Tian, and W. Huang. 2018. An End-to-End TextSpotter with Explicit Alignment and Attention. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5020--5029. https://doi.org/10.1109/CVPR.2018.00527

[11]

J. Hu, L. Shen, and S. Albanie. 2020. Squeeze-and-Excitation Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 8 (2020), 2011--2023. https://doi.org/10.1109/TPAMI.2019.2913372

Digital Library

[12]

Max Jaderberg, Karen Simonyan, and Andrea Vedaldi. 2016. Reading Text in the Wild with Convolutional Neural Networks. Int. J. Comput. Vision, Vol. 116, 1 (Jan. 2016), 1--20. https://doi.org/10.1007/s11263-015-0823-z

Digital Library

[13]

Kai Wang, B. Babenko, and S. Belongie. 2011. End-to-end scene text recognition. In 2011 International Conference on Computer Vision. 1457--1464. https://doi.org/10.1109/ICCV.2011.6126402

Digital Library

[14]

D. Karatzas, L. Gomez-Bigorda, and A. Nicolaou. 2015. ICDAR 2015 competition on Robust Reading. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR). 1156--1160. https://doi.org/10.1109/ICDAR.2015.7333942

Digital Library

[15]

D. Karatzas, F. Shafait, and S. Uchida. 2013. ICDAR 2013 Robust Reading Competition. In 2013 12th International Conference on Document Analysis and Recognition. 1484--1493. https://doi.org/10.1109/ICDAR.2013.221

Digital Library

[16]

Hui Li, Peng Wang, and Chunhua Shen. 2019. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 8610--8617. https://doi.org/10.1609/aaai.v33i01.33018610

Digital Library

[17]

Ron Litman, Oron Anschel, and Shahar Tsiper. 2020. SCATTER: Selective Context Attentional Scene Text Recognizer. (2020).

[18]

Wei Liu, Chaofeng Chen, and Kwan-Yee K. Wong. 2018. Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 7154--7161. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16327

[19]

Ning Lu, Wenwen Yu, and Xianbiao Qi. 2019. MASTER: Multi-Aspect Non-local Network for Scene Text Recognition. (2019).

[20]

Simon Lucas, Alexandros Panaretos, and Luis Sosa. 2005. ICDAR 2003 Robust Reading Competitions: Entries, Results and Future Directions. IJDAR, Vol. 7 (07 2005), 105--122. https://doi.org/10.1007/s10032-004-0134--3

[21]

Canjie Luo, Lianwen Jin, and Zenghui Sun. 2019. MORAN: A Multi-Object Rectified Attention Network for scene text recognition. Pattern Recognition, Vol. 90 (2019), 109 -- 118. https://doi.org/10.1016/j.patcog.2019.01.020

Digital Library

[22]

Anand Mishra, Karteek Alahari, and C.V. Jawahar. 2012. Scene Text Recognition using Higher Order Language Priors. In BMVC - British Machine Vision Conference. BMVA, Surrey, United Kingdom. https://doi.org/10.5244/C.26.127

[23]

Anand Mishra, Karteek Alahari, and C.V. Jawahar. 2016. Enhancing energy minimization framework for scene text recognition with top-down cues. Computer Vision and Image Understanding, Vol. 145 (2016), 30 -- 42. https://doi.org/10.1016/j.cviu.2016.01.002 Light Field for Computer Vision.

Digital Library

[24]

Adam Paszke, Sam Gross, and Francisco Massa. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., 8026--8037. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

Digital Library

[25]

T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. 2013. Recognizing Text with Perspective Distortion in Natural Scenes. In 2013 IEEE International Conference on Computer Vision. 569--576. https://doi.org/10.1109/ICCV.2013.76

Digital Library

[26]

Zhi Qiao, Xugong Qin, and Yin qing Zhou. 2020. Gaussian Constrained Attention Network for Scene Text Recognition. ArXiv, Vol. abs/2010.09169 (2020).

[27]

Zhi Qiao, Yu Zhou, and Dongbao Yang. 2020. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. arXiv e-prints, Article arXiv:2005.10977 (May 2020), arXiv:2005.10977 pages.arxiv: 2005.10977 [cs.CV]

[28]

Anhar Risnumawan, Palaiahankote Shivakumara, Chee Seng Chan, and Chew Lim Tan. 2014. A robust arbitrary text detection system for natural scene images. Expert Systems with Applications, Vol. 41, 18 (2014), 8027 -- 8048. https://doi.org/10.1016/j.eswa.2014.07.008

[29]

B. Shi, X. Bai, and C. Yao. 2017. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 11 (2017), 2298--2304. https://doi.org/10.1109/TPAMI.2016.2646371

Digital Library

[30]

B. Shi, X. Wang, and P. Lyu. 2016. Robust Scene Text Recognition with Automatic Rectification. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4168--4176. https://doi.org/10.1109/CVPR.2016.452

[31]

B. Shi, M. Yang, and X. Wang. 2019. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 9 (2019), 2035--2048. https://doi.org/10.1109/TPAMI.2018.2848939

[32]

Ashish Vaswani, Noam Shazeer, and Niki Parmar. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., 6000--6010.

Digital Library

[33]

Kai Wang and Serge Belongie. 2010. Word Spotting in the Wild. In Proceedings of the 11th European Conference on Computer Vision: Part I (Heraklion, Crete, Greece) (ECCV'10). Springer-Verlag, Berlin, Heidelberg, 591--604.

[34]

Kelvin Xu, Jimmy Lei Ba, and Ryan Kiros. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France) (ICML'15). JMLR.org, 2048--2057.

[35]

MingKun Yang, Yushuo Guan, and Minghui Liao. 2019. Symmetry-constrained Rectification Network for Scene Text Recognition. arXiv e-prints, Article arXiv:1908.01957 (Aug. 2019), arXiv:1908.01957 pages.arxiv: 1908.01957 [cs.CV]

[36]

Xiao Yang, Dafang He, and Zihan Zhou. 2017. Learning to Read Irregular Text with Attention Mechanisms. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 3280--3286. https://doi.org/10.24963/ijcai.2017/458

[37]

C. Yao, X. Bai, and B. Shi. 2014. Strokelets: A Learned Multi-scale Representation for Scene Text Recognition. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 4042--4049. https://doi.org/10.1109/CVPR.2014.515

Digital Library

[38]

Matthew Zeiler. 2012. ADADELTA: An adaptive learning rate method., Vol. 1212 (12 2012).

[39]

Fangneng Zhan and Shijian Lu. 2019. ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation / IEEE, 2059--2068. https://doi.org/10.1109/CVPR.2019.00216

Cited By

Tan YKong AKim J(2022)Pure Transformer with Integrated Experts for Scene Text RecognitionComputer Vision – ECCV 202210.1007/978-3-031-19815-1_28(481-497)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19815-1_28

Index Terms

NASTER: Non-local Attentional Scene Text Recognizer
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Coded modulation in the block-fading channel: coding theorems and code construction

We consider coded modulation schemes for the block-fading channel. In the setting where a codeword spans a finite number N of fading degrees of freedom, we show that coded modulations of rate R bit per complex dimension, over a finite signal set ýýC of ...
Turbo coding for the noisy 2-user binary adder channel with punctured convolutional codes

This paper investigates the use of punctured recursive systematic convolutional codes for turbo coding in a 2-user binary adder channel (2-BAC) in the presence of additive white Gaussian noise, aiming to achieve a higher transmission sum rate with ...
Efficient maximum-likelihood decoding of LDPC codes over the binary erasure channel

We propose an efficient maximum-likelihood (ML) decoding algorithm for decoding low-density parity-check (LDPC) codes over the binary-erasure channel (BEC). We also analyze the computational complexity of the proposed algorithm.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

August 2021

715 pages

ISBN:9781450384636

DOI:10.1145/3460426

General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICMR '21

Sponsor:

SIGMM

ICMR '21: International Conference on Multimedia Retrieval

August 21 - 24, 2021

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
135
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tan YKong AKim J(2022)Pure Transformer with Integrated Experts for Scene Text RecognitionComputer Vision – ECCV 202210.1007/978-3-031-19815-1_28(481-497)Online publication date: 23-Oct-2022
https://dl.acm.org/doi/10.1007/978-3-031-19815-1_28

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents