[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

TSINIT: A Two-Stage Inpainting Network for Incomplete Text

Published: 07 July 2022 Publication History

Abstract

Although there are lots of studies on scene text recognition, few of them focus on the recognition of the incomplete text. The recognition performance of existing text recognition algorithms on the incomplete text is far from the expected, and the recognition of the incomplete text is still challenging. In this paper, an end-to-end Two-Stage Inpainting Network for Incomplete Text (TSINIT) is proposed to reconstruct the incomplete text into the complete one even when the text is in various styles and with various backgrounds, and the reconstructed text can be recognized by the existing text recognition algorithms correctly. The proposed TSINIT is divided into text extraction module (TEM) and text reconstruction module (TRM) to make the inpainting only focus on the text. TEM separates the incomplete text from the background and character-like regions at the pixel level, which can reduce the ambiguity of text reconstruction caused by the background. TRM reconstructs the incomplete text towards the most possible text with the consideration of the abstract and semantic structures of the text. Furthermore, we build a synthetic incomplete text dataset (SITD), which contains contaminated and abraded text images. SITD is divided into 6 incomplete levels according to the number of pixels in the incomplete regions and the ratio of the incomplete characters to all characters. The experimental results show that the proposed method has better inpainting ability for the incomplete text compared with traditional image inpainting algorithms on the proposed SITD and real images. When using the same text recognition method, the recognition accuracy of the incomplete text on SITD can be improved much more with the help of the proposed TSINIT than with the traditional image inpainting methods.

References

[1]
X. Rong, C. Yi, and Y. Tian, “Recognizing text-based traffic guide panels with cascaded localization network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 109–121.
[2]
Y. Zhu, M. Liao, M. Yang, and W. Liu, “Cascaded segmentation-detection networks for text-based traffic sign detection,” Trans. Intell. Transp. Syst., vol. 9, no. 1, pp. 209–219, 2017.
[3]
H. Cui, L. Zhu, J. Li, Y. Yang, and L. Nie, “Scalable deep hashing for large-scale social image retrieval,” IEEE Trans. Image Process., vol. 29, pp. 1271–1284, 2020.
[4]
L. Zhu, X. Lu, Z. Cheng, J. Li, and H. Zhang, “Deep collaborative multi-view hashing for large-scale image search,” IEEE Trans. Image Process., vol. 29, pp. 4643–4655, 2020.
[5]
X. Lu, L. Zhu, Z. Cheng, L. Nie, and H. Zhang, “Online multi-modal hashing with dynamic query-adaption,” in Proc. 42nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2019, pp. 715–724.
[6]
X. Shen et al., “Deep co-image-label hashing for multi-label image retrieval,” IEEE Trans. Multimedia, vol. 24, pp. 1116–1126, 2022.
[7]
B. Shi et al., “ASTER: An attentional scene text recognizer with flexible rectification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 9, pp. 2035–2048, Sep. 2019.
[8]
R. Litman et al., “Scatter: Selective context attentional scene text recognizer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11962–11972.
[9]
Z. Qiao, X. Qin, Y. Zhou, F. Yang, and W. Wang, “Gaussian constrained attention network for scene text recognition,” in Proc. 25th Int. Conf. Pattern Recognit., 2021, pp. 3328–3335.
[10]
S. Fang, H. Xie, Y. Wang, Z. Mao, and Y. Zhang, “Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 7098–7107.
[11]
T. Wang et al., “Decoupled attention network for text recognition,” in Proc. AAAI Conf. Artif. Intell., 2020, vol. 34, no. 07, pp. 12216–12224.
[12]
B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2298–2304, Nov. 2017.
[13]
Z. Qiao, Y. Zhou, D. Yang, Y. Zhou, and W. Wang, “Seed: Semantics enhanced encoder-decoder framework for scene text recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13528–13537.
[14]
I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst., 2014, vol. 27, pp. 2672–2680.
[15]
M. Mirza and S. Osindero, “Conditional generative adversarial nets,” in Proc. NIPS Deep Learn. Representation Learn. Workshop, 2014.
[16]
P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1125–1134.
[17]
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein GANs,” in Proc. Adv. Neural Inf. Process. Syst., 2017, vol. 30, pp. 5767–5777.
[18]
J. Yu et al., “Generative image inpainting with contextual attention,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 5505–5514.
[19]
C. Xie et al., “Image inpainting with learnable bidirectional attention maps,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8858–8867.
[20]
Y. Wang, X. Tao, X. Qi, X. Shen, and J. Jia, “Image inpainting via generative multi-column convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2018, vol. 31, pp. 329–338.
[21]
R. A. Yeh et al., “Semantic image inpainting with deep generative models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5485–5493.
[22]
D. Yu et al., “Towards accurate scene text recognition with semantic reasoning networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 12113–12122.
[23]
P. He, W. Huang, Y. Qiao, C. Loy, and X. Tang, “Reading scene text in deep convolutional sequences,” in Proc. 30th AAAI Conf. Artif. Intell., 2016, pp. 3501–3508.
[24]
A. Bhunia, A. Sain, P. Chowdhury, and Y. Song, “Text is text, no matter what: Unifying text recognition using knowledge distillation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, 963–972.
[25]
A. Bhunia et al., “Joint visual semantic reasoning: Multi-stage decoder for text recognition,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, 14920–14929.
[26]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[27]
R. L. Graham, “An efficient algorithm for determining the convex hull of a finite planar set,” Info. Pro. Lett, vol. 1, pp. 132–133, 1972.
[28]
Y. Liu, J. Pan, and Z. Su, “Deep blind image inpainting,” in Proc. Int. Conf. Intell. Sci. Big Data Eng., 2019, pp. 128–141.
[29]
R. Karkare, R. Paffenroth, and G. Mahindre, “Blind image denoising and inpainting using robust hadamard autoencoders,” 2021, arXiv:2101.10876.
[30]
Y. Wang, Y. Chen, X. Tao, and J. Jia, “VCNet: A robust approach to blind image inpainting,” in Proc. 16th Eur. Conf. Comput. Vis., Glasgow, U.K., 2020, pp. 752–768.
[31]
X. Li et al., “Blind image inpainting using pyramid GAN on thyroid ultrasound images,” in Proc. IEEE Int. Conf. Bioinf. Biomed., 2019, pp. 678–683.
[32]
L. Wang et al., “Automatic consecutive context perceived transformer GAN for serial sectioning image blind inpainting,” Comput. Biol. Med., vol. 136, 2021, Art. no.
[33]
Y. Ibrahim, B. Nagy, and C. Benedek, “A GAN-based blind inpainting method for masonry wall images,” in Proc. 25th Int. Conf. Pattern Recognit., 2021, pp. 3178–3185.
[34]
A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2315–2324.
[35]
Y. Wang et al., “From two to one: A new scene text recognizer with visual language modeling network,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 14194–14203.
[36]
M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 11474–11481.
[37]
V. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Sov. Phys. Doklady, vol. 10, no. 8, pp. 707–710, 1966.

Cited By

View all
  • (2024)Reproducing the Past: A Dataset for Benchmarking Inscription RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680587(7714-7723)Online publication date: 28-Oct-2024
  • (2024)Learning Feature Semantic Matching for Spatio-Temporal Video GroundingIEEE Transactions on Multimedia10.1109/TMM.2024.338769626(9268-9279)Online publication date: 1-Jan-2024
  • (2024)MMGInpainting: Multi-Modality Guided Image Inpainting Based on Diffusion ModelsIEEE Transactions on Multimedia10.1109/TMM.2024.338248426(8811-8823)Online publication date: 1-Jan-2024

Index Terms

  1. TSINIT: A Two-Stage Inpainting Network for Incomplete Text
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Multimedia
        IEEE Transactions on Multimedia  Volume 25, Issue
        2023
        8932 pages

        Publisher

        IEEE Press

        Publication History

        Published: 07 July 2022

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 01 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Reproducing the Past: A Dataset for Benchmarking Inscription RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680587(7714-7723)Online publication date: 28-Oct-2024
        • (2024)Learning Feature Semantic Matching for Spatio-Temporal Video GroundingIEEE Transactions on Multimedia10.1109/TMM.2024.338769626(9268-9279)Online publication date: 1-Jan-2024
        • (2024)MMGInpainting: Multi-Modality Guided Image Inpainting Based on Diffusion ModelsIEEE Transactions on Multimedia10.1109/TMM.2024.338248426(8811-8823)Online publication date: 1-Jan-2024

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media