More Web Proxy on the site http://driver.im/

research-article

TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting

Authors:

Haiyong ZhengAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6813 - 6821

https://doi.org/10.1145/3503161.3547848

Published: 10 October 2022 Publication History

Abstract

Blind image inpainting is extremely challenging due to the unknown and multi-property complexity of contamination in different contaminated images. Current mainstream work decomposes blind image inpainting into two stages: mask estimating from the contaminated image and image inpainting based on the estimated mask, and this two-stage solution involves two CNN-based encoder-decoder architectures for estimating and inpainting separately. In this work, we propose a novel one-stage Transformer-CNN Hybrid AutoEncoder (TransCNN-HAE) for blind image inpainting, which intuitively follows the inpainting-then-reconstructing pipeline by leveraging global long-range contextual modeling of Transformer to repair contaminated regions and local short-range contextual modeling of CNN to reconstruct the repaired image. Moreover, a Cross-layer Dissimilarity Prompt (CDP) is devised to accelerate the identifying and inpainting of contaminated regions. Ablation studies validate the efficacy of both TransCNN-HAE and CDP, and extensive experiments on various datasets with multi-property contaminations show that our method achieves state-of-the-art performance with much lower computational cost on blind image inpainting. Our code is available at https://github.com/zhenglab/TransCNN-HAE.

Supplementary Material

MP4 File (MM22-fp0432.mp4)

The Presentation video of TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting.

Download
19.74 MB

References

[1]

Coloma Ballester, Marcelo Bertalmio, Vicent Caselles, Guillermo Sapiro, and Joan Verdera. 2001. Filling-in by joint interpolation of vector fields and gray levels. IEEE TIP, Vol. 10, 8 (2001), 1200--1211.

[2]

Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM TOG, Vol. 28, 3 (2009), 24.

Digital Library

[3]

Marcelo Bertalmio, Luminita Vese, Guillermo Sapiro, and Stanley Osher. 2003. Simultaneous structure and texture image inpainting. IEEE TIP, Vol. 12, 8 (2003), 882--889.

[4]

Nian Cai, Zhenghang Su, Zhineng Lin, Han Wang, Zhijing Yang, and Bingo Wing-Kuen Ling. 2017. Blind inpainting using the fully convolutional neural network. The Visual Computer, Vol. 33, 2 (2017), 249--261.

Digital Library

[5]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In ECCV. Springer, 213--229.

[6]

Tony F Chan and Jianhong Shen. 2001. Nontexture inpainting by curvature-driven diffusions. JVCIR, Vol. 12, 4 (2001), 436--449.

Digital Library

[7]

Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In ICML. PMLR, 1691--1703.

[8]

Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, and William T Freeman. 2018. Sparse, smart contours to represent and edit images. In CVPR. IEEE, 3511--3520.

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255.

[10]

Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei Efros. 2012. What makes paris look like paris? ACM TOG, Vol. 31, 4 (2012), 103--110.

Digital Library

[11]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.

[12]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In CVPR. IEEE, 2414--2423.

[13]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. Curran Associates, Inc., 2672--2680.

Digital Library

[14]

Xiefan Guo, Hongyu Yang, and Di Huang. 2021b. Image inpainting via conditional texture and structure dual generation. In ICCV. IEEE, 14134--14143.

[15]

Zonghui Guo, Dongsheng Guo, Haiyong Zheng, Zhaorui Gu, Bing Zheng, and Junyu Dong. 2021a. Image harmonization with transformer. In ICCV. IEEE, 14870--14879.

[16]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR. IEEE, 1125--1134.

[17]

Yifan Jiang, Shiyu Chang, and Zhangyang Wang. 2021. TransGAN: Two pure transformers can make one strong GAN, and that can scale up. In NeurIPS. Curran Associates, Inc., 14745--14758.

[18]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV. Springer, 694--711.

[19]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In ICLR.

[20]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. IEEE, 4401--4410.

[21]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.

[22]

Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D object representations for fine-grained categorization. In ICCVW. 554--561.

[23]

Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In ECCV. Springer, 85--100.

[24]

Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, and Chao Yang. 2020. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In ECCV. Springer, 725--741.

[25]

Yang Liu, Jinshan Pan, and Zhixun Su. 2019. Deep blind image inpainting. In IScIDE. Springer, 128--141.

[26]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical vision transformer using shifted windows. In ICCV. IEEE, 10012--10022.

[27]

Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. 2019. EdgeConnect: Structure guided image inpainting using edge prediction. In ICCVW. IEEE, 3265--3274.

[28]

M-E. Nilsback and A. Zisserman. 2008. Automated flower classification over a large number of classes. In ICVGIP. ACM, 722--729.

[29]

Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao Huang. 2021. 3D object detection with pointformer. In CVPR. IEEE, 7463--7472.

[30]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context Encoders: Feature learning by inpainting. In CVPR. IEEE, 2536--2544.

[31]

Min-cheol Sagong, Yong-goo Shin, Seung-wook Kim, Seung Park, and Sung-jea Ko. 2019. PEPSI: Fast image inpainting with parallel decoding network. In CVPR. IEEE, 11360--11368.

[32]

Linsen Song, Jie Cao, Lingxiao Song, Yibo Hu, and Ran He. 2019. Geometry-aware face completion and editing. In AAAI. AAAI Press, 2506--2513.

[33]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. Curran Associates, Inc., 5998--6008.

[34]

Tengfei Wang, Hao Ouyang, and Qifeng Chen. 2021. Image inpainting with external-internal learning and monochromic bottleneck. In CVPR. IEEE, 5120--5129.

[35]

Yi Wang, Ying-Cong Chen, Xin Tao, and Jiaya Jia. 2020. VCNet: A robust approach to blind image inpainting. In ECCV. Springer, 752--768.

[36]

Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, and Jiaya Jia. 2018. Image inpainting via generative multi-column convolutional neural networks. In NeurIPS. Curran Associates, Inc., 331--340.

[37]

Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, and Shiguang Shan. 2018. Shift-Net: Image inpainting via deep feature rearrangement. In ECCV. Springer, 1--17.

[38]

Raymond A Yeh, Chen Chen, Teck Yian Lim, Alexander G Schwing, Mark Hasegawa-Johnson, and Minh N Do. 2017. Semantic image inpainting with deep generative models. In CVPR. IEEE, 5485--5493.

[39]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In CVPR. IEEE, 5505--5514.

[40]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In ICCV. IEEE, 4471--4480.

[41]

Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zi-Hang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-Token ViT: Training vision transformers from scratch on ImageNet. In ICCV. IEEE, 558--567.

[42]

Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. 2019. Learning pyramid-context encoder network for high-quality image inpainting. In CVPR. IEEE, 1486--1494.

[43]

Shu Zhang, Ran He, Zhenan Sun, and Tieniu Tan. 2017. DemeshNet: Blind face inpainting for deep meshface verification. IEEE TIFS, Vol. 13, 3 (2017), 637--647.

[44]

Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. 2021. Large scale image completion via co-modulated generative adversarial networks. In ICLR.

[45]

Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE TPAMI, Vol. 40, 6 (2017), 1452--1464.

[46]

Jun-Yan Zhu, Philipp Kr"ahenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In ECCV. Springer, 597--613.

Cited By

Li CXu DChen K(2024)Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image InpaintingElectronics10.3390/electronics1310185213:10(1852)Online publication date: 9-May-2024
https://doi.org/10.3390/electronics13101852
Li XWang ZChen CTao CQiu YLiu JSun B(2024)SemID: Blind Image Inpainting with Semantic Inconsistency DetectionTsinghua Science and Technology10.26599/TST.2023.901007929:4(1053-1068)Online publication date: Aug-2024
https://doi.org/10.26599/TST.2023.9010079
Zhu SXue HNie NZhu CLiu HFang PCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Reproducing the Past: A Dataset for Benchmarking Inscription RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680587(7714-7723)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680587
Show More Cited By

Index Terms

TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Reconstruction
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A Method for Face Image Inpainting Based on Autoencoder and Generative Adversarial Network
Image and Video Technology
Abstract
Face image inpainting has great value in the fields of computer vision and digital image processing. In this paper, we propose a face image inpainting method based on autoencoder and Generative Adversarial Network (GAN). The neural network for ...
Edge-Guided Image Inpainting with Transformer
Advances in Visual Computing
Abstract
Image inpainting aims to complete missing regions by extracting the features of the image through the information of the known region. Traditional image inpainting approaches like patch-based and diffusion-based methods are robust for simple ...
Image Inpainting: A Review
Abstract
Although image inpainting, or the art of repairing the old and deteriorated images, has been around for many years, it has recently gained even more popularity, because of the recent development in image processing techniques. With the improvement ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fundamental Research Funds for the Central Universities
National Natural Science Foundation of China
National Natural Science Foundation of China

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
523
Total Downloads

Downloads (Last 12 months)228
Downloads (Last 6 weeks)37

Reflects downloads up to 18 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li CXu DChen K(2024)Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image InpaintingElectronics10.3390/electronics1310185213:10(1852)Online publication date: 9-May-2024
https://doi.org/10.3390/electronics13101852
Li XWang ZChen CTao CQiu YLiu JSun B(2024)SemID: Blind Image Inpainting with Semantic Inconsistency DetectionTsinghua Science and Technology10.26599/TST.2023.901007929:4(1053-1068)Online publication date: Aug-2024
https://doi.org/10.26599/TST.2023.9010079
Zhu SXue HNie NZhu CLiu HFang PCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Reproducing the Past: A Dataset for Benchmarking Inscription RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680587(7714-7723)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680587
Tao H(2024)Weakly-Supervised Pavement Surface Crack Segmentation Based on Dual Separation and Domain GeneralizationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.346452825:12(19729-19743)Online publication date: Dec-2024
https://doi.org/10.1109/TITS.2024.3464528
Zhou JDai PLi YHu MCao X(2024)Explicitly-Decoupled Text Transfer With Minimized Background Reconstruction for Scene Text EditingIEEE Transactions on Image Processing10.1109/TIP.2024.347735533(5921-5935)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3477355
Zhao RHuang WLiu WLi WZhong X(2024)Facial Feature Priors Guided Blind Face Inpainting2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10664817(1-6)Online publication date: 5-Aug-2024
https://doi.org/10.1109/ICIEA61579.2024.10664817
Li CXu DZhang H(2024)Multi‐stage image inpainting using improved partial convolutionsIET Image Processing10.1049/ipr2.1317818:12(3343-3355)Online publication date: 30-Jul-2024
https://doi.org/10.1049/ipr2.13178
Zhou HTao HDuan QHu ZDeng Y(2024)An end-to-end repair-based joint training framework for weakly supervised pavement crack segmentationMultimedia Tools and Applications10.1007/s11042-024-19691-xOnline publication date: 27-Jun-2024
https://doi.org/10.1007/s11042-024-19691-x
Li CLin YChiu W(2023)Decontamination Transformer For Blind Image InpaintingICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10094950(1-5)Online publication date: 4-Jun-2023
https://doi.org/10.1109/ICASSP49357.2023.10094950
Phutke SKulkarni AVipparthi SMurala S(2023)Blind Image Inpainting via Omni-dimensional Gated Attention and Wavelet Queries2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00132(1251-1260)Online publication date: Jun-2023
https://doi.org/10.1109/CVPRW59228.2023.00132

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents