[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3503161.3547848acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting

Published: 10 October 2022 Publication History

Abstract

Blind image inpainting is extremely challenging due to the unknown and multi-property complexity of contamination in different contaminated images. Current mainstream work decomposes blind image inpainting into two stages: mask estimating from the contaminated image and image inpainting based on the estimated mask, and this two-stage solution involves two CNN-based encoder-decoder architectures for estimating and inpainting separately. In this work, we propose a novel one-stage Transformer-CNN Hybrid AutoEncoder (TransCNN-HAE) for blind image inpainting, which intuitively follows the inpainting-then-reconstructing pipeline by leveraging global long-range contextual modeling of Transformer to repair contaminated regions and local short-range contextual modeling of CNN to reconstruct the repaired image. Moreover, a Cross-layer Dissimilarity Prompt (CDP) is devised to accelerate the identifying and inpainting of contaminated regions. Ablation studies validate the efficacy of both TransCNN-HAE and CDP, and extensive experiments on various datasets with multi-property contaminations show that our method achieves state-of-the-art performance with much lower computational cost on blind image inpainting. Our code is available at https://github.com/zhenglab/TransCNN-HAE.

Supplementary Material

MP4 File (MM22-fp0432.mp4)
The Presentation video of TransCNN-HAE: Transformer-CNN Hybrid AutoEncoder for Blind Image Inpainting.

References

[1]
Coloma Ballester, Marcelo Bertalmio, Vicent Caselles, Guillermo Sapiro, and Joan Verdera. 2001. Filling-in by joint interpolation of vector fields and gray levels. IEEE TIP, Vol. 10, 8 (2001), 1200--1211.
[2]
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM TOG, Vol. 28, 3 (2009), 24.
[3]
Marcelo Bertalmio, Luminita Vese, Guillermo Sapiro, and Stanley Osher. 2003. Simultaneous structure and texture image inpainting. IEEE TIP, Vol. 12, 8 (2003), 882--889.
[4]
Nian Cai, Zhenghang Su, Zhineng Lin, Han Wang, Zhijing Yang, and Bingo Wing-Kuen Ling. 2017. Blind inpainting using the fully convolutional neural network. The Visual Computer, Vol. 33, 2 (2017), 249--261.
[5]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In ECCV. Springer, 213--229.
[6]
Tony F Chan and Jianhong Shen. 2001. Nontexture inpainting by curvature-driven diffusions. JVCIR, Vol. 12, 4 (2001), 436--449.
[7]
Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In ICML. PMLR, 1691--1703.
[8]
Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, and William T Freeman. 2018. Sparse, smart contours to represent and edit images. In CVPR. IEEE, 3511--3520.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255.
[10]
Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei Efros. 2012. What makes paris look like paris? ACM TOG, Vol. 31, 4 (2012), 103--110.
[11]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.
[12]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In CVPR. IEEE, 2414--2423.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS. Curran Associates, Inc., 2672--2680.
[14]
Xiefan Guo, Hongyu Yang, and Di Huang. 2021b. Image inpainting via conditional texture and structure dual generation. In ICCV. IEEE, 14134--14143.
[15]
Zonghui Guo, Dongsheng Guo, Haiyong Zheng, Zhaorui Gu, Bing Zheng, and Junyu Dong. 2021a. Image harmonization with transformer. In ICCV. IEEE, 14870--14879.
[16]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR. IEEE, 1125--1134.
[17]
Yifan Jiang, Shiyu Chang, and Zhangyang Wang. 2021. TransGAN: Two pure transformers can make one strong GAN, and that can scale up. In NeurIPS. Curran Associates, Inc., 14745--14758.
[18]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV. Springer, 694--711.
[19]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In ICLR.
[20]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR. IEEE, 4401--4410.
[21]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
[22]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D object representations for fine-grained categorization. In ICCVW. 554--561.
[23]
Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In ECCV. Springer, 85--100.
[24]
Hongyu Liu, Bin Jiang, Yibing Song, Wei Huang, and Chao Yang. 2020. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In ECCV. Springer, 725--741.
[25]
Yang Liu, Jinshan Pan, and Zhixun Su. 2019. Deep blind image inpainting. In IScIDE. Springer, 128--141.
[26]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical vision transformer using shifted windows. In ICCV. IEEE, 10012--10022.
[27]
Kamyar Nazeri, Eric Ng, Tony Joseph, Faisal Qureshi, and Mehran Ebrahimi. 2019. EdgeConnect: Structure guided image inpainting using edge prediction. In ICCVW. IEEE, 3265--3274.
[28]
M-E. Nilsback and A. Zisserman. 2008. Automated flower classification over a large number of classes. In ICVGIP. ACM, 722--729.
[29]
Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao Huang. 2021. 3D object detection with pointformer. In CVPR. IEEE, 7463--7472.
[30]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context Encoders: Feature learning by inpainting. In CVPR. IEEE, 2536--2544.
[31]
Min-cheol Sagong, Yong-goo Shin, Seung-wook Kim, Seung Park, and Sung-jea Ko. 2019. PEPSI: Fast image inpainting with parallel decoding network. In CVPR. IEEE, 11360--11368.
[32]
Linsen Song, Jie Cao, Lingxiao Song, Yibo Hu, and Ran He. 2019. Geometry-aware face completion and editing. In AAAI. AAAI Press, 2506--2513.
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. Curran Associates, Inc., 5998--6008.
[34]
Tengfei Wang, Hao Ouyang, and Qifeng Chen. 2021. Image inpainting with external-internal learning and monochromic bottleneck. In CVPR. IEEE, 5120--5129.
[35]
Yi Wang, Ying-Cong Chen, Xin Tao, and Jiaya Jia. 2020. VCNet: A robust approach to blind image inpainting. In ECCV. Springer, 752--768.
[36]
Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, and Jiaya Jia. 2018. Image inpainting via generative multi-column convolutional neural networks. In NeurIPS. Curran Associates, Inc., 331--340.
[37]
Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, and Shiguang Shan. 2018. Shift-Net: Image inpainting via deep feature rearrangement. In ECCV. Springer, 1--17.
[38]
Raymond A Yeh, Chen Chen, Teck Yian Lim, Alexander G Schwing, Mark Hasegawa-Johnson, and Minh N Do. 2017. Semantic image inpainting with deep generative models. In CVPR. IEEE, 5485--5493.
[39]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In CVPR. IEEE, 5505--5514.
[40]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In ICCV. IEEE, 4471--4480.
[41]
Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zi-Hang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-Token ViT: Training vision transformers from scratch on ImageNet. In ICCV. IEEE, 558--567.
[42]
Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. 2019. Learning pyramid-context encoder network for high-quality image inpainting. In CVPR. IEEE, 1486--1494.
[43]
Shu Zhang, Ran He, Zhenan Sun, and Tieniu Tan. 2017. DemeshNet: Blind face inpainting for deep meshface verification. IEEE TIFS, Vol. 13, 3 (2017), 637--647.
[44]
Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. 2021. Large scale image completion via co-modulated generative adversarial networks. In ICLR.
[45]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE TPAMI, Vol. 40, 6 (2017), 1452--1464.
[46]
Jun-Yan Zhu, Philipp Kr"ahenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In ECCV. Springer, 597--613.

Cited By

View all
  • (2024)Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image InpaintingElectronics10.3390/electronics1310185213:10(1852)Online publication date: 9-May-2024
  • (2024)SemID: Blind Image Inpainting with Semantic Inconsistency DetectionTsinghua Science and Technology10.26599/TST.2023.901007929:4(1053-1068)Online publication date: Aug-2024
  • (2024)Reproducing the Past: A Dataset for Benchmarking Inscription RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680587(7714-7723)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CNN
  2. autoencoder
  3. blind image inpainting
  4. transformer

Qualifiers

  • Research-article

Funding Sources

  • Fundamental Research Funds for the Central Universities
  • National Natural Science Foundation of China
  • National Natural Science Foundation of China

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)228
  • Downloads (Last 6 weeks)37
Reflects downloads up to 18 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hierarchical Vector-Quantized Variational Autoencoder and Vector Credibility Mechanism for High-Quality Image InpaintingElectronics10.3390/electronics1310185213:10(1852)Online publication date: 9-May-2024
  • (2024)SemID: Blind Image Inpainting with Semantic Inconsistency DetectionTsinghua Science and Technology10.26599/TST.2023.901007929:4(1053-1068)Online publication date: Aug-2024
  • (2024)Reproducing the Past: A Dataset for Benchmarking Inscription RestorationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680587(7714-7723)Online publication date: 28-Oct-2024
  • (2024)Weakly-Supervised Pavement Surface Crack Segmentation Based on Dual Separation and Domain GeneralizationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.346452825:12(19729-19743)Online publication date: Dec-2024
  • (2024)Explicitly-Decoupled Text Transfer With Minimized Background Reconstruction for Scene Text EditingIEEE Transactions on Image Processing10.1109/TIP.2024.347735533(5921-5935)Online publication date: 2024
  • (2024)Facial Feature Priors Guided Blind Face Inpainting2024 IEEE 19th Conference on Industrial Electronics and Applications (ICIEA)10.1109/ICIEA61579.2024.10664817(1-6)Online publication date: 5-Aug-2024
  • (2024)Multi‐stage image inpainting using improved partial convolutionsIET Image Processing10.1049/ipr2.1317818:12(3343-3355)Online publication date: 30-Jul-2024
  • (2024)An end-to-end repair-based joint training framework for weakly supervised pavement crack segmentationMultimedia Tools and Applications10.1007/s11042-024-19691-xOnline publication date: 27-Jun-2024
  • (2023)Decontamination Transformer For Blind Image InpaintingICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10094950(1-5)Online publication date: 4-Jun-2023
  • (2023)Blind Image Inpainting via Omni-dimensional Gated Attention and Wavelet Queries2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW59228.2023.00132(1251-1260)Online publication date: Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media