More Web Proxy on the site http://driver.im/

research-article

Open access

Barbershop: GAN-based image compositing using segmentation masks

Authors:

Peter WonkaAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 40, Issue 6

Article No.: 215, Pages 1 - 13

https://doi.org/10.1145/3478513.3480537

Published: 10 December 2021 Publication History

Abstract

Seamlessly blending features from multiple images is extremely challenging because of complex relationships in lighting, geometry, and partial occlusion which cause coupling between different parts of the image. Even though recent work on GANs enables synthesis of realistic hair or faces, it remains difficult to combine them into a single, coherent, and plausible image rather than a disjointed set of image patches. We present a novel solution to image blending, particularly for the problem of hairstyle transfer, based on GAN-inversion. We propose a novel latent space for image blending which is better at preserving detail and encoding spatial information, and propose a new GAN-embedding algorithm which is able to slightly modify images to conform to a common segmentation mask. Our novel representation enables the transfer of the visual properties from multiple reference images including specific details such as moles and wrinkles, and because we do image blending in a latent-space we are able to synthesize images that are coherent. Our approach avoids blending artifacts present in other approaches and finds a globally consistent image. Our results demonstrate a significant improvement over the current state of the art in a user study, with users preferring our blending solution over 95 percent of the time. Source code for the new approach is available at https://zpdesu.github.io/Barbershop.

Supplementary Material

ZIP File (a215-zhu.zip)

Supplemental files.

Download
474.42 MB

MP4 File (a215-zhu.mp4)

Download
58.14 MB

MP4 File (3478513.3480537.mp4)

presentation

Download
388.96 MB

References

[1]

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea, 4432--4441.

[2]

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020a. Image2stylegan++: How to edit the embedded images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Venice, Italy, 8296--8305.

[3]

Rameen Abdal, Peihao Zhu, Niloy Mitra, and Peter Wonka. 2020b. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. arXiv e-prints (2020), arXiv-2008.

[4]

David Bau, Hendrik Strobelt, William Peebles, Jonas Wulff, Bolei Zhou, Jun-Yan Zhu, and Antonio Torralba. 2019. Semantic Photo Manipulation with a Generative Image Prior. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH) 38, 4 (2019).

[5]

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, and Antonio Torralba. 2020. Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences (2020).

[6]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv:1809.11096 [cs.LG]

[7]

Anpei Chen, Ruiyang Liu, Ling Xie, and Jingyi Yu. 2020. A Free Viewpoint Portrait Generator with Dynamic Styling. arXiv preprint arXiv:2007.03780 (2020).

[8]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. 2018. Neural Ordinary Differential Equations. arXiv:1806.07366 [cs.LG]

[9]

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (Jun 2018).

[10]

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. StarGAN v2: Diverse Image Synthesis for Multiple Domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[11]

Edo Collins, Raja Bala, Bob Price, and Sabine Süsstrunk. 2020. Editing in Style: Uncovering the Local Semantics of GANs. arXiv:2004.14367 [cs.CV]

[12]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.

[13]

Patrick Esser, Robin Rombach, and Björn Ommer. 2020. Taming Transformers for High-Resolution Image Synthesis. arXiv:2012.09841 [cs.CV]

[14]

William Fedus, Ian Goodfellow, and Andrew M. Dai. 2018. MaskGAN: Better Text Generation via Filling in the _____ . arXiv:1801.07736 [stat.ML]

[15]

Neural Filters. [n.d.]. Adobe Photoshop. https://helpx.adobe.com/photoshop/using/neural-filters.html.

[16]

Anna Frühstück, Ibraheem Alhashim, and Peter Wonka. 2019. TileGAN: synthesis of large-scale non-homogeneous textures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1--11.

Digital Library

[17]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Networks. arXiv:1406.2661 [stat.ML]

[18]

Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. arXiv preprint arXiv:2004.02546 (2020).

[19]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems. 6626--6637.

[20]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. CVPR (2017).

[21]

Youngjoo Jo and Jongyoul Park. 2019. SC-FEGAN: Face Editing Generative Adversarial Network With User's Sketch and Color. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (Oct 2019).

[22]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv:1710.10196 [cs.NE]

[23]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.

[24]

Tero Karras, Samuli Laine, and Timo Aila. 2018. A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948 (2018).

[25]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and Improving the Image Quality of StyleGAN. In Proc. CVPR.

[26]

Hyunsu Kim, Yunjey Choi, Junho Kim, Sungjoo Yoo, and Youngjung Uh. 2021. StyleMap-GAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing. arXiv preprint arXiv:2104.14754 (2021).

[27]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).

[28]

Louis Landweber. 1951. An iteration formula for Fredholm integral equations of the first kind. American journal of mathematics 73, 3 (1951), 615--624.

[29]

Yifang Men, Yiming Mao, Yuning Jiang, Wei-Ying Ma, and Zhouhui Lian. 2020. Controllable Person Image Synthesis With Attribute-Decomposed GAN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2020).

[30]

Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs.LG]

[31]

Kyle Olszewski, Duygu Ceylan, Jun Xing, Jose Echevarria, Zhili Chen, Weikai Chen, and Hao Li. 2020. Intuitive, Interactive Beard and Hair Synthesis With Generative Models. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2020).

[32]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[33]

Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery. arXiv:2103.17249 [cs.CV]

[34]

Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. Faceshop. ACM Transactions on Graphics 37, 4 (Aug 2018), 1--13.

Digital Library

[35]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [cs.LG]

[36]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020).

[37]

Rohit Saha, Brendan Duke, Florian Shkurti, Graham W. Taylor, and Parham Aarabi. 2021. LOHO: Latent Optimization of Hairstyles via Orthogonalization. arXiv:2103.03891 [cs.CV]

[38]

Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P. Kingma. 2017. PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications. In ICLR.

[39]

Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[40]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[41]

Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, and Nenghai Yu. 2020. MichiGAN. ACM Transactions on Graphics 39, 4 (Jul 2020).

Digital Library

[42]

Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhofer, and Christian Theobalt. 2020a. Stylerig: Rigging stylegan for 3d control over portrait images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6142--6151.

[43]

Ayush Tewari, Mohamed Elgharib, Mallikarjun BR, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zöllhofer, and Christian Theobalt. 2020b. PIE: Portrait Image Embedding for Semantic Control. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia) 39, 6.

Digital Library

[44]

Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. 2021. Designing an Encoder for StyleGAN Image Manipulation. arXiv preprint arXiv:2102.02766 (2021).

[45]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[46]

Zongze Wu, Dani Lischinski, and Eli Shechtman. 2020. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. arXiv preprint arXiv:2011.12799 (2020).

[47]

Shuai Yang, Zhangyang Wang, Jiaying Liu, and Zongming Guo. 2020. Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches. Lecture Notes in Computer Science (2020), 601--617.

Digital Library

[48]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. Lecture Notes in Computer Science (2018), 334--349.

Digital Library

[49]

Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. arXiv preprint arXiv:1506.03365 (2015).

[50]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.

[51]

Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. 2021. Large Scale Image Completion via Co-Modulated Generative Adversarial Networks. In International Conference on Learning Representations (ICLR).

[52]

Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020c. In-domain gan inversion for real image editing. In European Conference on Computer Vision. Springer, 592--608.

Digital Library

[53]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017a. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. 2017 IEEE International Conference on Computer Vision (ICCV) (Oct 2017).

[54]

Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward Multimodal Image-to-Image Translation. arXiv:1711.11586 [cs.CV]

[55]

Peihao Zhu, Rameen Abdal, Yipeng Qin, John Femiani, and Peter Wonka. 2020b. Improved StyleGAN Embedding: Where are the Good Latents? arXiv:2012.09036 [cs.CV]

[56]

Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020a. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Jun 2020).

Cited By

Piao MSheng YYan JJin C(2024)Image Hash Layer Triggered CNN Framework for Wafer Map Failure Pattern Retrieval and ClassificationACM Transactions on Knowledge Discovery from Data10.1145/363805318:4(1-26)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3638053
He RJiao GLi C(2024)ETBHD‐HMF: A Hierarchical Multimodal Fusion Architecture for Enhanced Text‐Based Hair DesignComputer Graphics Forum10.1111/cgf.1519443:6Online publication date: 3-Sep-2024
https://doi.org/10.1111/cgf.15194
Katsumata KVo DLiu BNakayama H(2024)Revisiting Latent Space of GAN Inversion for Robust Real Image Editing2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00523(5301-5310)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00523
Show More Cited By

Index Terms

Barbershop: GAN-based image compositing using segmentation masks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
      2. Image and video acquisition
        Computational photography
  2. Computer graphics

Index terms have been assigned to the content through auto-classification.

Recommendations

Nonlinear multiresolution image blending

We study contrast enhancement for multiresolution image blending. In image compositing, image stitching, and image fusion, a blending operator combines coefficients of a pixel array, an image pyramid, a wavelet decomposition, or a gradient domain ...
Conditional reiterative High-Fidelity GAN inversion for image editing
Abstract
Our work introduces a conditional reiteration mechanism for High-Fidelity GAN (Generative Adversarial Networks) inversion (HFGI), preserving image-specific details (like background, appearance, etc.) for both normal and out-of-domain images (e.g. ...
Graphical abstract

Display Omitted
Highlights
- We proposed a Conditional Repetition Branch that aids in preserving the high-confidence region, capturing image-specific.
- The proposed method significantly improves the performance of reconstructing and editing out-of-the-domain ...
Multi-scale image harmonization
SIGGRAPH '10: ACM SIGGRAPH 2010 papers

Traditional image compositing techniques, such as alpha matting and gradient domain compositing, are used to create composites that have plausible boundaries. But when applied to images taken from different sources or shot under different conditions, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 40, Issue 6

December 2021

1351 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3478513

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 December 2021

Published in TOG Volume 40, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

55
Total Citations
View Citations
677
Total Downloads

Downloads (Last 12 months)195
Downloads (Last 6 weeks)19

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Piao MSheng YYan JJin C(2024)Image Hash Layer Triggered CNN Framework for Wafer Map Failure Pattern Retrieval and ClassificationACM Transactions on Knowledge Discovery from Data10.1145/363805318:4(1-26)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3638053
He RJiao GLi C(2024)ETBHD‐HMF: A Hierarchical Multimodal Fusion Architecture for Enhanced Text‐Based Hair DesignComputer Graphics Forum10.1111/cgf.1519443:6Online publication date: 3-Sep-2024
https://doi.org/10.1111/cgf.15194
Katsumata KVo DLiu BNakayama H(2024)Revisiting Latent Space of GAN Inversion for Robust Real Image Editing2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00523(5301-5310)Online publication date: 3-Jan-2024
https://doi.org/10.1109/WACV57701.2024.00523
Song XLiu CZheng YFeng ZLi LZhou KYu X(2024)HairStyle Editing via Parametric Controllable StrokesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.324189430:7(3857-3870)Online publication date: Jul-2024
https://doi.org/10.1109/TVCG.2023.3241894
Oldfield JTzelepis CPanagakis YNicolaou MPatras I(2024)Bilinear Models of Parts and Appearances in Generative Adversarial NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341550646:12(8568-8579)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3415506
Huang YZheng YSu YBolimera AZhang HChen FSavvides M(2024)A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing2024 IEEE International Joint Conference on Biometrics (IJCB)10.1109/IJCB62174.2024.10744438(1-10)Online publication date: 15-Sep-2024
https://doi.org/10.1109/IJCB62174.2024.10744438
Hu JZhang X(2024)StyleEditorGAN: Transformer-Based Image Inversion and Realistic Facial Editing2024 11th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA63982.2024.00057(374-380)Online publication date: 2-Nov-2024
https://doi.org/10.1109/DSA63982.2024.00057
Kim JOh CDo HKim SSohn K(2024)Diffusion-Driven GAN Inversion for Multi-Modal Face Image Generation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00990(10403-10412)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00990
Bobkov DTitov VAlanov AVetrov D(2024)The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00892(9337-9346)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00892
Tuan Tran DTung Phung DManh Duong DInoue KLee JNguyen A(2024)Privacy-Preserving Face and Hair Swapping in Real-Time With a GAN-Generated Face ImageIEEE Access10.1109/ACCESS.2024.342045212(179265-179280)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3420452
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents