Abstract
Regional facial image synthesis conditioned on a semantic mask has achieved great attention in the field of computational visual media. However, the appearances of different regions may be inconsistent with each other after performing regional editing. In this paper, we focus on harmonized regional style transfer for facial images. A multi-scale encoder is proposed for accurate style code extraction. The key part of our work is a multi-region style attention module. It adapts multiple regional style embeddings from a reference image to a target image, to generate a harmonious result. We also propose style mapping networks for multi-modal style synthesis. We further employ an invertible flow model which can serve as mapping network to fine-tune the style code by inverting the code to latent space. Experiments on three widely used face datasets were used to evaluate our model by transferring regional facial appearance between datasets. The results show that our model can reliably perform style transfer and multi-modal manipulation, generating output comparable to the state of the art.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Isola, P.; Zhu, J. -Y.; Zhou, T.; Efros, A. A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5967–5976, 2017.
Zhu, J.-Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A. A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 465–476, 2017.
Chen, Q.; Koltun, V. Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, 1520–1529, 2017.
Wang, T.-C.; Liu, M.-Y.; Zhu, J.-Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8798–8807, 2018.
Park, T.; Liu, M.-Y.; Wang, T.-C.; Zhu, J.-Y. Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2332–2341, 2019.
Zhu, Z.; Xu, Z.; You, A.; Bai, X. Semantically multi-modal image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5466–5475, 2020.
Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5103–5112, 2020.
Yang, D.; Hong, S.; Jang, Y.; Zhao, T.; Lee, H. Diversity sensitive conditional generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2019.
Gu, S.; Bao, J.; Yang, H.; Chen, D.; Wen, F.; Yuan, L. Mask-guided portrait editing with conditional GANs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3431–3440, 2019.
Lee, C.-H.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5549–5558, 2020.
Wang, M.; Yang, G.-Y.; Li, R.; Liang, R.-Z.; Zhang, S.-H.; Hall, P. M.; Hu, S.-M. Example-guided style-consistent image synthesis from semantic labeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1495–1504, 2019.
Choi, Y.; Uh, Y.; Yoo, J.; Ha, J. W. StarGAN v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8185–8194, 2020.
Karras, T.; Laine, S.; Aila, T. M. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4396–4405, 2019.
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of StyleGAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8107–8116, 2020.
Kingma, D. P.; Welling, M. Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations, 2014.
Cun, X.; Pun, C.-M. Improving the harmony of the composite image by spatial-separated attention module. IEEE Transactions on Image Processing Vol. 29, 4759–4771, 2020.
Cong, W. Y.; Zhang, J. F.; Niu, L.; Liu, L.; Ling, Z. X.; Li, W. Y.; Zhang, L. DoveNet: Deep image harmonization via domain verification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8391–8400, 2020.
Tsai, Y.-H.; Shen, X.; Lin, Z.; Sunkavalli, K.; Lu, X.; Yang, M.-H. Deep image harmonization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2799–2807, 2017.
Zhu, J.-Y.; Krahenbuhl, P.; Shechtman, E.; Efros, A. A. Learning a discriminative model for the perception of realism in composite images. In: Proceedings of the IEEE International Conference on Computer Vision, 3943–3951, 2015.
Richardson, E.; Alaluf, Y.; Patashnik, O.; Nitzan, Y.; Azar, Y.; Shapiro, S.; Cohen-Or, D. Encoding in style: A StyleGAN encoder for image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2287–2296, 2021.
Chen, R. T. Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural ordinary differential equations. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 6572–6583, 2018.
Grathwohl, W.; Chen, R. T. Q.; Bettencourt, J.; Sutskever, I.; Duvenaud, D. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In: Proceedings of the International Conference on Learning Representations, 2018.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 3, 2672–2680, 2014.
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 214–223, 2017.
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.
Denton, E.; Chintala, S.; Szlam, A.; Fergus, R. Deep generative image models using a Laplacian pyramid of adversarial networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 1, 1486–1494, 2015.
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 5769–5779, 2017.
Mao, X.; Li, Q.; Xie, H.; Lau, R. Y.; Wang, Z.; Smolley, S. P. Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2813–2821, 2017.
Zhang, H.; Goodfellow, I. J.; Metaxas, D. N.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the 36th International Conference on Machine Learning, 7354–7363, 2019.
Portenier, T.; Hu, Q.; Szabo, A.; Bigdeli, S. A.; Favaro, P.; Zwicker, M. Faceshop: Deep sketch-based face image editing. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 99, 2018.
Chen, S.-Y.; Liu, F.-L.; Lai, Y.-K.; Rosin, P. L.; Li, C.; Fu, H.; Gao, L. DeepFaceEditing: Deep face generation and editing with disentangled geometry and appearance control. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 90, 2021.
Tan, Z.; Chai, M.; Chen, D.; Liao, J.; Chu, Q.; Yuan, L.; Tulyakov, S.; Yu, N. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 95, 2020.
Huang, Z.; Peng, Y.; Hibino, T.; Zhao, C.; Xie, H.; Fukusato, T.; Miyata, K. DualFace: Two-stage drawing guidance for freehand portrait sketching. Computational Visual Media Vol. 8, No. 1, 63–77, 2022.
Shen, Y. J.; Gu, J. J.; Tang, X. O.; Zhou, B. L. Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9240–9249, 2020.
Shen, Y. J.; Zhou, B. L. Closed-form factorization of latent semantics in GANs. arXiv preprint arXiv:2007.06600, 2020.
Tewari, A.; Elgharib, M.; Bharaj, G.; Bernard, F.; Seidel, H.-P.; Perez, P.; Zollhofer, M.; Theobalt, C. StyleRig: Rigging StyleGAN for 3D control over portrait images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6141–6150, 2020.
Abdal, R.; Zhu, P. H.; Mitra, N.; Wonka, P. StyleFlow: Attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. arXiv preprint arXiv:2008.02401, 2020.
Rezende, D.; Mohamed, S. Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, 1530–1538, 2015.
Zhu, J.; Shen, Y.; Zhao, D.; Zhou, B. In-domain GAN inversion for real image editing. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12362. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 592–608, 2020.
Sun, R. Q.; Huang, C.; Zhu, H. L.; Ma, L. Z. Mask-aware photorealistic facial attribute manipulation. Computational Visual Media Vol. 7, No. 3, 363–374, 2021.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–6010, 2017.
Wang, X.; Girshick, R.; Gupta, A.; He, K. Nonlocal neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7794–7803, 2018.
Zhang, P.; Zhang, B.; Chen, D.; Yuan, L.; Wen, F. Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5142–5152, 2020.
Lee, J.; Kim, E.; Lee, Y.; Kim, D.; Chang, J.; Choo, J. Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5800–5809, 2020.
Jiang, W.; Liu, S.; Gao, C.; Cao, J.; He, R.; Feng, J.; Yan, S. PSGAN: Pose and expression robust spatial-aware GAN for customizable makeup transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5193–5201, 2020.
Huang, X.; Liu, M. Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 179–196, 2018.
Lee, H.-Y.; Tseng, H.-Y.; Huang, J.-B.; Singh, M. K.; Yang, M.-H. Diverse image-to-image translation via disentangled representations. In: Proceedings of the European Conference on Computer Vision, 2018.
Cun, X.; Pun, C.-M. Improving the harmony of the composite image by spatial-separated attention module. IEEE Transactions on Image Processing Vol. 29, 4759–4771, 2020.
Yang, G. D.; Huang, X.; Hao, Z. K.; Liu, M. Y.; Belongie, S.; Hariharan, B. PointFlow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4540–4549, 2019.
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
Liu, Y.; Shi, H.; Shen, H.; Si, Y.; Wang, X.; Mei, T. A new dataset and boundary-attention semantic segmentation for face parsing. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 07, 11637–11644, 2020.
Tan, Z.; Chen, D.; Chu, Q.; Chai, M.; Liao, J.; He, M.; Yuan, L.; Hua, G.; Yu, N. Efficient semantic image synthesis via class-adaptive normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 9, 4852–4866, 2022.
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.
Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 586–595, 2018.
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In: Proceedings of the International Conference on Learning Representations, 2018.
Kingma, D. P.; Ba, J. L. Adam: A method for stochastic optimization. In: Proceedings of the 3rd International Conference for Learning Representations, 2015.
Acknowledgements
This work was partly supported by the National Key R&D Program of China (No. 2020YFA0714100) and the National Natural Science Foundation of China (Nos. 61872162, 62102162, 61832016, U20B2070).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Declaration of competing interest
The authors have no competing interests to declare that are relevant to the content of this article.
Cong Wang is a Ph.D. candidate in the School of Mathematics at Jilin University. His research interests include image processing, computer vision, and deep learning.
Fan Tang is an assistant professor in the School of Artificial Intelligence, Jilin University. He received his Ph.D. degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, in 2019. His research interests include computer graphics, computer vision, and machine learning.
Yong Zhang received his Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences in 2018. From 2015 to 2017, he was a visiting scholar with the Rensselaer Polytechnic Institute. He is currently with the Tencent AI Lab. His research interests include computer vision and machine learning.
Tieru Wu is a professor in the Institute of Mathematics at Jilin University. His research interests include techniques of computer graphics, geometry processing, and machine learning. His research is supported in part by the National Natural Science Foundation of China.
Weiming Dong is a professor in the Sino-European Lab in Computer Science, Automation and Applied Mathematics (LIAMA) and the National Laboratory of Pattern Recognition (NLPR) at the Institute of Automation, Chinese Academy of Sciences. He received his B.Sc. and M.Sc. degrees in computer science in 2001 and 2004 respectively, both from Tsinghua University. He received his Ph.D. degree in computer science from the University of Lorraine, France, in 2007. His research interests include image synthesis and image recognition. He is a member of the ACM and IEEE.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Wang, C., Tang, F., Zhang, Y. et al. Towards harmonized regional style transfer and manipulation for facial images. Comp. Visual Media 9, 351–366 (2023). https://doi.org/10.1007/s41095-022-0284-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-022-0284-6