Generated Image Editing Method Based on Global-Local Jacobi Disentanglement for Machine Learning
<p>Generated human face image editing on semantic attributes of hairstyle, age, pose, and gender respectively.</p> "> Figure 2
<p>Entanglement of gender attributes with other semantic attributes in SefaGAN.</p> "> Figure 3
<p>The overall architecture of our proposed generated image editing method.</p> "> Figure 4
<p>(<bold>a</bold>) Training time of Jacobi orthogonal regularization search with different initialization methods. (<bold>b</bold>) The proportion of effective directions obtained by Jacobi orthogonal regularization search method.</p> "> Figure 5
<p>(<bold>a</bold>) Comparison of training time for loss convergence. (<bold>b</bold>) Comparison of the proportion of effective semantic editing directions.</p> "> Figure 6
<p>Entanglement of glasses attributes and age attributes in face editing.</p> "> Figure 7
<p>Semantic attribute local and non-local area examples of face mouth, eye, and hair areas.</p> "> Figure 8
<p>The result of editing the semantic attribute of glasses without local contrast regularization.</p> "> Figure 9
<p>The result of editing the semantic attribute of glasses with local contrast regularization.</p> "> Figure 10
<p>FFHQ dataset faces gender semantic attribute editing comparison results.</p> "> Figure 11
<p>FFHQ dataset faces age semantic attribute editing comparison results.</p> "> Figure 12
<p>LSUNCat dataset rotates semantic attribute editing comparison results.</p> "> Figure 13
<p>LSUNCat dataset Coat color semantic attribute editing comparison results.</p> "> Figure 14
<p>Local Jacobi disentangled method editing results for the FFHQ dataset.</p> "> Figure 15
<p>Local Jacobi disentangled method editing results for the LSUNCat dataset.</p> ">
Abstract
:1. Introduction
- A new global Jacobi orthogonal regularization search semantic direction set initialization method is designed, using the semantic direction vector of the weight matrix eigendecomposition as the initial vector, which improves the search speed and reduces the proportion of ineffective search directions.
- A local Jacobi disentangled method is proposed to discover more accurate image editing directions by limiting the search area and designing a contrast regularized loss function.
- Experiments on the datasets FFHQ and LSUNCat show that our method achieves optimality in semantic attribute disentangled metrics compared to existing unsupervised generated image editing methods, and is also able to discover more accurate image editing directions.
2. Materials and Methods
2.1. Global Jacobian Disentangled Method
2.1.1. Motivation of the Method
2.1.2. Method Feasibility Validation
2.1.3. Principle of Global Jacobi Disentangled Method
2.2. Local Jacobian Disentangled Method
2.2.1. Local Jacobi Orthogonal Regularization Search Algorithm
2.2.2. Local Contrast Regularized Loss Function
3. Results
3.1. Experimental Details
- Datasets. In this paper, we use the mainstream face dataset FFHQ and the cat dataset LSUNCat, both of which are the primary evaluation dataset for most generated image attribute edits methods. Among them, the FFHQ dataset has a large number of images and a pure background of face images. Due to the performance limitation of the computing platform, the FFHQ face dataset is down-sampled from 1024 × 1024 resolution to 512 × 512 resolution with 70 K images. the LSUNCat dataset has 256 × 256 resolution with the same 70 K images.
- Parameter setting. The total number of iterations for Jacobi orthogonal regularization training is 5 × 104 for the FFHQ dataset and 4 × 104 for the LSUNCat dataset. The number of column vectors of the initialization matrix in the direction of semantic attributes is 40, and the local and non-local regularization training balance parameters , are taken as 0.6 and 0.4, respectively.
- Experimental environment. The code is executed on Ubuntu 18.04 with Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz and GeForce RTX 2080Ti×2. The deep learning framework is Pytorch.
- Evaluation Metrics. The perceptual path distance (PPL) [40] is used as a metric to measure the performance of semantic attribute disentanglement. This metric describes how drastically the image changes when the intermediate latent space is interpolated along a certain direction, and its small value represents a relatively smooth latent space and low entanglement. Referring to STIA-WO [22] the PPL value is calculated for the intermediate latent space with a certain range of a sampling point along its orthogonal semantic attribute direction, instead of randomly sampling two latent spaces for calculating the PPL value as
- where = 10−4 represents the range moved during editing, is the perceptual distance between the two generated images [41], and represents the generator. The two sampling points corresponding to the images are the intermediate latent space and its points along the unit attribute direction shift .
3.2. Experimental Results Comparison
4. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chen, C.; Wang, C.; Liu, B.; He, C.; Cong, L.; Wan, S. Edge Intelligence Empowered Vehicle Detection and Image Segmentation for Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 1–12. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, C.; Liu, L.; Lan, D.; Jiang, H.; Wan, S. Aerial Edge Computing on Orbit: A Task Offloading and Allocation Scheme. IEEE Trans. Netw. Sci. Eng. 2023, 10, 275–285. [Google Scholar] [CrossRef]
- Chen, C.; Yao, G.; Wang, C.; Goudos, S.; Wan, S. Enhancing the Robustness of Object Detection via 6G Vehicular Edge Computing. Digit. Commun. Netw. 2022, 8, 923–931. [Google Scholar] [CrossRef]
- Jia, F.; Li, J.; Chen, L.; Li, N. A BUS-aided RSU Access Scheme Based on SDN and Evolutionary Game in the Internet of Vehicle. Int. J. Commun. Syst. 2022, 35, e3932. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive Growing of Gans for Improved Quality, Stability, and Variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of Stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
- Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Shoshan, A.; Bhonker, N.; Kviatkovsky, I.; Medioni, G. Gan-Control: Explicitly Controllable Gans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14083–14093. [Google Scholar]
- Shen, Y.; Gu, J.; Tang, X.; Zhou, B. Interpreting the Latent Space of Gans for Semantic Face Editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9243–9252. [Google Scholar]
- Suzuki, R.; Koyama, M.; Miyato, T.; Yonetsuji, T.; Zhu, H. Spatially Controllable Image Synthesis with Internal Representation Collaging. arXiv 2018, arXiv:1811.10153. [Google Scholar]
- Shi, Y.; Yang, X.; Wan, Y.; Shen, X. SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing. arXiv 2021, arXiv:2112.02236. [Google Scholar]
- Zhang, G.; Kan, M.; Shan, S.; Chen, X. Generative Adversarial Network with Spatial Attention for Face Attribute Editing. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 417–432. [Google Scholar]
- Jahanian, A.; Chai, L.; Isola, P. On the “Steerability” of Generative Adversarial Networks. arXiv 2019, arXiv:1907.07171. [Google Scholar]
- Plumerault, A.; Borgne, H.L.; Hudelot, C. Controlling Generative Models with Continuous Factors of Variations. arXiv 2020, arXiv:2001.10238. [Google Scholar]
- Härkönen, E.; Hertzmann, A.; Lehtinen, J.; Paris, S. Ganspace: Discovering Interpretable Gan Controls. Adv. Neural. Inf. Process. Syst. 2020, 33, 9841–9850. [Google Scholar]
- Abdi, H.; Williams, L.J. Principal Component Analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
- Shen, Y.; Zhou, B. Closed-Form Factorization of Latent Semantics in Gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1532–1540. [Google Scholar]
- Liu, M.; Wei, Y.; Wu, X.; Zuo, W.; Zhang, L. A Survey on Leveraging Pre-Trained Generative Adversarial Networks for Image Editing and Restoration. arXiv 2022, arXiv:2207.10309. [Google Scholar]
- Liu, K.; Cao, G.; Zhou, F.; Liu, B.; Duan, J.; Qiu, G. Towards Disentangling Latent Space for Unsupervised Semantic Face Editing. IEEE Trans. Image Process. 2022, 31, 1475–1489. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.; Fu, R.; Ai, X.; Huang, C.; Cong, L.; Li, X.; Jiang, J.; Pei, Q. An Integrated Method for River Water Level Recognition from Surveillance Images Using Convolution Neural Networks. Remote Sens. 2022, 14, 6023. [Google Scholar] [CrossRef]
- Chen, C.; Jiang, J.; Liao, Z.; Zhou, Y.; Wang, H.; Pei, Q. A Short-Term Flood Prediction Based on Spatial Deep Learning Network: A Case Study for Xi County, China. J. Hydrol. 2022, 607, 127535. [Google Scholar] [CrossRef]
- Zhang, J.; Yu, X.; Wang, B.; Chen, C. Unsupervised Generated Image Editing Method Based on Multi-Scale Hierarchical Disentanglement. In Proceedings of the 2022 IEEE International Conference on Smart Internet of Things (SmartIoT), Suzhou, China, 19–21 August 2022; pp. 185–191. [Google Scholar]
- Zhu, J.; Feng, R.; Shen, Y.; Zhao, D.; Zha, Z.-J.; Zhou, J.; Chen, Q. Low-Rank Subspaces in Gans. Adv. Neural. Inf. Process. Syst. 2021, 34, 16648–16658. [Google Scholar]
- Peebles, W.; Peebles, J.; Zhu, J.-Y.; Efros, A.; Torralba, A. The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 581–597. [Google Scholar]
- Wei, Y.; Shi, Y.; Liu, X.; Ji, Z.; Gao, Y.; Wu, Z.; Zuo, W. Orthogonal Jacobian Regularization for Unsupervised Disentanglement in Image Generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6721–6730. [Google Scholar]
- Ramesh, A.; Choi, Y.; LeCun, Y. A Spectral Regularizer for Unsupervised Disentanglement. arXiv 2018, arXiv:1812.01161. [Google Scholar]
- Liu, Y.; Li, Q.; Deng, Q.; Sun, Z.; Yang, M.-H. GAN-Based Facial Attribute Manipulation. arXiv 2022, arXiv:2210.12683. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Spruill, M. Asymptotic Distribution of Coordinates on High Dimensional Spheres. Electron. Commun. Probab. 2007, 12, 234–247. [Google Scholar] [CrossRef]
- Liu, B.; Zhu, Y.; Fu, Z.; de Melo, G.; Elgammal, A. Oogan: Disentangling Gan with One-Hot Sampling and Orthogonal Regularization. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 4836–4843. [Google Scholar]
- Liu, X.; Sanchez, P.; Thermos, S.; O’Neil, A.Q.; Tsaftaris, S.A. Learning Disentangled Representations in the Imaging Domain. Med. Image Anal. 2022, 80, 102516. [Google Scholar] [CrossRef]
- Lee, C.-H.; Liu, Z.; Wu, L.; Luo, P. Maskgan: Towards Diverse and Interactive Facial Image Manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5549–5558. [Google Scholar]
- Collins, E.; Bala, R.; Price, B.; Susstrunk, S. Editing in Style: Uncovering the Local Semantics of Gans. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5771–5780. [Google Scholar]
- Pajouheshgar, E.; Zhang, T.; Süsstrunk, S. Optimizing Latent Space Directions for Gan-Based Local Image Editing. In Proceedings of the ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 1740–1744. [Google Scholar]
- Zhu, J.; Shen, Y.; Zhao, D.; Zhou, B. In-Domain Gan Inversion for Real Image Editing. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 592–608. [Google Scholar]
- Chen, C.; Zeng, Y.; Li, H.; Liu, Y.; Wan, S. A Multi-Hop Task Offloading Decision Model in MEC-Enabled Internet of Vehicles. IEEE Internet Things J. 2022, 1. [Google Scholar] [CrossRef]
- Odena, A.; Buckman, J.; Olsson, C.; Brown, T.; Olah, C.; Raffel, C.; Goodfellow, I. Is Generator Conditioning Causally Related to GAN Performance? In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 3849–3858. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
- Wu, Z.; Lischinski, D.; Shechtman, E. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Melnik, A.; Miasayedzenkau, M.; Makarovets, D.; Pirshtuk, D.; Akbulut, E.; Holzmann, D.; Renusch, T.; Reichert, G.; Ritter, H. Face Generation and Editing with StyleGAN: A Survey. arXiv 2022, arXiv:2212.09102. [Google Scholar]
Pose | Age | Gender | Hairstyle | Face Color | |
---|---|---|---|---|---|
Cosine Similarity | 0.91 | 0.89 | 0.86 | 0.87 | 0.84 |
Model | Pose | Age | Gender | Hairstyle | Face Color |
---|---|---|---|---|---|
SefaGAN | 0.84 | 1.01 | 0.98 | 0.94 | 0.90 |
PCAGAN | 0.76 | 0.96 | 0.92 | 0.87 | 0.89 |
OroJaRGAN | 0.70 | 0.85 | 0.84 | 0.80 | 0.84 |
Our Method | 0.69 | 0.87 | 0.82 | 0.76 | 0.77 |
Model | Rotate | Scale | Coat Color |
---|---|---|---|
SefaGAN | 0.82 | 0.73 | 0.71 |
PCAGAN | 0.76 | 0.68 | 0.65 |
OroJaRGAN | 0.57 | 0.49 | 0.41 |
Our Method | 0.56 | 0.47 | 0.40 |
Model | Glasses | Mouth | Hairstyle | Hair Color | Average |
---|---|---|---|---|---|
Non-regularization | 8.92 | 12.56 | 3.12 | 4.89 | 7.37 |
Local contrast regularization | 0.83 | 0.72 | 0.76 | 0.69 | 0.75 |
Model | Head Pose | Head Color | Abdominal Color | Average |
---|---|---|---|---|
Non-regularization | 6.46 | 8.56 | 9.12 | 8.05 |
Local contrast regularization | 0.43 | 0.52 | 0.46 | 0.52 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, J.; Yu, X.; Wang, B.; Chen, C. Generated Image Editing Method Based on Global-Local Jacobi Disentanglement for Machine Learning. Sensors 2023, 23, 1815. https://doi.org/10.3390/s23041815
Zhang J, Yu X, Wang B, Chen C. Generated Image Editing Method Based on Global-Local Jacobi Disentanglement for Machine Learning. Sensors. 2023; 23(4):1815. https://doi.org/10.3390/s23041815
Chicago/Turabian StyleZhang, Jianlong, Xincheng Yu, Bin Wang, and Chen Chen. 2023. "Generated Image Editing Method Based on Global-Local Jacobi Disentanglement for Machine Learning" Sensors 23, no. 4: 1815. https://doi.org/10.3390/s23041815
APA StyleZhang, J., Yu, X., Wang, B., & Chen, C. (2023). Generated Image Editing Method Based on Global-Local Jacobi Disentanglement for Machine Learning. Sensors, 23(4), 1815. https://doi.org/10.3390/s23041815