[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

CutGAN: dual-Branch generative adversarial network for paper-cut image generation

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Chinese paper-cutting, as an ancient folk art, is facing difficulties in preserving and passing down its traditions due to a lack of skilled paper-cut artists. In contrast to other image generation tasks, paper-cut images not only necessitate symmetry and exaggeration but also demand a certain level of resemblance to human facial features. To address these issues, this paper proposes a dual-branch generative adversarial network model for automatically generating paper-cut images, referred to as CutGAN. Specifically, we first construct a paper-cut dataset consisting of 891 pairs of facial images and handcrafted paper-cut images to train and evaluate CutGAN. Next, during the pre-training phase, we utilize gender and eyeglasses recognition tasks to train the fixed encoder. In the fine-tuning phase, we design a flexible encoder based on the modified U-net structure without skip connections. Furthermore, we introduce an average face loss to augment the diversity and improve the quality of the generated paper-cut images. We conducted extensive qualitative and quantitative experiments, as well as ablation experiments, comparing CutGAN with state-of-the-art baseline models on the test set. The experimental results indicate that CutGAN outperforms other image translation models by generating paper-cut images that more accurately capture the essence of Chinese paper-cut art and closely resemble actual facial images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data used in this study are privately held and not publicly available.

References

  1. Xu X, Liu S (2023) The Wonderful Art Hidden in the Boudoir—Analysis of the Paper-cut Art of Qiang Embroidery. Int J Front Sociol 5 (3). https://doi.org/10.25236/IJFS.2023.050301

  2. Islam MR, Arafat E (2023) Exploring the Application of Paper-Cutting in Teaching Chinese as a Foreign Language: A Preliminary Study. Eur J Sci Innov Technol 3(1):219–223

    Google Scholar 

  3. Karetzky PE (2022) Xin Song and Her Transformation of the Traditional Practice of Paper Cutting. The J Asian Arts Aesthet 8:75–92

    Google Scholar 

  4. Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 7. pp 6101–6109. https://doi.org/10.1609/aaai.v35i7.16760

  5. Yan L, Wang Q, Ma S, Wang J, Yu C (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406

    Article  Google Scholar 

  6. Cheng Z, Liang J, Choi H, Tao G, Cao Z, Liu D, Zhang X (2022) Physical attack on monocular depth estimation with optimal adversarial patches. European Conference on Computer Vision. Springer, pp 514–532

    Google Scholar 

  7. Yan L, Ma S, Wang Q, Chen Y, Zhang X, Savakis A, Liu D (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst Video Technol 32(10):6642–6656

    Article  Google Scholar 

  8. Chandaliya PK, Nain N (2023) AW-GAN: face aging and rejuvenation using attention with wavelet GAN. Neural Comput Appl 35(3):2811–2825

    Article  Google Scholar 

  9. Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 8789–8797. https://doi.org/10.48550/arXiv.1711.09020

  10. Pandey N, Savakis A (2020) Poly-GAN: Multi-conditioned GAN for fashion synthesis. Neurocomputing 414:356–364

    Article  Google Scholar 

  11. Zhang P, Zhang B, Chen D, Yuan L, Wen F (2020) Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Seattle, WA, pp 5143–5153

  12. Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, pp 1125–1134

  13. Li B, Zhu Y, Wang Y, Lin C-W, Ghanem B, Shen L (2021) Anigan: Style-guided generative adversarial networks for unsupervised anime face generation. IEEE Trans Multimed 24:4077–4091

    Article  Google Scholar 

  14. Kim J, Kim M, Kang H, Lee K (2019) U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:190710830. https://doi.org/10.48550/arXiv.1907.10830

  15. Shi Y, Deb D, Jain AK (2019) Warpgan: automatic caricature generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, Long Beach, CA, pp 10762–10771

  16. Back J (2021) Fine-tuning stylegan2 for cartoon face generation. arXiv preprint arXiv:210612445. https://doi.org/10.48550/arXiv.2106.12445

  17. Peng X, Peng S, Hu Q, Peng J, Wang J, Liu X, Fan J (2022) Contour-enhanced CycleGAN framework for style transfer from scenery photos to Chinese landscape paintings. Neural Comput Appl 34(20):18075–18096

    Article  Google Scholar 

  18. Zhao J, Lee F, Hu C, Yu H, Chen Q (2022) LDA-GAN: Lightweight domain-attention GAN for unpaired image-to-image translation. Neurocomputing 506:355–368

    Article  Google Scholar 

  19. Yu J, Xu X, Gao F, Shi S, Wang M, Tao D, Huang Q (2020) Toward realistic face photo–sketch synthesis via composition-aided GANs. IEEE Trans Cybern 51(9):4350–4362

    Article  Google Scholar 

  20. Yan L, Han C, Xu Z, Liu D, Wang Q (2022) Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration for Video Captioning. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI). International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2023/180

  21. Cheng Z, Liang J, Tao G, Liu D, Zhang X (2023) Adversarial training of self-supervised monocular depth estimation against physical-world attacks. arXiv preprint arXiv:230113487. https://doi.org/10.48550/arXiv.2301.13487

  22. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Article  MathSciNet  Google Scholar 

  23. Li M, Huang H, Ma L, Liu W, Zhang T, Jiang Y (2018) Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: Proceedings of the European conference on computer vision (ECCV). ECCV, pp 184–199. https://doi.org/10.1007/978-3-030-01240-3_12

  24. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:171010196. https://doi.org/10.48550/arXiv.1710.10196

  25. Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:14111784. https://doi.org/10.48550/arXiv.1411.1784

  26. Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. IEEE, Venice, Italy, pp 2223–2232

  27. Zhao J, Ma Y, Yong K, Zhu M, Wang Y, Luo Z, Wei X, Huang X (2023) Deep-learning-based automatic evaluation of rice seed germination rate. J Sci Food Agric 103(4):1912–1924

    Article  Google Scholar 

  28. Zhao J, Ma Y, Yong K, Zhu M, Wang Y, Wang X, Li W, Wei X, Huang X (2023) Rice seed size measurement using a rotational perception deep learning model. Comput Electron Agric 205:107583

    Article  Google Scholar 

  29. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, pp 234–241

    Google Scholar 

  30. Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, UT, pp 8798–8807

  31. Zhu J-Y, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. Adv Neural Inform Process Syst 30. https://doi.org/10.48550/arXiv.1711.11586

  32. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919

  33. Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV). IEEE, pp 172–189. https://doi.org/10.1007/978-3-030-01219-9_11

  34. Liang H, Fu W, Yi F (2019) A survey of recent advances in transfer learning. In: 2019 IEEE 19th international conference on communication technology (ICCT), IEEE, pp 1516–1523

  35. Wang Y, Wu C, Herranz L, Van de Weijer J, Gonzalez-Garcia A, Raducanu B (2018) Transferring gans: generating images from limited data. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 218–234. https://doi.org/10.1007/978-3-030-01231-1_14

  36. Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, pp 7167–7176

  37. Cai G, Wang Y, He L, Zhou M (2019) Unsupervised domain adaptation with adversarial residual transform networks. IEEE Trans Neural Netw Learn Syst 31(8):3073–3086

    Article  Google Scholar 

  38. Wang Y, Gonzalez-Garcia A, Berga D, Herranz L, Khan FS, Weijer JVD (2020) Minegan: effective knowledge transfer from gans to target domains with few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Seattle, WA, pp 9332–9341

  39. Xiong L, Karlekar J, Zhao J, Cheng Y, Xu Y, Feng J, Pranata S, Shen S (2017) A good practice towards top performance of face recognition: transferred deep feature fusion. arXiv preprint arXiv:170400438. https://doi.org/10.48550/arXiv.1704.00438

  40. Mo S, Cho M, Shin J (2020) Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:200210964. https://doi.org/10.48550/arXiv.2002.10964

  41. Ojha U, Li Y, Lu J, Efros AA, Lee YJ, Shechtman E, Zhang R (2021) Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, pp 10743–10752

  42. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, NV, pp 770–778

  43. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision. IEEE, Santiago, Chile, pp 3730–3738

  44. King DE (2009) Dlib-ml: A machine learning toolkit. The J Mach Learn Res 10:1755–1758

    Google Scholar 

  45. Talebi H, Milanfar P (2018) NIMA: Neural image assessment. IEEE Trans Image Process 27(8):3998–4011

    Article  MathSciNet  Google Scholar 

  46. Murray N, Marchesotti L, Perronnin F (2012) AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 2408–2415

  47. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, vol 30. Curran Associates Inc., Red Hook, NY

  48. Bińkowski M, Sutherland DJ, Arbel M, Gretton A (2018) Demystifying MMD gans. arXiv preprint arXiv:180101401. https://doi.org/10.48550/arXiv.1801.01401

  49. Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, UT, pp 586–595

  50. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, Miami, FL, pp 248–255

  51. Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, New Orleans, LA, pp 10684–10695

Download references

Acknowledgements

This work is partially supported by the National Natural Science Foundation of China (Grant no. 61373004).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lijun Yan or Yan Ma.

Ethics declarations

Ethical Approval

The use of facial images in this research has been conducted with the informed consent of the individuals depicted.

Conflict of interests

The authors declare that there is no conflict of interests regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, Y., Yan, L., Hou, Z. et al. CutGAN: dual-Branch generative adversarial network for paper-cut image generation. Multimed Tools Appl 83, 55867–55888 (2024). https://doi.org/10.1007/s11042-023-17746-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17746-z

Keywords

Navigation