Abstract
Chinese paper-cutting, as an ancient folk art, is facing difficulties in preserving and passing down its traditions due to a lack of skilled paper-cut artists. In contrast to other image generation tasks, paper-cut images not only necessitate symmetry and exaggeration but also demand a certain level of resemblance to human facial features. To address these issues, this paper proposes a dual-branch generative adversarial network model for automatically generating paper-cut images, referred to as CutGAN. Specifically, we first construct a paper-cut dataset consisting of 891 pairs of facial images and handcrafted paper-cut images to train and evaluate CutGAN. Next, during the pre-training phase, we utilize gender and eyeglasses recognition tasks to train the fixed encoder. In the fine-tuning phase, we design a flexible encoder based on the modified U-net structure without skip connections. Furthermore, we introduce an average face loss to augment the diversity and improve the quality of the generated paper-cut images. We conducted extensive qualitative and quantitative experiments, as well as ablation experiments, comparing CutGAN with state-of-the-art baseline models on the test set. The experimental results indicate that CutGAN outperforms other image translation models by generating paper-cut images that more accurately capture the essence of Chinese paper-cut art and closely resemble actual facial images.
Similar content being viewed by others
Data availability
The data used in this study are privately held and not publicly available.
References
Xu X, Liu S (2023) The Wonderful Art Hidden in the Boudoir—Analysis of the Paper-cut Art of Qiang Embroidery. Int J Front Sociol 5 (3). https://doi.org/10.25236/IJFS.2023.050301
Islam MR, Arafat E (2023) Exploring the Application of Paper-Cutting in Teaching Chinese as a Foreign Language: A Preliminary Study. Eur J Sci Innov Technol 3(1):219–223
Karetzky PE (2022) Xin Song and Her Transformation of the Traditional Practice of Paper Cutting. The J Asian Arts Aesthet 8:75–92
Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2021) Densernet: Weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 7. pp 6101–6109. https://doi.org/10.1609/aaai.v35i7.16760
Yan L, Wang Q, Ma S, Wang J, Yu C (2022) Solve the puzzle of instance segmentation in videos: A weakly supervised framework with spatio-temporal collaboration. IEEE Trans Circuits Syst Video Technol 33(1):393–406
Cheng Z, Liang J, Choi H, Tao G, Cao Z, Liu D, Zhang X (2022) Physical attack on monocular depth estimation with optimal adversarial patches. European Conference on Computer Vision. Springer, pp 514–532
Yan L, Ma S, Wang Q, Chen Y, Zhang X, Savakis A, Liu D (2022) Video captioning using global-local representation. IEEE Trans Circuits Syst Video Technol 32(10):6642–6656
Chandaliya PK, Nain N (2023) AW-GAN: face aging and rejuvenation using attention with wavelet GAN. Neural Comput Appl 35(3):2811–2825
Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, pp 8789–8797. https://doi.org/10.48550/arXiv.1711.09020
Pandey N, Savakis A (2020) Poly-GAN: Multi-conditioned GAN for fashion synthesis. Neurocomputing 414:356–364
Zhang P, Zhang B, Chen D, Yuan L, Wen F (2020) Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Seattle, WA, pp 5143–5153
Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, pp 1125–1134
Li B, Zhu Y, Wang Y, Lin C-W, Ghanem B, Shen L (2021) Anigan: Style-guided generative adversarial networks for unsupervised anime face generation. IEEE Trans Multimed 24:4077–4091
Kim J, Kim M, Kang H, Lee K (2019) U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv preprint arXiv:190710830. https://doi.org/10.48550/arXiv.1907.10830
Shi Y, Deb D, Jain AK (2019) Warpgan: automatic caricature generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, Long Beach, CA, pp 10762–10771
Back J (2021) Fine-tuning stylegan2 for cartoon face generation. arXiv preprint arXiv:210612445. https://doi.org/10.48550/arXiv.2106.12445
Peng X, Peng S, Hu Q, Peng J, Wang J, Liu X, Fan J (2022) Contour-enhanced CycleGAN framework for style transfer from scenery photos to Chinese landscape paintings. Neural Comput Appl 34(20):18075–18096
Zhao J, Lee F, Hu C, Yu H, Chen Q (2022) LDA-GAN: Lightweight domain-attention GAN for unpaired image-to-image translation. Neurocomputing 506:355–368
Yu J, Xu X, Gao F, Shi S, Wang M, Tao D, Huang Q (2020) Toward realistic face photo–sketch synthesis via composition-aided GANs. IEEE Trans Cybern 51(9):4350–4362
Yan L, Han C, Xu Z, Liu D, Wang Q (2022) Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration for Video Captioning. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI). International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2023/180
Cheng Z, Liang J, Tao G, Liu D, Zhang X (2023) Adversarial training of self-supervised monocular depth estimation against physical-world attacks. arXiv preprint arXiv:230113487. https://doi.org/10.48550/arXiv.2301.13487
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Li M, Huang H, Ma L, Liu W, Zhang T, Jiang Y (2018) Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks. In: Proceedings of the European conference on computer vision (ECCV). ECCV, pp 184–199. https://doi.org/10.1007/978-3-030-01240-3_12
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:171010196. https://doi.org/10.48550/arXiv.1710.10196
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:14111784. https://doi.org/10.48550/arXiv.1411.1784
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. IEEE, Venice, Italy, pp 2223–2232
Zhao J, Ma Y, Yong K, Zhu M, Wang Y, Luo Z, Wei X, Huang X (2023) Deep-learning-based automatic evaluation of rice seed germination rate. J Sci Food Agric 103(4):1912–1924
Zhao J, Ma Y, Yong K, Zhu M, Wang Y, Wang X, Li W, Wei X, Huang X (2023) Rice seed size measurement using a rotational perception deep learning model. Comput Electron Agric 205:107583
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, UT, pp 8798–8807
Zhu J-Y, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. Adv Neural Inform Process Syst 30. https://doi.org/10.48550/arXiv.1711.11586
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 4401–4410. https://doi.org/10.1109/TPAMI.2020.2970919
Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV). IEEE, pp 172–189. https://doi.org/10.1007/978-3-030-01219-9_11
Liang H, Fu W, Yi F (2019) A survey of recent advances in transfer learning. In: 2019 IEEE 19th international conference on communication technology (ICCT), IEEE, pp 1516–1523
Wang Y, Wu C, Herranz L, Van de Weijer J, Gonzalez-Garcia A, Raducanu B (2018) Transferring gans: generating images from limited data. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 218–234. https://doi.org/10.1007/978-3-030-01231-1_14
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Honolulu, HI, pp 7167–7176
Cai G, Wang Y, He L, Zhou M (2019) Unsupervised domain adaptation with adversarial residual transform networks. IEEE Trans Neural Netw Learn Syst 31(8):3073–3086
Wang Y, Gonzalez-Garcia A, Berga D, Herranz L, Khan FS, Weijer JVD (2020) Minegan: effective knowledge transfer from gans to target domains with few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Seattle, WA, pp 9332–9341
Xiong L, Karlekar J, Zhao J, Cheng Y, Xu Y, Feng J, Pranata S, Shen S (2017) A good practice towards top performance of face recognition: transferred deep feature fusion. arXiv preprint arXiv:170400438. https://doi.org/10.48550/arXiv.1704.00438
Mo S, Cho M, Shin J (2020) Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:200210964. https://doi.org/10.48550/arXiv.2002.10964
Ojha U, Li Y, Lu J, Efros AA, Lee YJ, Shechtman E, Zhang R (2021) Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, pp 10743–10752
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Las Vegas, NV, pp 770–778
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision. IEEE, Santiago, Chile, pp 3730–3738
King DE (2009) Dlib-ml: A machine learning toolkit. The J Mach Learn Res 10:1755–1758
Talebi H, Milanfar P (2018) NIMA: Neural image assessment. IEEE Trans Image Process 27(8):3998–4011
Murray N, Marchesotti L, Perronnin F (2012) AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 2408–2415
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems, vol 30. Curran Associates Inc., Red Hook, NY
Bińkowski M, Sutherland DJ, Arbel M, Gretton A (2018) Demystifying MMD gans. arXiv preprint arXiv:180101401. https://doi.org/10.48550/arXiv.1801.01401
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, UT, pp 586–595
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, Miami, FL, pp 248–255
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B (2022) High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, New Orleans, LA, pp 10684–10695
Acknowledgements
This work is partially supported by the National Natural Science Foundation of China (Grant no. 61373004).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Ethical Approval
The use of facial images in this research has been conducted with the informed consent of the individuals depicted.
Conflict of interests
The authors declare that there is no conflict of interests regarding the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liao, Y., Yan, L., Hou, Z. et al. CutGAN: dual-Branch generative adversarial network for paper-cut image generation. Multimed Tools Appl 83, 55867–55888 (2024). https://doi.org/10.1007/s11042-023-17746-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17746-z