Text-to-Chinese-painting Method Based on Multi-domain VQGAN

Home > Archive>Volume 13, Issue 2, 2023 >197-219. DOI:10.21655/ijsi.1673-7288.00314

Text-to-Chinese-painting Method Based on Multi-domain VQGAN
DOI:
                        10.21655/ijsi.1673-7288.00314
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the development of generative adversarial networks, synthesizing images from text descriptions has become an active research area. However, text descriptions used for image generation are often in English, and the generated objects are mostly faces, flowers, birds, etc. Few studies have been conducted on the generation of Chinese paintings with Chinese descriptions. The text-to-image task often requires a large number of labeled image--text pairs, which is expensive and boring. The advance of vision-language pre-training enables an image generation process guided by an optimized way, which significantly reduces the demand for annotated datasets and computational resources. In this paper, a multi-domain VQGAN model is proposed to generate Chinese paintings in multiple domains. Further, a vision-language pre-training model WenLan is used to calculate the distance loss between the generated images and the text descriptions. The semantic consistency between images and text is achieved by optimizing the hidden space variables as the input of multi-domain VQGAN. An ablation study is conducted to compare different variants of our multi-domain VQGAN in terms of the FID and R-precision metrics. We also conduct a user study to further show the effectiveness of our proposed model. The extensive results demonstrate that our proposed multi-domain VQGAN model outperforms all the competitors in terms of image quality and text-image semantic consistency.

Reference

Cited by

Get Citation

Zelong Sun, Guoxing Yang, Jingyuan Wen, Nanyi Fei, Zhiwu Lu, Jirong Wen. Text-to-Chinese-painting Method Based on Multi-domain VQGAN. International Journal of Software and Informatics, 2023,13(2):197~219

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 16,2022
Revised:May 29,2022
Adopted:August 24,2022
Online: June 29,2023
Published:

Home

About Journal

Editorial Board

Guidelines

Content

News

Top papers

E-mail Alert

Publication Ethics

Old Version

Get Citation

Share

Article Metrics

History