CN115861747A

CN115861747A - Image generation method, image generation device, electronic equipment and storage medium

Info

Publication number: CN115861747A
Application number: CN202211457780.3A
Authority: CN
Inventors: 龚苏明; 程虎; 刘文超; 林垠; 殷保才; 胡金水
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-11-21
Filing date: 2022-11-21
Publication date: 2023-03-28

Abstract

The invention provides an image generation method, an image generation device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a description text of a target to be drawn, and extracting keywords of preset attributes of the drawing target from the description text; generating an attribute image of the preset attribute based on the keyword; and generating a target image of the target to be drawn by taking the description text and the attribute image as constraints. According to the method, the device, the electronic equipment and the storage medium, the description text and the attribute image are used as constraints, the target image of the target to be drawn is generated, the target image not only accords with the limitation of the description text, but also accords with the limitation of the attribute image, therefore, the consistency of the preset attribute of the target in the generated target image and the keyword in the description text is ensured, the target image accords with the common sense recognition in the presentation of the preset attribute, no rare and strange image result is generated, and the accuracy and the reliability of the generated target image are improved.

Description

Image generation method, image generation device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image generation technologies, and in particular, to an image generation method and apparatus, an electronic device, and a storage medium.

Background

With the increasing application demand of personalized virtual reloading, text-driven garment image generation becomes an essential technical support.

In the prior art, image generation methods mainly include three types, namely, an image generation method based on a Variational Auto-Encoders (VAE), an image generation method based on a GAN (generic adaptive Network, generation countermeasure Network), and an image generation method based on a Diffusion model (Diffusion Models).

However, the images generated by the three image generation methods have the problem of being not consistent with common sense cognition, and particularly, the images are weaker in the control of style information for generating the clothes images, and rare and strange images are often obtained.

Disclosure of Invention

The invention provides an image generation method, an image generation device, electronic equipment and a storage medium, which are used for solving the defect that an image generated by an image generation method in the prior art is not consistent with common sense cognition.

The invention provides an image generation method, which comprises the following steps:

obtaining a description text of a target to be drawn, and extracting keywords of preset attributes of the target to be drawn from the description text;

generating an attribute image of the preset attribute based on the keyword;

and generating a target image of the target to be drawn by taking the description text and the attribute image as constraints.

According to an image generating method provided by the present invention, the generating a target image of the target to be drawn with the description text and the attribute image as constraints includes:

and fusing the semantic features of the description text and the image features of the attribute images to obtain fusion constraint features, and generating the target image based on the fusion constraint features.

According to an image generation method provided by the present invention, the fusing the semantic features of the description text and the image features of the attribute image to obtain a fusion constraint feature, and generating the target image based on the fusion constraint feature includes:

fusing the semantic features of the description text and the image features of the attribute images based on a constraint layer in an image generation model to obtain fused constraint features;

based on a generation layer in the image generation model, applying the fusion constraint characteristics to generate the target image;

the image generation model is obtained by training based on a sample description text, and a sample attribute image and a sample target image which respectively correspond to the sample description text.

According to an image generation method provided by the invention, the generation layer comprises a plurality of cascaded characteristic layers;

the generating the target image by applying the fusion constraint characteristic based on a generation layer in the image generation model comprises:

performing feature fusion on the fusion constraint feature and a previous baseline feature of a previous feature layer based on a current feature layer in the generation layer to obtain a current baseline feature;

taking a next feature layer of the current feature layer as the current feature layer, taking the current baseline feature as the previous baseline feature, and returning to execute feature fusion of the current feature layer until the current feature layer is the last feature layer in the generation layers;

and taking the baseline characteristic of the final characteristic layer as the target image.

According to an image generating method provided by the present invention, the extracting a keyword of a preset attribute of the drawing target from the description text includes:

extracting keywords with preset attributes of the drawing target from the description text based on a keyword library with preset attributes;

the keyword library is constructed based on the word frequency of each sample word in the sample description text and the attribute screening corresponding to each sample word.

According to an image generation method provided by the present invention, the generating an attribute image of the preset attribute based on the keyword includes:

and generating the attribute image of the preset attribute by taking the semantic features of the keywords as constraints.

According to the image generation method provided by the invention, the target to be drawn is a garment, and the preset attribute is a garment style.

The present invention also provides an image generating apparatus comprising:

the system comprises an acquisition unit, a display unit and a control unit, wherein the acquisition unit is used for acquiring a description text of a target to be drawn and extracting keywords of preset attributes of the drawn target from the description text;

the generating unit is used for generating an attribute image of the preset attribute based on the keyword;

and the target image generating unit is used for generating a target image of the target to be drawn by taking the description text and the attribute image as constraints.

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image generation method as described in any of the above when executing the program.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image generation method as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the image generation method as described in any one of the above.

According to the image generation method, the image generation device, the electronic equipment and the storage medium, the attribute image of the preset attribute is generated based on the keywords, the keywords of the preset attribute of the target to be drawn are extracted from the description text, the description text and the attribute image are taken as constraints, the target image of the target to be drawn is generated, the target image not only accords with the definition of the description text, but also accords with the definition of the attribute image, therefore, the preset attribute of the target in the generated target image is ensured to be consistent with the keywords in the description text, the target image accords with common sense cognition on the presentation of the preset attribute, the image result which is rare and odd cannot be generated, and the accuracy and reliability of the generated target image are improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is one of the flow diagrams of an image generation method provided by the present invention;

FIG. 2 is a schematic flow chart of step 131 in the image generation method provided by the present invention;

FIG. 3 is a schematic flow chart of step 1312 of the image generation method provided by the present invention;

FIG. 4 is a second schematic flowchart of the image generation method provided by the present invention;

FIG. 5 is a schematic diagram of an image generation apparatus provided by the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the related art, the image generation methods mainly include three types, namely, an image generation method based on a Variational Auto-encoder (VAE), an image generation method based on a GAN (generic adaptive Network, generation countermeasure Network), and an image generation method based on a Diffusion model (Diffusion Models).

GAN is mainly composed of a Generator (Generator) and a Discriminator (Discriminator). At present, GAN-based image generation methods mainly include a multi-stage image generation method and a single-stage image generation method.

The multi-stage image generation method comprises the following steps: firstly, a section of Text description is obtained, then a Text feature extraction network (Text Encoder) is utilized to extract Text features in the Text description, then the Text features and random noise features sampled from Gaussian distribution are spliced, then the spliced features are input into a first generator to generate a low-resolution image, after the first generator outputs a first image, the first image output by the first generator is input into a second generator, and the like until the last generator outputs a target-resolution image.

The single-stage image generation method comprises the following steps: firstly, a section of text description is obtained, then, a text feature extraction network is utilized to extract text features in the text description, then, random noise sampled from Gaussian distribution is input into a generator as an input feature to generate a target image, and the text features are added into each module Block of the generator as constraint conditions in the generation process of the target image, so that the final image generation result conforms to the pre-obtained text description.

However, in the GAN-based image generation method, the multi-stage image generation method requires training of multiple generators and discriminators, which increases the difficulty of training for image generation and also increases the risk of training failure, whereas in the single-stage image generation method, although the training process is relatively stable, in the case where the amount of training data is not huge enough, the image generation result may not meet the common sense recognition. For example, the target image generated by the image generation model is well learned for descriptive text such as color, but the style information is not well learned, resulting in an image generation result that is not in accordance with common sense knowledge.

To solve the problem, the present invention provides an image generating method, and fig. 1 is one of the flow diagrams of the image generating method provided by the present invention, as shown in fig. 1, the method includes:

step 110, obtaining a description text of the target to be drawn, and extracting keywords of preset attributes of the target to be drawn from the description text.

Specifically, the target to be drawn is a target that needs to generate a target image subsequently, and the target to be drawn may be a garment, an automobile, a cup, or the like.

A description text of the object to be drawn may be obtained, where the description text refers to a descriptive text related to an attribute of the object to be drawn, and the description text may be directly input by a user, may also be obtained by performing voice transcription on an acquired audio, and may also be obtained by acquiring an image through an image acquisition device such as a scanner, a mobile phone, a camera, a tablet, and performing OCR (Optical Character Recognition) on the image, which is not specifically limited in this embodiment of the present invention.

For example, when the object to be drawn is a garment, the description text of the object to be drawn may be a red T-shirt (T-shirt), a brown coat, a white shirt, or the like, and this is not particularly limited in this embodiment of the present invention.

After the description text of the target to be drawn is acquired, keywords of preset attributes of the target to be drawn can be extracted from the description text. The preset attribute of the target to be drawn is a target attribute which is preset by a pointer, such as the target to be drawn, and needs to be restricted or strengthened, and the preset attribute may be one or more.

For example, when the object to be drawn is a garment, the preset attribute of the object to be drawn may be a garment style; when the target to be drawn is an automobile, the preset attribute of the target to be drawn can be the automobile style; when the object to be drawn is a cup, the preset attribute of the object to be drawn may be a cup style, and the like, which is not specifically limited in the embodiment of the present invention.

Correspondingly, the keyword refers to a word corresponding to the preset attribute of the target to be drawn in the description text, and the keyword may be understood as a description word for the attribute value of the preset attribute, and may be one or multiple keywords.

For example, when the target to be drawn is a garment and the preset attribute of the target to be drawn is a color, the keyword for extracting the preset attribute of the target to be drawn from the description text may be red, blue, green, white, or the like; for another example, when the target to be drawn is a garment and the preset attribute of the target to be drawn is a design color, the keyword for extracting the preset attribute of the target to be drawn from the description text may be a leopard line, a lattice, a gradient color, an embroidery, or the like; for example, when the target to be drawn is a garment and the preset attribute of the target to be drawn is a color, a design color, or a neckline style, the keyword for extracting the preset attribute of the target to be drawn from the description text may be a black lattice round collar, a white square collar, an embroidery stand collar, a leopard V collar, or the like, which is not particularly limited in the embodiment of the present invention.

And 120, generating an attribute image of the preset attribute based on the keyword.

Specifically, after extracting a keyword of a preset attribute of the target to be drawn from the description text, an attribute image of the preset attribute may be generated based on the keyword, where the attribute image refers to an image in which the preset attribute matches the description of the keyword. It is understood that the generation of the attribute image is intended to draw the preset attribute described by the keyword, and the characteristics of the other attributes of the target to be drawn except for the preset attribute can be ignored in the obtained attribute image.

For example, when the target to be drawn is a garment and the preset attribute of the target to be drawn is a garment style, the attribute image may be a garment style image, and for other attributes except the preset attribute, such as a color, the attribute image may be directly expressed as a default color; for another example, when the object to be drawn is an automobile and the preset attribute of the object to be drawn is an automobile style, the attribute image may be an automobile style image; for example, when the object to be drawn is a cup and the preset attribute of the object to be drawn is a cup style, the attribute image may be a cup style image or the like, which is not particularly limited in the embodiment of the present invention.

Here, the attribute image of the preset attribute is generated based on the keyword, and the existing image generation model may be used for fine tuning to obtain the image generation model suitable for generating the attribute image of the preset attribute, where the image generation model may use a Parsing generation model.

And step 130, generating a target image of the target to be drawn by taking the description text and the attribute image as constraints.

Specifically, after the description text and the generated attribute image of the preset attribute are obtained, the description text and the attribute image may be used as constraints to generate a target image of the target to be drawn. For example, semantic features in the description text may be extracted, image features in the attribute image may be extracted, and then a target image of the object to be drawn may be generated based on the semantic features in the description text and the image features in the attribute image. The target image here is the image of the target to be drawn which is finally drawn.

For example, the target image may be an image of a red T-shirt, an image of a brown windcheat, an image of a white shirt, and the like, which is not specifically limited in this embodiment of the present invention.

It can be understood that the description text and the attribute image are taken as constraints to generate the target to be drawn, and the target image not only conforms to the definition of the description text, but also conforms to the definition of the attribute image, so that the preset attribute of the target in the generated target image is ensured to be consistent with the keywords in the description text, the target image conforms to common sense cognition on the presentation of the preset attribute, and rare and odd image results can not occur.

According to the method provided by the embodiment of the invention, the attribute image of the preset attribute is generated based on the keyword, the keyword of the preset attribute of the target to be drawn is extracted from the description text, the description text and the attribute image are taken as constraints, and the target image of the target to be drawn is generated, and not only accords with the definition of the description text, but also accords with the definition of the attribute image, so that the preset attribute of the target in the generated target image is ensured to be consistent with the keyword in the description text, the target image accords with common sense cognition on the presentation of the preset attribute, a rare and ancient image result is not generated, and the accuracy and reliability of the generated target image are improved.

Based on the above embodiment, step 130 includes:

and 131, fusing the semantic features of the description text and the image features of the attribute images to obtain fusion constraint features, and generating the target image based on the fusion constraint features.

Specifically, after the description text and the attribute image are obtained, the semantic features of the description text and the image features of the attribute image may be fused to obtain a fusion constraint feature, where the fusion constraint feature is a feature in which the semantic features of the description text and the image features of the attribute image are fused.

Here, the description text may be text-coded to obtain semantic features, where the text coding may use a multilayer Convolutional Neural Network (CNN) with a cascade structure, may also use a Deep Neural Network (DNN), may also use a combined structure of CNN and DNN, and the like, and this is not specifically limited in this embodiment of the present invention.

Here, the feature extraction may be performed on the attribute image to obtain an image feature, where the feature extraction may use an LBP (Local Binary Patterns) algorithm, an HOG (Histogram of Oriented gradients) algorithm, or the like, and this is not particularly limited in the embodiment of the present invention.

Here, the semantic features of the description text and the image features of the attribute images may be merged by splicing the semantic features of the description text and the image features of the attribute images, or by splicing the semantic features of the description text and the image features of the attribute images after weighting by using an attention mechanism, which is not specifically limited in the embodiment of the present invention.

After the fusion constraint feature is obtained, a target image may be generated based on the fusion constraint feature. The target image here is the image of the target to be drawn which is finally drawn.

For example, the target image may be generated by applying the fusion constraint feature based on the generation layer in the image generation model, or the target image may be generated by applying the fusion constraint feature and the random noise feature sampled from the gaussian distribution based on the generation layer in the image generation model.

According to the method provided by the embodiment of the invention, the semantic features of the description text and the image features of the attribute images are fused to obtain the fusion constraint features, and the target image is generated based on the fusion constraint features, so that the accuracy and reliability of the generated target image are ensured.

In the related art, most of image generation models are used for generating images by combining text features in extracted text descriptions and random noise features sampled from Gaussian distribution, however, target images generated by the image generation models can be well learned for descriptive texts such as colors, but style information cannot be well learned, so that the generated target images are not in accordance with common sense cognition, and the accuracy and reliability of the generated target images are reduced.

To solve this problem, the image generation model in the embodiment of the present invention includes a constraint layer, and based on the constraint layer in the image generation model, semantic features describing text and image features of attribute images may be fused.

Based on the foregoing embodiment, fig. 2 is a schematic flowchart of step 131 in the image generation method provided by the present invention, and as shown in fig. 2, step 131 includes:

step 1311, fusing semantic features of the description text and image features of the attribute images based on a constraint layer in an image generation model to obtain fused constraint features;

step 1312, based on a generation layer in the image generation model, applying the fusion constraint characteristics to generate the target image;

Specifically, in order to achieve better enhancement of the image generation effect of the image generation model, before step 1311, the image generation model needs to be obtained by:

the sample description text, and the sample attribute image and the sample target image respectively corresponding to the sample description text may be collected in advance, and an initial image generation model may also be constructed in advance, where parameters of the initial image generation model may be preset or randomly generated, and the initial image generation model may be a generator in a generation countermeasure network, which is not specifically limited in this embodiment of the present invention.

In this process, the initial constrained layer and the initial generation layer may be used as an initial image generation model, that is, an initial model of a training image generation model.

After an initial image generation model including an initial constrained layer and an initial generation layer is obtained, a sample description text collected in advance, and a sample attribute image and a sample target image which respectively correspond to the sample description text can be applied to train the initial image generation model:

firstly, semantic features of a description text and image features of an attribute image can be extracted, then, a sample description text and sample attribute images respectively corresponding to the sample description text can be input into an initial constraint layer in an initial image generation model, and the initial constraint layer fuses the semantic features of the description text and the image features of the attribute image to obtain and output initial fusion constraint features.

Second, the initial fusion constraint feature may be input into the initial generation layer, and the initial fusion constraint feature is applied by the initial generation layer to generate the prediction target image.

After the prediction target image is obtained based on the initial generation layer, the prediction target image is compared with a sample target image collected in advance, a loss function value is calculated according to the difference degree between the prediction target image and the sample target image, parameter iteration is performed on the initial image generation model as a whole based on the loss function value, and the initial image generation model after the parameter iteration is completed is marked as an image generation model.

It is understood that the greater the degree of difference between the prediction target image and the sample target image collected in advance, the greater the loss function value; the smaller the degree of difference between the prediction target image and the sample target image collected in advance, the smaller the loss function value.

That is, in the training process of the image generation model, the ability to generate the target image is learned.

In addition, the image generation model can be regarded as a generator to be in confrontation training with the discriminator. The image generation model as the generator can fuse the semantic features of the input description text and the image features of the attribute images to obtain fusion constraint features, the fusion constraint features are applied to generate a target image, and the discriminator can discriminate the input image, namely distinguish whether the input image is the target image generated by the generator or a real sample target image. In the process, the generator and the discriminator play games with each other, the generator takes the output of the sample target image which is similar to the sample target image as much as possible as a means for realizing the purpose that the discriminator is difficult to distinguish the sample target image from the target image, and the discriminator takes the output discrimination result consistent with the actual condition of the input image as a means for realizing more accurate and reliable discrimination effect.

According to the method provided by the embodiment of the invention, the image generation model is obtained by training based on the sample description text, and the sample attribute image and the sample target image which respectively correspond to the sample description text, so that the reliability and the accuracy of the target image generated by the image generation model are ensured.

Based on the above embodiment, the generation layer includes a plurality of cascaded feature layers;

fig. 3 is a schematic flow chart of step 1312 in the image generating method provided in the present invention, and as shown in fig. 3, step 1312 includes:

step 310, performing feature fusion on the fusion constraint feature and a previous baseline feature of a previous feature layer based on a current feature layer in the generated layer to obtain a current baseline feature;

step 320, using a next feature layer of the current feature layer as the current feature layer, using the current baseline feature as the previous baseline feature, and returning to perform feature fusion of the current feature layer until the current feature layer is the last feature layer of the generated layers;

and step 330, taking the baseline characteristic of the final characteristic layer as the target image.

In particular, the generation layers in the image generation model may include multiple cascaded layers of features. In the process of generating the target image, first, a feature layer arranged first in the generation layer may be taken as a current feature layer, and a flow of generating the target image may be performed:

feature fusion can be performed on a fusion constraint feature output by a constraint layer in an image generation model and a previous baseline feature of a previous feature layer to obtain a current baseline feature, where the previous feature layer can be FC (full Connected layers), and the previous baseline feature can be a random noise feature output by the previous feature layer, for example, random noise sampled from gaussian distribution can be input to the full connection layer, and the random noise feature output by the full connection layer is used as the previous baseline feature.

The feature fusion here may be performed by splicing the fusion constraint feature with the previous baseline feature, or may be performed by splicing the fusion constraint feature with the previous baseline feature after weighting the fusion constraint feature with an attention mechanism, which is not specifically limited in this embodiment of the present invention.

After completing the feature fusion of the current feature layer, the next feature layer of the current feature layer, that is, the feature layer arranged at the second position, is used as the current feature layer, the current baseline feature is used as the previous baseline feature, and the process of executing the feature fusion of the current feature layer is returned:

that is, after the current baseline feature output by the feature layer ranked at the second position is obtained, the current baseline feature may be used as the previous baseline feature and the fusion constraint feature to perform feature fusion, so as to obtain the current baseline feature.

The process of performing feature fusion by using the feature layer arranged at the third position as the current feature layer is similar to the process of performing feature fusion by using the feature layer arranged at the second position as the current feature layer, and is not repeated here.

And the analogy is repeated until the current feature layer is the last feature layer in the generation layers, wherein the last feature layer is the last feature layer in the generation layers.

After the baseline feature is output by the last feature layer, the baseline feature of the last feature layer can be used as a target image.

According to the method provided by the embodiment of the invention, the generation layer comprises the multilayer cascade feature layer, and the fusion constraint feature and the baseline feature of the multilayer cascade feature layer are subjected to feature fusion, so that the accuracy and the reliability of the generated target image are improved.

Based on the above embodiment, step 120 includes:

Specifically, the keywords of the preset attribute of the drawing target may be extracted from the description text based on a keyword library of the preset attribute, where the keyword library is constructed based on the word frequency of each sample participle in the sample description text and the attribute screening corresponding to each sample participle.

For example, the ending word segmentation may be applied to the sample description text, the sample description text may be screened to obtain each sample word with a higher word frequency, then, the attribute corresponding to each sample word is screened from each sample word with a higher word frequency, and each sample word with a higher word frequency and the attribute corresponding to each sample word are stored in the keyword library. The sample segmentation words may be spring clothes, summer clothes, autumn clothes, winter clothes, etc., and the attribute corresponding to the sample segmentation words may be clothing style, such as red, leopard line, woolen cloth, down, etc.

According to the method provided by the embodiment of the invention, the keyword library is constructed on the basis of the word frequency of each sample participle in the sample description text and the attribute screening corresponding to each sample participle, so that the condition that the keywords for drawing the preset attribute of the target are extracted from the description text is ensured, the keywords are the words with higher word frequency of each sample participle in the sample description text, and the accuracy and the reliability of the subsequently generated target image are improved.

Based on the above embodiment, step 120 includes:

Specifically, the semantic features of the keywords may be used as constraints to generate an attribute image with preset attributes, where the semantic features of the keywords are features obtained by text coding of the keywords.

For example, a description text of the target to be drawn is "a red T-shirt", a keyword for extracting a preset attribute of the target to be drawn from the description text is "T-shirt", and an attribute image of the preset attribute may be generated based on a semantic feature of "T-shirt".

In addition, if there is No matching keyword, a prompt message "No Match" to check whether the keyword is complete may be given.

Based on the above embodiment, the target to be drawn is a garment, and the preset attribute is a garment style.

Specifically, the target to be drawn may be a garment, and the preset attribute of the target to be drawn may be a garment style. For another example, the object to be drawn may be an automobile, the preset attribute of the object to be drawn may be an automobile style, and for another example, the object to be drawn may be a cup, and the preset attribute of the object to be drawn may be a cup style, and the like.

Based on any of the above embodiments, the present invention provides an image generating method, and fig. 4 is a second schematic flow chart of the image generating method provided by the present invention, as shown in fig. 4, the method includes:

the first step may be to obtain a description text of the target to be drawn, and may be to extract a keyword of a preset attribute of the target to be drawn from the description text based on a keyword library of the preset attribute. The keyword library is constructed based on the word frequency of each sample participle in the sample description text and the attribute screening corresponding to each sample participle. The target to be drawn is a garment, and the preset attribute is a garment style.

And a second step of generating an attribute image of the preset attribute based on the keyword.

Thirdly, fusing semantic features of the description text and image features of the attribute images based on a constraint layer in the image generation model to obtain fused constraint features;

and based on a generation layer in the image generation model, applying the fusion constraint characteristics to generate a target image. The generation layer here includes a plurality of cascaded feature layers, that is, the fusion constraint feature may be feature-fused with the previous baseline feature of the previous feature layer based on the current feature layer in the generation layer to obtain the current baseline feature.

And taking the next characteristic layer of the current characteristic layer as the current characteristic layer, taking the current baseline characteristic as the previous baseline characteristic, returning to execute the characteristic fusion of the current characteristic layer until the current characteristic layer is the rearmost characteristic layer in the layers, and taking the baseline characteristic of the rearmost characteristic layer as the target image.

The image generation model is trained on the sample description text, and the sample attribute image and the sample target image which respectively correspond to the sample description text.

The image generating apparatus provided by the present invention is described below, and the image generating apparatus described below and the image generating method described above may be referred to in correspondence with each other.

Based on any of the embodiments described above, the present invention provides an image generating apparatus, and fig. 5 is a schematic structural diagram of the image generating apparatus provided by the present invention, and as shown in fig. 5, the apparatus includes:

an obtaining unit 510, configured to obtain a description text of a target to be drawn, and extract a keyword of a preset attribute of the target to be drawn from the description text;

a generating unit 520, configured to generate an attribute image of the preset attribute based on the keyword;

a generate target image unit 530, configured to generate a target image of the target to be drawn by using the description text and the attribute image as constraints.

According to the device provided by the embodiment of the invention, the attribute image of the preset attribute is generated based on the keyword, the keyword of the preset attribute of the target to be drawn is extracted from the description text, the description text and the attribute image are taken as constraints, the target image of the target to be drawn is generated, the target image not only accords with the definition of the description text, but also accords with the definition of the attribute image, therefore, the preset attribute of the target in the generated target image is ensured to be consistent with the keyword in the description text, the target image accords with common sense cognition on the presentation of the preset attribute, the image result with odd and odd taste is not generated, and the accuracy and reliability of the generated target image are improved.

Based on any of the above embodiments, generating the target image unit specifically includes:

and the fusion unit is used for fusing the semantic features of the description text and the image features of the attribute images to obtain fusion constraint features, and generating the target image based on the fusion constraint features.

Based on any of the embodiments above, the fusion unit is specifically configured to:

the constraint unit is used for fusing the semantic features of the description text and the image features of the attribute images based on a constraint layer in an image generation model to obtain fused constraint features;

a generating subunit, configured to apply the fusion constraint feature to generate the target image based on a generation layer in the image generation model;

In any of the above embodiments, the generation layer comprises a plurality of cascaded feature layers;

the generating subunit is specifically configured to:

Based on any of the above embodiments, the obtaining unit specifically includes:

Based on any of the above embodiments, the generating unit includes:

Based on any of the above embodiments, the target to be drawn is a garment, and the preset attribute is a garment style.

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor) 610, a communication Interface 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 complete communication with each other through the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform an image generation method comprising: obtaining a description text of a target to be drawn, and extracting keywords of preset attributes of the drawing target from the description text; generating an attribute image of the preset attribute based on the keyword; and generating a target image of the target to be drawn by taking the description text and the attribute image as constraints.

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the image generation method provided by the above methods, the method comprising: obtaining a description text of a target to be drawn, and extracting keywords of preset attributes of the drawing target from the description text; generating an attribute image of the preset attribute based on the keyword; and generating a target image of the target to be drawn by taking the description text and the attribute image as constraints.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image generation method provided by performing the above methods, the method including: obtaining a description text of a target to be drawn, and extracting keywords of preset attributes of the drawing target from the description text; generating an attribute image of the preset attribute based on the keyword; and generating a target image of the target to be drawn by taking the description text and the attribute image as constraints.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image generation method, comprising:

generating an attribute image of the preset attribute based on the keyword;

2. The image generation method according to claim 1, wherein generating the target image of the object to be drawn by using the description text and the attribute image as constraints comprises:

3. The image generation method according to claim 2, wherein the fusing the semantic features of the description text and the image features of the attribute images to obtain a fusion constraint feature, and generating the target image based on the fusion constraint feature includes:

4. The image generation method according to claim 3, wherein the generation layer includes a plurality of cascaded feature layers;

taking a next feature layer of the current feature layer as the current feature layer, taking the current baseline feature as the previous baseline feature, and returning to perform feature fusion of the current feature layer until the current feature layer is the last feature layer in the generation layers;

5. The image generation method according to claim 1, wherein the extracting the keyword of the preset attribute of the drawing target from the description text includes:

6. The image generation method according to claim 1, wherein the generating of the attribute image of the preset attribute based on the keyword includes:

7. The image generation method according to any one of claims 1 to 6, wherein the object to be drawn is a garment, and the preset attribute is a garment style.

8. An image generation apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the image generation method according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the image generation method according to any one of claims 1 to 7.