CN114399668A - Natural image generation method and device based on hand-drawn sketch and image sample constraint - Google Patents
Natural image generation method and device based on hand-drawn sketch and image sample constraint Download PDFInfo
- Publication number
- CN114399668A CN114399668A CN202111617371.0A CN202111617371A CN114399668A CN 114399668 A CN114399668 A CN 114399668A CN 202111617371 A CN202111617371 A CN 202111617371A CN 114399668 A CN114399668 A CN 114399668A
- Authority
- CN
- China
- Prior art keywords
- image
- content
- natural
- natural image
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 87
- 230000008569 process Effects 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 22
- 230000009466 transformation Effects 0.000 claims description 16
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 8
- 230000005012 migration Effects 0.000 claims description 8
- 238000013508 migration Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 7
- 238000013140 knowledge distillation Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 5
- 239000000243 solution Substances 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a natural image generation method and a device based on hand-drawn sketch and image sample constraint, wherein the method comprises the following steps: firstly, acquiring an original natural image and category information, and constructing a training data set; then, a natural image generation model is constructed, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images; then training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model; and finally, inputting the target image sample and the target hand-drawn sketch into the target model to generate a natural image based on the target hand-drawn sketch and the image sample constraint. The invention improves the convenience and the controllability, and can be widely applied to the technical field of image processing.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a natural image generation method and device based on hand-drawn sketches and image sample constraints.
Background
With the rapid development of conditional countermeasure generation networks, a variety of conditional countermeasure generation networks based on different constraints are emerging continuously, such as networks using edge maps, natural images, hand-drawn sketches, semantic segmentation maps, or the like as constraints. But these networks can only control the generation of image content such as pose, shape, etc. The controllability of the generated image is not high enough.
The recent emergence of using key points, edge maps and only natural images to control the generation of image content, but using key points does not express user intent well because they are too abstract; the user can not find a proper input to express the intention of the user by using the edge map or the natural image, and the method is not convenient enough.
Disclosure of Invention
In view of this, embodiments of the present invention provide a natural image generation method and apparatus based on sketching and image sample constraints, which are highly convenient and controllable.
The invention provides a natural image generation method based on hand-drawn sketch and image sample constraint, which comprises the following steps:
acquiring an original natural image and category information, and constructing a training data set; wherein the training data set comprises content images and image samples;
constructing a natural image generation model, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images;
training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and inputting the target image sample and the target hand-drawn sketch into the target model, and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Optionally, the acquiring the original natural image and the category information to construct a training data set includes:
acquiring an edge map of the original natural image through an edge map extraction algorithm;
the edge graph and the corresponding natural image are paired to form a training data pair in a data set, and the edge graph and the random natural image are also paired to form a training data pair in the data set;
and constructing the training data set according to the two training data pairs, wherein the edge graph is a content image, and the natural image is an image sample.
Optionally, the constructing a natural image generation model includes:
constructing a content encoder, wherein the content encoder comprises five convolution modules and two residual modules, and is used for extracting content characteristics in input data;
constructing a style encoder, wherein the style encoder is used for extracting style characteristics of an image sample in input data;
and constructing a content decoder, wherein the content decoder is used for acquiring affine transformation parameters according to the style characteristics and generating pictures according to the content characteristics and the affine transformation parameters.
Optionally, the content feature extraction formula is:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentIs a content feature; econtentBeing a content encoder, Xedge or sketchIs a sketch or hand drawing in the content image.
Optionally, the method further includes a step of performing style migration using adaptive instance normalization, where the step specifically includes:
processing the style characteristics through three full connection layers to obtain affine transformation parameters;
inputting the style characteristics into an AdaIN Resblock module of the content decoder according to the affine transformation parameters, and performing style migration by using self-adaptive example normalization through the AdaIN Resblock module;
wherein the calculation expression of the adaptive instance normalization is as follows:
wherein AdaIN (z)content,zreference) Represents the result of the adaptive instance normalization, zcontentRepresenting a content feature, zreferenceDenotes the style characteristics, μ denotes the mean, and σ denotes the variance.
Optionally, in the step of training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model,
the reconstruction loss function of the training process is:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein L isrec(G) Represents a reconstruction loss function; xedgeRepresenting edge maps in the training data pairs; y isimageRepresenting natural images in the training data pair; g () represents the generator model, and during training, the generator inputs an edge graph and a natural image to generate a target picture;
the multi-tasking discriminant loss function is:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein L isGAN(D, G) represents a multitask discrimination loss, G represents a generator network, and D represents a discriminator network; eXRepresenting the true data distribution, logDc-1(Yc-1) Representing the output of the discriminator when the real sample is input; eX,YRepresenting the generated sample distribution, logDc-1(G(Xedge,Yc-1) Output of the discriminator when the representative input generates a sample; g (X)edge,Yc-1) Representing samples generated by a generator network; the subscript c-1 indicates the category.
The knowledge distillation loss function is:
wherein L isDistill(GS) Represents a loss of knowledge distillation; n represents the number of layers of the middle layer of the selected generator network;an output representing the activation value of the ith layer of the network middle layer of the teacher generator;an output representing a layer i activation value of a middle layer of the student generator network.
Another aspect of the embodiments of the present invention provides a natural image generation apparatus based on hand-drawn sketches and image sample constraints, including:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an original natural image and category information and constructing a training data set; wherein the training data set comprises content images and image samples;
the system comprises a first module, a second module and a third module, wherein the first module is used for establishing a natural image generation model, the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating a natural image in a training process, and the multi-task discriminator is used for judging whether the generated natural image is real or not in the training process and judging the category of the generated natural image;
the third module is used for training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and the fourth module is used for inputting the target image sample and the target hand-drawn sketch into the target model and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Another aspect of the embodiments of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Another aspect of the embodiments of the present invention provides a computer-readable storage medium storing a program, the program being executed by a processor to implement the method as described above.
Another aspect of embodiments of the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The method comprises the steps of firstly, obtaining original natural images and category information, and constructing a training data set; then, a natural image generation model is constructed, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images; then training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model; and finally, inputting the target image sample and the target hand-drawn sketch into the target model to generate a natural image based on the target hand-drawn sketch and the image sample constraint. The invention improves the convenience and the controllability.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating the overall steps provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of edge map extraction according to natural images according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating two training data pairs according to an embodiment of the present invention;
FIG. 4 is a diagram of a natural image generation model according to an embodiment of the present invention;
fig. 5 is a diagram of a natural image generation result based on a sketching and an image sample constraint according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Aiming at the problems in the prior art, in order to combine the advantages of using the hand-drawn sketch and the image sample as input, the hand-drawn sketch can be used for providing convenience for users and controlling the generated image content, and the image sample can be used for controlling the styles of generating image textures and the like. The invention provides a fine-grained generation framework based on a hand-drawn sketch and using an image sample as an additional constraint. In the framework, aiming at the problem that the mapping relation between the hand-drawn sketch and the natural image is difficult to construct, a content encoder capable of extracting general semantic features from the cross-domain image is designed, namely, correct semantic features can be extracted no matter a side graph or the hand-drawn sketch is used as the input of the content encoder; and the knowledge distillation loss is introduced into the image generation process by combining the idea of knowledge distillation, so that the image generation quality is improved without changing the network architecture.
Specifically, an embodiment of the present invention provides a natural image generation method based on hand-drawn sketches and image sample constraints, including:
acquiring an original natural image and category information, and constructing a training data set; wherein the training data set comprises content images and image samples;
constructing a natural image generation model, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images;
training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and inputting the target image sample and the target hand-drawn sketch into the target model, and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Optionally, the acquiring the original natural image and the category information to construct a training data set includes:
acquiring an edge map of the original natural image through an edge map extraction algorithm;
the edge graph and the corresponding natural image are paired to form a training data pair in a data set, and the edge graph and the random natural image are also paired to form a training data pair in the data set;
and constructing the training data set according to the two training data pairs, wherein the edge graph is a content image, and the natural image is an image sample.
Optionally, the constructing a natural image generation model includes:
constructing a content encoder, wherein the content encoder comprises five convolution modules and two residual modules, and is used for extracting content characteristics in input data;
constructing a style encoder, wherein the style encoder is used for extracting style characteristics of an image sample in input data;
and constructing a content decoder, wherein the content decoder is used for acquiring affine transformation parameters according to the style characteristics and generating pictures according to the content characteristics and the affine transformation parameters.
Optionally, the content feature extraction formula is:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentIs a content feature; econtentBeing a content encoder, Xedge or sketchIs a sketch or hand drawing in the content image.
Optionally, the method further includes a step of performing style migration using adaptive instance normalization, where the step specifically includes:
processing the style characteristics through three full connection layers to obtain affine transformation parameters;
inputting the style characteristics into an AdaIN Resblock module of the content decoder according to the affine transformation parameters, and performing style migration by using self-adaptive example normalization through the AdaIN Resblock module;
wherein the calculation expression of the adaptive instance normalization is as follows:
wherein AdaIN (z)content,zreference) Represents the result of the adaptive instance normalization, zcontentRepresenting a content feature, zreferenceDenotes the style characteristics, μ denotes the mean, and σ denotes the variance.
Optionally, in the step of training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model,
the reconstruction loss function of the training process is:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein L isrec(G) Represents a reconstruction loss function; xedgeRepresenting edge maps in the training data pairs; y isimageRepresenting natural images in the training data pair; g () represents the generator model, and during training, the generator inputs an edge graph and a natural image to generate a target picture;
the multi-tasking discriminant loss function is:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein L isGAN(D, G) represents a multitask discrimination loss, G represents a generator network, and D represents a discriminator network; eXRepresenting the true data distribution, logDc-1(Yc-1) Representing the output of the discriminator when the real sample is input; eX,YRepresenting the generated sample distribution, logDc-1(G(Xedge,Yc-1) Output of the discriminator when the representative input generates a sample; g (X)edge,Yc-1) Representing samples generated by a generator network; the subscript c-1 indicates the category.
The knowledge distillation loss function is:
wherein L isDistill(GS) Represents a loss of knowledge distillation; n represents the number of layers of the middle layer of the selected generator network;an output representing the activation value of the ith layer of the network middle layer of the teacher generator;an output representing a layer i activation value of a middle layer of the student generator network.
Another aspect of the embodiments of the present invention provides a natural image generation apparatus based on hand-drawn sketches and image sample constraints, including:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an original natural image and category information and constructing a training data set; wherein the training data set comprises content images and image samples;
the system comprises a first module, a second module and a third module, wherein the first module is used for establishing a natural image generation model, the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating a natural image in a training process, and the multi-task discriminator is used for judging whether the generated natural image is real or not in the training process and judging the category of the generated natural image;
the third module is used for training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and the fourth module is used for inputting the target image sample and the target hand-drawn sketch into the target model and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
Another aspect of the embodiments of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
Another aspect of the embodiments of the present invention provides a computer-readable storage medium storing a program, the program being executed by a processor to implement the method as described above.
Another aspect of embodiments of the present invention provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The following detailed description of the specific implementation principles of the present invention is made with reference to the accompanying drawings:
as shown in FIG. 1, the invention discloses a natural image generation method based on hand-drawn sketch and image sample constraint, which includes the steps of firstly collecting natural images and hand-drawn sketch data to make a data set, then constructing a natural image generation model, training the natural image generation model by using a loss function, finally using the hand-drawn sketch and the image sample as input by a user, generating natural images which refer to the hand-drawn sketch in posture and refer to the image sample in texture by a generator in the natural image generation model, and obtaining a final result. Specifically, the method mainly comprises the following steps:
step 1: collecting a natural image and a hand-drawn sketch making data set, wherein the edge map and the sketch in the data set are content images, the natural image is an image sample, and the method comprises the following steps 1-1 and 1-2:
step 1-1: the method comprises the steps of collecting natural images and corresponding category information of the natural images, firstly obtaining edge maps of the natural images by using an edge map extraction algorithm, and obtaining the result of extracting the edge maps according to the natural images as shown in fig. 2. And (3) taking the edge map, the corresponding natural image, the edge map and the random natural image pair as training data pairs, and constructing a training data set by using the two training data pairs as shown in fig. 3.
Step 1-2: collecting the hand-drawn sketches, randomly distributing 10 natural images for each sketches as image samples, and constructing a test data set.
Step 2: the method comprises the following steps of constructing a natural image generation model, wherein the model comprises a generator and a multi-task discriminator, the generator consists of a content encoder which is responsible for extracting input content characteristics, a style encoder which is responsible for extracting input image sample characteristics and a decoder, and the multi-task discriminator consists of a plurality of two-classification discriminators and comprises the following steps of 2-1 to 2-5:
and 2-1, constructing a content encoder, wherein the content encoder is responsible for extracting the content characteristics of the input content image and is designed into a seven-layer convolution network comprising five convolution modules Conv-64, Conv-128, Conv-256, Conv-512 and two residual modules Resblock-512 and Resblock-512. The content feature process formula for extracting the content image is expressed as follows:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentAs a content feature, EcontentBeing a content encoder, Xedge or sketchIs a content image edge map or sketch; conv-in the network structure represents a convolution block, Resblock-represents a residual block, and the number represents the number of output characteristic channels.
And 2-2, constructing a style encoder, wherein the style encoder is responsible for extracting style characteristics of the input image sample and is designed into seven-layer networks including Conv-64, Conv-128, Conv-256, Conv-512, Conv-1024, AvgPooling and Conv-8. The style characteristic process formula for extracting the image sample is expressed as follows:
Zreference=Ereference(Yc-1)
wherein Z isreferenceAs a style feature, EreferenceFor a genre encoder, Yc-1For the input image sample, subscript c-1 indicates the class of the input image sample; conv-in the network structure represents the volume block, AvgPooling represents the average pooling layer, and numbers represent the number of output feature channels.
And 2-3, constructing a content decoder, wherein the content decoder uses the content characteristics as input, and performs style migration on the style characteristics by using adaptive instance normalization (AdaIN) to obtain a final generated picture. The content decoder is designed as two AdaIN Resblock-512 modules and five convolution modules Conv-512, Conv-256, Conv-128, Conv-64, Conv-3. The content decoder uses the content features and the style features as parameters to obtain a final generated picture process formula as follows:
wherein Z iscontentAs a characteristic of the content, ZreferenceFor style characteristics, Decoder is a content Decoder,for the final generated picture, subscript c-1 indicates the category of the generated picture; conv-in the network structure represents the convolution block, AdaIN Resblock represents the AdaIN residual block, and the number represents the number of output characteristic channels.
The specific steps of the content decoder for style migration using adaptive instance normalization (AdaIN) are as follows:
step 2-4: obtaining affine transformation parameters required by the ith AdaIN reblock module after the style characteristics pass through the first three full-connection layersThe specific calculation is as follows
Wherein, WTAnd b is the offset of the full connection layer, and the full connection layer converts the output into a vector form to realize feature transformation.
Step 2-5: using affine transformation parametersAnd (3) injecting the style characteristics into an AdaIN Resblock module of a decoder model, wherein the injection method is specifically calculated as follows
Wherein σi(zreference) And ui(zreference) Respectively representing predicted affine transformation parameters according to style characteristicsx is a content characteristic Zcontent,zreferenceFor the style characteristics, μ represents the mean and σ represents the variance.
And step 3: training data is adopted to generate a model for training a natural image, and parameters of the model for generating the natural image are adjusted by using a loss function in each training round, wherein the specific loss function is as follows:
reconstruction loss function:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein, XedgeAs content image, YimageFor the image example, G is the generator network.
Multitask discriminant loss function:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein G is a generator network and D is a discriminator network; eXFor true data distribution, logDc-1(Yc-1) The output of the discriminator when the real sample is input; eX,YTo generate the sample distribution, logDc-1(G(Xedge,Yc-1) When a sample is generated for input, the output of the discriminator; g (X)edge,Yc-1) Samples generated for a generator network; the subscript c-1 is a category.
Knowledge distillation loss function:
wherein,andand respectively outputting the activation values of the ith layers of the intermediate layers of the teacher generator network and the student generator network, wherein the teacher generator network and the student generator network are consistent with the generator network structure, N represents the total selected number of the intermediate layers, and N is 6.
And 4, inputting the sketch in the test data set and any natural image data into a trained natural image generation model to realize natural image generation based on the freehand sketch and image sample constraint, wherein the generation result is shown in FIG. 5.
In summary, the invention combines the advantages of using the hand-drawn sketch and the image sample as input, which can not only use the hand-drawn sketch to provide convenience for users and control the generation of image content, but also use the image sample to control the generation of styles such as image texture.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. The natural image generation method based on the hand-drawn sketch and the image sample constraint is characterized by comprising the following steps of:
acquiring an original natural image and category information, and constructing a training data set; wherein the training data set comprises content images and image samples;
constructing a natural image generation model, wherein the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating natural images in the training process, and the multi-task discriminator is used for judging whether the generated natural images are real or not in the training process and judging the category of the generated natural images;
training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and inputting the target image sample and the target hand-drawn sketch into the target model, and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
2. The method for generating a natural image based on a hand-drawn sketch and image example constraint according to claim 1, wherein the obtaining of the original natural image and the category information and the construction of the training data set comprise:
acquiring an edge map of the original natural image through an edge map extraction algorithm;
the edge graph and the corresponding natural image are paired to form a training data pair in a data set, and the edge graph and the random natural image are also paired to form a training data pair in the data set;
and constructing the training data set according to the two training data pairs, wherein the edge graph is a content image, and the natural image is an image sample.
3. The method for generating natural images based on sketching and image sample constraints as claimed in claim 1, wherein the constructing a natural image generation model comprises:
constructing a content encoder, wherein the content encoder comprises five convolution modules and two residual modules, and is used for extracting content characteristics in input data;
constructing a style encoder, wherein the style encoder is used for extracting style characteristics of an image sample in input data;
and constructing a content decoder, wherein the content decoder is used for acquiring affine transformation parameters according to the style characteristics and generating pictures according to the content characteristics and the affine transformation parameters.
4. The method of generating natural images based on sketching and image sample constraints as recited in claim 3,
the extraction formula of the content features is as follows:
Zcontent=Econtent(Xedge or sketch)
wherein Z iscontentIs a content feature; econtentBeing a content encoder, Xedge or sketchIs a sketch or hand drawing in the content image.
5. The method for generating natural images based on sketching and image sample constraints as claimed in claim 3, wherein said method further comprises a step of performing style migration using adaptive instance normalization, said step comprising:
processing the style characteristics through three full connection layers to obtain affine transformation parameters;
inputting the style characteristics into an AdaIN Resblock module of the content decoder according to the affine transformation parameters, and performing style migration by using self-adaptive example normalization through the AdaIN Resblock module;
wherein the calculation expression of the adaptive instance normalization is as follows:
wherein AdaIN (z)content,zreference) Represents the result of the adaptive instance normalization, zcontentRepresenting a content feature, zreferenceDenotes the style characteristics, μ denotes the mean, and σ denotes the variance.
6. The method according to claim 1, wherein in the step of training the natural image model by the training data set, adjusting parameters of the natural image model to obtain a trained target model,
the reconstruction loss function of the training process is:
Lrec(G)=||G(Xedge,Yimage)-Yimage||1
wherein L isrec(G) Represents a reconstruction loss function; xedgeRepresenting edge maps in the training data pairs; y isimageRepresenting natural images in the training data pair; g () represents the generator model, and during training, the generator inputs an edge graph and a natural image to generate a target picture;
the multi-tasking discriminant loss function is:
LGAN(D,G)=EX[-logDc-1(Yc-1)]+EX,Y[log(1-logDc-1(G(Xedge,Yc-1)))]
wherein L isGAN(D, G) represents a multitask discrimination loss, G represents a generator network, and D represents a discriminator network; eXRepresenting the true data distribution, logDc-1(Yc-1) Representing the output of the discriminator when the real sample is input; eX,YRepresenting the generated sample distribution, logDc-1(G(Xedge,Yc-1) Output of the discriminator when the representative input generates a sample; g (X)edge,Yc-1) Representing samples generated by a generator network; the subscript c-1 indicates the category.
The knowledge distillation loss function is:
wherein L isDistill(GS) Represents a loss of knowledge distillation; n represents the number of layers of the middle layer of the selected generator network;an output representing the activation value of the ith layer of the network middle layer of the teacher generator;output representing activation value of layer I of middle layer of student generator network。
7. A natural image generating apparatus based on hand-drawn sketches and image sample constraints, comprising:
the system comprises a first module, a second module and a third module, wherein the first module is used for acquiring an original natural image and category information and constructing a training data set; wherein the training data set comprises content images and image samples;
the system comprises a first module, a second module and a third module, wherein the first module is used for establishing a natural image generation model, the natural image generation model comprises a generator and a multi-task discriminator, the generator is used for generating a natural image in a training process, and the multi-task discriminator is used for judging whether the generated natural image is real or not in the training process and judging the category of the generated natural image;
the third module is used for training the natural image generation model through the training data set, and adjusting parameters of the natural image generation model to obtain a trained target model;
and the fourth module is used for inputting the target image sample and the target hand-drawn sketch into the target model and generating a natural image based on the target hand-drawn sketch and the image sample constraint.
8. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program realizes the method of any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 6 when executed by a processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111617371.0A CN114399668A (en) | 2021-12-27 | 2021-12-27 | Natural image generation method and device based on hand-drawn sketch and image sample constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111617371.0A CN114399668A (en) | 2021-12-27 | 2021-12-27 | Natural image generation method and device based on hand-drawn sketch and image sample constraint |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114399668A true CN114399668A (en) | 2022-04-26 |
Family
ID=81228403
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111617371.0A Pending CN114399668A (en) | 2021-12-27 | 2021-12-27 | Natural image generation method and device based on hand-drawn sketch and image sample constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114399668A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115358917A (en) * | 2022-07-14 | 2022-11-18 | 北京汉仪创新科技股份有限公司 | Method, device, medium and system for transferring non-aligned faces in hand-drawing style |
CN115496824A (en) * | 2022-09-27 | 2022-12-20 | 北京航空航天大学 | Multi-class object-level natural image generation method based on hand drawing |
CN116542321A (en) * | 2023-07-06 | 2023-08-04 | 中科南京人工智能创新研究院 | Image generation model compression and acceleration method and system based on diffusion model |
-
2021
- 2021-12-27 CN CN202111617371.0A patent/CN114399668A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115358917A (en) * | 2022-07-14 | 2022-11-18 | 北京汉仪创新科技股份有限公司 | Method, device, medium and system for transferring non-aligned faces in hand-drawing style |
CN115358917B (en) * | 2022-07-14 | 2024-05-07 | 北京汉仪创新科技股份有限公司 | Method, equipment, medium and system for migrating non-aligned faces of hand-painted styles |
CN115496824A (en) * | 2022-09-27 | 2022-12-20 | 北京航空航天大学 | Multi-class object-level natural image generation method based on hand drawing |
CN115496824B (en) * | 2022-09-27 | 2023-08-18 | 北京航空航天大学 | Multi-class object-level natural image generation method based on hand drawing |
CN116542321A (en) * | 2023-07-06 | 2023-08-04 | 中科南京人工智能创新研究院 | Image generation model compression and acceleration method and system based on diffusion model |
CN116542321B (en) * | 2023-07-06 | 2023-09-01 | 中科南京人工智能创新研究院 | Image generation model compression and acceleration method and system based on diffusion model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | On leveraging pretrained gans for generation with limited data | |
Gu et al. | A systematic survey of prompt engineering on vision-language foundation models | |
CN114399668A (en) | Natural image generation method and device based on hand-drawn sketch and image sample constraint | |
Zhang et al. | FFDNet: Toward a fast and flexible solution for CNN-based image denoising | |
CN113658051A (en) | Image defogging method and system based on cyclic generation countermeasure network | |
Zhang et al. | Accurate and fast image denoising via attention guided scaling | |
CN110222722A (en) | Interactive image stylization processing method, calculates equipment and storage medium at system | |
Kwak et al. | Generating images part by part with composite generative adversarial networks | |
CN109829959B (en) | Facial analysis-based expression editing method and device | |
CN112507947A (en) | Gesture recognition method, device, equipment and medium based on multi-mode fusion | |
CN110427946A (en) | A kind of file and picture binary coding method, device and calculate equipment | |
CN113066005A (en) | Image processing method and device, electronic equipment and readable storage medium | |
Du et al. | Boosting dermatoscopic lesion segmentation via diffusion models with visual and textual prompts | |
CN113989405B (en) | Image generation method based on small sample continuous learning | |
Ruta et al. | Hypernst: Hyper-networks for neural style transfer | |
CN113688882B (en) | Training method and device for continuous learning neural network model with enhanced memory | |
Matskevych et al. | From shallow to deep: exploiting feature-based classifiers for domain adaptation in semantic segmentation | |
CN115496651A (en) | Feature processing method and device, computer-readable storage medium and electronic equipment | |
Liu et al. | Towards Interactive Image Inpainting via Robust Sketch Refinement | |
CN114972611B (en) | Depth texture synthesis method based on guide matching loss and related equipment | |
CN113487475B (en) | Interactive image editing method, system, readable storage medium and electronic equipment | |
CN113284150B (en) | Industrial quality inspection method and industrial quality inspection device based on unpaired industrial data | |
CN115482557A (en) | Human body image generation method, system, device and storage medium | |
Vo et al. | Paired-D++ GAN for image manipulation with text | |
CN116704588B (en) | Face image replacing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |