CN114004772B

CN114004772B - Image processing method, and method, system and equipment for determining image synthesis model

Info

Publication number: CN114004772B
Application number: CN202111162107.2A
Authority: CN
Inventors: 赵帅帅
Original assignee: Alibaba China Network Technology Co Ltd
Current assignee: Alibaba China Network Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2024-10-18
Anticipated expiration: 2041-09-30
Also published as: CN114004772A

Abstract

The embodiment of the application provides an image processing method, a system and equipment for determining an image synthesis model. The image processing method comprises the following steps: responding to the operation of a user, and acquiring a first image and a second image; determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model; inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image; and displaying the synthesized image. The technical scheme provided by the embodiment of the application has the advantages of good image synthesizing effect, small time delay and good instantaneity.

Description

Image processing method, and method, system and equipment for determining image synthesis model

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, a system, and an apparatus for determining an image synthesis model.

Background

The merchant promotes the home clothing, and the effect of using the real person to show is certainly better and more visual than the effect of only showing the clothing. However, the types of clothes of the trade company are many, if each piece of clothes is shot by finding a model, the shooting cost is high, the clothes type is very fast to be replaced, the cost is increased by re-shooting after each replacement, and the shooting of the model is not performed conditionally for every family.

There are some techniques in the prior art, in which a merchant (or called a user) does not need to find a real model to shoot, and some automatic image generation means can be used to generate an image of a simulated shooting model trying on a corresponding garment. However, the images generated by the prior art have long generation period, or the model dressing effect is not real.

Disclosure of Invention

The present application provides an image processing method, a first image synthesis model determining method, a system and a device for solving the above problems or at least partially solving the above problems.

In one embodiment of the present application, an image processing method is provided. The method comprises the following steps:

responding to the operation of a user, and acquiring a first image and a second image;

Determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;

inputting the first image and the second image into the first image synthesis model after training, and outputting a synthesized image;

And displaying the synthesized image.

Responding to image input operation of a user, and acquiring a first display object image and a model image;

Inputting the first display object image and the model image into the first image synthesis model, and outputting a synthesized image of the model display first display object;

And displaying the synthesized image.

In yet another embodiment of the present application, a method of determining an image synthesis model for image processing is also provided. The method comprises the following steps:

Acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph;

Inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph to obtain a first output graph;

Inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation;

Optimizing the first image synthesis model based on the second output graph and the first sample graph;

the optimized first image synthesis model is used for processing two input images to obtain a synthesized image.

In yet another embodiment of the present application, an image processing system is also provided. The image processing system includes:

The data layer is used for interactively storing and acquiring data with the database; wherein, the database stores data which can be used as training samples;

The processing layer is provided with at least one second image synthesis model serving as a teacher model and at least one first image synthesis model serving as a student model and is used for generating training samples according to the data set acquired by the data layer; respectively training the at least one first image synthesis model by using the training sample and the at least one second image synthesis model to obtain at least one first image synthesis model which learns the image synthesis result of the corresponding second image synthesis model;

an application layer for receiving a first image and a second image input by a user;

the processing layer is further configured to synthesize the first image and the second image by using at least part of the at least one first image synthesis model to obtain a synthesized image;

The application layer is further configured to send the composite image to a client device corresponding to the user.

The server is used for respectively training at least one first image synthesis model serving as a student model by utilizing the training sample and at least one second image synthesis model serving as a teacher model to obtain at least one first image synthesis model of the image synthesis result of the corresponding second image synthesis model;

a client for locally deploying at least part of the at least one first image synthesis model; the method is also used for responding to the operation of a user to acquire a first image and a second image; determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model to output a synthesized image, and displaying the synthesized image.

In yet another embodiment of the present application, an electronic device is also provided. The electronic device includes a processor and a memory, wherein,

The memory is used for storing one or more computer instructions;

the processor, coupled to the memory, is configured to execute the at least one or more computer instructions to implement the steps of the method embodiments described above.

In yet another embodiment of the present application, a computer program product is also provided. The computer program product comprises a computer program or instructions which, when executed by a processor, cause the processor to carry out the steps of the method embodiments described above.

According to the technical scheme provided by the embodiment of the application, the knowledge distillation technology is used for training the first image synthesis model by using the second image synthesis model, so that the first image synthesis model has the accuracy and performance similar to those of the second image synthesis model. Thus, in the specific implementation, a model with good precision and performance can be selected as a second image synthesis model (in the technical field of knowledge distillation, the second image synthesis model can be called a teacher model), and the first image synthesis model is trained through a knowledge distillation process; although the initial first image synthesis model has low performance and precision, the initial first image synthesis model has high precision and high performance similar to the second image synthesis model through a knowledge distillation training process, so that the first image synthesis model is used for image synthesis, only two images are needed, and a good effect can be achieved; in addition, the model with high precision and good performance mostly has the problem of time extension, while the first image synthesis model in the embodiment has good precision and performance, the characteristic of high processing efficiency is more obvious when the images are required to be processed in batches due to small parameter quantity, small time delay and good instantaneity.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the drawings that are needed to be utilized in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application and that other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 2 shows a schematic diagram illustrating a technical solution provided by an embodiment of the present application from an application interface perspective;

Fig. 3 is a schematic diagram showing a principle of a spatial transformation network module mentioned in an image processing method according to an embodiment of the present application;

FIG. 4 shows a schematic diagram of a transformation of diagram A using a spatial transformation network module;

Fig. 5 is a schematic diagram illustrating the principle of a GAN module mentioned in an image processing method according to an embodiment of the present application;

Fig. 6 is a schematic diagram showing an image processing method according to an embodiment of the present application;

Fig. 7 is a schematic flow chart of an image processing method according to another embodiment of the present application;

FIG. 8 is a schematic diagram showing training of a first image synthesis model through a knowledge distillation process based on a second image synthesis model according to an embodiment of the present application;

FIG. 9 is a flow chart of a method for determining a first image synthesis model for image processing according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an image processing system according to an embodiment of the present application;

FIG. 11 is a schematic diagram of an image processing system provided from a system software architecture level according to an embodiment of the present application;

fig. 12 is a schematic diagram showing the structure of an image processing apparatus according to an embodiment of the present application;

Fig. 13 is a schematic structural view showing a determining apparatus of a first image synthesis model for image processing according to another embodiment of the present application;

Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

There is a solution in the prior art that uses three-dimensional image rendering techniques for generation. The scheme mainly comprises two steps: a virtual three-dimensional model is generated and then the garment is rendered onto the three-dimensional model. This solution requires a lot of three-dimensional information of the human body in the early stage, which requires a high cost. The period of the later use stage generation is longer, and real-time generation cannot be realized. Therefore, the application provides the following embodiments to solve the problems in the prior art, and can quickly generate a scheme capable of changing the display effect according to the change of the clothing style.

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings. In some of the flows described in the description of the application, the claims, and the figures described above, a number of operations occurring in a particular order are included, and the operations may be performed out of order or concurrently with respect to the order in which they occur. The sequence numbers of operations such as 101, 102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different modules, models, devices, etc., and do not represent a sequential order, and are not limited to "first" and "second" being different types. Furthermore, the embodiments described below are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Before describing the embodiments provided by the present application, a brief description is made of a scenario adapted to the technical solution provided by the embodiments of the present application. The scene for which the technical scheme provided by the embodiment of the application is applicable can be a model reloading scene, such as the scene shown in fig. 2. For example, the technical scheme provided by the embodiment of the application can provide the service of commodity image synthesis for merchants. For example, a merchant user may upload, via a client (e.g., a smart phone, desktop computer, tablet computer, etc.), a model image (e.g., a photograph of a previously taken model wearing apparel A) and a dress image (which may be a tile of a dress). The merchant user then clicks on the "composite" control on the interactive interface of the client device to obtain a composite image of the model wearing the dress. Because the first image synthesis model adopted by the synthetic image is obtained by training by using a knowledge distillation technology, the synthetic image has higher performance and precision; the composite image has a good effect. The merchant user not only wants to put the composite image on the e-commerce platform for promotion, but also wants to make an advertisement poster for downlink promotion. Because the first image synthesis model adopted by the synthesized image is obtained by training by using a high-resolution sample image, the synthesized image also has higher resolution, can meet the resolution requirement of the advertisement poster, and has good effect when the advertisement poster is manufactured.

Besides changing, the technical scheme provided by the embodiment of the application is also suitable for other types of commodities, such as shoes, bags (handbags, schoolbags, suitcases and the like), scarves, hats, gloves, belts, ornaments (such as bracelets, rings, necklaces, earrings, headwear and the like), watches and handheld electronic equipment (such as mobile phones, notebook computers, tablet computers and the like).

The training of the first image synthesis model for processing image synthesis in the application can be responsible for a server. For example, the server trains the first image synthesis model and then sends the trained first image synthesis model to the client, so that the client can locally synthesize two images input by the user of the client by using the first image synthesis model. Or training the first image synthesis model and performing image synthesis on the two images by using the first image synthesis model are performed by the server, the client sends the first image and the second image input by the user to the server, and then receives the synthesized image fed back by the server.

Fig. 1 is a flow chart illustrating an image processing method according to an embodiment of the application. The execution subject of the method provided in this embodiment may be a client, where the client may be a smart phone, a desktop computer, a notebook computer, a tablet computer, an intelligent wearable device, and the like, which is not limited in this embodiment. As shown in fig. 1, the method includes:

101. responding to the operation of a user, and acquiring a first image and a second image;

102. Determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;

103. Inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image;

104. and displaying the synthesized image.

Referring to the example shown in fig. 2, a user may input a first image and a second image through the interactive interface shown in fig. 2. Wherein the first image and the second image may be selected by a user from a gallery. The gallery may be stored locally (e.g. album) by the execution body of the method in this embodiment, or may be stored in other devices on the network side, which is not limited in this embodiment.

In 102, the second image synthesis model has higher accuracy and higher performance than the first image synthesis model. What needs to be explained here is: by the filing date of the present application, the performance and accuracy of models with large model magnitudes are generally considered in the art to be superior to models with small model magnitudes. The model magnitudes of the two models can be compared from the number of parameters in the model, the number of levels the model contains, the number of parameters the model inputs, and so on. The model with large model magnitude has large parameter quantity, more layers, more input parameters and the like. Therefore, the model magnitude of the second image synthesis model in this embodiment is larger than that of the first image synthesis model.

The first image synthesis model is obtained using knowledge distillation techniques. Knowledge distillation refers to the migration of parameters between two models in deep learning, or the migration of information learned by a large model to a small model. The large model and the small model are here a relative concept, two models in knowledge distillation, the one with the large number of parameters being understood as the large model and the one with the small number of parameters being the small model. Or two models in knowledge distillation, the one with large model magnitude (such as more levels, more parameters in the model and more input parameters) is a large model, and the one with small model magnitude (such as less levels, less parameters in the model and less input parameters) is a small model.

Knowledge distillation is to train a light small model by using the supervision information of a large model with better performance so as to achieve better performance and accuracy. This large model is called the teacher (teacher) model, and the small model is called the student (student) model.

In the embodiment of the application, the first image synthesis model is trained by using knowledge distillation based on the second image synthesis model. The knowledge distillation process is further described below in the context of the present application.

The knowledge distillation process may include several distillation modes: off-line distillation, semi-supervised distillation, self-supervised distillation, etc. Wherein, the off-line distillation process is: a teacher model is trained in advance, then when the student model is trained, the acquired teacher model is used for supervision training to achieve the purpose of distillation, the training precision of the teacher is higher than that of the student model, and the distillation effect is more obvious as the difference value is larger. Generally, the parameters of the teacher model remain unchanged in the distillation training process, so that the purpose of training the student model is achieved. The distilled loss function distillation loss calculates the difference of the output predicted value between the teacher model and the student model, and then adds the difference with the student loss (loss) to be used as the loss of the whole training, so as to perform gradient update, and finally obtain a student model with higher performance and precision. The semi-supervised distillation process is: and using the prediction information of the teacher model as a label to supervise and learn the student model. Before training the student model, part of unlabeled data is input, a label is output by using a teacher network as supervision information, and then the label is input into the student model to complete the distillation process, so that a data set with less labeling quantity can be used, and the purpose of improving the model precision is achieved. The self-monitoring distillation process is: the training of the student model itself does not require the advance training of a teacher model, but rather the training of the student model itself completes a distillation process. There are various self-monitoring distillation modes, and this is exemplified by, for example, training a student model first, and using a student trained previously as a monitoring model when the last few epochs of the entire training process (1 epoch represents all samples in a training set). In the remaining epoch, the model was distilled. The method has the advantages that training of the teacher model in advance is not needed, distillation can be performed while training, and training time of the whole distillation process is saved.

In this embodiment, a knowledge distillation technology is adopted, and the knowledge distillation can be performed on the high-precision second image synthesis model (e.g. the teacher model) to train the first image synthesis model (e.g. the student model), so as to improve the precision of the first image synthesis model. The thus trained first image synthesis model can be used to perform the synthesis calculations for both images. Compared with the second image synthesis model, the model magnitude of the first image synthesis model is small, and the time delay for processing the image by using the trained first image synthesis model is small; in addition, the knowledge distillation method can compress the input parameter quantity of the second image synthesis model, and the parameter quantity input by using the trained first image synthesis model (namely the first image synthesis model) is smaller than the parameter quantity input by using the second image synthesis model, so that the user can conveniently use the knowledge distillation method.

More prominent is: by adopting the technical scheme provided by the embodiment of the application, a user can obtain a good synthesis effect by simply providing two images; for a user, the requirement of inputting the image is low, and the operation is simple; in particular, for the merchant, the merchant user only needs to provide the pictures of new commodities and the pictures of the existing models, does not need to find the models for shooting, and has low cost, convenience, rapidness and high commodity shelf-up efficiency.

Further, the first image synthesis model in this embodiment includes: a spatial transformation network module and an image synthesis network module. The first image synthesis model may comprise a plurality of layers, the spatial transformation network module may be a part of the plurality of layers, and the image synthesis network module may be another part of the plurality of layers. The first image synthesis model may further include other layers, such as an input layer, a convolution layer, and an output layer, besides the layers corresponding to the spatial transformation network module and the image synthesis network module, which is not limited in this embodiment. Accordingly, the step 103 "inputting the first image and the second image into the first image synthesis model to output a synthesized image" in the method of the present embodiment may specifically include:

1031. inputting the first image and the second image into the space transformation network module, and outputting an intermediate image obtained by performing coordinate transformation on at least part of pixel points in the second image by referring to the first image;

1032. Inputting the intermediate image and the first image into the image synthesis network module, and outputting the synthesized image;

Wherein the image synthesis network module shares at least part of the parameters with the second image synthesis model. The parameters shared by the image synthesis network module and the second image synthesis model can be understood as: the second image synthesis model contains the same parameters as those in the image synthesis network module. Alternatively, the second image composition model has the same network layer as the image composition network module.

The spatial transformation network (Spatial Transformer Networks, STN) module is a module that provides spatial transformation functions. The STN can be embedded into a certain layer of the first image synthesis model as a special network module, so that the characteristics of supporting space transformation (simulation transformation, projection transformation) and the like, adding rotation invariance, translation invariance and the like to the first image synthesis model are realized. For example, the STN module may perform a pose transformation on an object in an image input to the first image synthesis model. Referring to fig. 3, the principle structure of the STN module may include: regression network, network generator and sampler. Wherein:

Regression network (Localisation Network): the input raw image U is subjected to several convolution operations and then fully connected to regress 6 angle values (assuming affine transformations), such as a matrix of 2*3.

Network generator (Grid generator): and the coordinate position in the target image V is responsible for calculating the coordinate position in the corresponding image U of each position in the target image V through matrix operation, namely generating T (G).

Sampler (Sampler): sampling is performed in the original image U based on the coordinate information in T (G), and pixels in the original image U are copied to the target image V.

Such as the two diagrams shown in fig. 4, the diagrams a and B input an STN module that transforms the clothing in diagram B based on the pose of the model in diagram B to obtain diagram C. Indeed, in addition to STN, GMM (Gemometric Matching Module, geometric matching module) may be employed. GMM is an end-to-end neural network trained with pixel-wise L1 loss (a loss function that calculates the loss between pixels of a predicted image and a target image), and may also be used to align an input garment C with a character representation P (e.g., a pose key map, body shape mask, artwork retention area, etc.) and to generate an image of the garment C that is appropriate for the character's pose, body shape, etc. For more details on GMM, see the relevant literature and will not be described in detail herein.

In particular, the image composition Network module in this embodiment may be implemented using a Generation Antagonism Network (GAN). GAN aims to generate data by means of the opponent game between the neural networks. The GAN has strong image generation capability, so that the GAN has wide application in aspects of image synthesis, image patching, super-resolution, draft restoration and the like. As shown in fig. 5, the GAN model is composed of two parts, a generator and a discriminator. As can be seen from fig. 5, the input information is passed through a generator to obtain a generated image (fake) which is input as part of a decision by a decision maker. And the other part of the input of the discriminator is from the real image (real), so that the discriminator discriminates the authenticity from the two inputs. During training, game relationships exist between the discriminators and the generators, their performance being significantly enhanced during the continuous training process.

In one possible technical solution, the first image synthesis model in this embodiment may be obtained by the following steps.

Specifically, as shown in connection with fig. 6, the determining process of the first image synthesis model may include:

1021. And acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample.

The training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph.

1022. And inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph, so as to obtain a first output graph.

1023. And inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation.

1024. And optimizing the first image synthesis model based on the second output graph and the first sample graph.

The technical scheme provided by the embodiment can be applied to image synthesis of a plurality of scenes, such as synthesizing the face in the image a to the face of the person in the image b; for example, the clothing in figure c is combined with the person in image D; for example, the watch in the image d is synthesized on the wrist of the person in the image e; etc., the present embodiment is not limited thereto.

The figures in the training samples described above are described below in connection with a specific example. For training a first image synthesis model suitable for processing a model reloading scene, the training samples selected at this time comprise: the model displays a first sample graph corresponding to the second display, at least one first characteristic graph associated with the first sample graph, a second sample graph corresponding to the third display, at least one second characteristic graph associated with the second sample graph, and a third sample graph corresponding to the third display and a fourth sample graph corresponding to the second display.

It will be readily appreciated from the above examples that "the first sample map and the second sample map are synthesized to obtain a synthesized map related to the map content of the third sample map; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph. Here "diagram content related" can be understood as: the content of the synthesized graph of the first sample graph and the second sample graph is the same as, similar to, or similar to the content of the third sample graph, which is not limited in this embodiment.

The at least one first feature map mentioned above may include, but is not limited to, at least one of the following: and at least one segmentation method is adopted to segment the first sample graph to obtain at least one segmentation graph, and the characteristic point distribution graph of the first sample graph is obtained. The at least one second feature map includes, but is not limited to, at least one of: and at least one segmentation map obtained by segmenting the second sample map by adopting at least one segmentation method.

For example, the first sample graph is a model graph, and the corresponding at least one segmentation graph of the model graph can adopt, but is not limited to, at least one of the following methods for human body detection and segmentation: the present embodiment is not limited to the template-based method, the model-based method, the parallel line-based method, the edge contour-based method, the image blocking-based method, and the like. The feature point distribution map of the first sample map may be a human posture key point map. The human body key point diagram can be obtained by detecting the characters (namely models) in the first sample diagram by adopting a human body key point detection method. Among them, there are various human body key point detection methods, and the human body detection method and the above-mentioned various segmentation methods are not specifically described herein, and may be referred to other documents. In practical implementation, densePose (human body posture recognition system) may be selected in this embodiment to recognize a human body posture in the image, so as to generate human body posture information (which may be the human body posture key point diagram mentioned above) in the first sample map. To obtain a more accurate pose estimation, densePose proposes a dense human pose estimation method that maps each pixel to a dense pose point. Clothing region distortion and pose alignment are performed using the estimated dense poses, which provides more rich pose details for pose-directed synthesis. Densepose parsing contains body segmentation and grid coordinates, which provide more information that can be used for realistic gesture-guided synthesis. Densepose parsing and network coordinates provide dense pseudo-three-dimensional information that can be used to represent gesture details.

For another example, the second sample graph is a clothing graph, and the corresponding clothing corresponds to at least one segmentation graph, and clothing segmentation can be performed by at least one of the following methods, but not limited to: edge contour based methods, image blocking based methods, and the like.

Further, as shown in fig. 8, the second module to be trained in this embodiment includes: a spatial transformation network module and an image synthesis network module. In particular, the spatial transformation network module may be the STN module, GMM module, etc. mentioned above; the image composition network module may be a GAN module. Correspondingly, step 1023 "inputs the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputs a second output graph for knowledge distillation", specifically:

S1, inputting the third sample graph and the fourth sample graph into the space transformation network module, and outputting a third output graph obtained by carrying out coordinate transformation on at least part of pixel points in the fourth sample graph based on the third sample graph;

s2, inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.

The following will bring more specific application scenarios to help understand the relationships between the above-mentioned sample graphs, and a knowledge distillation method for training the first image synthesis model provided in this embodiment. In particular, the examples described below are detailed.

When the scheme provided by the embodiment of the application is practically applied, the resolution of the images in the existing data set is found to be relatively low, for example, only 256×192. The low-resolution image is used as a training sample, and the resolution of the synthesized image output by the trained first image synthesis model is low. The low resolution composite image has poor effect if it is desired to display the advertisement (e.g. to make showcase advertisement poster or other display). Or the merchant user displays the composite image on the merchant electronic commerce page as the commodity image, and when the purchaser wants to enlarge and see the clothing details, the composite image cannot be clearly displayed due to low resolution. Therefore, according to the method provided by the embodiment of the application, the resolution can be improved by adopting the following steps in both training the second image synthesis model and training the first image synthesis model by using the trained second image synthesis model through knowledge distillation. That is, the method provided in this embodiment may further include the following steps:

105. Acquiring a data set;

106. Determining an atlas that can be used as a training sample based on the dataset;

107. Processing the images in the image set to improve the resolution of the images in the image set; the processed image is used as a sample graph in a training sample;

108. and analyzing the processed images in the atlas to obtain at least one feature image corresponding to the images.

The data set in 105 above may be obtained from some common data set disclosed on the network. A public dataset is a dataset that a holder with the dataset discloses for public reading use. The selection of the common data set in this embodiment is not particularly limited.

In the specific implementation, the above 106 may be selected according to the actual image composition requirement, and the atlas is suitable as a training sample. For example, in the present embodiment, the first image synthesis model is used to synthesize the model image and the clothing image to obtain the image of the clothing in the clothing image, and then those model images, clothing images, post-replacement images, and the like in the data set are suitable as training samples. For another example, in the present embodiment, the first image synthesis model is used to synthesize two person images to obtain a synthesized image with a face changed, and then those person images in the dataset are suitable as training samples. For another example, in the present embodiment, the first image synthesis model is used to synthesize the ornament image and the person image to obtain an image of wearing the ornament, and then those ornament images, person images, etc. in the data set are suitable as training samples.

In the embodiment 107, the image super-resolution reconstruction method can be used to improve the image quality in the data set. The detail information of the image is reconstructed from the low-resolution image by a software processing method, so that a higher-quality super-resolution image is obtained.

The resolution of the image in the above steps can be improved by the following means:

1. The super-resolution reconstruction method based on interpolation is to estimate the value of the current pixel point through an algorithm according to the pixel values of a plurality of known positions at the adjacent position of the current pixel point, so as to obtain an image with higher resolution. The interpolation-based algorithm is simple, the calculation complexity is low, and the application range is very wide.

2. Reconstruction-based super-resolution reconstruction methods are based on an assumption that more missing original detail features can be captured and estimated from the low resolution image. Many super-resolution reconstruction models represented by the spatial domain method and the frequency domain method are developed based on this assumption.

3. The super-resolution reconstruction method based on learning is to collect and establish a learning image material database, accumulate a certain priori knowledge by performing a large amount of model training and image reconstruction by an algorithm, and better capture and recover detailed information of an original image by adjusting a certain parameter setting so as to improve the reconstruction effect. For example, the more classical is based on Super-Resolution A GENERATIVE ADVERSARIAL Network (SRGAN) algorithms that generate an antagonistic Network. SRGAN the main idea is: the generator generates the high resolution image to approximate the effect of the real image through training and learning, and makes it difficult for the arbiter to determine whether the input high resolution image is from the original real high resolution image or generates the generated high resolution image as much as possible. When the discriminator cannot discriminate the authenticity of the image, it is explained that the generator network generates a high-quality high-resolution image.

For example, an image with a resolution of 256×192 acquired from an existing common dataset may be reconstructed into an image with a resolution of 512×384 by any of the methods described above. The image with the resolution of 512 x 384 is used as the sample image in the training sample, and the high-resolution sample images are used for training the model, so that the resolution of the model output image can be improved, and the effect is better.

Of course, in the specific embodiment, if there are high-resolution images, these high-resolution images may be directly selected as the sample image, so that the resolution of the image does not need to be improved by the above method.

The analysis of the processed image in the atlas 108 may include the above-mentioned segmentation of the image, detection of key points in the image, etc. to obtain at least one feature map corresponding to the image, for example: at least one segmentation map obtained by segmenting the image by at least one segmentation method, a key point map obtained by detecting the image by a key point detection method, and the like.

Further, the method provided in this embodiment may further include the following steps:

109. In response to feedback of a user for the composite image, re-determining a first image composite model to calculate a composite image of the first image and the second image using the re-determined first image composite model if the feedback composite effect is not satisfactory; the redetermined first image synthesis model is as follows: and training the first image synthesis model through a knowledge distillation process based on the third model.

In a specific implementation, the model magnitude of the third model may be greater than the model magnitude of the second image synthesis model. Alternatively, the third model is of the same model order as the second image synthesis model, but belongs to a different model (e.g., hierarchy, order different).

Or the first image composition model for image processing in the present embodiment may be more than one. For example, different teacher models (such as the second image synthesis model and the third model mentioned above) are used in advance, and knowledge distillation is used to train the second image synthesis model to obtain different first image synthesis models. For another example, different teacher models are used in advance to train different student models through knowledge distillation to obtain different first image synthesis models. For example, there are model a, model B in knowledge distillation that can be used as a teacher model; the knowledge distillation can be used as a model a and a model b of a student model. Training the model a through knowledge distillation by using the model A to obtain a first image synthesis model z; training the model B through knowledge distillation by using the model B to obtain a first image synthesis model x. And when training each student model through knowledge distillation, different training sample sets can be used so as to be suitable for corresponding scene requirements. In this way, the user may reselect one of the first image synthesis models to regenerate the desired synthesized image when the synthesized image effect is not satisfied. That is, the step of "determining the first image synthesis model" in the method provided in the present embodiment may specifically be:

selecting one model from a plurality of candidate models as the first image synthesis model in response to a model selection operation by a user; wherein the plurality of candidate models includes at least one of: the method comprises the steps of training a first image synthesis model through a knowledge distillation process based on teacher models with different model orders to obtain a model, and training different student models through a knowledge distillation process based on teacher models with different model orders to obtain a model.

The technical scheme provided by the application is described below in connection with specific application scenarios, such as a scenario of model display replacement. As shown in fig. 7, another embodiment of the present application provides a flowchart of an image processing method. The execution subject of the method provided in this embodiment may be a client, where the client may be a smart phone, a desktop computer, a notebook computer, a tablet computer, an intelligent wearable device, and the like, which is not limited in this embodiment. Specifically, the method comprises the following steps:

201. Responding to image input operation of a user, and acquiring a first display object image and a model image;

202. determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model;

203. inputting the first display object image and the model image into the first image synthesis model, and outputting a synthesized image of the model display first display object;

204. And displaying the synthesized image.

The display in this embodiment may be apparel, accessories, electronics, bags (e.g., handbags, suitcases, backpacks, etc.), shoes, hats, scarves, gloves, etc. The ornaments may be necklaces, watches, rings, headwear, earrings, etc., which are not limited in this embodiment.

As shown in the example of FIG. 2, a user may input a first image (e.g., a model image) and a second image (e.g., a white coat image) via an interactive interface. The user clicks the "composition" control on the interactive interface to see the composite image. For the main body (such as a mobile phone, a computer and the like) for executing the method of the embodiment, after a user inputs a first image and a second image, a first image synthesis model is determined; the first image and the second image are then processed using the first image composition model to obtain a composite image (e.g., a composite image of a model wearing a white coat).

For the model display replacement scenario corresponding to the present embodiment, as shown in fig. 8, the method in the present embodiment may determine the first image synthesis model by using the following steps:

2021. And acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample.

The training samples comprise a first sample diagram corresponding to a second display object displayed by the model, at least one first characteristic diagram associated with the first sample diagram, a second sample diagram corresponding to a third display object, at least one second characteristic diagram associated with the second sample diagram, a third sample diagram corresponding to the third display object displayed by the model and a fourth sample diagram corresponding to the second display object.

2022. Inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to obtain a first output graph corresponding to the third display object displayed by the model;

2023. inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph which is used for knowledge distillation and corresponds to the second display object displayed by the model;

2024. and optimizing the first image synthesis model based on the second output graph and the first sample graph.

The at least one first feature map may include, but is not limited to, at least one of: and at least one segmentation map obtained by segmenting the model in the first sample map by adopting at least one segmentation method is adopted, wherein the characteristic point distribution map of the model in the first sample map is obtained. The at least one second feature map may include, but is not limited to, at least one of: and at least one segmentation map obtained by segmenting the third display object in the second sample map by adopting at least one segmentation method.

Fig. 8 shows a training process for a first image synthesis model suitable for use in image processing in a model retrofit scenario. The first image composition model includes a spatial transformation network module and an image composition network module. As in the example shown in fig. 8, the at least one first feature map corresponding to the first sample map may include: two segmentation graphs obtained by segmenting the human body of the model in the first sample graph by adopting two segmentation methods and a key point diagram representing the gesture of the model. The second image synthesis model above in fig. 8 has more levels, and has been trained to function as a teacher model in the knowledge distillation process. And inputting the first sample graph, the two model human body segmentation graphs and the model posture key point diagram into a second image synthesis model to obtain a first output graph. The third sample graph and the second sample graph are input into a first image synthesis model, and a space transformation network module of the first image synthesis model executes to obtain a third output graph (namely, a clothing graph after deformation which is suitable for the model posture). The third output image and the first output image output by the second image synthesis model are used as inputs of an image synthesis network module of the first image synthesis model, and the image synthesis network module executes the second output image. Finally, a knowledge distillation loss (loss) is calculated based on the second output graph and the first sample graph to optimize the first image synthesis model based on the knowledge distillation loss (loss). More particularly, parameters of the image composition network module are optimized according to knowledge distillation loss (loss).

The second image synthesis model and the image synthesis network module share some parameters, so that in the optimization process, if all the parameters shared by the second image synthesis model are optimized, the parameters in the second image synthesis model are optimized accordingly.

The reason for this is that this embodiment is: the second image synthesis model is difficult to generate a synthesis image with good effect in one step, then the first output image output by the second image synthesis model is used as the input of the first image synthesis model, and then the first sample image in the training sample is used as a label to calculate loss so as to optimize the first image synthesis model, so that the first image synthesis model can be well studied.

What needs to be explained here is: how to calculate the knowledge distillation loss, how to use the loss optimization model parameters, and the like, and the embodiment is not limited. The present embodiment is not particularly limited as such, as to the selection or design (hierarchical structure) of the second image synthesis model and the first image synthesis model (the spatial transformation network module and the image synthesis network module). The embodiment of the application focuses on the innovation of knowledge distillation, namely, a scheme that the output of a second image synthesis model (namely a teacher model) is used as the input of a first image synthesis model for training; and innovations that apply knowledge distillation techniques to image processing (e.g., image synthesis) scenarios.

The above process profile is: firstly, analyzing the pose of a model, and then, utilizing a space transformation network module to deform a display object image (such as a tiled clothing image) so as to adapt to the pose of the model in the model image; and then synthesizing the deformed display image and the model image to obtain a figure of the model for displaying the objects in the display image. According to the technical scheme provided by the embodiments of the application, knowledge distillation is technically used, so that on one hand, the input content is greatly reduced, and the trained first image synthesis model can be used for image synthesis only by inputting model diagrams and tiling diagrams, so that the application flow is greatly simplified; the two aspects further promote the effect of the generation.

Fig. 9 is a flowchart of a method for determining a first image synthesis model for image processing according to an embodiment of the present application. As shown in fig. 9, the method includes:

401. Acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph;

402. Inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph to obtain a first output graph;

403. Inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation;

404. optimizing the first image synthesis model based on the second output graph and the first sample graph;

For the contents 401 to 402, reference may be made to the corresponding contents above, and details are not repeated here.

Further, the first image synthesis model includes: a spatial transformation network module and an image synthesis network module. Accordingly, the step 403 "inputting the first output graph, the third sample graph, and the fourth sample graph into the first image synthesis model, and outputting the second output graph for knowledge distillation" may include:

4031. Inputting the third sample graph and the fourth sample graph into the space transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph by referring to the third sample graph;

4032. And inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.

Similarly, for the contents 4031 to 4032, reference should be made to the corresponding contents above, and details are not repeated here.

Because the technical scheme provided by the embodiments of the application adopts the knowledge distillation technology, the lightweight first image synthesis model can also have better performance. Therefore, in practical application, a teacher model used in the knowledge distillation process can be trained on the server side, and the teacher model has better performance after being trained. Training the learning model through a knowledge distillation process based on the trained teacher model. The student model for training is deployed at the client side due to light magnitude, and has faster image processing speed and quick response. I.e. the server is responsible for training the first image composition model. The trained student model is deployed to the client. The user can directly use the first image synthesis model locally stored by the client, so that the image processing can be completed simply, conveniently and quickly.

That is, an embodiment of the present application provides an image processing system. As shown in fig. 2, the image processing system includes: server 302 and client 301. The server 302 is configured to respectively train at least one first image synthesis model serving as a student model by using a training sample and at least one second image synthesis model serving as a teacher model, so as to obtain at least one first image synthesis model that learns an image synthesis result of the corresponding second image synthesis model. A client 301 for locally deploying at least part of the at least one first image synthesis model; the method is also used for responding to the operation of a user to acquire a first image and a second image; determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model to output a synthesized image, and displaying the synthesized image.

As shown in fig. 6, the server-trained first image composition model may be deployed at the client. The client may be, but is not limited to: desktop computers, notebook computers, cell phones, tablet computers, and the like. The user may obtain and load the first image composition model from the server via a client request. Or the server side transmits the first image synthesis model to the client side, and the client side receives the first image synthesis model and then automatically loads the first image synthesis model in the local or updates the locally existing first image synthesis model.

Or an image processing system as shown in fig. 10. The system implementation scheme provided by the other embodiment of the present application, namely, the server 302, is configured to respectively train at least one first image synthesis model serving as a student model by using a training sample and at least one second image synthesis model serving as a teacher model, so as to obtain at least one first image synthesis model from which an image synthesis result of the corresponding second image synthesis model is learned; the method comprises the steps of receiving a first image and a second image sent by a client, determining a first image synthesis model, inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image; the composite image is then fed back to the client 301. The client 301 is configured to send a first image and a second image input by a user to the server 302 in response to an operation of the user; and receiving and displaying the synthesized image fed back by the server 302.

The technical scheme provided by the application can also be realized by adopting a system architecture as shown in fig. 11. The system architecture shown in fig. 11 corresponds to a system that may be deployed on a single server, a server cluster, a virtual server or cloud deployed on a server, or the like on the server side. The system corresponding to the system architecture shown in fig. 11 may also be deployed on a client-side computer. As shown in fig. 11, the image processing system includes: a data layer, a processing layer and an application layer. The data layer is used for interactively storing and acquiring data with the database; wherein the database stores data that can be used as training samples. The processing layer is provided with at least one second image synthesis model serving as a teacher model and at least one first image synthesis model serving as a student model and is used for generating training samples according to the data set acquired by the data layer; and respectively training the at least one first image synthesis model by using the training sample and the at least one second image synthesis model to obtain at least one first image synthesis model of which the image synthesis result of the corresponding second image synthesis model is learned. The second image synthesis model, the second model, the third model, and the … … nth model shown in fig. 11 may be used as a teacher model in one knowledge distillation training task or a student model in the next knowledge distillation training task. And the application layer is used for receiving the first image and the second image input by the user. The processing layer is further configured to synthesize the first image and the second image by using at least part of the at least one first image synthesis model to obtain a synthesized image. For example, one first image synthesis model is selected at a time to obtain one synthesized image, and a plurality of first image synthesis models can be selected at a time to obtain a plurality of synthesized images, so that a user can conveniently select one of the most satisfactory synthesized images. The participation in image processing using several first image synthesis models in an implementation may be determined by a user. The application layer is further configured to send the composite image to a client device corresponding to the user.

The technical schemes provided by the embodiments of the present application are described below with reference to specific application scenarios. The technical scheme provided by the embodiment of the application can provide the service of commodity image synthesis for merchants. For example, a merchant user may upload model images (e.g., previously taken photographs of the model wearing apparel A) and dress images (which may be a tiled view of the dress) via the client. The merchant user then clicks the "composite" control to obtain a composite image of the model wearing the dress. Because the first image synthesis model adopted by the synthetic image is obtained by training by using a knowledge distillation technology, the synthetic image has higher performance and precision; the composite image has a good effect. The merchant user not only wants to put the composite image on the e-commerce platform for promotion, but also wants to make an advertisement poster for downlink promotion. Because the first image synthesis model adopted by the synthesized image is obtained by training by using a high-resolution sample image, the synthesized image also has higher resolution, can meet the resolution requirement of the advertisement poster, and has good effect when the advertisement poster is manufactured.

In fact, based on the technical scheme provided by the embodiments of the application, the service of batch processing image synthesis can be provided for merchants. For example, 10 new products are taken in season, and 3 products are suitable for the air quality and posture of model x; there are 4 pieces of gas and posture suitable for model y; there are 3 pieces of gas and posture that fit model z. In this case, the user may upload the model image of model x and the image of 3 pieces of clothing at one time, and then click on the "batch processing" control, and the back end or the server end device of the client device sequentially processes the model image of model x and the image of 3 pieces of clothing, so as to obtain a composite image that model x wears the 3 pieces of clothing respectively. Similarly, other pieces can be processed in batches, so that a user does not need to operate for many times, the operation is simplified, and the efficiency is improved.

Therefore, the technical scheme provided by the embodiments of the application saves the shooting cost of clothing display, is more quickly suitable for clothing style change, and can change the display effect in real time according to the style change.

The description above only has been made in connection with a model reloading scenario, and the technical solution provided by the embodiment of the present application may also be applied to other scenarios, so as to be used by users with different requirements, which are not exemplified herein.

Fig. 12 is a schematic diagram showing the structure of an image processing apparatus according to an embodiment of the present application. As shown in fig. 12, the apparatus includes: the device comprises an acquisition module 11, a determination module 12, a calculation module 13 and a display module 14. The acquiring module 11 is configured to acquire a first image and a second image in response to a user operation. The determining module 12 is configured to determine a first image synthesis model, where the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model. The computing module 13 is configured to input the first image and the second image into the first image synthesis model, and output a synthesized image. The display module 14 is configured to display the composite image.

Further, the first computing module 13 includes: a spatial transformation network module and an image synthesis network module. Correspondingly, when the first image and the second image are input into the first image synthesis model to output a synthesized image, the computing module is specifically configured to:

Inputting the first image and the second image into the space transformation network module, and outputting an intermediate image obtained by performing coordinate transformation on at least part of pixel points in the second image by referring to the first image; inputting the intermediate image and the first image into the image synthesis network module, and outputting the synthesized image; wherein the image synthesis network module shares at least part of the parameters with the second image synthesis model.

Further, the determining module 12 is specifically configured to, when determining the first image synthesis model:

And optimizing the first image synthesis model based on the second output graph and the first sample graph.

Still further, the at least one first feature map of the above includes at least one of: and at least one segmentation method is adopted to segment the first sample graph to obtain at least one segmentation graph, and the characteristic point distribution graph of the first sample graph is obtained. The at least one second feature map includes at least one of: and at least one segmentation map obtained by segmenting the second sample map by adopting at least one segmentation method.

Further, the first image synthesis model includes, but is not limited to: a spatial transformation network module and an image synthesis network module. Correspondingly, when the first output graph, the third sample graph and the fourth sample graph are input into the first image synthesis model, the determining module 12 is specifically configured to:

Inputting the third sample graph and the fourth sample graph into the space transformation network module, and outputting a third output graph obtained by carrying out coordinate transformation on at least part of pixel points in the fourth sample graph based on the third sample graph;

And inputting the first output diagram and the third output diagram into the image synthesis network module to obtain a second output diagram for knowledge distillation.

Further, the device provided in this embodiment further includes a data preparation module. The data preparation module is used for acquiring a data set; determining an atlas that can be used as a training sample based on the dataset; processing the images in the image set to improve the resolution of the images in the image set; the processed image is used as a sample graph in a training sample; and analyzing the processed images in the atlas to obtain at least one feature image corresponding to the images.

What needs to be explained here is: the image processing device provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principles of the foregoing modules or units may refer to corresponding contents in the foregoing method embodiments, which are not repeated herein.

Another embodiment of the present application provides an image processing apparatus having a structure similar to that shown in fig. 12. The image processing apparatus includes: the device comprises an acquisition module, a determination module, a calculation module and a display module. The acquisition module is used for responding to the image input operation of a user to acquire a first display object image and a model image. The determining module is used for determining a first image synthesis model, wherein the first image synthesis model learns an image synthesis result of a second image synthesis model through a training process, and the second image synthesis model is a pre-trained model. The computing module is used for inputting the first display object image and the model image into the first image synthesis model and outputting a synthesized image of the model display first display object. The display module is used for displaying the synthesized image.

Further, the determining module is specifically configured to, when determining the first image synthesis model:

Acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample image corresponding to a second display object displayed by the model, at least one first characteristic image associated with the first sample image, a second sample image corresponding to a third display object, at least one second characteristic image associated with the second sample image, a third sample image corresponding to the third display object displayed by the model, and a fourth sample image corresponding to the second display object;

Inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to obtain a first output graph corresponding to the third display object displayed by the model;

Inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph which is used for knowledge distillation and corresponds to the second display object displayed by the model;

Further, the at least one first feature map includes at least one of: and at least one segmentation map obtained by segmenting the model in the first sample map by adopting at least one segmentation method is adopted, wherein the characteristic point distribution map of the model in the first sample map is obtained. The at least one second feature map includes at least one of: and at least one segmentation map obtained by segmenting the third display object in the second sample map by adopting at least one segmentation method.

Fig. 13 is a schematic diagram showing a configuration of a determination apparatus of a first image synthesis model for image processing according to still another embodiment of the present application. As shown, the determining means includes: the system comprises an acquisition module 21, a first calculation module 22, a second calculation module 23 and an optimization module 24. The acquiring module 21 is configured to acquire a pre-trained second image synthesis model, a first image synthesis model to be trained, and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph. The first calculation module 22 is configured to input the first sample graph, the at least one first feature graph, the second sample graph, and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph, so as to obtain a first output graph. The second calculation module 23 is configured to input the first output graph, the third sample graph, and the fourth sample graph into the first image synthesis model, and output a second output graph for knowledge distillation. The optimizing module 24 is configured to optimize the first image synthesis model according to the second output graph and the first sample graph. The optimized first image synthesis model is used for processing two input images to obtain a synthesized image.

Further, the first image synthesis model includes: a spatial transformation network module and an image synthesis network module. Correspondingly, the second calculation module 23 is specifically configured to, when inputting the first output map, the third sample map, and the fourth sample map into the first image synthesis model, output a second output map for knowledge distillation:

Inputting the third sample graph and the fourth sample graph into the space transformation network module, and outputting a third output graph obtained by performing coordinate transformation on at least part of pixel points in the fourth sample graph by referring to the third sample graph;

What needs to be explained here is: the determining device for the first image synthesis model for image processing provided in the foregoing embodiment may implement the technical solutions described in the foregoing corresponding method embodiments, and the specific implementation principles of the foregoing modules or units may refer to corresponding contents in the foregoing corresponding method embodiments, which are not repeated herein.

Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device comprises a processor 31 and a memory 33. Wherein the memory 33 is configured to store one or more computer instructions; the processor 31 is coupled to the memory 33 for the at least one or more computer instructions (e.g., computer instructions implementing data storage logic) for implementing:

Inputting the first image and the second image into the first image synthesis model, and outputting a synthesized image;

And displaying the synthesized image.

What needs to be explained here is: the processor may implement, in addition to the above steps, other method steps provided in the above data processing method embodiments, and specifically, reference may be made to the details in the above embodiments, which are not described herein. The memory 33 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

Further, as shown in fig. 14, the electronic device further includes: communication component 35, power component 32, display 34, and other components. Only some of the components are schematically shown in fig. 14, which does not mean that the electronic device only comprises the components shown in fig. 14.

Another embodiment of the present application provides an electronic device, a schematic structural diagram of which is shown in fig. 14 described above. Specifically, the electronic device includes a processor and a memory. Wherein the memory is configured to store one or more computer instructions; the processor, coupled with the memory, is configured to execute the at least one or more computer instructions to implement:

And displaying the synthesized image.

What needs to be explained here is: the processor may implement, in addition to the above steps, other method steps provided in the above data processing method embodiments, and specifically, reference may be made to the details in the above embodiments, which are not described herein.

Still another embodiment of the present application provides an electronic apparatus, the schematic structural diagram of which is shown in fig. 14 described above. Specifically, the electronic device includes a processor and a memory. Wherein the memory is configured to store one or more computer instructions; the processor, coupled with the memory, is configured to execute the at least one or more computer instructions to implement:

Performing knowledge distillation with the second output graph and the first sample graph to optimize the first image synthesis model;

the optimized first image synthesis model is used as the first image synthesis model, and the first image synthesis model is used for processing two input images to obtain a synthesized image.

Yet another embodiment of the application provides a computer program product (not shown in the drawings of the specification). The computer program product comprises a computer program or instructions which, when executed by a processor, cause the processor to carry out the steps of the method embodiments described above.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a computer, is capable of implementing the method steps or functions provided by the above embodiments.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An image processing method, comprising:

Inputting the first image and the second image into a first image synthesis model which is trained, and outputting a synthesized image; the first image synthesis model is obtained by the following steps: acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph; inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph to obtain a first output graph; inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation; optimizing the first image synthesis model based on the second output graph and the first sample graph;

And displaying the synthesized image.

2. The method of claim 1, wherein the first image synthesis model comprises: spatial transformation network module and image synthesis network module, and

Inputting the first image and the second image into the first image synthesis model to output a synthesized image, comprising:

Inputting the first image and the second image into the space transformation network module, and outputting an intermediate image obtained by performing coordinate transformation on at least part of pixel points in the second image by referring to the first image;

inputting the intermediate image and the first image into the image synthesis network module, and outputting the synthesized image;

Wherein the image synthesis network module shares at least part of the parameters with the second image synthesis model.

3. The method of claim 1, wherein the at least one first feature map comprises at least one of: at least one segmentation method is adopted to segment the first sample graph to obtain at least one segmentation graph, and the characteristic point distribution graph of the first sample graph is obtained;

the at least one second feature map includes at least one of: and at least one segmentation map obtained by segmenting the second sample map by adopting at least one segmentation method.

4. The method of claim 1, wherein the first image synthesis model comprises: spatial transformation network module and image synthesis network module, and

Inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation, wherein the method comprises the following steps:

5. The method as recited in claim 1, further comprising:

Acquiring a data set;

determining an atlas that can be used as a training sample based on the dataset;

processing the images in the image set to improve the resolution of the images in the image set; the processed image is used as a sample graph in a training sample;

and analyzing the processed images in the atlas to obtain at least one feature image corresponding to the images.

6. An image processing method, comprising:

Inputting the first display object image and the model image into a first image synthesis model, and outputting a synthesized image of the model display first display object; the first image synthesis model is obtained by the following steps: acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph; inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph to obtain a first output graph; inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation; optimizing the first image synthesis model based on the second output graph and the first sample graph;

And displaying the synthesized image.

7. The method of claim 6, wherein the at least one first feature map comprises at least one of: at least one segmentation map obtained by segmenting the model in the first sample map by adopting at least one segmentation method, wherein the characteristic point distribution map of the model in the first sample map;

the at least one second feature map includes at least one of: and at least one segmentation map obtained by segmenting the third display object in the second sample map by adopting at least one segmentation method.

8. A method of determining an image synthesis model for image processing, comprising:

9. The method of claim 8, wherein the first image synthesis model comprises: spatial transformation network module and image synthesis network module, and

10. An image processing system, comprising:

The processing layer is provided with at least one second image synthesis model serving as a teacher model and at least one first image synthesis model serving as a student model and is used for generating training samples according to the data set acquired by the data layer; respectively training the at least one first image synthesis model by using the training sample and the at least one second image synthesis model to obtain at least one first image synthesis model which learns the image synthesis result of the corresponding second image synthesis model; training the first image synthesis model using the training sample and the second image synthesis model, comprising: acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph; inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph to obtain a first output graph; inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation; optimizing the first image synthesis model based on the second output graph and the first sample graph;

11. An image processing system, comprising:

The server is used for respectively training at least one first image synthesis model serving as a student model by utilizing the training sample and at least one second image synthesis model serving as a teacher model to obtain at least one first image synthesis model of the image synthesis result of the corresponding second image synthesis model; training the first image synthesis model using the training sample and the second image synthesis model, comprising: acquiring a pre-trained second image synthesis model, a first image synthesis model to be trained and a training sample; the training samples comprise a first sample graph, at least one first characteristic graph associated with the first sample graph, a second sample graph, at least one second characteristic graph associated with the second sample graph, a third sample graph and a fourth sample graph; the first sample graph and the second sample graph are synthesized to obtain a synthesized graph related to graph content of the third sample graph; the fourth sample graph and the third sample graph are synthesized to obtain a synthesized graph related to graph content of the first sample graph; inputting the first sample graph, the at least one first feature graph, the second sample graph and the at least one second feature graph into the second image synthesis model to refer to the at least one first feature graph and the at least one second feature graph for synthesizing the first sample graph and the second sample graph to obtain a first output graph; inputting the first output graph, the third sample graph and the fourth sample graph into the first image synthesis model, and outputting a second output graph for knowledge distillation; optimizing the first image synthesis model based on the second output graph and the first sample graph;

12. An electronic device comprising a processor and a memory, wherein,

The memory is used for storing one or more computer instructions;

The processor is coupled to the memory for executing the at least one or more computer instructions for implementing the steps of the method of any one of the preceding claims 1 to 5, or for implementing the steps of the method of claim 6 or 7, or for implementing the steps of the method of claim 8 or 9.