WO2023160074A1

WO2023160074A1 - Image generation method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023160074A1
Application number: PCT/CN2022/134861
Authority: WO
Inventors: 李世楷; 吴文岩; 钱晨
Original assignee: 上海商汤智能科技有限公司
Priority date: 2022-02-28
Filing date: 2022-11-29
Publication date: 2023-08-31
Also published as: CN114581288A

Abstract

Provided in the present disclosure are an image generation method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring an original image, wherein the original image includes a target object in an original pose; on the basis of the original image and a target pose parameter, determining a first body posture diagram of the target object in a target pose; determining a first semantic segmentation diagram of the target object in the target pose, wherein the first semantic segmentation diagram is used for indicating a wearing feature of a body part of the target object; and on the basis of the first body posture diagram and the first semantic segmentation diagram, determining a composite image, wherein the composite image comprises the target object, which is in the target pose and has the wearing feature.

Description

An image generation method, device, electronic device and storage medium

Cross References to Related Applications

This disclosure claims the priority of the Chinese patent application with application number 202210187560.7 filed on February 28, 2022, which is incorporated herein by reference.

technical field

The present disclosure relates to the technical field of image processing, and in particular, to an image generation method, device, electronic equipment, and storage medium.

Background technique

The field of image generation has developed very vigorously in recent years, including human body image completion, motion migration, appearance migration and other functions. Human body image generation has huge potential applications in character animation, virtual try-on, movies or games, etc. value. In related techniques, a new image is generated by combining a 2D image and a target pose as input and employing an adversarial generative network. However, in the case of a large difference between the source pose and the target pose in the 2D image, the synthesis effect of the image obtained by relying on the neural network to obtain the feature information of the 2D image in the related art is poor.

Contents of the invention

Embodiments of the present disclosure at least provide an image generation method, device, electronic equipment, and storage medium.

In the first aspect, an embodiment of the present disclosure provides an image generation method, including: acquiring an original image; wherein, the original image includes a target object in an original pose; based on the original image and target pose parameters, determining the A first body shape map of the target object in a target pose; and a first semantic segmentation map for determining that the target object is in the target pose; wherein the first semantic segmentation map is used to indicate the limbs of the target object Wearing characteristics of parts; determining a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing characteristics.

In the above embodiment, first, the first body shape map can be determined based on the original image and the target pose parameters, so as to obtain more accurate body shape information of the target object; then, by determining the first semantic segmentation map, the complete body shape information of the target object can be obtained. The wearing characteristics of the body model can solve the problem that the existing 3D body model cannot obtain the texture information and geometric information located outside the human body model. When the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.

In an optional implementation manner, the determining the first body shape map of the target object in the target pose based on the original image and the target pose parameters includes: constructing the target object's body shape map based on the original image A first body model, and determining a texture map of the original image based on the first body model; determining a first body shape map of the target object in a target pose based on the texture map and the target pose parameters.

Compared with the technical solution in the related art that relies on the generation network to extract all the feature information in the original image, the technical solution of the present disclosure can firstly determine the above-mentioned first body shape map based on the texture map carrying the texture information of the body part of the target object, so that it can extract Texture information for the body parts of the target object. When semantically synthesizing the first body shape map and the first semantic segmentation map through the image generation network, the feature information extracted by the image generation network can be reduced, thereby alleviating the network processing pressure of the image generation network, and further improving the image synthesis effect.

In an optional implementation manner, the constructing the first body model of the target object based on the original image includes: determining model parameters of the first body model based on the original image; wherein, the model parameters are used Indicating the posture information of the target object in the original image; adjusting the position information of each vertex in the initial model based on the model parameters; generating the first body model based on the adjusted position information of each vertex.

In the above-mentioned embodiment, when the difference between the original pose and the target pose is large, the body shape information in each pose can still be accurately determined, so as to obtain a more accurate first body model, based on the first body When the model performs image synthesis, it can improve the effect of image synthesis.

In an optional implementation manner, the determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters includes: based on the first body model and the The target posture parameter determines the second body model of the target object in the target posture; based on the mapping relationship between the pixels of the texture map and the vertices, the texture map is rendered to the second body model, and The first body shape map.

In the above embodiment, by combining the three-dimensional body model to determine the first body shape map of the target object in the target pose, when the difference between the original pose and the target pose is large, a more accurate pose can be obtained. The first body shape map, so that the synthetic effect of the synthetic image can be improved.

In an optional implementation manner, the determining the first semantic segmentation map of the target object in the target pose includes: determining a second semantic segmentation map of the original image; wherein, the second semantic segmentation map It is used to indicate the wearing features of the body parts of the target object in the original posture; and the first semantic segmentation graph is generated based on the second semantic segmentation graph.

In the above embodiments, the accuracy of the first semantic segmentation map can be improved by determining the first semantic segmentation map through the second semantic segmentation map, thereby further improving the effect of image synthesis.

In an optional implementation manner, the generating the first semantic segmentation graph based on the second semantic segmentation graph includes: acquiring a second body model of the target object in a target pose; wherein the first The second body model is determined based on a first body model and the target posture parameters, the first body model is a body model of the target object constructed based on the original image; based on the second body model and the The second semantic segmentation map is used to generate the first semantic segmentation map.

In an optional implementation manner, the second semantic segmentation graph includes an original semantic graph and an original segmentation graph; the first semantic segmentation graph includes a target segmentation graph and a target semantic graph; the second semantic segmentation based on Generating the first semantic segmentation graph includes: generating a target segmentation graph in which the target object is in the target pose based on the original segmentation graph; wherein, the target segmentation graph includes that the target object is in the target pose The wearing features of all body parts during the posture; the target semantic map of the target object in the target pose is generated based on the original semantic map; wherein, the target semantic map includes the target object in the target segmentation map Dressing characteristics of individual body parts.

In an optional implementation manner, the method further includes: obtaining a second semantic segmentation map of the original image, and obtaining a second body shape map of the target object in the original posture; wherein, the second semantic segmentation The graph is used to characterize the position information of each limb part of the target object in the original posture, and the second body shape graph carries the complete wearing characteristics of the target object; based on the first body shape graph and the Determining a composite image from the first semantic segmentation map includes: determining to carry the target object based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map Composite images of the complete wearing features of .

In the above embodiments, more complete wearing features of the target object and denser texture information of the target object can be obtained, so as to obtain a composite image with better composite effect.

In an optional implementation manner, based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map, it is determined to carry the target object Synthetic image of the complete wearing features of , including: inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing , to obtain the composite image.

In the above embodiments, the synthetic image is generated through the image generation network, which can improve the synthetic effect of the synthetic image, so as to obtain a more realistic synthetic image.

In a second aspect, an embodiment of the present disclosure provides an image generation device, including: an acquisition unit, configured to acquire an original image; wherein, the original image contains a target object in an original posture; a first determination unit, configured to The original image and the target pose parameters determine the first body shape map of the target object in the target pose; and determine the first semantic segmentation map of the target object in the target pose; wherein, the first The semantic segmentation map includes the wearing features of the body parts of the target object; the third determination unit is configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image contains The target object is in the target pose and has the wearing feature.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.

In the fourth aspect, the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the above-mentioned first aspect, or any possible implementation of the first aspect is executed. steps in the method.

In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.

FIG. 1 shows a flow chart of an image generation method provided by an embodiment of the present disclosure;

Fig. 2 shows a schematic diagram of the effect of a target gesture provided by an embodiment of the present disclosure;

Fig. 3 shows a schematic diagram of the effect of a first body shape map provided by an embodiment of the present disclosure;

Fig. 4 shows a schematic diagram of the effect of a second body shape map provided by an embodiment of the present disclosure;

Fig. 5 shows a schematic diagram of the effect of a source gesture provided by an embodiment of the present disclosure;

Fig. 6(a) shows a schematic diagram of the effect of a segmentation map of the foreground region of an original image provided by an embodiment of the present disclosure;

Fig. 6(b) shows a schematic diagram of the effect of a semantic map of the foreground area of an original image provided by an embodiment of the present disclosure;

Fig. 7(a) shows a schematic diagram of the effect of a segmentation map of a target object in a target pose provided by an embodiment of the present disclosure;

Fig. 7(b) shows a schematic diagram of the effect of a semantic map of a target object in a target pose provided by an embodiment of the present disclosure;

FIG. 8 shows a schematic diagram of the effect of a synthesized image provided by an embodiment of the present disclosure;

FIG. 9 shows a flow chart of another image generation method provided by an embodiment of the present disclosure;

FIG. 10 shows a schematic diagram of an image generating device provided by an embodiment of the present disclosure;

Fig. 11 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.

After research, it is found that in related technologies, new images are generated by combining 2D images and target poses as input and using confrontation generation networks. However, in the case of a large difference between the source pose and the target pose in the 2D image, the synthetic effect of the image obtained by relying on the neural network to obtain the feature information of the 2D image is poor.

On this basis, in the field of human body image generation, related technical personnel propose to generate new human body images through 3D human body models. The 3D human body model can obtain the texture image of the source image in the target pose, and then the adversarial generation network can generate a new image of the target object in the target pose through the texture image and the target pose. Through this processing method, the network processing pressure of the confrontation generation network can be relieved, thereby improving the synthesis effect of the image. However, the 3D human body model commonly used in the related art is a parametric human body model, such as a skinned multi-person linear model (Skinned Multi-Person Linear, SMPL) model. The SMPL model can construct the skeleton model of the human body, and the surface texture corresponding to the human body model can be extracted through the SMPL model, for example, the texture of the clothes worn and the texture of the skin. However, when the clothes worn by the human body are relatively loose, the SMPL model cannot obtain the texture information and geometric information outside the human body model. Therefore, it is difficult for the SMPL model to have a good synthesis effect on the pictures of the human body wearing loose clothing.

The present disclosure provides an image generation method, device, electronic equipment and storage medium. The image generation method provided by the embodiments of the present disclosure can be applied in fields such as motion transfer, appearance transfer, virtual fitting, and the like. First, by determining the first body shape map based on the original image and the target pose parameters, more accurate body shape information of the target object can be obtained; then, by determining the first semantic segmentation map, the complete wearing characteristics of the target object can be obtained, so that Obtain texture information and geometry information located outside the mannequin. When the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.

In order to facilitate the understanding of this embodiment, an image generation method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the image generation method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability. The device includes, for example: a terminal device or a server or other processing device, and the terminal device may be a user device (mobile device, user terminal, terminal, handheld device, computing device, wearable device, etc. In some possible implementations, the image generation Methods can be implemented by a processor invoking computer readable instructions stored in a memory.

Referring to FIG. 1 , which is a flow chart of an image generation method provided by an embodiment of the present disclosure, the method includes steps S101 to S107, wherein:

S101: Acquire an original image; wherein, the original image includes a target object in an original pose.

S103: Based on the original image and target posture parameters, determine a first body shape map of the target object in a target posture.

Here, the original image is a 2D image; the target pose parameter can be an image containing the target pose, or the target pose parameter can also be the position information of limb key points used to characterize the target pose; The body pose of each body part of the target object.

During specific implementation, the first body model of the target object in the original pose can be reconstructed based on the original image, and the texture map of the original image relative to the first body model, the texture map carrying the limb parts of the target object in the original image Texture information on the body, for example, the texture information may be skin texture information of the target object, and/or clothing texture information of clothing worn on body parts of the target object. Afterwards, the model parameters of the first body model can be adjusted based on the target pose parameters to obtain the body model of the target object in the target pose, that is, the second body model, and determine the first body shape map based on the second body model and the texture map .

It should be understood that the first body shape map determined in the above manner also includes texture information on the limb parts of the target object in the original image.

In the embodiment of the present disclosure, when the original image contains multiple objects, the object located in the foreground area may be determined from the original image as the target object; or, the object with the highest degree of limb integrity in the original image may be determined as The target object; alternatively, the target object may also be determined according to the user's selection operation on multiple objects.

S105: Determine a first semantic segmentation map of the target object in the target pose; wherein the first semantic segmentation map is used to indicate wearing features of body parts of the target object.

In the embodiments of the present disclosure, the first semantic segmentation map may be determined based on the semantic segmentation map of the original image and the above-mentioned second body model, and the specific determination process is described in detail in the following embodiments. Among them, the semantic segmentation map of the original image is used to indicate the wearing characteristics of each limb part of the target object in the original pose.

Part labels of various body parts of the target object may be included in the semantic segmentation map of the original image, so as to distinguish the body parts of the target object through the part labels.

Here, the body of the target object may be divided into different body parts, for example, may be divided into the following body parts: hair part, face, upper torso part, hand, leg and foot.

During specific implementation, the part labels may be pixel points corresponding to corresponding pixel values, wherein different part labels correspond to different pixel values. In the semantic segmentation map of the original image, each body part can be represented by the pixel.

For example, pixel points with corresponding pixel values may be displayed in the part region where each limb part is located in the semantic segmentation map.

After distinguishing each body part of the target object through the part label, the wearing characteristics of each body part of the target object in the original pose can be determined by combining the original image and the part label.

Similarly, the first semantic segmentation map can contain the part labels of the various limb parts of the target object, and the part labels of the same limb parts in the semantic segmentation map of the original image and the first semantic segmentation map are the same, that is, by corresponding to the same Pixel values are represented by pixel points.

Because the first body shape map contains the texture information on the limb parts of the target object in the original image, it does not contain the complete wearing characteristics of the target object in the original image. Therefore, in order to obtain the complete wearing characteristics of the target object, the first semantic segmentation map of the target object in the target pose can be determined based on the semantic segmentation map of the original image and the above-mentioned second body model, and then, based on the first semantic segmentation map, it can be determined The complete wearing characteristics of the subject's body part.

For example, the target object in the original image is wearing looser clothes, such as a skirt. Then the first body shape map determined based on the original image and the target pose parameters may contain the texture information located on the limb parts of the target object in the original image, but not the texture information located outside the limb parts, for example, not including the texture information located on the limb parts of the target object. Texture information for the skirt area outside of the body parts. Further, by determining the first body shape map and the first semantic segmentation map including the complete wearing features, the complete wearing features of the target object can be obtained. When synthesizing based on the first body shape map and the first semantic segmentation map, a synthetic image including complete wearing features can be obtained.

S107: Determine a composite image based on the first body shape map and the first semantic segmentation map, where the composite image includes a target object in the target pose and having the wearing feature.

It can be known from the above description that the first semantic segmentation map may contain part labels of each body part of the target object. Therefore, the semantic layout information in the first semantic segmentation map can be represented by the part label. It can be understood that the semantic layout information can be used to indicate the limb layout information of each limb part of the target object in the target pose.

Based on this, the first body shape map and the first semantic segmentation map can be semantically synthesized through the image generation network, so that the semantic layout information in the first semantic segmentation map can be preserved in the resulting synthetic image.

The specific implementation manners of the above step S101 to step S107 will be described in detail below in conjunction with the specific implementation manners.

In an optional embodiment, the above step S103: based on the original image and target posture parameters, determining the first body shape map of the target object in the target posture includes the following steps:

Step S1031: constructing a first body model of the target object based on the original image, and determining a texture map of the original image based on the first body model;

Step S1032: Determine a first body shape map of the target object in a target pose based on the texture map and the target pose parameters.

In an embodiment of the present disclosure, the first body model can be obtained by performing three-dimensional reconstruction on the original image, wherein the first body model can be a three-dimensional body model including the body structure and body posture of the target object, for example, the three-dimensional body model It can be an SMPL model, or it can also be other types of three-dimensional body models.

Here, the body structure of the target object may be understood as each body part of the target object, and the body posture of the target object may be understood as the body posture of each body part of the target object.

After the first body model is determined, texture reconstruction can be performed based on the original image and the first body model, so as to obtain the texture map of the original image. The specific reconstruction principle is described as follows:

Projecting the first body model to the original image, thereby establishing a projection relationship between the first body model and the original image. Wherein, the projection relationship is used to indicate the corresponding relationship between the vertices in the first body model and the pixels in the original image. Based on the projection relationship, the pixel values of the pixel points corresponding to the vertices of the first body model in the original image are determined. Obtain the correspondence between the vertices of the first body model and the coordinates in the texture map, and then render the pixels corresponding to the vertices of the first body model to the texture map based on the correspondence, so as to obtain the texture of the original image stickers.

Since the first body model can represent the body structure and body shape of the target object, the texture map of the original image determined in the above manner contains texture information on the body parts of the target object in the original image, for example, contains the texture information of the target object Skin texture information, or clothing texture information including clothing worn on body parts of the target object.

After the texture map is determined, the first body shape map can be determined in combination with the target pose parameters and the texture map. During specific implementation, the second body model of the target object in the target pose can be determined based on the target pose parameters, and then the texture map is rendered on the second body model, so as to obtain the first body shape map of the target object in the target pose.

Since the texture map of the original image contains the texture information of the limb parts of the target object, the first body shape map determined based on the texture map can not only represent the body posture of each limb part of the target object in the target pose, but the first body The morphology map also contains texture information on the limb parts of the target object.

In related technologies, a new image is generated by combining a 2D image and a target pose as input and using an adversarial generation network. This method relies on the generative network to extract feature information in the original image, for example, extracting the pose features of the target object and the clothing features of the target object in the original image. When the difference between the original pose and the target pose is large, related techniques often fail to obtain a composite image with a more accurate pose.

In the technical solution of the present disclosure, through the above processing method, in the case of a large difference between the original pose and the target pose, the first body shape map with a more accurate pose can still be obtained, thereby improving the synthesis effect of the synthesized image.

In the embodiment of the present disclosure, the first body shape map and the first semantic segmentation map may also be semantically synthesized through an image generation network, so as to obtain a synthesized image. Compared with the technical solution in the prior art that relies on the generation network to extract all the feature information in the original image, the technical solution of the present disclosure can first determine the above-mentioned first body shape map based on the texture map carrying the texture information of the body part of the target object, so that it can extract Get the texture information of the body part of the target object. When semantically synthesizing the first body shape map and the first semantic segmentation map through the image generation network, the feature information extracted by the image generation network can be reduced, thereby alleviating the network processing pressure of the image generation network, and further improving the image synthesis effect.

In an optional implementation manner, the above step S1031: constructing the first body model of the target object based on the original image, specifically includes the following steps:

Step S10311: Determine model parameters of the first body model based on the original image; wherein, the model parameters are used to indicate pose information of the target object in the original image;

Step S10312: Adjust the position information of each vertex in the initial model based on the model parameters;

Step S10313: Generate the first body model based on the adjusted position information of each vertex.

In the embodiment of the present disclosure, after the original image is acquired, the original image may be subjected to pixel value normalization processing, so that the pixel value of each pixel point in the original image is normalized to the range [-a, a] Numerical values, wherein the value of a can be 1, or other normalized numerical values, and the value of a can be set according to the actual needs of normalization, which is not specifically limited in the present disclosure. Afterwards, the picture size of the original image can be modified. For example, the picture size of the original image can be set to 224*224 and 512*512 to meet the input requirements of the Human Mesh Recovery (HMR) model.

After the original image is processed in the above manner, the processed original image can be input into the HMR model for processing, so that the output result of the HMR model can be used as the model parameter of the first body model (that is, the model parameter of the SMPL model ).

Here, the model parameters may be parameters used to characterize pose information of the target object included in the original image, wherein the model parameters may include pose parameters, shape parameters and/or camera parameters.

Exemplarily, the camera parameters describe the pose information of the camera through the values of K1 dimensions. Wherein, the value of K1 may be 3, that is, the camera parameters may include internal parameters, external parameters and distortion parameters.

The shape parameter describes the shape of the target object through the value of K2 dimensions, and the value of each dimension can be interpreted as an indicator of the shape of the target object, such as height, shortness, fatness, and head-to-body ratio. Here, the shape change of the body model can be controlled by controlling the values of the K2 dimensions. Wherein, the value of K2 may be 10.

Action parameters describe the action posture of the target object at a certain moment through the value of K3 dimensions, where the values of K3 dimensions are used to represent the angles of multiple limb joint points of the target object relative to each axis, among which, K3 dimensions The value of can be, for example, the value of 72 (24*3) dimensions, 24 represents the limb joint points of 24 well-defined target objects, and 3 in 72(24*3) means that the limb joint points are aimed at An axis-angle representation of the rotation angle of its parent node.

After the model parameters are obtained in the manner described above, the position information of each vertex in the initial model can be adjusted based on the model parameters. Wherein, the initial model may be a pre-created body model in the SMPL model that matches the object type of the target object. For example, if the target object is a human being, then the preset initial model may be a pre-created human body model.

The initial model contains multiple vertices and triangular patches determined based on multiple vertices. The number of vertices in the initial model can be set by the user according to actual needs; or, the number of vertices in the initial model is the same as that of the target object associated with the object type; or, the number of vertices in the initial model is the default value.

During specific implementation, a matching initial model can be built in advance according to the object type of the target object. For example, P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches are firstly created, and then the body of the target object is represented by the P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches. Among them, the P 3D vertices can be understood as P bone points of the initial model, and the triangle patch can be understood as triangles formed based on 3D vertex vertex, and each triangle corresponds to three 3D vertex vertex.

Afterwards, based on the model parameters, the vertex coordinates of each vertex in the initial model after action deformation and body shape deformation can be determined, and then the position information of each vertex in the initial model can be adjusted based on the vertex coordinates, and the adjusted vertices can be used to determine the triangle surface , and the first body model is composed of several triangle faces.

After the first body model is determined, texture extraction may be performed on the first body model (eg, SMPL model), so as to extract a texture map.

During specific implementation, due to the topological structure of the human body model of the SMPL model, each vertex has a corresponding semantic content, wherein the semantic content is used to represent the corresponding relationship between the vertex and the pixel points in the original image. Based on this, the mapping relationship between the vertices and each pixel in the texture map can be constructed. Afterwards, texture reconstruction can be performed based on the mapping relationship, so as to obtain the texture map of the original image. For example, based on the mapping relationship, the pixel value of the pixel point matching the corresponding vertex in the original image can be determined, and then the pixel value of the corresponding coordinate point in the texture map can be determined based on the pixel value, so as to complete the texture reconstruction and obtain the original image. texture map.

In an optional implementation manner, the above step S1032: determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters, specifically includes the following steps:

Step S10321: Determine a second body model of the target object in a target pose based on the first body model and the target pose parameters;

Step S10322: Based on the mapping relationship between the pixels of the texture map and the vertices, render the texture map to the second body model to obtain the first body shape map.

In the embodiment of the present disclosure, the action parameters and shape parameters of the target object in the target pose may be determined based on the target pose parameters, and then the second body model is determined by modifying the action parameters and shape parameters of the first body model.

If the target pose parameter is image data, wherein the image data contains the target pose, then the image data can be processed based on the HMR model, so as to obtain the action parameters and shape parameters of the target object in the target pose; after that , the vertex coordinates of each vertex in the first body model can be modified based on the action parameters and shape parameters of the target object in the target pose, so as to obtain the second body model through adjustment.

If the target pose parameters include the action parameters and shape parameters of the target object in the target pose, then the vertex coordinates of each vertex in the first body model can be directly adjusted based on the action parameters and shape parameters, so as to obtain the second body model.

If the target pose parameter is a pose parameter of another class, then the target pose parameter can be converted into an action parameter and a shape parameter of the target object in the target pose.

After the second body model is determined in the manner described above, the texture map can be rendered in the second body model based on the mapping relationship between each pixel point and each vertex in the texture map, so that it can be based on the rendering after rendering As a result a first body shape map is determined. For example, assuming that the target posture is the posture shown in FIG. 2 , a first body shape diagram as shown in FIG. 3 can be obtained. It can be seen from the first body shape diagram shown in FIG. 3 that the first body shape diagram only includes the body features (eg, posture) of the target object, but does not include the clothing features of the target object.

In an optional implementation manner, the above step S105: determining the first semantic segmentation map of the target object in the target pose specifically includes the following steps:

Step S1051: Determine the second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate the wearing characteristics of the body parts of the target object in the original posture;

Step S1052: Generate the first semantic segmentation graph based on the second semantic segmentation graph.

In the embodiment of the present disclosure, the above steps S10321 and S10322 may be performed before step S1051 and step S1052, or may be performed after step S1051 and step S1052, which is not specifically limited in the present disclosure.

During specific implementation, the original image may be processed to obtain a semantic segmentation map of the original image, that is, a second semantic segmentation map. For example, image segmentation processing may be performed on the original image through an image segmentation network, so as to obtain the second semantic segmentation map.

For example, the original pose of the target object in the original image is the pose shown in Figure 5, then the second semantic segmentation map of the original image can be the semantic segmentation map shown in Figure 6(a) and Figure 6(b) . Among them, Fig. 6(a) shows the segmentation map of the foreground region of the original image, and Fig. 6(b) shows the semantic map of the foreground region of the original image.

It can be seen from Fig. 6(a) that the segmentation map of the foreground region of the original image is used to characterize the segmentation result of the target object in the original image, for example, the body contour information of the target object. As shown in Fig. 6(b), the semantic map of the foreground area of the original image is used to represent the semantic information of the target object in the original image. The semantic information, for example, represents the various body parts of the target object in the original image.

After the second semantic segmentation map of the original image is determined, the wearing features of each body part of the target object in the original posture can be determined based on the second semantic segmentation map and the original image.

After the semantic segmentation graph shown in FIG. 6( a ) and FIG. 6( b ) is determined, a first semantic segmentation graph in which the target object is in the target pose can be determined based on the semantic segmentation graph. Here, the first semantic segmentation graph includes a semantic graph (ie, the following target semantic graph) and a segmentation graph (ie, the following target segmentation graph) in which the target object is in the target pose.

In the above embodiments, the accuracy of the first semantic segmentation map can be improved by determining the first semantic segmentation map through the second semantic segmentation map, thereby further improving the effect of image synthesis. At the same time, the complete wearing characteristics of the target object in the target pose can be obtained, thereby solving the problem that the SMPL model cannot obtain texture information and geometric information outside the human body model, and further improving the image synthesis effect. For targets wearing looser clothing Objects can still get better image synthesis results.

In an optional implementation manner, the above step S1052: generating the first semantic segmentation graph based on the second semantic segmentation graph specifically includes the following steps:

Step S11: Obtain a second body model of the target object in a target posture, wherein the second body model is determined based on the first body model and the target posture parameters, and the first body model is determined based on the Constructing a body model of the target object from the original image;

Step S12: Generate the first semantic segmentation map based on the second body model and the second semantic segmentation map.

Here, the second body model is a three-dimensional body model determined based on the first body model and target posture parameters, and the specific determination process is described as follows:

The action parameters and shape parameters of the target object in the target pose can be determined based on the target pose parameters, and then the second body model can be determined by modifying the action parameters and shape parameters of the first body model.

Here, the target pose parameter may be image data, or data including action parameters and shape parameters of the target object in the target pose, or other types of pose parameters. The process of determining the second body model based on various types of target posture parameters is as described in the embodiment corresponding to the above step S10321 and step S10322, and will not be described in detail here.

After the second body model is determined, the second body model and the second semantic segmentation map can be input into the target sub-network for processing, so as to obtain the first semantic segmentation map.

Assuming that the second semantic segmentation graph is the semantic segmentation graph shown in Figure 6(a) and Figure 6(b), then the semantic segmentation graph shown in Figure 7(a) and Figure 7(b) can be obtained, that is, the first A semantic segmentation map. As shown in Fig. 7(a) and Fig. 7(b), the first semantic segmentation map contains the semantic map and the segmentation map of the target object in the target pose.

It can be seen from the above description that the second semantic segmentation map includes the original semantic map (ie, the semantic map of the foreground area) and the original segmentation map (ie, the segmentation map of the foreground area).

In this case, step S1052: generating the first semantic segmentation graph based on the second semantic segmentation graph, specifically includes the following steps:

Step S21: Generate a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes wearing features of all body parts of the target object in the target pose;

Step S22: Generate a target semantic map in which the target object is in the target pose based on the original semantic map; wherein, the target semantic map includes wearing features of each body part of the target object in the target segmentation map .

During specific implementation, the original segmentation map and the second body model can be input into the target sub-network A for processing, so as to obtain the target segmentation map of the target object in the target pose, for example, the segmentation shown in Figure 7(a) picture. Afterwards, the original semantic graph and the second body model can be input into the target sub-network B for processing, so as to obtain the target semantic graph of the target object in the target pose, for example, the semantic graph shown in Figure 7(b) is obtained. Wherein, the target subnetwork A may be a generation network with the same or different structure as the target subnetwork B, wherein the output result of the target subnetwork A is a segmentation graph, and the output result of the target subnetwork B is a semantic graph.

Here, the target sub-network can be any kind of convolutional neural network, and the network structure of the target sub-network A and the target sub-network B are the same, but the network parameters are different, that is, the target sub-network A and the target sub-network B are obtained through their respective The training sample is the neural network obtained after training the same convolutional neural network.

After the target segmentation map and the target semantic map are obtained, the target segmentation map and the target semantic map may be determined as the first semantic segmentation map in which the target object is in the target pose. For example, when the target pose is the pose shown in Figure 2, the semantic segmentation graphs shown in Figure 7(a) and Figure 7(b) can be obtained.

In the embodiment of the present disclosure, after the first semantic segmentation map is determined, the clothing segmentation map of the target object in the original image can be determined based on the target segmentation map in the first semantic segmentation map, for example, based on the The segmentation map is extracted from the original image to obtain the wearing segmentation map of the target object; and determine the wearing semantic map of the target object in the original image based on the target semantic map in the first semantic segmentation map, for example, based on the semantic map of Figure 7(b) A wearing semantic map that carries wearing features of each body part of a target object is determined from raw images.

In an optional implementation manner, the embodiment of the present disclosure further includes the following steps: obtaining a second semantic segmentation map of the original image, and obtaining a second body shape map of the target object in the original posture; wherein, the first The second semantic segmentation map is used to characterize the position information of each body part of the target object in the original posture, and the second body shape map carries the complete wearing characteristics of the target object.

On this basis, the above step S107: determine a composite image carrying the complete wearing characteristics of the target object based on the first body shape map and the first semantic segmentation map, including:

Step S1071: Based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determine a composite image carrying the wearing characteristics of the target object.

In the embodiment of the present disclosure, before obtaining the synthesized image, the second semantic segmentation map of the original image can also be obtained, and the second body shape map of the target object in the source pose can be obtained; wherein, the second semantic segmentation map The second semantic segmentation map of the original image determined in the manner described above may be, for example, the segmentation map shown in FIG. 6(a) and the semantic map shown in FIG. 6(b). The second body shape map may be a body map of the target object in the original pose determined based on the original image. Wherein, the second body shape map determined based on the original image carries the complete wearing characteristics of the target object. For example, based on the segmentation map shown in Figure 6(a), the wearing body map of the target object in the source pose can be extracted from the original image, so as to obtain the wearing body map shown in Figure 4 (i.e., the second body shape diagram).

After obtaining the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, based on the first body shape map, the first semantic segmentation Figure, the second body shape map and the second semantic segmentation map, determine the synthetic image carrying the wearing characteristics of the target object, specifically including:

Inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing to obtain the composite image. Wherein, the image generation network may be a space-adaptive normalization network, wherein the space-adaptive normalization network may be trained based on an adversarial generation network.

Here, the spatial adaptive normalization network can be a SPADE (Spatial Adaptive Normalization) network; wherein, the SPADE network can convert the segmentation layout map into a realistic picture. In the embodiment of the present disclosure, in addition to the above-mentioned spatial adaptive normalization network, the image generation network can also be other networks, for example, other networks that can replace the spatial adaptive normalization network, or can realize Other networks of the network function of the spatial adaptive normalization network, which are not specifically limited in the present disclosure.

After inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into the image generator and the spatial adaptive normalization network for processing, the Get a composite image. For example, a composite image as shown in FIG. 8 can be obtained. It can be seen from FIG. 8 that the composite image contains the complete wearing features of the target object, and better image synthesis results can still be obtained for the target object wearing loose clothing.

Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

FIG. 9 is a flow chart of another image generation method provided by an embodiment of the present disclosure. As shown in FIG. 9, the method includes steps S901-S908, wherein:

Step S901: Acquire original image and target data.

Here, the original image may be an image including a target human body captured by a camera of an electronic device, and the target human body is the target object in the foregoing embodiments.

Here, target pose parameters are included in the target data. Wherein, the target data may be an image containing the object in the target pose, or a video containing the object in the target pose.

In the embodiment of the present disclosure, after the original image is acquired, the original image may be subjected to pixel value normalization processing, so that the pixel value of each pixel point in the original image is normalized to the range [-a, a] Numerical values, wherein the value of a can be 1, or other normalized numerical values, and the value of a can be set according to the actual needs of normalization, which is not specifically limited in the present disclosure. Afterwards, the picture size of the original image can be modified. For example, the picture size of the original image can be set to 224*224 and 512*512 to meet the input requirements of the HMR (Human Mesh Recovery) human body pose reconstruction model.

Step S902: Determine model parameters based on the original image; wherein, the model parameters are used to indicate pose information of the target object in the original image.

During specific implementation, the processed original image may be input into the HMR model for processing, so that the output result of the HMR model is used as a model parameter, for example, a model parameter of the SMPL model.

Step S903: Construct a first body model based on model parameters.

In an embodiment of the present disclosure, the position information of each vertex in the initial model may be adjusted based on the model parameters; and the first body model may be generated based on the adjusted position information of each vertex.

Here, the initial model may be a pre-created body model in the SMPL model that matches the object type of the target object. For example, the preset initial model may be a pre-created human body model. The initial model contains multiple vertices and triangular faces determined based on the multiple vertices.

During specific implementation, a matching initial model can be built in advance according to the object type of the target object. For example, P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches are firstly created, and then the body of the target object is represented by the P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches. Among them, the P 3D vertices can be understood as P bone points of the initial model, and the triangle patch can be understood as triangles formed based on 3D vertex vertex, and each triangle corresponds to three 3D vertex vertex. Afterwards, based on the model parameters, the vertex coordinates of each vertex in the initial model after action deformation and body shape deformation can be determined, and then the position information of each vertex in the initial model can be adjusted based on the vertex coordinates, and the adjusted vertices can be used to determine the triangle surface , and the first body model is composed of several triangle faces.

Step S904: Determine the texture map of the original image based on the first body model.

During specific implementation, the first body model may be projected to the original image, so as to establish a projection relationship between the first body model and the original image. Wherein, the projection relationship is used to indicate the corresponding relationship between the vertices in the first body model and the pixels in the original image. Based on the projection relationship, the pixel values of the pixel points corresponding to the vertices of the first body model in the original image are determined. Obtain the correspondence between the vertices of the first body model and the coordinates in the texture map, and then render the pixels corresponding to the vertices of the first body model to the texture map based on the correspondence, so as to obtain the texture of the original image stickers.

Step S905: Determine the first body shape map of the target object in the target pose based on the texture map and the target pose parameters.

In the embodiment of the present disclosure, firstly, the second body model of the target object in the target pose may be determined based on the first body model and the target pose parameters. Then, based on the mapping relationship between the pixels of the texture map and the vertices in the second body model, the texture map can be rendered to the second body model to obtain the first body shape map.

Step S906: Determine the first semantic segmentation map in which the target object is in the target pose.

In an embodiment of the present disclosure, the second body model determined in step S905 may be obtained, and a second semantic segmentation map of the original image may be determined; wherein, the second semantic segmentation map is used to indicate that the limbs of the target object are in the original posture The wearing feature of the part, and then based on the second body model and the second semantic segmentation map, a first semantic segmentation map is generated.

Here, the second semantic segmentation map contains the original segmentation map and the original semantic map of the target object in the original pose. In the embodiment of the present disclosure, the target segmentation map under the target pose can be generated through the original segmentation map and the second body model, and the target semantic map under the target pose can be generated through the original semantic map and the second body model, wherein the target object is in The first semantic segmentation map under the target pose includes the target segmentation map and the target semantic map. The specific process is described as follows:

Inputting the original segmentation map and the second body model into the target sub-network to obtain the target segmentation map of the target object in the target pose; wherein, the target segmentation map includes all body parts of the target object in the target pose Wearing features; and inputting the original semantic map and the second body model into the target sub-network to obtain the target semantic map of the target object in the target pose, the target semantic map includes the various body parts of the target object in the target segmentation map clothing characteristics.

Step S907: Obtain a second body shape map of the target object in the original posture.

Step S908: Based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determine a composite image carrying the complete wearing characteristics of the target object.

During specific implementation, the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map can be input into an image generation network for processing to obtain the synthesized image.

Here, the image generation network may be a space-adaptive normalization network, wherein the space-adaptive normalization network may be trained based on an adversarial generation network.

In the embodiment of the present disclosure, more accurate body shape information of the target object can be obtained by determining the first body shape map based on the original image and target pose parameters; after that, by determining the first semantic segmentation map, the complete body shape map of the target object can be obtained. Therefore, the problem that the existing parametric human body model cannot obtain the texture information and geometric information outside the human body model is solved. When the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.

Based on the same inventive concept, the embodiment of the present disclosure also provides an image generating device corresponding to the image generating method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned image generating method in the embodiment of the present disclosure, the implementation of the device Reference can be made to the implementation of the method, and repeated descriptions will not be repeated.

Referring to FIG. 10 , which is a schematic diagram of an image generation device provided by an embodiment of the present disclosure, the device includes: an acquisition unit 10, a first determination unit 20, and a second determination unit 30; wherein,

An acquisition unit 10, configured to acquire an original image; wherein, the original image includes a target object in an original posture;

The first determining unit 20 is configured to determine, based on the original image and the target pose parameters, a first body shape map of the target object in the target pose; and determine a first semantic representation of the target object in the target pose Segmentation diagram; wherein, the first semantic segmentation diagram is used to indicate the wearing characteristics of the limb parts of the target object;

The second determining unit 30 is configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target.

In a possible implementation manner, the first determination unit is further configured to: construct a first body model of the target object based on the original image, and determine a texture map of the original image based on the first body model; A first body shape map of the target object in a target pose is determined based on the texture map and the target pose parameters.

In a possible implementation manner, the first determining unit is further configured to: determine a model parameter of the first body model based on the original image; wherein the model parameter is used to indicate the target object in the original image attitude information; adjusting the position information of each vertex in the initial model based on the model parameters; generating the first body model based on the adjusted position information of each vertex.

In a possible implementation manner, the first determining unit is further configured to: determine a second body model of the target object in a target pose based on the first body model and the target pose parameter; The mapping relationship between the pixels and the vertices, rendering the texture map to the second body model to obtain the first body shape map.

In a possible implementation manner, the first determining unit is further configured to: determine a second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate that the limbs of the target object are in the original posture The wearing feature of the part; generating the first semantic segmentation map based on the second semantic segmentation map.

In a possible implementation manner, the first determining unit is further configured to: acquire a second body model of the target object in a target posture; wherein, the second body model is based on the first body model and the target Determined by posture parameters, the first body model is the body model of the target object constructed based on the original image; based on the second body model and the second semantic segmentation map, the first semantic segmentation map is generated Split graph.

In a possible implementation manner, when the second semantic segmentation graph includes an original semantic graph and an original segmentation graph, and the first semantic segmentation graph includes a target segmentation graph and a target semantic graph; the first determining unit is further configured to : Generate a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes the wearing features of all body parts when the target object is in the target pose; based on the The original semantic map generates a target semantic map in which the target object is in the target pose; wherein, the target semantic map includes wearing features of various body parts of the target object in the target segmentation map.

In a possible implementation manner, the device is also used to: obtain a second semantic segmentation map of the original image, and obtain a second body shape map of the target object in the original posture; wherein, the second semantic segmentation map It is used to characterize the position information of each limb part of the target object in the original posture, and the second body shape map carries the complete wearing characteristics of the target object; the second determination unit is also configured to: based on the first The body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map determine a composite image carrying the wearing characteristics of the target object.

In a possible implementation manner, the second determining unit is further configured to: input the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map to the image generation network for processing to obtain the composite image.

For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.

Corresponding to the image generation method in FIG. 1, the embodiment of the present disclosure also provides an electronic device 1100, as shown in FIG. 11, which is a schematic structural diagram of the electronic device 1100 provided in the embodiment of the present disclosure, including:

Processor 111, memory 112, and bus 113; memory 112 is used for storing execution order, comprises memory 1121 and external memory 1122; memory 1121 here is also called internal memory, is used for temporarily storing the operation data in processor 111, and The data exchanged by the external memory 1122 such as hard disk, the processor 111 exchanges data with the external memory 1122 through the memory 1121, when the electronic device 1100 is running, the processor 111 communicates with the memory 112 through the bus 113, so that The processor 111 executes the following instructions:

Acquiring an original image; wherein, the original image contains the target object in the original posture;

Based on the original image and target pose parameters, determine a first body shape map of the target object in the target pose; and determine a first semantic segmentation map of the target object in the target pose; wherein, the first a semantic segmentation map comprising wearing features of body parts of the target object;

A composite image is determined based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing feature.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image generation method described in the above-mentioned method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

The embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the image generation method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.

Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.

Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described system and device can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.

Claims

A method of image generation, comprising:

Acquiring an original image; wherein, the original image contains the target object in the original posture;

Based on the original image and target pose parameters, determine a first body shape map of the target object in the target pose; and determine a first semantic segmentation map of the target object in the target pose; wherein, the first A semantic segmentation map is used to indicate the wearing characteristics of the body part of the target object;

A composite image is determined based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing feature.
The method according to claim 1, wherein, based on the original image and target posture parameters, determining the first body shape diagram of the target object in the target posture comprises:

Constructing a first body model of the target object based on the original image, and determining a texture map of the original image based on the first body model;

A first body shape map of the target object in a target pose is determined based on the texture map and the target pose parameters.
The method according to claim 2, wherein said constructing the first body model of the target object based on the original image comprises:

Determining model parameters of the first body model based on the original image; wherein the model parameters are used to indicate pose information of the target object in the original image;

adjusting the position information of each vertex in the initial model based on the model parameters;

The first body model is generated based on the adjusted position information of each vertex.
The method according to claim 2 or 3, wherein the determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters comprises:

determining a second body model of the target subject in a target pose based on the first body model and the target pose parameters;

Rendering the texture map to the second body model based on the mapping relationship between the pixels of the texture map and the vertices, to obtain the first body shape map.
The method according to any one of claims 1 to 4, wherein said determining the first semantic segmentation map in which the target object is in the target pose comprises:

Determining a second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate the wearing characteristics of the limb parts of the target object in the original posture;

The first semantic segmentation graph is generated based on the second semantic segmentation graph.
The method according to claim 5, wherein the generating the first semantic segmentation graph based on the second semantic segmentation graph comprises:

Acquiring a second body model of the target object in a target posture; wherein, the second body model is determined based on the first body model and the target posture parameters, and the first body model is determined based on the original image a constructed body model of the target subject;

The first semantic segmentation map is generated based on the second body model and the second semantic segmentation map.
The method according to claim 5 or 6, wherein the second semantic segmentation map includes an original semantic map and an original segmentation map; the first semantic segmentation map includes a target segmentation map and a target semantic map;

The generating the first semantic segmentation graph based on the second semantic segmentation graph includes:

Generating a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes wearing features of all body parts of the target object in the target pose;

A target semantic map in which the target object is in the target pose is generated based on the original semantic map; wherein, the target semantic map includes wearing features of various body parts of the target object in the target segmentation map.
The method according to any one of claims 1 to 7, further comprising:

Obtain a second semantic segmentation map of the original image, and obtain a second body shape map of the target object in the original posture; wherein, the second semantic segmentation map is used to characterize the body parts of the target object in the original posture The location information of the second body shape map carries the complete wearing characteristics of the target object;

The determining a composite image based on the first body shape map and the first semantic segmentation map includes:

Based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map, determine a composite image carrying complete wearing features of the target object.
The method according to claim 8, wherein, based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determining the A composite image of the complete attire of the subject, including:

Inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing to obtain the composite image.
An image generating device, comprising:

An acquisition unit, configured to acquire an original image; wherein, the original image contains the target object in the original posture;

A first determining unit, configured to determine a first body shape map of the target object in a target pose based on the original image and target pose parameters; and determine a first semantic segmentation of the target object in the target pose Figure; wherein, the first semantic segmentation figure is used to indicate the wearing characteristics of the limb parts of the target object;

A second determining unit, configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target in the target pose and having the wearing characteristics object.
An electronic device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the memory through the bus , when the machine-readable instructions are executed by the processor, the image generation method according to any one of claims 1 to 9 is executed.
A computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the image generation method according to any one of claims 1 to 9 is executed.