WO2023160074A1 - Image generation method and apparatus, electronic device, and storage medium - Google Patents
Image generation method and apparatus, electronic device, and storage medium Download PDFInfo
- Publication number
- WO2023160074A1 WO2023160074A1 PCT/CN2022/134861 CN2022134861W WO2023160074A1 WO 2023160074 A1 WO2023160074 A1 WO 2023160074A1 CN 2022134861 W CN2022134861 W CN 2022134861W WO 2023160074 A1 WO2023160074 A1 WO 2023160074A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- map
- target
- target object
- semantic segmentation
- pose
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000011218 segmentation Effects 0.000 claims abstract description 230
- 239000002131 composite material Substances 0.000 claims abstract description 38
- 238000010586 diagram Methods 0.000 claims abstract description 25
- 230000037237 body shape Effects 0.000 claims description 118
- 238000012545 processing Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 9
- 238000009877 rendering Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 description 27
- 230000015572 biosynthetic process Effects 0.000 description 19
- 238000003786 synthesis reaction Methods 0.000 description 19
- 230000009471 action Effects 0.000 description 14
- 238000010606 normalization Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 7
- 230000003044 adaptive effect Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 210000000988 bone and bone Anatomy 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000036548 skin texture Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
Definitions
- the present disclosure relates to the technical field of image processing, and in particular, to an image generation method, device, electronic equipment, and storage medium.
- Embodiments of the present disclosure at least provide an image generation method, device, electronic equipment, and storage medium.
- an embodiment of the present disclosure provides an image generation method, including: acquiring an original image; wherein, the original image includes a target object in an original pose; based on the original image and target pose parameters, determining the A first body shape map of the target object in a target pose; and a first semantic segmentation map for determining that the target object is in the target pose; wherein the first semantic segmentation map is used to indicate the limbs of the target object Wearing characteristics of parts; determining a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing characteristics.
- the first body shape map can be determined based on the original image and the target pose parameters, so as to obtain more accurate body shape information of the target object; then, by determining the first semantic segmentation map, the complete body shape information of the target object can be obtained.
- the wearing characteristics of the body model can solve the problem that the existing 3D body model cannot obtain the texture information and geometric information located outside the human body model.
- the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.
- the determining the first body shape map of the target object in the target pose based on the original image and the target pose parameters includes: constructing the target object's body shape map based on the original image A first body model, and determining a texture map of the original image based on the first body model; determining a first body shape map of the target object in a target pose based on the texture map and the target pose parameters.
- the technical solution of the present disclosure can firstly determine the above-mentioned first body shape map based on the texture map carrying the texture information of the body part of the target object, so that it can extract Texture information for the body parts of the target object.
- the feature information extracted by the image generation network can be reduced, thereby alleviating the network processing pressure of the image generation network, and further improving the image synthesis effect.
- the constructing the first body model of the target object based on the original image includes: determining model parameters of the first body model based on the original image; wherein, the model parameters are used Indicating the posture information of the target object in the original image; adjusting the position information of each vertex in the initial model based on the model parameters; generating the first body model based on the adjusted position information of each vertex.
- the body shape information in each pose can still be accurately determined, so as to obtain a more accurate first body model, based on the first body
- the model performs image synthesis, it can improve the effect of image synthesis.
- the determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters includes: based on the first body model and the The target posture parameter determines the second body model of the target object in the target posture; based on the mapping relationship between the pixels of the texture map and the vertices, the texture map is rendered to the second body model, and The first body shape map.
- the three-dimensional body model by combining the three-dimensional body model to determine the first body shape map of the target object in the target pose, when the difference between the original pose and the target pose is large, a more accurate pose can be obtained.
- the first body shape map so that the synthetic effect of the synthetic image can be improved.
- the determining the first semantic segmentation map of the target object in the target pose includes: determining a second semantic segmentation map of the original image; wherein, the second semantic segmentation map It is used to indicate the wearing features of the body parts of the target object in the original posture; and the first semantic segmentation graph is generated based on the second semantic segmentation graph.
- the accuracy of the first semantic segmentation map can be improved by determining the first semantic segmentation map through the second semantic segmentation map, thereby further improving the effect of image synthesis.
- the generating the first semantic segmentation graph based on the second semantic segmentation graph includes: acquiring a second body model of the target object in a target pose; wherein the first The second body model is determined based on a first body model and the target posture parameters, the first body model is a body model of the target object constructed based on the original image; based on the second body model and the The second semantic segmentation map is used to generate the first semantic segmentation map.
- the second semantic segmentation graph includes an original semantic graph and an original segmentation graph;
- the first semantic segmentation graph includes a target segmentation graph and a target semantic graph;
- the second semantic segmentation based on Generating the first semantic segmentation graph includes: generating a target segmentation graph in which the target object is in the target pose based on the original segmentation graph; wherein, the target segmentation graph includes that the target object is in the target pose The wearing features of all body parts during the posture; the target semantic map of the target object in the target pose is generated based on the original semantic map; wherein, the target semantic map includes the target object in the target segmentation map Dressing characteristics of individual body parts.
- the method further includes: obtaining a second semantic segmentation map of the original image, and obtaining a second body shape map of the target object in the original posture; wherein, the second semantic segmentation The graph is used to characterize the position information of each limb part of the target object in the original posture, and the second body shape graph carries the complete wearing characteristics of the target object; based on the first body shape graph and the Determining a composite image from the first semantic segmentation map includes: determining to carry the target object based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map Composite images of the complete wearing features of .
- more complete wearing features of the target object and denser texture information of the target object can be obtained, so as to obtain a composite image with better composite effect.
- the target object Synthetic image of the complete wearing features of including: inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing , to obtain the composite image.
- the synthetic image is generated through the image generation network, which can improve the synthetic effect of the synthetic image, so as to obtain a more realistic synthetic image.
- an embodiment of the present disclosure provides an image generation device, including: an acquisition unit, configured to acquire an original image; wherein, the original image contains a target object in an original posture; a first determination unit, configured to The original image and the target pose parameters determine the first body shape map of the target object in the target pose; and determine the first semantic segmentation map of the target object in the target pose; wherein, the first The semantic segmentation map includes the wearing features of the body parts of the target object; the third determination unit is configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image contains The target object is in the target pose and has the wearing feature.
- an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing
- the processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
- the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the above-mentioned first aspect, or any possible implementation of the first aspect is executed. steps in the method.
- FIG. 1 shows a flow chart of an image generation method provided by an embodiment of the present disclosure
- Fig. 2 shows a schematic diagram of the effect of a target gesture provided by an embodiment of the present disclosure
- Fig. 3 shows a schematic diagram of the effect of a first body shape map provided by an embodiment of the present disclosure
- Fig. 4 shows a schematic diagram of the effect of a second body shape map provided by an embodiment of the present disclosure
- Fig. 5 shows a schematic diagram of the effect of a source gesture provided by an embodiment of the present disclosure
- Fig. 6(a) shows a schematic diagram of the effect of a segmentation map of the foreground region of an original image provided by an embodiment of the present disclosure
- Fig. 6(b) shows a schematic diagram of the effect of a semantic map of the foreground area of an original image provided by an embodiment of the present disclosure
- Fig. 7(a) shows a schematic diagram of the effect of a segmentation map of a target object in a target pose provided by an embodiment of the present disclosure
- Fig. 7(b) shows a schematic diagram of the effect of a semantic map of a target object in a target pose provided by an embodiment of the present disclosure
- FIG. 8 shows a schematic diagram of the effect of a synthesized image provided by an embodiment of the present disclosure
- FIG. 9 shows a flow chart of another image generation method provided by an embodiment of the present disclosure.
- FIG. 10 shows a schematic diagram of an image generating device provided by an embodiment of the present disclosure
- Fig. 11 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
- the 3D human body model can obtain the texture image of the source image in the target pose, and then the adversarial generation network can generate a new image of the target object in the target pose through the texture image and the target pose. Through this processing method, the network processing pressure of the confrontation generation network can be relieved, thereby improving the synthesis effect of the image.
- the 3D human body model commonly used in the related art is a parametric human body model, such as a skinned multi-person linear model (Skinned Multi-Person Linear, SMPL) model.
- SMPL skinned multi-person linear model
- the SMPL model can construct the skeleton model of the human body, and the surface texture corresponding to the human body model can be extracted through the SMPL model, for example, the texture of the clothes worn and the texture of the skin.
- the SMPL model cannot obtain the texture information and geometric information outside the human body model. Therefore, it is difficult for the SMPL model to have a good synthesis effect on the pictures of the human body wearing loose clothing.
- the present disclosure provides an image generation method, device, electronic equipment and storage medium.
- the image generation method provided by the embodiments of the present disclosure can be applied in fields such as motion transfer, appearance transfer, virtual fitting, and the like.
- the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.
- the execution subject of the image generation method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability.
- the device includes, for example: a terminal device or a server or other processing device, and the terminal device may be a user device (mobile device, user terminal, terminal, handheld device, computing device, wearable device, etc.
- the image generation Methods can be implemented by a processor invoking computer readable instructions stored in a memory.
- FIG. 1 is a flow chart of an image generation method provided by an embodiment of the present disclosure, the method includes steps S101 to S107, wherein:
- S101 Acquire an original image; wherein, the original image includes a target object in an original pose.
- S103 Based on the original image and target posture parameters, determine a first body shape map of the target object in a target posture.
- the original image is a 2D image
- the target pose parameter can be an image containing the target pose, or the target pose parameter can also be the position information of limb key points used to characterize the target pose;
- the body pose of each body part of the target object is a 2D image
- the first body model of the target object in the original pose can be reconstructed based on the original image, and the texture map of the original image relative to the first body model, the texture map carrying the limb parts of the target object in the original image Texture information on the body, for example, the texture information may be skin texture information of the target object, and/or clothing texture information of clothing worn on body parts of the target object.
- the model parameters of the first body model can be adjusted based on the target pose parameters to obtain the body model of the target object in the target pose, that is, the second body model, and determine the first body shape map based on the second body model and the texture map .
- the first body shape map determined in the above manner also includes texture information on the limb parts of the target object in the original image.
- the object located in the foreground area may be determined from the original image as the target object; or, the object with the highest degree of limb integrity in the original image may be determined as The target object; alternatively, the target object may also be determined according to the user's selection operation on multiple objects.
- S105 Determine a first semantic segmentation map of the target object in the target pose; wherein the first semantic segmentation map is used to indicate wearing features of body parts of the target object.
- the first semantic segmentation map may be determined based on the semantic segmentation map of the original image and the above-mentioned second body model, and the specific determination process is described in detail in the following embodiments.
- the semantic segmentation map of the original image is used to indicate the wearing characteristics of each limb part of the target object in the original pose.
- Part labels of various body parts of the target object may be included in the semantic segmentation map of the original image, so as to distinguish the body parts of the target object through the part labels.
- the body of the target object may be divided into different body parts, for example, may be divided into the following body parts: hair part, face, upper torso part, hand, leg and foot.
- the part labels may be pixel points corresponding to corresponding pixel values, wherein different part labels correspond to different pixel values.
- each body part can be represented by the pixel.
- pixel points with corresponding pixel values may be displayed in the part region where each limb part is located in the semantic segmentation map.
- the wearing characteristics of each body part of the target object in the original pose can be determined by combining the original image and the part label.
- the first semantic segmentation map can contain the part labels of the various limb parts of the target object, and the part labels of the same limb parts in the semantic segmentation map of the original image and the first semantic segmentation map are the same, that is, by corresponding to the same Pixel values are represented by pixel points.
- the first body shape map contains the texture information on the limb parts of the target object in the original image, it does not contain the complete wearing characteristics of the target object in the original image. Therefore, in order to obtain the complete wearing characteristics of the target object, the first semantic segmentation map of the target object in the target pose can be determined based on the semantic segmentation map of the original image and the above-mentioned second body model, and then, based on the first semantic segmentation map, it can be determined The complete wearing characteristics of the subject's body part.
- the target object in the original image is wearing looser clothes, such as a skirt.
- the first body shape map determined based on the original image and the target pose parameters may contain the texture information located on the limb parts of the target object in the original image, but not the texture information located outside the limb parts, for example, not including the texture information located on the limb parts of the target object. Texture information for the skirt area outside of the body parts.
- the complete wearing features of the target object can be obtained.
- a synthetic image including complete wearing features can be obtained.
- S107 Determine a composite image based on the first body shape map and the first semantic segmentation map, where the composite image includes a target object in the target pose and having the wearing feature.
- the first semantic segmentation map may contain part labels of each body part of the target object. Therefore, the semantic layout information in the first semantic segmentation map can be represented by the part label. It can be understood that the semantic layout information can be used to indicate the limb layout information of each limb part of the target object in the target pose.
- the first body shape map and the first semantic segmentation map can be semantically synthesized through the image generation network, so that the semantic layout information in the first semantic segmentation map can be preserved in the resulting synthetic image.
- step S101 to step S107 will be described in detail below in conjunction with the specific implementation manners.
- step S103: based on the original image and target posture parameters, determining the first body shape map of the target object in the target posture includes the following steps:
- Step S1031 constructing a first body model of the target object based on the original image, and determining a texture map of the original image based on the first body model;
- Step S1032 Determine a first body shape map of the target object in a target pose based on the texture map and the target pose parameters.
- the first body model can be obtained by performing three-dimensional reconstruction on the original image, wherein the first body model can be a three-dimensional body model including the body structure and body posture of the target object, for example, the three-dimensional body model It can be an SMPL model, or it can also be other types of three-dimensional body models.
- the body structure of the target object may be understood as each body part of the target object, and the body posture of the target object may be understood as the body posture of each body part of the target object.
- texture reconstruction can be performed based on the original image and the first body model, so as to obtain the texture map of the original image.
- the specific reconstruction principle is described as follows:
- the projection relationship is used to indicate the corresponding relationship between the vertices in the first body model and the pixels in the original image.
- the pixel values of the pixel points corresponding to the vertices of the first body model in the original image are determined. Obtain the correspondence between the vertices of the first body model and the coordinates in the texture map, and then render the pixels corresponding to the vertices of the first body model to the texture map based on the correspondence, so as to obtain the texture of the original image stickers.
- the texture map of the original image determined in the above manner contains texture information on the body parts of the target object in the original image, for example, contains the texture information of the target object Skin texture information, or clothing texture information including clothing worn on body parts of the target object.
- the first body shape map can be determined in combination with the target pose parameters and the texture map.
- the second body model of the target object in the target pose can be determined based on the target pose parameters, and then the texture map is rendered on the second body model, so as to obtain the first body shape map of the target object in the target pose.
- the texture map of the original image contains the texture information of the limb parts of the target object
- the first body shape map determined based on the texture map can not only represent the body posture of each limb part of the target object in the target pose, but the first body
- the morphology map also contains texture information on the limb parts of the target object.
- a new image is generated by combining a 2D image and a target pose as input and using an adversarial generation network.
- This method relies on the generative network to extract feature information in the original image, for example, extracting the pose features of the target object and the clothing features of the target object in the original image.
- related techniques often fail to obtain a composite image with a more accurate pose.
- the first body shape map with a more accurate pose can still be obtained, thereby improving the synthesis effect of the synthesized image.
- the first body shape map and the first semantic segmentation map may also be semantically synthesized through an image generation network, so as to obtain a synthesized image.
- the technical solution of the present disclosure can first determine the above-mentioned first body shape map based on the texture map carrying the texture information of the body part of the target object, so that it can extract Get the texture information of the body part of the target object.
- the feature information extracted by the image generation network can be reduced, thereby alleviating the network processing pressure of the image generation network, and further improving the image synthesis effect.
- step S1031 constructing the first body model of the target object based on the original image, specifically includes the following steps:
- Step S10311 Determine model parameters of the first body model based on the original image; wherein, the model parameters are used to indicate pose information of the target object in the original image;
- Step S10312 Adjust the position information of each vertex in the initial model based on the model parameters
- Step S10313 Generate the first body model based on the adjusted position information of each vertex.
- the original image may be subjected to pixel value normalization processing, so that the pixel value of each pixel point in the original image is normalized to the range [-a, a] Numerical values, wherein the value of a can be 1, or other normalized numerical values, and the value of a can be set according to the actual needs of normalization, which is not specifically limited in the present disclosure.
- the picture size of the original image can be modified. For example, the picture size of the original image can be set to 224*224 and 512*512 to meet the input requirements of the Human Mesh Recovery (HMR) model.
- HMR Human Mesh Recovery
- the processed original image can be input into the HMR model for processing, so that the output result of the HMR model can be used as the model parameter of the first body model (that is, the model parameter of the SMPL model ).
- model parameters may be parameters used to characterize pose information of the target object included in the original image, wherein the model parameters may include pose parameters, shape parameters and/or camera parameters.
- the camera parameters describe the pose information of the camera through the values of K1 dimensions.
- the value of K1 may be 3, that is, the camera parameters may include internal parameters, external parameters and distortion parameters.
- the shape parameter describes the shape of the target object through the value of K2 dimensions, and the value of each dimension can be interpreted as an indicator of the shape of the target object, such as height, shortness, fatness, and head-to-body ratio.
- the shape change of the body model can be controlled by controlling the values of the K2 dimensions.
- the value of K2 may be 10.
- Action parameters describe the action posture of the target object at a certain moment through the value of K3 dimensions, where the values of K3 dimensions are used to represent the angles of multiple limb joint points of the target object relative to each axis, among which, K3 dimensions
- K3 dimensions The value of can be, for example, the value of 72 (24*3) dimensions, 24 represents the limb joint points of 24 well-defined target objects, and 3 in 72(24*3) means that the limb joint points are aimed at An axis-angle representation of the rotation angle of its parent node.
- the position information of each vertex in the initial model can be adjusted based on the model parameters.
- the initial model may be a pre-created body model in the SMPL model that matches the object type of the target object.
- the target object is a human being
- the preset initial model may be a pre-created human body model.
- the initial model contains multiple vertices and triangular patches determined based on multiple vertices.
- the number of vertices in the initial model can be set by the user according to actual needs; or, the number of vertices in the initial model is the same as that of the target object associated with the object type; or, the number of vertices in the initial model is the default value.
- a matching initial model can be built in advance according to the object type of the target object. For example, P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches are firstly created, and then the body of the target object is represented by the P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches.
- the P 3D vertices can be understood as P bone points of the initial model
- the triangle patch can be understood as triangles formed based on 3D vertex vertex, and each triangle corresponds to three 3D vertex vertex.
- the vertex coordinates of each vertex in the initial model after action deformation and body shape deformation can be determined, and then the position information of each vertex in the initial model can be adjusted based on the vertex coordinates, and the adjusted vertices can be used to determine the triangle surface , and the first body model is composed of several triangle faces.
- texture extraction may be performed on the first body model (eg, SMPL model), so as to extract a texture map.
- first body model eg, SMPL model
- each vertex has a corresponding semantic content, wherein the semantic content is used to represent the corresponding relationship between the vertex and the pixel points in the original image.
- the mapping relationship between the vertices and each pixel in the texture map can be constructed.
- texture reconstruction can be performed based on the mapping relationship, so as to obtain the texture map of the original image. For example, based on the mapping relationship, the pixel value of the pixel point matching the corresponding vertex in the original image can be determined, and then the pixel value of the corresponding coordinate point in the texture map can be determined based on the pixel value, so as to complete the texture reconstruction and obtain the original image. texture map.
- step S1032 determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters, specifically includes the following steps:
- Step S10321 Determine a second body model of the target object in a target pose based on the first body model and the target pose parameters;
- Step S10322 Based on the mapping relationship between the pixels of the texture map and the vertices, render the texture map to the second body model to obtain the first body shape map.
- the action parameters and shape parameters of the target object in the target pose may be determined based on the target pose parameters, and then the second body model is determined by modifying the action parameters and shape parameters of the first body model.
- the target pose parameter is image data, wherein the image data contains the target pose
- the image data can be processed based on the HMR model, so as to obtain the action parameters and shape parameters of the target object in the target pose; after that , the vertex coordinates of each vertex in the first body model can be modified based on the action parameters and shape parameters of the target object in the target pose, so as to obtain the second body model through adjustment.
- the vertex coordinates of each vertex in the first body model can be directly adjusted based on the action parameters and shape parameters, so as to obtain the second body model.
- the target pose parameter is a pose parameter of another class
- the target pose parameter can be converted into an action parameter and a shape parameter of the target object in the target pose.
- the texture map can be rendered in the second body model based on the mapping relationship between each pixel point and each vertex in the texture map, so that it can be based on the rendering after rendering
- a first body shape map is determined. For example, assuming that the target posture is the posture shown in FIG. 2 , a first body shape diagram as shown in FIG. 3 can be obtained. It can be seen from the first body shape diagram shown in FIG. 3 that the first body shape diagram only includes the body features (eg, posture) of the target object, but does not include the clothing features of the target object.
- the three-dimensional body model by combining the three-dimensional body model to determine the first body shape map of the target object in the target pose, when the difference between the original pose and the target pose is large, a more accurate pose can be obtained.
- the first body shape map so that the synthetic effect of the synthetic image can be improved.
- step S105: determining the first semantic segmentation map of the target object in the target pose specifically includes the following steps:
- Step S1051 Determine the second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate the wearing characteristics of the body parts of the target object in the original posture;
- Step S1052 Generate the first semantic segmentation graph based on the second semantic segmentation graph.
- steps S10321 and S10322 may be performed before step S1051 and step S1052, or may be performed after step S1051 and step S1052, which is not specifically limited in the present disclosure.
- the original image may be processed to obtain a semantic segmentation map of the original image, that is, a second semantic segmentation map.
- image segmentation processing may be performed on the original image through an image segmentation network, so as to obtain the second semantic segmentation map.
- the original pose of the target object in the original image is the pose shown in Figure 5
- the second semantic segmentation map of the original image can be the semantic segmentation map shown in Figure 6(a) and Figure 6(b) .
- Fig. 6(a) shows the segmentation map of the foreground region of the original image
- Fig. 6(b) shows the semantic map of the foreground region of the original image.
- the segmentation map of the foreground region of the original image is used to characterize the segmentation result of the target object in the original image, for example, the body contour information of the target object.
- the semantic map of the foreground area of the original image is used to represent the semantic information of the target object in the original image.
- the semantic information for example, represents the various body parts of the target object in the original image.
- the wearing features of each body part of the target object in the original posture can be determined based on the second semantic segmentation map and the original image.
- a first semantic segmentation graph in which the target object is in the target pose can be determined based on the semantic segmentation graph.
- the first semantic segmentation graph includes a semantic graph (ie, the following target semantic graph) and a segmentation graph (ie, the following target segmentation graph) in which the target object is in the target pose.
- the accuracy of the first semantic segmentation map can be improved by determining the first semantic segmentation map through the second semantic segmentation map, thereby further improving the effect of image synthesis.
- the complete wearing characteristics of the target object in the target pose can be obtained, thereby solving the problem that the SMPL model cannot obtain texture information and geometric information outside the human body model, and further improving the image synthesis effect. For targets wearing looser clothing Objects can still get better image synthesis results.
- step S1052: generating the first semantic segmentation graph based on the second semantic segmentation graph specifically includes the following steps:
- Step S11 Obtain a second body model of the target object in a target posture, wherein the second body model is determined based on the first body model and the target posture parameters, and the first body model is determined based on the Constructing a body model of the target object from the original image;
- Step S12 Generate the first semantic segmentation map based on the second body model and the second semantic segmentation map.
- the second body model is a three-dimensional body model determined based on the first body model and target posture parameters, and the specific determination process is described as follows:
- the action parameters and shape parameters of the target object in the target pose can be determined based on the target pose parameters, and then the second body model can be determined by modifying the action parameters and shape parameters of the first body model.
- the target pose parameter may be image data, or data including action parameters and shape parameters of the target object in the target pose, or other types of pose parameters.
- the process of determining the second body model based on various types of target posture parameters is as described in the embodiment corresponding to the above step S10321 and step S10322, and will not be described in detail here.
- the second body model and the second semantic segmentation map can be input into the target sub-network for processing, so as to obtain the first semantic segmentation map.
- the semantic segmentation graph shown in Figure 7(a) and Figure 7(b) can be obtained, that is, the first A semantic segmentation map.
- the first semantic segmentation map contains the semantic map and the segmentation map of the target object in the target pose.
- the second semantic segmentation map includes the original semantic map (ie, the semantic map of the foreground area) and the original segmentation map (ie, the segmentation map of the foreground area).
- step S1052 generating the first semantic segmentation graph based on the second semantic segmentation graph, specifically includes the following steps:
- Step S21 Generate a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes wearing features of all body parts of the target object in the target pose;
- Step S22 Generate a target semantic map in which the target object is in the target pose based on the original semantic map; wherein, the target semantic map includes wearing features of each body part of the target object in the target segmentation map .
- the original segmentation map and the second body model can be input into the target sub-network A for processing, so as to obtain the target segmentation map of the target object in the target pose, for example, the segmentation shown in Figure 7(a) picture.
- the original semantic graph and the second body model can be input into the target sub-network B for processing, so as to obtain the target semantic graph of the target object in the target pose, for example, the semantic graph shown in Figure 7(b) is obtained.
- the target subnetwork A may be a generation network with the same or different structure as the target subnetwork B, wherein the output result of the target subnetwork A is a segmentation graph, and the output result of the target subnetwork B is a semantic graph.
- the target sub-network can be any kind of convolutional neural network, and the network structure of the target sub-network A and the target sub-network B are the same, but the network parameters are different, that is, the target sub-network A and the target sub-network B are obtained through their respective
- the training sample is the neural network obtained after training the same convolutional neural network.
- the target segmentation map and the target semantic map may be determined as the first semantic segmentation map in which the target object is in the target pose.
- the target pose is the pose shown in Figure 2
- the semantic segmentation graphs shown in Figure 7(a) and Figure 7(b) can be obtained.
- the clothing segmentation map of the target object in the original image can be determined based on the target segmentation map in the first semantic segmentation map, for example, based on the The segmentation map is extracted from the original image to obtain the wearing segmentation map of the target object; and determine the wearing semantic map of the target object in the original image based on the target semantic map in the first semantic segmentation map, for example, based on the semantic map of Figure 7(b)
- a wearing semantic map that carries wearing features of each body part of a target object is determined from raw images.
- the embodiment of the present disclosure further includes the following steps: obtaining a second semantic segmentation map of the original image, and obtaining a second body shape map of the target object in the original posture; wherein, the first The second semantic segmentation map is used to characterize the position information of each body part of the target object in the original posture, and the second body shape map carries the complete wearing characteristics of the target object.
- step S107 determine a composite image carrying the complete wearing characteristics of the target object based on the first body shape map and the first semantic segmentation map, including:
- Step S1071 Based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determine a composite image carrying the wearing characteristics of the target object.
- the second semantic segmentation map of the original image before obtaining the synthesized image, the second semantic segmentation map of the original image can also be obtained, and the second body shape map of the target object in the source pose can be obtained; wherein, the second semantic segmentation map
- the second semantic segmentation map of the original image determined in the manner described above may be, for example, the segmentation map shown in FIG. 6(a) and the semantic map shown in FIG. 6(b).
- the second body shape map may be a body map of the target object in the original pose determined based on the original image.
- the second body shape map determined based on the original image carries the complete wearing characteristics of the target object.
- the wearing body map of the target object in the source pose can be extracted from the original image, so as to obtain the wearing body map shown in Figure 4 (i.e., the second body shape diagram).
- the synthetic image carrying the wearing characteristics of the target object specifically including:
- the image generation network may be a space-adaptive normalization network, wherein the space-adaptive normalization network may be trained based on an adversarial generation network.
- the spatial adaptive normalization network can be a SPADE (Spatial Adaptive Normalization) network; wherein, the SPADE network can convert the segmentation layout map into a realistic picture.
- the image generation network can also be other networks, for example, other networks that can replace the spatial adaptive normalization network, or can realize Other networks of the network function of the spatial adaptive normalization network, which are not specifically limited in the present disclosure.
- the Get a composite image After inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into the image generator and the spatial adaptive normalization network for processing, the Get a composite image.
- a composite image as shown in FIG. 8 can be obtained. It can be seen from FIG. 8 that the composite image contains the complete wearing features of the target object, and better image synthesis results can still be obtained for the target object wearing loose clothing.
- the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possible
- the inner logic is OK.
- FIG. 9 is a flow chart of another image generation method provided by an embodiment of the present disclosure. As shown in FIG. 9, the method includes steps S901-S908, wherein:
- Step S901 Acquire original image and target data.
- the original image may be an image including a target human body captured by a camera of an electronic device, and the target human body is the target object in the foregoing embodiments.
- target pose parameters are included in the target data.
- the target data may be an image containing the object in the target pose, or a video containing the object in the target pose.
- the original image may be subjected to pixel value normalization processing, so that the pixel value of each pixel point in the original image is normalized to the range [-a, a] Numerical values, wherein the value of a can be 1, or other normalized numerical values, and the value of a can be set according to the actual needs of normalization, which is not specifically limited in the present disclosure.
- the picture size of the original image can be modified. For example, the picture size of the original image can be set to 224*224 and 512*512 to meet the input requirements of the HMR (Human Mesh Recovery) human body pose reconstruction model.
- HMR Human Mesh Recovery
- Step S902 Determine model parameters based on the original image; wherein, the model parameters are used to indicate pose information of the target object in the original image.
- the processed original image may be input into the HMR model for processing, so that the output result of the HMR model is used as a model parameter, for example, a model parameter of the SMPL model.
- Step S903 Construct a first body model based on model parameters.
- the position information of each vertex in the initial model may be adjusted based on the model parameters; and the first body model may be generated based on the adjusted position information of each vertex.
- the initial model may be a pre-created body model in the SMPL model that matches the object type of the target object.
- the preset initial model may be a pre-created human body model.
- the initial model contains multiple vertices and triangular faces determined based on the multiple vertices.
- a matching initial model can be built in advance according to the object type of the target object. For example, P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches are firstly created, and then the body of the target object is represented by the P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches.
- the P 3D vertices can be understood as P bone points of the initial model
- the triangle patch can be understood as triangles formed based on 3D vertex vertex, and each triangle corresponds to three 3D vertex vertex.
- the vertex coordinates of each vertex in the initial model after action deformation and body shape deformation can be determined, and then the position information of each vertex in the initial model can be adjusted based on the vertex coordinates, and the adjusted vertices can be used to determine the triangle surface , and the first body model is composed of several triangle faces.
- Step S904 Determine the texture map of the original image based on the first body model.
- the first body model may be projected to the original image, so as to establish a projection relationship between the first body model and the original image.
- the projection relationship is used to indicate the corresponding relationship between the vertices in the first body model and the pixels in the original image.
- the pixel values of the pixel points corresponding to the vertices of the first body model in the original image are determined.
- Obtain the correspondence between the vertices of the first body model and the coordinates in the texture map and then render the pixels corresponding to the vertices of the first body model to the texture map based on the correspondence, so as to obtain the texture of the original image stickers.
- Step S905 Determine the first body shape map of the target object in the target pose based on the texture map and the target pose parameters.
- the second body model of the target object in the target pose may be determined based on the first body model and the target pose parameters. Then, based on the mapping relationship between the pixels of the texture map and the vertices in the second body model, the texture map can be rendered to the second body model to obtain the first body shape map.
- Step S906 Determine the first semantic segmentation map in which the target object is in the target pose.
- the second body model determined in step S905 may be obtained, and a second semantic segmentation map of the original image may be determined; wherein, the second semantic segmentation map is used to indicate that the limbs of the target object are in the original posture The wearing feature of the part, and then based on the second body model and the second semantic segmentation map, a first semantic segmentation map is generated.
- the second semantic segmentation map contains the original segmentation map and the original semantic map of the target object in the original pose.
- the target segmentation map under the target pose can be generated through the original segmentation map and the second body model, and the target semantic map under the target pose can be generated through the original semantic map and the second body model, wherein the target object is in
- the first semantic segmentation map under the target pose includes the target segmentation map and the target semantic map.
- the target segmentation map includes all body parts of the target object in the target pose Wearing features; and inputting the original semantic map and the second body model into the target sub-network to obtain the target semantic map of the target object in the target pose, the target semantic map includes the various body parts of the target object in the target segmentation map clothing characteristics.
- Step S907 Obtain a second body shape map of the target object in the original posture.
- Step S908 Based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determine a composite image carrying the complete wearing characteristics of the target object.
- the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map can be input into an image generation network for processing to obtain the synthesized image.
- the image generation network may be a space-adaptive normalization network, wherein the space-adaptive normalization network may be trained based on an adversarial generation network.
- more accurate body shape information of the target object can be obtained by determining the first body shape map based on the original image and target pose parameters; after that, by determining the first semantic segmentation map, the complete body shape map of the target object can be obtained. Therefore, the problem that the existing parametric human body model cannot obtain the texture information and geometric information outside the human body model is solved.
- the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.
- the embodiment of the present disclosure also provides an image generating device corresponding to the image generating method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned image generating method in the embodiment of the present disclosure, the implementation of the device Reference can be made to the implementation of the method, and repeated descriptions will not be repeated.
- the device includes: an acquisition unit 10, a first determination unit 20, and a second determination unit 30; wherein,
- An acquisition unit 10 configured to acquire an original image; wherein, the original image includes a target object in an original posture;
- the first determining unit 20 is configured to determine, based on the original image and the target pose parameters, a first body shape map of the target object in the target pose; and determine a first semantic representation of the target object in the target pose Segmentation diagram; wherein, the first semantic segmentation diagram is used to indicate the wearing characteristics of the limb parts of the target object;
- the second determining unit 30 is configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target.
- the first determination unit is further configured to: construct a first body model of the target object based on the original image, and determine a texture map of the original image based on the first body model; A first body shape map of the target object in a target pose is determined based on the texture map and the target pose parameters.
- the first determining unit is further configured to: determine a model parameter of the first body model based on the original image; wherein the model parameter is used to indicate the target object in the original image attitude information; adjusting the position information of each vertex in the initial model based on the model parameters; generating the first body model based on the adjusted position information of each vertex.
- the first determining unit is further configured to: determine a second body model of the target object in a target pose based on the first body model and the target pose parameter; The mapping relationship between the pixels and the vertices, rendering the texture map to the second body model to obtain the first body shape map.
- the first determining unit is further configured to: determine a second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate that the limbs of the target object are in the original posture The wearing feature of the part; generating the first semantic segmentation map based on the second semantic segmentation map.
- the first determining unit is further configured to: acquire a second body model of the target object in a target posture; wherein, the second body model is based on the first body model and the target Determined by posture parameters, the first body model is the body model of the target object constructed based on the original image; based on the second body model and the second semantic segmentation map, the first semantic segmentation map is generated Split graph.
- the first determining unit is further configured to : Generate a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes the wearing features of all body parts when the target object is in the target pose; based on the The original semantic map generates a target semantic map in which the target object is in the target pose; wherein, the target semantic map includes wearing features of various body parts of the target object in the target segmentation map.
- the device is also used to: obtain a second semantic segmentation map of the original image, and obtain a second body shape map of the target object in the original posture; wherein, the second semantic segmentation map It is used to characterize the position information of each limb part of the target object in the original posture, and the second body shape map carries the complete wearing characteristics of the target object; the second determination unit is also configured to: based on the first The body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map determine a composite image carrying the wearing characteristics of the target object.
- the second determining unit is further configured to: input the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map to the image generation network for processing to obtain the composite image.
- the embodiment of the present disclosure also provides an electronic device 1100, as shown in FIG. 11, which is a schematic structural diagram of the electronic device 1100 provided in the embodiment of the present disclosure, including:
- Processor 111 memory 112, and bus 113; memory 112 is used for storing execution order, comprises memory 1121 and external memory 1122; memory 1121 here is also called internal memory, is used for temporarily storing the operation data in processor 111, and The data exchanged by the external memory 1122 such as hard disk, the processor 111 exchanges data with the external memory 1122 through the memory 1121, when the electronic device 1100 is running, the processor 111 communicates with the memory 112 through the bus 113, so that The processor 111 executes the following instructions:
- the original image contains the target object in the original posture
- a first body shape map of the target object in the target pose Based on the original image and target pose parameters, determine a first body shape map of the target object in the target pose; and determine a first semantic segmentation map of the target object in the target pose; wherein, the first a semantic segmentation map comprising wearing features of body parts of the target object;
- a composite image is determined based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing feature.
- Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image generation method described in the above-mentioned method embodiments are executed.
- the storage medium may be a volatile or non-volatile computer-readable storage medium.
- the embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the image generation method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
- the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof.
- the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
- a software development kit Software Development Kit, SDK
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor.
- the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
Provided in the present disclosure are an image generation method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring an original image, wherein the original image includes a target object in an original pose; on the basis of the original image and a target pose parameter, determining a first body posture diagram of the target object in a target pose; determining a first semantic segmentation diagram of the target object in the target pose, wherein the first semantic segmentation diagram is used for indicating a wearing feature of a body part of the target object; and on the basis of the first body posture diagram and the first semantic segmentation diagram, determining a composite image, wherein the composite image comprises the target object, which is in the target pose and has the wearing feature.
Description
相关申请的交叉引用Cross References to Related Applications
本公开要求于2022年02月28日提交的、申请号为202210187560.7的中国专利申请的优先权,该申请以引用的方式并入本文中。This disclosure claims the priority of the Chinese patent application with application number 202210187560.7 filed on February 28, 2022, which is incorporated herein by reference.
本公开涉及图像处理的技术领域,具体而言,涉及一种图像生成方法、装置、电子设备以及存储介质。The present disclosure relates to the technical field of image processing, and in particular, to an image generation method, device, electronic equipment, and storage medium.
图像生成领域在近年来有了非常蓬勃的发展,其中包括了人体图像补全,动作迁移,外观迁移等功能的人体图像生成在角色动画、虚拟试穿,电影或游戏等方面具有巨大的潜在应用价值。在相关技术中,通过结合2D图像和目标姿势作为输入,并运用对抗生成网络来生成新的图像。然而,在2D图像中源姿势和目标姿势之间差异较大的情况下,通过相关技术中依赖于通过神经网络来获取2D图像的特征信息的方式所得到的图像的合成效果较差。The field of image generation has developed very vigorously in recent years, including human body image completion, motion migration, appearance migration and other functions. Human body image generation has huge potential applications in character animation, virtual try-on, movies or games, etc. value. In related techniques, a new image is generated by combining a 2D image and a target pose as input and employing an adversarial generative network. However, in the case of a large difference between the source pose and the target pose in the 2D image, the synthesis effect of the image obtained by relying on the neural network to obtain the feature information of the 2D image in the related art is poor.
发明内容Contents of the invention
本公开实施例至少提供一种图像生成方法、装置、电子设备以及存储介质。Embodiments of the present disclosure at least provide an image generation method, device, electronic equipment, and storage medium.
第一方面,本公开实施例提供了一种图像生成方法,包括:获取原始图像;其中,所述原始图像中包含处于原始姿势的目标对象;基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图;以及确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第一语义分割图用于指示所述目标对象的肢体部位的穿着特征;基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的所述目标对象。In the first aspect, an embodiment of the present disclosure provides an image generation method, including: acquiring an original image; wherein, the original image includes a target object in an original pose; based on the original image and target pose parameters, determining the A first body shape map of the target object in a target pose; and a first semantic segmentation map for determining that the target object is in the target pose; wherein the first semantic segmentation map is used to indicate the limbs of the target object Wearing characteristics of parts; determining a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing characteristics.
上述实施方式中,首先,可以通过基于原始图像和目标姿势参数确定第一身体形态图,从而得到更加准确的目标对象的身体形态信息;之后,通过确定第一语义分割图,可以得到目标对象完整的穿着特征,从而解决现有三维身体模型无法获得位于人体模型外的纹理信息和几何信息的问题。在基于第一身体形态图和第一语义分割图确定合成图像时,可以提升图像合成效果,在原始姿势和目标姿势之间差异较大的情况下,依然可以得到姿态更加准确的合成图像,并且对于穿着较宽松服饰的目标对象依然可以得到较好的图像合成结果。In the above embodiment, first, the first body shape map can be determined based on the original image and the target pose parameters, so as to obtain more accurate body shape information of the target object; then, by determining the first semantic segmentation map, the complete body shape information of the target object can be obtained. The wearing characteristics of the body model can solve the problem that the existing 3D body model cannot obtain the texture information and geometric information located outside the human body model. When the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.
一种可选的实施方式中,所述基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图,包括:基于所述原始图像构建所述目标对象的第一身体模型,并基于所述第一身体模型确定所述原始图像的纹理贴图;基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图。In an optional implementation manner, the determining the first body shape map of the target object in the target pose based on the original image and the target pose parameters includes: constructing the target object's body shape map based on the original image A first body model, and determining a texture map of the original image based on the first body model; determining a first body shape map of the target object in a target pose based on the texture map and the target pose parameters.
相对于相关技术中依靠生成网络提取原始图像中全部特征信息的技术方案,本公开技术方案可以首先基于携带目标对象的肢体部位的纹理信息的纹理贴图确定上述第一身体形态图,从而可以提取出目标对象的肢体部位的纹理信息。在通过图像生成网络将该第一身体形态图和第一语义分割图进行语义合成时,可以减少图像生成网络所提取的特征信息,从而缓解图像生成网络的网络处理压力,进而提升图像合成效果。Compared with the technical solution in the related art that relies on the generation network to extract all the feature information in the original image, the technical solution of the present disclosure can firstly determine the above-mentioned first body shape map based on the texture map carrying the texture information of the body part of the target object, so that it can extract Texture information for the body parts of the target object. When semantically synthesizing the first body shape map and the first semantic segmentation map through the image generation network, the feature information extracted by the image generation network can be reduced, thereby alleviating the network processing pressure of the image generation network, and further improving the image synthesis effect.
一种可选的实施方式中,所述基于所述原始图像构建所述目标对象的第一身体模型,包括:基于所述原始图像确定第一身体模型的模型参数;其中,所述模型参数用于指示 所述原始图像中所述目标对象的姿态信息;基于所述模型参数调整初始模型中各顶点的位置信息;基于调整后的各顶点的位置信息生成所述第一身体模型。In an optional implementation manner, the constructing the first body model of the target object based on the original image includes: determining model parameters of the first body model based on the original image; wherein, the model parameters are used Indicating the posture information of the target object in the original image; adjusting the position information of each vertex in the initial model based on the model parameters; generating the first body model based on the adjusted position information of each vertex.
上述实施方式中,在原始姿势和目标姿势之间差异较大的情况下,依然可以准确的确定出各个姿势下的身体形态信息,从而得到更加准确的第一身体模型,在基于该第一身体模型进行图像合成时,可以提高图像合成的效果。In the above-mentioned embodiment, when the difference between the original pose and the target pose is large, the body shape information in each pose can still be accurately determined, so as to obtain a more accurate first body model, based on the first body When the model performs image synthesis, it can improve the effect of image synthesis.
一种可选的实施方式中,所述基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图,包括:基于所述第一身体模型和所述目标姿势参数确定所述目标对象处于目标姿势下的第二身体模型;基于所述纹理贴图的像素点和各顶点之间的映射关系,将所述纹理贴图渲染至所述第二身体模型,得到所述第一身体形态图。In an optional implementation manner, the determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters includes: based on the first body model and the The target posture parameter determines the second body model of the target object in the target posture; based on the mapping relationship between the pixels of the texture map and the vertices, the texture map is rendered to the second body model, and The first body shape map.
上述实施方式中,通过结合三维的身体模型来确定目标对象处于目标姿势下的第一身体形态图的方式,在原始姿势和目标姿势之间的差异较大的情况下,可以得到姿态更加准确的第一身体形态图,从而可以提高合成图像的合成效果。In the above embodiment, by combining the three-dimensional body model to determine the first body shape map of the target object in the target pose, when the difference between the original pose and the target pose is large, a more accurate pose can be obtained. The first body shape map, so that the synthetic effect of the synthetic image can be improved.
一种可选的实施方式中,所述确定目标对象处于所述目标姿势下的第一语义分割图,包括:确定所述原始图像的第二语义分割图;其中,所述第二语义分割图用于指示所述目标对象在原始姿势下肢体部位的穿着特征;基于所述第二语义分割图生成所述第一语义分割图。In an optional implementation manner, the determining the first semantic segmentation map of the target object in the target pose includes: determining a second semantic segmentation map of the original image; wherein, the second semantic segmentation map It is used to indicate the wearing features of the body parts of the target object in the original posture; and the first semantic segmentation graph is generated based on the second semantic segmentation graph.
上述实施方式中,通过第二语义分割图确定第一语义分割图的方式,可以提高第一语义分割图的准确度,从而进一步提升了图像合成效果。In the above embodiments, the accuracy of the first semantic segmentation map can be improved by determining the first semantic segmentation map through the second semantic segmentation map, thereby further improving the effect of image synthesis.
一种可选的实施方式中,所述基于所述第二语义分割图生成所述第一语义分割图,包括:获取所述目标对象处于目标姿势下的第二身体模型;其中,所述第二身体模型为基于第一身体模型和所述目标姿势参数确定的,所述第一身体模型为基于所述原始图像所构建的所述目标对象的身体模型;基于所述第二身体模型和所述第二语义分割图,生成所述第一语义分割图。In an optional implementation manner, the generating the first semantic segmentation graph based on the second semantic segmentation graph includes: acquiring a second body model of the target object in a target pose; wherein the first The second body model is determined based on a first body model and the target posture parameters, the first body model is a body model of the target object constructed based on the original image; based on the second body model and the The second semantic segmentation map is used to generate the first semantic segmentation map.
一种可选的实施方式中,所述第二语义分割图包括原始语义图和原始分割图;所述第一语义分割图包括目标分割图和目标语义图;所述基于所述第二语义分割图生成所述第一语义分割图,包括:基于所述原始分割图生成所述目标对象处于所述目标姿势下的目标分割图;其中,所述目标分割图包含所述目标对象处于所述目标姿势时全部身体部位的穿着特征;基于所述原始语义图生成所述目标对象处于所述目标姿势下的目标语义图;其中,所述目标语义图包含所述目标分割图中所述目标对象的各个肢体部位的穿着特征。In an optional implementation manner, the second semantic segmentation graph includes an original semantic graph and an original segmentation graph; the first semantic segmentation graph includes a target segmentation graph and a target semantic graph; the second semantic segmentation based on Generating the first semantic segmentation graph includes: generating a target segmentation graph in which the target object is in the target pose based on the original segmentation graph; wherein, the target segmentation graph includes that the target object is in the target pose The wearing features of all body parts during the posture; the target semantic map of the target object in the target pose is generated based on the original semantic map; wherein, the target semantic map includes the target object in the target segmentation map Dressing characteristics of individual body parts.
一种可选的实施方式中,所述方法还包括:获取原始图像的第二语义分割图,并获取所述目标对象处于原始姿势下的第二身体形态图;其中,所述第二语义分割图用于表征所述目标对象在原始姿势下各个肢体部位的位置信息,所述第二身体形态图中携带所述目标对象的完整穿着特征;所述基于所述第一身体形态图和所述第一语义分割图确定合成图像,包括:基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的完整穿着特征的合成图像。In an optional implementation manner, the method further includes: obtaining a second semantic segmentation map of the original image, and obtaining a second body shape map of the target object in the original posture; wherein, the second semantic segmentation The graph is used to characterize the position information of each limb part of the target object in the original posture, and the second body shape graph carries the complete wearing characteristics of the target object; based on the first body shape graph and the Determining a composite image from the first semantic segmentation map includes: determining to carry the target object based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map Composite images of the complete wearing features of .
上述实施方式中,可以获得更加完整目标对象的穿着特征,以及获取更加稠密的目标对象的纹理信息,从而得到合成效果更好的合成图像。In the above embodiments, more complete wearing features of the target object and denser texture information of the target object can be obtained, so as to obtain a composite image with better composite effect.
一种可选的实施方式中,所述基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的完整穿着特征的合成图像,包括:将所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图输入至图像生成网络中进行处理,得到所述合成图像。In an optional implementation manner, based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map, it is determined to carry the target object Synthetic image of the complete wearing features of , including: inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing , to obtain the composite image.
上述实施方式中,通过图像生成网络生成合成图像,可以提高合成图像的合成效果,从而得到更加逼真的合成图像。In the above embodiments, the synthetic image is generated through the image generation network, which can improve the synthetic effect of the synthetic image, so as to obtain a more realistic synthetic image.
第二方面,本公开实施例提供了一种图像生成装置,包括:获取单元,用于获取原 始图像;其中,所述原始图像中包含处于原始姿势的目标对象;第一确定单元,用于基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图;以及确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第一语义分割图包含所述目标对象的肢体部位的穿着特征;第三确定单元,用于基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的所述目标对象。In a second aspect, an embodiment of the present disclosure provides an image generation device, including: an acquisition unit, configured to acquire an original image; wherein, the original image contains a target object in an original posture; a first determination unit, configured to The original image and the target pose parameters determine the first body shape map of the target object in the target pose; and determine the first semantic segmentation map of the target object in the target pose; wherein, the first The semantic segmentation map includes the wearing features of the body parts of the target object; the third determination unit is configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image contains The target object is in the target pose and has the wearing feature.
第三方面,本公开实施例还提供一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a third aspect, an embodiment of the present disclosure further provides an electronic device, including: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processing The processor communicates with the memory through a bus, and when the machine-readable instructions are executed by the processor, the above-mentioned first aspect, or the steps in any possible implementation manner of the first aspect are executed.
第四方面,本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In the fourth aspect, the embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the above-mentioned first aspect, or any possible implementation of the first aspect is executed. steps in the method.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments will be described in detail below together with the accompanying drawings.
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the accompanying drawings used in the embodiments. The accompanying drawings here are incorporated into the specification and constitute a part of the specification. The drawings show the embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those skilled in the art, they can also make From these drawings other related drawings are obtained.
图1示出了本公开实施例所提供的一种图像生成方法的流程图;FIG. 1 shows a flow chart of an image generation method provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的一种目标姿势的效果示意图;Fig. 2 shows a schematic diagram of the effect of a target gesture provided by an embodiment of the present disclosure;
图3示出了本公开实施例所提供的一种第一身体形态图的效果示意图;Fig. 3 shows a schematic diagram of the effect of a first body shape map provided by an embodiment of the present disclosure;
图4示出了本公开实施例所提供的一种第二身体形态图的效果示意图;Fig. 4 shows a schematic diagram of the effect of a second body shape map provided by an embodiment of the present disclosure;
图5示出了本公开实施例所提供的一种源姿势的效果示意图;Fig. 5 shows a schematic diagram of the effect of a source gesture provided by an embodiment of the present disclosure;
图6(a)示出了本公开实施例所提供的一种原始图像的前景区域的分割图的效果示意图;Fig. 6(a) shows a schematic diagram of the effect of a segmentation map of the foreground region of an original image provided by an embodiment of the present disclosure;
图6(b)示出了本公开实施例所提供的一种原始图像的前景区域的语义图的效果示意图;Fig. 6(b) shows a schematic diagram of the effect of a semantic map of the foreground area of an original image provided by an embodiment of the present disclosure;
图7(a)示出了本公开实施例所提供的一种目标对象处于目标姿势下的分割图的效果示意图;Fig. 7(a) shows a schematic diagram of the effect of a segmentation map of a target object in a target pose provided by an embodiment of the present disclosure;
图7(b)示出了本公开实施例所提供的一种目标对象处于目标姿势下的语义图的效果示意图;Fig. 7(b) shows a schematic diagram of the effect of a semantic map of a target object in a target pose provided by an embodiment of the present disclosure;
图8示出了本公开实施例所提供的一种合成图像的效果示意图;FIG. 8 shows a schematic diagram of the effect of a synthesized image provided by an embodiment of the present disclosure;
图9示出了本公开实施例所提供的另一种图像生成方法的流程图;FIG. 9 shows a flow chart of another image generation method provided by an embodiment of the present disclosure;
图10示出了本公开实施例所提供的一种图像生成装置的示意图;FIG. 10 shows a schematic diagram of an image generating device provided by an embodiment of the present disclosure;
图11示出了本公开实施例所提供的一种电子设备的示意图。Fig. 11 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公 开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only It is a part of the embodiments of the present disclosure, but not all of them. The components of the disclosed embodiments generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative effort shall fall within the protection scope of the present disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
本文中术语“和/或”,仅仅是描述一种关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article only describes an association relationship, which means that there can be three kinds of relationships, for example, A and/or B can mean: there is A alone, A and B exist at the same time, and B exists alone. situation. In addition, the term "at least one" herein means any one of a variety or any combination of at least two of the more, for example, including at least one of A, B, and C, which may mean including from A, Any one or more elements selected from the set formed by B and C.
经研究发现,相关技术中,通过结合2D图像和目标姿势作为输入,并运用对抗生成网络来生成新的图像。然而,在2D图像中源姿势和目标姿势之间差异较大的情况下,依赖于通过神经网络来获取2D图像的特征信息的方式所得到的图像的合成效果较差。After research, it is found that in related technologies, new images are generated by combining 2D images and target poses as input and using confrontation generation networks. However, in the case of a large difference between the source pose and the target pose in the 2D image, the synthetic effect of the image obtained by relying on the neural network to obtain the feature information of the 2D image is poor.
在此基础上,在人体图像的生成领域,相关技术人员提出通过3D人体模型生成新的人体图像。3D人体模型可以获取目标姿势下源图像的纹理图像,进而对抗生成网络可以通过纹理图像和目标姿势生成目标对象处于目标姿势下的新图像。通过该处理方式,可以缓解对抗生成网络的网络处理压力,从而提升图像的合成效果。然而,相关技术中通常使用的三维人体模型为参数化人体模型,如蒙皮多人线性模型(Skinned Multi-Person Linear,SMPL)模型。SMPL模型可以构建出人体的骨骼模型,通过SMPL模型可以提取出对应人体模型的表面纹理,例如,所穿着衣服的纹理和皮肤的纹理等。但是,当人体所穿着的衣服较为宽松时,SMPL模型无法获得位于人体模型外的纹理信息和几何信息,因此,SMPL模型对于穿着宽松服饰的人体的图片也很难有较好地合成效果。On this basis, in the field of human body image generation, related technical personnel propose to generate new human body images through 3D human body models. The 3D human body model can obtain the texture image of the source image in the target pose, and then the adversarial generation network can generate a new image of the target object in the target pose through the texture image and the target pose. Through this processing method, the network processing pressure of the confrontation generation network can be relieved, thereby improving the synthesis effect of the image. However, the 3D human body model commonly used in the related art is a parametric human body model, such as a skinned multi-person linear model (Skinned Multi-Person Linear, SMPL) model. The SMPL model can construct the skeleton model of the human body, and the surface texture corresponding to the human body model can be extracted through the SMPL model, for example, the texture of the clothes worn and the texture of the skin. However, when the clothes worn by the human body are relatively loose, the SMPL model cannot obtain the texture information and geometric information outside the human body model. Therefore, it is difficult for the SMPL model to have a good synthesis effect on the pictures of the human body wearing loose clothing.
本公开提供了一种图像生成方法、装置、电子设备以及存储介质。本公开实施例所提供的图像生成方法可以应用在动作迁移领域、外观迁移领域、虚拟试衣等领域。首先,可以通过基于原始图像和目标姿势参数确定第一身体形态图,从而得到更加准确的目标对象的身体形态信息;之后,通过确定第一语义分割图,可以得到目标对象完整的穿着特征,从而获得位于人体模型外的纹理信息和几何信息。在基于第一身体形态图和第一语义分割图确定合成图像时,可以提升图像合成效果,在原始姿势和目标姿势之间差异较大的情况下,依然可以得到姿态更加准确的合成图像,并且对于穿着较宽松服饰的目标对象依然可以得到较好的图像合成结果。The present disclosure provides an image generation method, device, electronic equipment and storage medium. The image generation method provided by the embodiments of the present disclosure can be applied in fields such as motion transfer, appearance transfer, virtual fitting, and the like. First, by determining the first body shape map based on the original image and the target pose parameters, more accurate body shape information of the target object can be obtained; then, by determining the first semantic segmentation map, the complete wearing characteristics of the target object can be obtained, so that Obtain texture information and geometry information located outside the mannequin. When the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.
为便于对本实施例进行理解,首先对本公开实施例所公开的一种图像生成方法进行详细介绍,本公开实施例所提供的图像生成方法的执行主体一般为具有一定计算能力的电子设备,该电子设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(移动设备、用户终端、终端、手持设备、计算设备、可穿戴设备等。在一些可能的实现方式中,该图像生成方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, an image generation method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the image generation method provided in the embodiment of the present disclosure is generally an electronic device with a certain computing capability. The device includes, for example: a terminal device or a server or other processing device, and the terminal device may be a user device (mobile device, user terminal, terminal, handheld device, computing device, wearable device, etc. In some possible implementations, the image generation Methods can be implemented by a processor invoking computer readable instructions stored in a memory.
参见图1所示,为本公开实施例提供的一种图像生成方法的流程图,所述方法包括步骤S101~S107,其中:Referring to FIG. 1 , which is a flow chart of an image generation method provided by an embodiment of the present disclosure, the method includes steps S101 to S107, wherein:
S101:获取原始图像;其中,所述原始图像中包含处于原始姿势的目标对象。S101: Acquire an original image; wherein, the original image includes a target object in an original pose.
S103:基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图。S103: Based on the original image and target posture parameters, determine a first body shape map of the target object in a target posture.
这里,原始图像为2D图像;目标姿势参数可以为包含目标姿势的图像,或者,目标姿势参数还可以为用于表征目标姿势的肢体关键点的位置信息;第一身体形态图中包含目标姿势下目标对象的各个肢体部位的肢体姿态。Here, the original image is a 2D image; the target pose parameter can be an image containing the target pose, or the target pose parameter can also be the position information of limb key points used to characterize the target pose; The body pose of each body part of the target object.
具体实施时,可以基于原始图像重建得到目标对象处于原始姿势下的第一身体模型,以及原始图像相对于该第一身体模型的纹理贴图,该纹理贴图中携带原始图像中位于目标对象的肢体部位上的纹理信息,例如,该纹理信息可以为目标对象的皮肤纹理信息,和/或,目标对象的肢体部位上所穿着衣物的衣物纹理信息。之后,可以基于目标姿势参数调整第一身体模型的模型参数,得到处于目标姿势下的目标对象的身体模型,即第二身体模型,并基于该第二身体模型和纹理贴图确定第一身体形态图。During specific implementation, the first body model of the target object in the original pose can be reconstructed based on the original image, and the texture map of the original image relative to the first body model, the texture map carrying the limb parts of the target object in the original image Texture information on the body, for example, the texture information may be skin texture information of the target object, and/or clothing texture information of clothing worn on body parts of the target object. Afterwards, the model parameters of the first body model can be adjusted based on the target pose parameters to obtain the body model of the target object in the target pose, that is, the second body model, and determine the first body shape map based on the second body model and the texture map .
应理解的是,通过上述方式所确定出的第一身体形态图中还包含原始图像中位于目标对象的肢体部位上的纹理信息。It should be understood that the first body shape map determined in the above manner also includes texture information on the limb parts of the target object in the original image.
在本公开实施例中,在原始图像中包含多个对象的情况下,可以从原始图像中确定位于前景区域的对象为目标对象;或者,可以将原始图像中包含肢体完整度最高的对象确定为目标对象;或者,还可以根据用户对多个对象的选择操作确定目标对象。In the embodiment of the present disclosure, when the original image contains multiple objects, the object located in the foreground area may be determined from the original image as the target object; or, the object with the highest degree of limb integrity in the original image may be determined as The target object; alternatively, the target object may also be determined according to the user's selection operation on multiple objects.
S105:确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第一语义分割图用于指示所述目标对象的肢体部位的穿着特征。S105: Determine a first semantic segmentation map of the target object in the target pose; wherein the first semantic segmentation map is used to indicate wearing features of body parts of the target object.
在本公开实施例中,可以基于原始图像的语义分割图和上述第二身体模型确定该第一语义分割图,具体确定过程在下述实施例中进行详细介绍。其中,原始图像的语义分割图用于指示目标对象在原始姿势下各个肢体部位的穿着特征。In the embodiments of the present disclosure, the first semantic segmentation map may be determined based on the semantic segmentation map of the original image and the above-mentioned second body model, and the specific determination process is described in detail in the following embodiments. Among them, the semantic segmentation map of the original image is used to indicate the wearing characteristics of each limb part of the target object in the original pose.
在原始图像的语义分割图中可以包含目标对象的各个肢体部位的部位标签,以通过该部位标签对目标对象的肢体部位进行区分。Part labels of various body parts of the target object may be included in the semantic segmentation map of the original image, so as to distinguish the body parts of the target object through the part labels.
这里,可以将目标对象的身体划分为不同的肢体部位,例如,可以划分为以下肢体部位:头发部位、脸部、上身躯干部位、手部、腿部和脚部。Here, the body of the target object may be divided into different body parts, for example, may be divided into the following body parts: hair part, face, upper torso part, hand, leg and foot.
具体实施时,部位标签可以为对应相应像素值的像素点,其中,不同部位标签对应不同的像素值。在原始图像的语义分割图中,可以通过该像素点对各个肢体部位进行表示。During specific implementation, the part labels may be pixel points corresponding to corresponding pixel values, wherein different part labels correspond to different pixel values. In the semantic segmentation map of the original image, each body part can be represented by the pixel.
举例来说,可以在该语义分割图中每个肢体部位所在的部位区域内显示对应的像素值的像素点。For example, pixel points with corresponding pixel values may be displayed in the part region where each limb part is located in the semantic segmentation map.
在通过部位标签对目标对象的各个肢体部位进行区分之后,就可以结合原始图像和部位标签确定目标对象在原始姿势下各个肢体部位的穿着特征。After distinguishing each body part of the target object through the part label, the wearing characteristics of each body part of the target object in the original pose can be determined by combining the original image and the part label.
同样地,在第一语义分割图中可以包含目标对象的各个肢体部位的部位标签,且原始图像的语义分割图中和第一语义分割图中相同肢体部位的部位标签相同,即可以通过对应相同像素值的像素点进行表示。Similarly, the first semantic segmentation map can contain the part labels of the various limb parts of the target object, and the part labels of the same limb parts in the semantic segmentation map of the original image and the first semantic segmentation map are the same, that is, by corresponding to the same Pixel values are represented by pixel points.
由于第一身体形态图中包含原始图像中位于目标对象的肢体部位上的纹理信息,而不包含原始图像中目标对象的完整穿着特征。因此,为了获取目标对象的完整穿着特征,可以基于原始图像的语义分割图和上述第二身体模型确定该目标对象处于目标姿势下的第一语义分割图,进而,基于该第一语义分割图确定该目标对象的肢体部位的完整穿着特征。Because the first body shape map contains the texture information on the limb parts of the target object in the original image, it does not contain the complete wearing characteristics of the target object in the original image. Therefore, in order to obtain the complete wearing characteristics of the target object, the first semantic segmentation map of the target object in the target pose can be determined based on the semantic segmentation map of the original image and the above-mentioned second body model, and then, based on the first semantic segmentation map, it can be determined The complete wearing characteristics of the subject's body part.
举例来说,原始图像中的目标对象穿着较为宽松的衣服,例如,裙子。那么基于原始图像和目标姿势参数所确定第一身体形态图中可以包含原始图像中位于该目标对象的肢体部位上的纹理信息,而不包含位于肢体部位之外的纹理信息,例如,不包含位于肢体部位之外的裙子区域的纹理信息。进一步地,通过确定第一身体形态图和包含完整穿着特征的第一语义分割图,可以得到目标对象的完整穿着特征。在基于第一身体形态图和第一语义分割图进行合成时,可以得到包含完整穿着特征的合成图像。For example, the target object in the original image is wearing looser clothes, such as a skirt. Then the first body shape map determined based on the original image and the target pose parameters may contain the texture information located on the limb parts of the target object in the original image, but not the texture information located outside the limb parts, for example, not including the texture information located on the limb parts of the target object. Texture information for the skirt area outside of the body parts. Further, by determining the first body shape map and the first semantic segmentation map including the complete wearing features, the complete wearing features of the target object can be obtained. When synthesizing based on the first body shape map and the first semantic segmentation map, a synthetic image including complete wearing features can be obtained.
S107:基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的目标对象。S107: Determine a composite image based on the first body shape map and the first semantic segmentation map, where the composite image includes a target object in the target pose and having the wearing feature.
通过上述描述可知,在第一语义分割图中可以包含目标对象的各个肢体部位的部位标签。因此,可以通过该部位标签来表征第一语义分割图中的语义布局信息。可以理解的是,该语义布局信息可以用于指示目标姿势下目标对象的各个肢体部位的肢体布局信 息。It can be known from the above description that the first semantic segmentation map may contain part labels of each body part of the target object. Therefore, the semantic layout information in the first semantic segmentation map can be represented by the part label. It can be understood that the semantic layout information can be used to indicate the limb layout information of each limb part of the target object in the target pose.
基于此,可以通过图像生成网络将第一身体形态图和第一语义分割图进行语义图像合成,从而得到的合成图像中能够保留第一语义分割图中的语义布局信息。Based on this, the first body shape map and the first semantic segmentation map can be semantically synthesized through the image generation network, so that the semantic layout information in the first semantic segmentation map can be preserved in the resulting synthetic image.
下面将结合具体实施方式对上述步骤S101至步骤S107的具体实施方式进行详细介绍。The specific implementation manners of the above step S101 to step S107 will be described in detail below in conjunction with the specific implementation manners.
在一个可选的实施方式中,上述步骤S103:基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图,包括如下步骤:In an optional embodiment, the above step S103: based on the original image and target posture parameters, determining the first body shape map of the target object in the target posture includes the following steps:
步骤S1031:基于所述原始图像构建所述目标对象的第一身体模型,并基于所述第一身体模型确定所述原始图像的纹理贴图;Step S1031: constructing a first body model of the target object based on the original image, and determining a texture map of the original image based on the first body model;
步骤S1032:基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图。Step S1032: Determine a first body shape map of the target object in a target pose based on the texture map and the target pose parameters.
在本公开实施例中,可以对原始图像进行三维重建得到第一身体模型,其中,该第一身体模型可以为包含该目标对象的身体结构和身体姿态的三维身体模型,例如,该三维身体模型可以为SMPL模型,或者,还可以为其他能够类型的三维身体模型。In an embodiment of the present disclosure, the first body model can be obtained by performing three-dimensional reconstruction on the original image, wherein the first body model can be a three-dimensional body model including the body structure and body posture of the target object, for example, the three-dimensional body model It can be an SMPL model, or it can also be other types of three-dimensional body models.
这里,目标对象的身体结构可以理解为目标对象的各个肢体部位,目标对象的身体姿态可以理解为目标对象的各个肢体部位的肢体姿态。Here, the body structure of the target object may be understood as each body part of the target object, and the body posture of the target object may be understood as the body posture of each body part of the target object.
在确定出第一身体模型之后,可以基于原始图像和第一身体模型进行纹理重建,从而得到原始图像的纹理贴图,具体重建原理描述如下:After the first body model is determined, texture reconstruction can be performed based on the original image and the first body model, so as to obtain the texture map of the original image. The specific reconstruction principle is described as follows:
将第一身体模型投影至原始图像,从而建立第一身体模型和原始图像之间的投影关系。其中,该投影关系用于指示第一身体模型中的顶点和原始图像中像素点之间的对应关系。基于该投影关系确定原始图像中与第一身体模型的各顶点相对应的像素点的像素值。获取第一身体模型的顶点和纹理贴图中各坐标之间的对应关系,进而可以基于该对应关系将第一身体模型的各顶点所对应的像素点渲染至纹理贴图上,从而得到原始图像的纹理贴图。Projecting the first body model to the original image, thereby establishing a projection relationship between the first body model and the original image. Wherein, the projection relationship is used to indicate the corresponding relationship between the vertices in the first body model and the pixels in the original image. Based on the projection relationship, the pixel values of the pixel points corresponding to the vertices of the first body model in the original image are determined. Obtain the correspondence between the vertices of the first body model and the coordinates in the texture map, and then render the pixels corresponding to the vertices of the first body model to the texture map based on the correspondence, so as to obtain the texture of the original image stickers.
由于第一身体模型可以表征目标对象的身体结构和身体形态,因此,按照上述方式所确定的原始图像的纹理贴图中包含原始图像中目标对象的肢体部位上的纹理信息,例如,包含目标对象的皮肤纹理信息,或者,包含目标对象的肢体部位上所穿着衣物的衣物纹理信息。Since the first body model can represent the body structure and body shape of the target object, the texture map of the original image determined in the above manner contains texture information on the body parts of the target object in the original image, for example, contains the texture information of the target object Skin texture information, or clothing texture information including clothing worn on body parts of the target object.
在确定出纹理贴图之后,可以结合目标姿势参数和纹理贴图确定该第一身体形态图。具体实施时,可以基于目标姿势参数确定目标对象处于目标姿势下的第二身体模型,进而将纹理贴图渲染至第二身体模型上,从而得到目标对象处于目标姿势下的第一身体形态图。After the texture map is determined, the first body shape map can be determined in combination with the target pose parameters and the texture map. During specific implementation, the second body model of the target object in the target pose can be determined based on the target pose parameters, and then the texture map is rendered on the second body model, so as to obtain the first body shape map of the target object in the target pose.
由于原始图像的纹理贴图中包含目标对象的肢体部位的纹理信息,因此,基于纹理贴图所确定的第一身体形态图除了能够表示目标姿势下目标对象的各个肢体部位的肢体姿态,该第一身体形态图中还包含目标对象的肢体部位上的纹理信息。Since the texture map of the original image contains the texture information of the limb parts of the target object, the first body shape map determined based on the texture map can not only represent the body posture of each limb part of the target object in the target pose, but the first body The morphology map also contains texture information on the limb parts of the target object.
相关技术中,通过结合2D图像和目标姿势作为输入,并运用对抗生成网络来生成新的图像。该方式依靠生成网络提取原始图像中的特征信息,例如,提取原始图像中目标对象的姿态特征和目标对象的穿着特征。当原始姿势与目标姿势之间的差异较大时,相关技术往往无法得到姿态较为准确的合成图像。In related technologies, a new image is generated by combining a 2D image and a target pose as input and using an adversarial generation network. This method relies on the generative network to extract feature information in the original image, for example, extracting the pose features of the target object and the clothing features of the target object in the original image. When the difference between the original pose and the target pose is large, related techniques often fail to obtain a composite image with a more accurate pose.
在本公开技术方案中,通过上述处理方式,在原始姿势和目标姿势之间的差异较大的情况下,依然可以得到姿态更加准确的第一身体形态图,从而能够提高合成图像的合成效果。In the technical solution of the present disclosure, through the above processing method, in the case of a large difference between the original pose and the target pose, the first body shape map with a more accurate pose can still be obtained, thereby improving the synthesis effect of the synthesized image.
在本公开实施例中,还可以通过图像生成网络将第一身体形态图和第一语义分割图进行语义合成,从而得到合成图像。相对于现有技术中依靠生成网络提取原始图像中全部特征信息的技术方案,本公开技术方案可以首先基于携带目标对象的肢体部位的纹理信息的纹理贴图确定上述第一身体形态图,从而可以提取出目标对象的肢体部位的纹理 信息。在通过图像生成网络将该第一身体形态图和第一语义分割图进行语义合成时,可以减少图像生成网络所提取的特征信息,从而缓解图像生成网络的网络处理压力,进而提升图像合成效果。In the embodiment of the present disclosure, the first body shape map and the first semantic segmentation map may also be semantically synthesized through an image generation network, so as to obtain a synthesized image. Compared with the technical solution in the prior art that relies on the generation network to extract all the feature information in the original image, the technical solution of the present disclosure can first determine the above-mentioned first body shape map based on the texture map carrying the texture information of the body part of the target object, so that it can extract Get the texture information of the body part of the target object. When semantically synthesizing the first body shape map and the first semantic segmentation map through the image generation network, the feature information extracted by the image generation network can be reduced, thereby alleviating the network processing pressure of the image generation network, and further improving the image synthesis effect.
在一个可选的实施方式中,上述步骤S1031:基于所述原始图像构建所述目标对象的第一身体模型,具体包括如下步骤:In an optional implementation manner, the above step S1031: constructing the first body model of the target object based on the original image, specifically includes the following steps:
步骤S10311:基于所述原始图像确定第一身体模型的模型参数;其中,所述模型参数用于指示所述原始图像中所述目标对象的姿态信息;Step S10311: Determine model parameters of the first body model based on the original image; wherein, the model parameters are used to indicate pose information of the target object in the original image;
步骤S10312:基于所述模型参数调整初始模型中各顶点的位置信息;Step S10312: Adjust the position information of each vertex in the initial model based on the model parameters;
步骤S10313:基于调整后的各顶点的位置信息生成所述第一身体模型。Step S10313: Generate the first body model based on the adjusted position information of each vertex.
在本公开实施例中,在获取到原始图像之后,可以将原始图像进行像素值归一化处理,从而将原始图像中各个像素点的像素值归一化为[-a,a]范围内的数值,其中,a的取值可以为1,或者为其他的归一化后的数值,该a的取值可以根据归一化的实际需要进行设定,本公开对此不作具体限定。之后,可以修改原始图像的图片大小,例如,可以设置原始图像的图片大小为224*224和512*512,使其满足人体姿态重建模型(Human Mesh Recovery,HMR)的输入要求。In the embodiment of the present disclosure, after the original image is acquired, the original image may be subjected to pixel value normalization processing, so that the pixel value of each pixel point in the original image is normalized to the range [-a, a] Numerical values, wherein the value of a can be 1, or other normalized numerical values, and the value of a can be set according to the actual needs of normalization, which is not specifically limited in the present disclosure. Afterwards, the picture size of the original image can be modified. For example, the picture size of the original image can be set to 224*224 and 512*512 to meet the input requirements of the Human Mesh Recovery (HMR) model.
在按照上述方式对原始图像进行处理之后,可以将处理之后的原始图像输入至HMR模型中进行处理,从而将HMR模型的输出结果作为第一身体模型的模型参数(也即,SMPL模型的模型参数)。After the original image is processed in the above manner, the processed original image can be input into the HMR model for processing, so that the output result of the HMR model can be used as the model parameter of the first body model (that is, the model parameter of the SMPL model ).
这里,模型参数可以为用于表征原始图像中所包含目标对象的姿态信息的参数,其中,模型参数可以包含动作参数(pose parameters)、形状参数(shape parameters)和/或相机参数。Here, the model parameters may be parameters used to characterize pose information of the target object included in the original image, wherein the model parameters may include pose parameters, shape parameters and/or camera parameters.
示例性的,相机参数通过K1个维度的数值去描述相机的姿态信息。其中,K1的取值可以为3,即相机参数可以包含内参、外参和畸变参数。Exemplarily, the camera parameters describe the pose information of the camera through the values of K1 dimensions. Wherein, the value of K1 may be 3, that is, the camera parameters may include internal parameters, external parameters and distortion parameters.
形状参数通过K2个维度的数值去描述目标对象的形状,每一个维度的值都可以解释为目标对象的形状的某个指标,比如高矮,胖瘦,以及头身比等参数。这里,可以通过控制K2个维度的数值控制身体模型的形状变化。其中,K2的取值可以为10。The shape parameter describes the shape of the target object through the value of K2 dimensions, and the value of each dimension can be interpreted as an indicator of the shape of the target object, such as height, shortness, fatness, and head-to-body ratio. Here, the shape change of the body model can be controlled by controlling the values of the K2 dimensions. Wherein, the value of K2 may be 10.
动作参数通过K3个维度的数值去描述某个时刻目标对象的动作姿态,其中,K3个维度的数值用于表征目标对象的多个肢体关节点相对于各个轴向的角度,其中,K3个维度的数值可以为例如72(24*3)个维度的数值,24表示的是24个定义好的目标对象的肢体关节点,72(24*3)中的3是指的是该肢体关节点针对于其父节点的旋转角度的轴角式表达(axis-angle representation)。Action parameters describe the action posture of the target object at a certain moment through the value of K3 dimensions, where the values of K3 dimensions are used to represent the angles of multiple limb joint points of the target object relative to each axis, among which, K3 dimensions The value of can be, for example, the value of 72 (24*3) dimensions, 24 represents the limb joint points of 24 well-defined target objects, and 3 in 72(24*3) means that the limb joint points are aimed at An axis-angle representation of the rotation angle of its parent node.
在按照上述所描述的方式得到模型参数之后,就可以基于该模型参数调整初始模型中各个顶点的位置信息。其中,该初始模型可以为SMPL模型中预先创建的与目标对象的对象类型相匹配的身体模型。比如,目标对象为人,那么该预设初始模型可以为预先创建的人体模型。After the model parameters are obtained in the manner described above, the position information of each vertex in the initial model can be adjusted based on the model parameters. Wherein, the initial model may be a pre-created body model in the SMPL model that matches the object type of the target object. For example, if the target object is a human being, then the preset initial model may be a pre-created human body model.
初始模型中包含多个顶点和基于多个顶点确定的三角面片,其中,初始模型中顶点的数量可以为用户根据实际需要进行设定给的;或者,初始模型中顶点的数量与该目标对象的对象类型相关联;或者,初始模型中顶点的数量为默认的数值。The initial model contains multiple vertices and triangular patches determined based on multiple vertices. The number of vertices in the initial model can be set by the user according to actual needs; or, the number of vertices in the initial model is the same as that of the target object associated with the object type; or, the number of vertices in the initial model is the default value.
具体实施时,可以预先根据目标对象的对象类型搭建相匹配的初始模型。例如,首先创建P个3D顶点vertex(即,上述顶点)以及Q个三角面片,进而通过P个3D顶点vertex(即,上述顶点)以及Q个三角面片来表示目标对象的身体。其中,P个3D顶点可以理解为该初始模型的P个骨骼点,三角面片可以理解为基于3D顶点vertex所组成的三角形,每个三角形对应三个3D顶点vertex。During specific implementation, a matching initial model can be built in advance according to the object type of the target object. For example, P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches are firstly created, and then the body of the target object is represented by the P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches. Among them, the P 3D vertices can be understood as P bone points of the initial model, and the triangle patch can be understood as triangles formed based on 3D vertex vertex, and each triangle corresponds to three 3D vertex vertex.
之后,可以基于该模型参数确定初始模型中每个顶点经过动作形变和体形形变之后的顶点坐标,进而基于该顶点坐标调整初始模型中每个顶点的位置信息,利用调整之后的顶点确定三角面片,并由若干个三角面片构成第一身体模型。Afterwards, based on the model parameters, the vertex coordinates of each vertex in the initial model after action deformation and body shape deformation can be determined, and then the position information of each vertex in the initial model can be adjusted based on the vertex coordinates, and the adjusted vertices can be used to determine the triangle surface , and the first body model is composed of several triangle faces.
在确定出第一身体模型之后,可以对第一身体模型(例如,SMPL模型)进行纹理提取,从而提取到纹理贴图。After the first body model is determined, texture extraction may be performed on the first body model (eg, SMPL model), so as to extract a texture map.
具体实施时,由于SMPL模型的拓扑结构人体模型,其每一个顶点都有对应的语义内容,其中,该语义内容用于表征顶点和原始图像中像素点之间的对应关系。基于此,可以构建出顶点和纹理贴图中各个像素点之间的映射关系。之后,可以基于该映射关系进行纹理重建,从而得到原始图像的纹理贴图。例如,可以基于该映射关系,确定原始图像中与对应顶点相匹配的像素点的像素值,进而基于该像素值确定纹理贴图中对应坐标点的像素值,从而完成纹理的重建,得到原始图像的纹理贴图。During specific implementation, due to the topological structure of the human body model of the SMPL model, each vertex has a corresponding semantic content, wherein the semantic content is used to represent the corresponding relationship between the vertex and the pixel points in the original image. Based on this, the mapping relationship between the vertices and each pixel in the texture map can be constructed. Afterwards, texture reconstruction can be performed based on the mapping relationship, so as to obtain the texture map of the original image. For example, based on the mapping relationship, the pixel value of the pixel point matching the corresponding vertex in the original image can be determined, and then the pixel value of the corresponding coordinate point in the texture map can be determined based on the pixel value, so as to complete the texture reconstruction and obtain the original image. texture map.
在一个可选的实施方式中,上述步骤S1032:基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图,具体包括如下步骤:In an optional implementation manner, the above step S1032: determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters, specifically includes the following steps:
步骤S10321:基于所述第一身体模型和所述目标姿势参数确定所述目标对象处于目标姿势下的第二身体模型;Step S10321: Determine a second body model of the target object in a target pose based on the first body model and the target pose parameters;
步骤S10322:基于所述纹理贴图的像素点和各顶点之间的映射关系,将所述纹理贴图渲染至所述第二身体模型,得到所述第一身体形态图。Step S10322: Based on the mapping relationship between the pixels of the texture map and the vertices, render the texture map to the second body model to obtain the first body shape map.
在本公开实施例中,可以基于目标姿势参数确定目标对象处于目标姿势下的动作参数和形态参数,进而通过修改第一身体模型的动作参数和形态参数确定第二身体模型。In the embodiment of the present disclosure, the action parameters and shape parameters of the target object in the target pose may be determined based on the target pose parameters, and then the second body model is determined by modifying the action parameters and shape parameters of the first body model.
如果目标姿势参数为图片类数据,其中,该图片类数据中包含该目标姿势,那么可以基于HMR模型对该图片类数据进行处理,从而得到目标对象处于目标姿势下的动作参数和形态参数;之后,可以基于目标对象处于目标姿势下的动作参数和形态参数修改第一身体模型中各个顶点的顶点坐标,从而调整得到第二身体模型。If the target pose parameter is image data, wherein the image data contains the target pose, then the image data can be processed based on the HMR model, so as to obtain the action parameters and shape parameters of the target object in the target pose; after that , the vertex coordinates of each vertex in the first body model can be modified based on the action parameters and shape parameters of the target object in the target pose, so as to obtain the second body model through adjustment.
如果目标姿势参数包含目标对象处于目标姿势下的动作参数和形态参数,则可以直接基于该动作参数和形态参数调整第一身体模型中各个顶点的顶点坐标,从而调整得到第二身体模型。If the target pose parameters include the action parameters and shape parameters of the target object in the target pose, then the vertex coordinates of each vertex in the first body model can be directly adjusted based on the action parameters and shape parameters, so as to obtain the second body model.
如果目标姿势参数为其他类的姿态参数,那么可以将该目标姿势参数转换为目标对象处于目标姿势下的动作参数和形态参数。If the target pose parameter is a pose parameter of another class, then the target pose parameter can be converted into an action parameter and a shape parameter of the target object in the target pose.
在按照上述所描述的方式确定出第二身体模型之后,可以基于纹理贴图中各个像素点和各顶点之间的映射关系,将纹理贴图渲染在第二身体模型中,从而可以基于渲染之后的渲染结果确定第一身体形态图。例如,假设目标姿势为如图2所示的姿势的情况下,可以得到如图3所示的第一身体形态图。从如图3所示的第一身体形态图中可以看出,该第一身体形态图中仅包含目标对象的肢体特征(例如,姿态),而不包含目标对象的穿着特征。After the second body model is determined in the manner described above, the texture map can be rendered in the second body model based on the mapping relationship between each pixel point and each vertex in the texture map, so that it can be based on the rendering after rendering As a result a first body shape map is determined. For example, assuming that the target posture is the posture shown in FIG. 2 , a first body shape diagram as shown in FIG. 3 can be obtained. It can be seen from the first body shape diagram shown in FIG. 3 that the first body shape diagram only includes the body features (eg, posture) of the target object, but does not include the clothing features of the target object.
上述实施方式中,通过结合三维的身体模型来确定目标对象处于目标姿势下的第一身体形态图的方式,在原始姿势和目标姿势之间的差异较大的情况下,可以得到姿态更加准确的第一身体形态图,从而可以提高合成图像的合成效果。In the above embodiment, by combining the three-dimensional body model to determine the first body shape map of the target object in the target pose, when the difference between the original pose and the target pose is large, a more accurate pose can be obtained. The first body shape map, so that the synthetic effect of the synthetic image can be improved.
在一个可选的实施方式中,上述步骤S105:确定该目标对象处于所述目标姿势下的第一语义分割图,具体包括如下步骤:In an optional implementation manner, the above step S105: determining the first semantic segmentation map of the target object in the target pose specifically includes the following steps:
步骤S1051:确定所述原始图像的第二语义分割图;其中,所述第二语义分割图用于指示所述目标对象在原始姿势下肢体部位的穿着特征;Step S1051: Determine the second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate the wearing characteristics of the body parts of the target object in the original posture;
步骤S1052:基于所述第二语义分割图生成所述第一语义分割图。Step S1052: Generate the first semantic segmentation graph based on the second semantic segmentation graph.
在本公开实施例中,上述步骤S10321和S10322,可以在步骤S1051和步骤S1052之前执行,还可以在步骤S1051和步骤S1052之后执行,本公开对此不作具体限定。In the embodiment of the present disclosure, the above steps S10321 and S10322 may be performed before step S1051 and step S1052, or may be performed after step S1051 and step S1052, which is not specifically limited in the present disclosure.
具体实施时,可以对原始图像进行处理,从而得到原始图像的语义分割图,即第二语义分割图。例如,可以通过图像分割网络对原始图像进行图像分割处理,从而得到第二语义分割图。During specific implementation, the original image may be processed to obtain a semantic segmentation map of the original image, that is, a second semantic segmentation map. For example, image segmentation processing may be performed on the original image through an image segmentation network, so as to obtain the second semantic segmentation map.
举例来说,原始图像中目标对象的原始姿势为如图5所示的姿势,那么原始图像的 第二语义分割图可以为如图6(a)和图6(b)所示的语义分割图。其中,图6(a)示出了原始图像的前景区域的分割图,图6(b)示出了原始图像的前景区域的语义图。For example, the original pose of the target object in the original image is the pose shown in Figure 5, then the second semantic segmentation map of the original image can be the semantic segmentation map shown in Figure 6(a) and Figure 6(b) . Among them, Fig. 6(a) shows the segmentation map of the foreground region of the original image, and Fig. 6(b) shows the semantic map of the foreground region of the original image.
从图6(a)可以看出,原始图像的前景区域的分割图用于表征原始图像中目标对象的分割结果,例如,目标对象的身体轮廓信息。图6(b)所示,原始图像的前景区域的语义图用于表征原始图像中目标对象的语义信息,语义信息例如表示原始图像中目标对象的各个肢体部位。It can be seen from Fig. 6(a) that the segmentation map of the foreground region of the original image is used to characterize the segmentation result of the target object in the original image, for example, the body contour information of the target object. As shown in Fig. 6(b), the semantic map of the foreground area of the original image is used to represent the semantic information of the target object in the original image. The semantic information, for example, represents the various body parts of the target object in the original image.
在确定出原始图像的第二语义分割图之后,可以基于该第二语义分割图和原始图像确定目标对象处于原始姿势下的各个肢体部位的穿着特征。After the second semantic segmentation map of the original image is determined, the wearing features of each body part of the target object in the original posture can be determined based on the second semantic segmentation map and the original image.
在确定出如图6(a)和图6(b)所示的语义分割图之后,可以基于该语义分割图确定目标对象处于所述目标姿势下的第一语义分割图。这里,第一语义分割图包含目标对象处于目标姿势下的语义图(即下述目标语义图)和分割图(即下述目标分割图)。After the semantic segmentation graph shown in FIG. 6( a ) and FIG. 6( b ) is determined, a first semantic segmentation graph in which the target object is in the target pose can be determined based on the semantic segmentation graph. Here, the first semantic segmentation graph includes a semantic graph (ie, the following target semantic graph) and a segmentation graph (ie, the following target segmentation graph) in which the target object is in the target pose.
上述实施方式中,通过第二语义分割图确定第一语义分割图的方式,可以提高第一语义分割图的准确度,从而进一步提升了图像合成效果。同时,可以得到目标对象处于目标姿势下的完整的穿着特征,从而解决SMPL模型无法获得位于人体模型外的纹理信息和几何信息的问题,进而进一步提升了图像合成效果,对于穿着较宽松服饰的目标对象依然可以得到较好的图像合成结果。In the above embodiments, the accuracy of the first semantic segmentation map can be improved by determining the first semantic segmentation map through the second semantic segmentation map, thereby further improving the effect of image synthesis. At the same time, the complete wearing characteristics of the target object in the target pose can be obtained, thereby solving the problem that the SMPL model cannot obtain texture information and geometric information outside the human body model, and further improving the image synthesis effect. For targets wearing looser clothing Objects can still get better image synthesis results.
在一个可选的实施方式中,上述步骤S1052:基于所述第二语义分割图生成所述第一语义分割图,具体包括如下步骤:In an optional implementation manner, the above step S1052: generating the first semantic segmentation graph based on the second semantic segmentation graph specifically includes the following steps:
步骤S11:获取所述目标对象处于目标姿势下的第二身体模型,其中,所述第二身体模型为基于第一身体模型和所述目标姿势参数确定的,所述第一身体模型为基于所述原始图像构建所述目标对象的身体模型;Step S11: Obtain a second body model of the target object in a target posture, wherein the second body model is determined based on the first body model and the target posture parameters, and the first body model is determined based on the Constructing a body model of the target object from the original image;
步骤S12:基于所述第二身体模型和所述第二语义分割图,生成所述第一语义分割图。Step S12: Generate the first semantic segmentation map based on the second body model and the second semantic segmentation map.
这里,第二身体模型为基于第一身体模型和目标姿势参数确定的三维身体模型,具体确定过程描述如下:Here, the second body model is a three-dimensional body model determined based on the first body model and target posture parameters, and the specific determination process is described as follows:
可以基于目标姿势参数确定目标对象处于目标姿势下的动作参数和形态参数,进而通过修改第一身体模型的动作参数和形态参数确定第二身体模型。The action parameters and shape parameters of the target object in the target pose can be determined based on the target pose parameters, and then the second body model can be determined by modifying the action parameters and shape parameters of the first body model.
这里,目标姿势参数可以为图片类数据,或者为包含目标对象处于目标姿势下的动作参数和形态参数的数据,或者为其他类的姿态参数。基于各个类型的目标姿势参数确定第二身体模型的过程如上步骤S10321和步骤S10322所对应实施例的描述,此处不再展开详细介绍。Here, the target pose parameter may be image data, or data including action parameters and shape parameters of the target object in the target pose, or other types of pose parameters. The process of determining the second body model based on various types of target posture parameters is as described in the embodiment corresponding to the above step S10321 and step S10322, and will not be described in detail here.
在确定出第二身体模型之后,可以将第二身体模型和所述第二语义分割图输入至目标子网络中进行处理,从而得到第一语义分割图。After the second body model is determined, the second body model and the second semantic segmentation map can be input into the target sub-network for processing, so as to obtain the first semantic segmentation map.
假设,第二语义分割图为图6(a)和图6(b)所示的语义分割图,那么可以得到如图7(a)和图7(b)所示的语义分割图,即第一语义分割图。从图7(a)和图7(b)所示,第一语义分割图包含目标对象处于目标姿势下的语义图和分割图。Assuming that the second semantic segmentation graph is the semantic segmentation graph shown in Figure 6(a) and Figure 6(b), then the semantic segmentation graph shown in Figure 7(a) and Figure 7(b) can be obtained, that is, the first A semantic segmentation map. As shown in Fig. 7(a) and Fig. 7(b), the first semantic segmentation map contains the semantic map and the segmentation map of the target object in the target pose.
通过上述描述可知,第二语义分割图包括原始语义图(即上述前景区域的语义图)和原始分割图(即,前景区域的分割图)。It can be seen from the above description that the second semantic segmentation map includes the original semantic map (ie, the semantic map of the foreground area) and the original segmentation map (ie, the segmentation map of the foreground area).
在此情况下,步骤S1052:基于所述第二语义分割图生成所述第一语义分割图,具体包括如下步骤:In this case, step S1052: generating the first semantic segmentation graph based on the second semantic segmentation graph, specifically includes the following steps:
步骤S21:基于所述原始分割图生成所述目标对象处于所述目标姿势下的目标分割图;其中,所述目标分割图包含所述目标对象处于所述目标姿势时全部身体部位的穿着特征;Step S21: Generate a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes wearing features of all body parts of the target object in the target pose;
步骤S22:基于所述原始语义图生成所述目标对象处于所述目标姿势下的目标语义图;其中,所述目标语义图包含所述目标分割图中所述目标对象的各个肢体部位的穿着特征。Step S22: Generate a target semantic map in which the target object is in the target pose based on the original semantic map; wherein, the target semantic map includes wearing features of each body part of the target object in the target segmentation map .
具体实施时,可以将原始分割图和第二身体模型输入至目标子网络A中进行处理,从而得到目标对象处于目标姿势下的目标分割图,例如,得到如图7(a)所示的分割图。之后,可以将原始语义图和第二身体模型输入至目标子网络B中进行处理,从而得到目标对象处于目标姿势下的目标语义图,例如,得到如图7(b)所示的语义图。其中,目标子网络A可以为目标子网络B结构相同或者不同的生成网络,其中,目标子网络A的输出结果为分割图,目标子网络B的输出结果为语义图。During specific implementation, the original segmentation map and the second body model can be input into the target sub-network A for processing, so as to obtain the target segmentation map of the target object in the target pose, for example, the segmentation shown in Figure 7(a) picture. Afterwards, the original semantic graph and the second body model can be input into the target sub-network B for processing, so as to obtain the target semantic graph of the target object in the target pose, for example, the semantic graph shown in Figure 7(b) is obtained. Wherein, the target subnetwork A may be a generation network with the same or different structure as the target subnetwork B, wherein the output result of the target subnetwork A is a segmentation graph, and the output result of the target subnetwork B is a semantic graph.
这里,目标子网络可以为任意一种卷积神经网络,且目标子网络A和目标子网络B的网络结构相同,但是网络参数不相同,即目标子网络A和目标子网络B为通过各自的训练样本对相同的卷积神经网络进行训练之后得到的神经网络。Here, the target sub-network can be any kind of convolutional neural network, and the network structure of the target sub-network A and the target sub-network B are the same, but the network parameters are different, that is, the target sub-network A and the target sub-network B are obtained through their respective The training sample is the neural network obtained after training the same convolutional neural network.
在得到目标分割图和目标语义图之后,可以将目标分割图和所述目标语义图确定为目标对象处于目标姿势下的第一语义分割图。例如,在目标姿势为如图2所示的姿势的情况下,可以得到图7(a)和图7(b)所示的语义分割图。After the target segmentation map and the target semantic map are obtained, the target segmentation map and the target semantic map may be determined as the first semantic segmentation map in which the target object is in the target pose. For example, when the target pose is the pose shown in Figure 2, the semantic segmentation graphs shown in Figure 7(a) and Figure 7(b) can be obtained.
在本公开实施例中,在确定出第一语义分割图之后,可以基于第一语义分割图中的目标分割图,确定原始图像中目标对象的穿着分割图,例如,基于图7(a)的分割图从原始图像中抠取得到目标对象的穿着分割图;以及基于第一语义分割图中的目标语义图确定原始图像中目标对象的穿着语义图,例如,基于图7(b)的语义图从原始图像确定携带目标对象的各个肢体部位的穿着特征的穿着语义图。In the embodiment of the present disclosure, after the first semantic segmentation map is determined, the clothing segmentation map of the target object in the original image can be determined based on the target segmentation map in the first semantic segmentation map, for example, based on the The segmentation map is extracted from the original image to obtain the wearing segmentation map of the target object; and determine the wearing semantic map of the target object in the original image based on the target semantic map in the first semantic segmentation map, for example, based on the semantic map of Figure 7(b) A wearing semantic map that carries wearing features of each body part of a target object is determined from raw images.
在一个可选的实施方式中,本公开实施例还包括如下步骤:获取原始图像的第二语义分割图,并获取所述目标对象处于原始姿势下的第二身体形态图;其中,所述第二语义分割图用于表征所述目标对象在原始姿势下各个肢体部位的位置信息,所述第二身体形态图中携带所述目标对象的完整穿着特征。In an optional implementation manner, the embodiment of the present disclosure further includes the following steps: obtaining a second semantic segmentation map of the original image, and obtaining a second body shape map of the target object in the original posture; wherein, the first The second semantic segmentation map is used to characterize the position information of each body part of the target object in the original posture, and the second body shape map carries the complete wearing characteristics of the target object.
在此基础上,上述步骤S107:基于所述第一身体形态图和所述第一语义分割图确定携带所述目标对象的完整穿着特征的合成图像,包括:On this basis, the above step S107: determine a composite image carrying the complete wearing characteristics of the target object based on the first body shape map and the first semantic segmentation map, including:
步骤S1071:基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的穿着特征的合成图像。Step S1071: Based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determine a composite image carrying the wearing characteristics of the target object.
在本公开实施例中,在得到合成图像之前,还可以获取原始图像的第二语义分割图,并获取所述目标对象处于源姿势下的第二身体形态图;其中,该第二语义分割图为按照上述所描述的方式确定的原始图像的第二语义分割图,例如,可以为图6(a)所示的分割图和图6(b)所示的语义图。第二身体形态图可以为基于原始图像确定的目标对象在原始姿势下的穿着身体图。其中,基于原始图像所确定的第二身体形态图中携带目标对象的完整穿着特征。举例来说,可以基于图6(a)所示的分割图在原始图像中提取出目标对象在源姿势下的穿着身体图,从而得到如图4所示的穿着身体图(即,第二身体形态图)。In the embodiment of the present disclosure, before obtaining the synthesized image, the second semantic segmentation map of the original image can also be obtained, and the second body shape map of the target object in the source pose can be obtained; wherein, the second semantic segmentation map The second semantic segmentation map of the original image determined in the manner described above may be, for example, the segmentation map shown in FIG. 6(a) and the semantic map shown in FIG. 6(b). The second body shape map may be a body map of the target object in the original pose determined based on the original image. Wherein, the second body shape map determined based on the original image carries the complete wearing characteristics of the target object. For example, based on the segmentation map shown in Figure 6(a), the wearing body map of the target object in the source pose can be extracted from the original image, so as to obtain the wearing body map shown in Figure 4 (i.e., the second body shape diagram).
在获取到第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图之后,可以基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的穿着特征的合成图像,具体包括:After obtaining the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, based on the first body shape map, the first semantic segmentation Figure, the second body shape map and the second semantic segmentation map, determine the synthetic image carrying the wearing characteristics of the target object, specifically including:
将所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图输入至图像生成网络中进行处理,得到所述合成图像。其中,图像生成网络可以为空间自适应归一化网络,其中,该空间自适应归一化网络可以为基于对抗式生成网络进行训练得到。Inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing to obtain the composite image. Wherein, the image generation network may be a space-adaptive normalization network, wherein the space-adaptive normalization network may be trained based on an adversarial generation network.
这里,空间自适应归一化网络可以为SPADE(Spatial Adaptive Normalization)网络;其中,SPADE网络可以将分割布局图转换为逼真的图片。在本公开实施例中,图像生成网络除了包含上述空间自适应归一化网络之外,图像生成网络还可以为其他网络,例如,能够替换空间自适应归一化网络的其他网络,或者能够实现空间自适应归一化网络的网络功能的其他网络,本公开对此不作具体限定。Here, the spatial adaptive normalization network can be a SPADE (Spatial Adaptive Normalization) network; wherein, the SPADE network can convert the segmentation layout map into a realistic picture. In the embodiment of the present disclosure, in addition to the above-mentioned spatial adaptive normalization network, the image generation network can also be other networks, for example, other networks that can replace the spatial adaptive normalization network, or can realize Other networks of the network function of the spatial adaptive normalization network, which are not specifically limited in the present disclosure.
在将第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图输入至图像生成器和空间自适应归一化网络中进行处理之后,可以得到合成图像。例如,可以得到如图8所示的合成图像,从图8可以看出,合成图像中包含目标对象的完整穿着特征,对于穿着较宽松服饰的目标对象依然可以得到较好的图像合成结果。After inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into the image generator and the spatial adaptive normalization network for processing, the Get a composite image. For example, a composite image as shown in FIG. 8 can be obtained. It can be seen from FIG. 8 that the composite image contains the complete wearing features of the target object, and better image synthesis results can still be obtained for the target object wearing loose clothing.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.
图9为本公开实施例提供的另一种图像生成方法的流程图,如图9所示,所述方法包括步骤S901~S908,其中:FIG. 9 is a flow chart of another image generation method provided by an embodiment of the present disclosure. As shown in FIG. 9, the method includes steps S901-S908, wherein:
步骤S901:获取原始图像和目标数据。Step S901: Acquire original image and target data.
这里,该原始图像可以为通过电子设备的摄像头拍摄到的包含目标人体的图像,该目标人体即为上述实施例中的目标对象。Here, the original image may be an image including a target human body captured by a camera of an electronic device, and the target human body is the target object in the foregoing embodiments.
这里,目标数据中包含目标姿势参数。其中,目标数据可以为包含处于目标姿势的对象的图像,或者为包含处于目标姿势的对象的视频。Here, target pose parameters are included in the target data. Wherein, the target data may be an image containing the object in the target pose, or a video containing the object in the target pose.
在本公开实施例中,在获取到原始图像之后,可以将原始图像进行像素值归一化处理,从而将原始图像中各个像素点的像素值归一化为[-a,a]范围内的数值,其中,a的取值可以为1,或者为其他的归一化后的数值,该a的取值可以根据归一化的实际需要进行设定,本公开对此不作具体限定。之后,可以修改原始图像的图片大小,例如,可以设置原始图像的图片大小为224*224和512*512,使其满足HMR(Human Mesh Recovery)人体姿态重建模型的输入要求。In the embodiment of the present disclosure, after the original image is acquired, the original image may be subjected to pixel value normalization processing, so that the pixel value of each pixel point in the original image is normalized to the range [-a, a] Numerical values, wherein the value of a can be 1, or other normalized numerical values, and the value of a can be set according to the actual needs of normalization, which is not specifically limited in the present disclosure. Afterwards, the picture size of the original image can be modified. For example, the picture size of the original image can be set to 224*224 and 512*512 to meet the input requirements of the HMR (Human Mesh Recovery) human body pose reconstruction model.
步骤S902:基于原始图像确定模型参数;其中,模型参数用于指示原始图像中目标对象的姿态信息。Step S902: Determine model parameters based on the original image; wherein, the model parameters are used to indicate pose information of the target object in the original image.
具体实施时,可以将处理之后的原始图像输入至HMR模型中进行处理,从而将HMR模型的输出结果作为模型参数,例如,SMPL模型的模型参数。During specific implementation, the processed original image may be input into the HMR model for processing, so that the output result of the HMR model is used as a model parameter, for example, a model parameter of the SMPL model.
步骤S903:基于模型参数构建第一身体模型。Step S903: Construct a first body model based on model parameters.
在本公开实施例中,可以基于模型参数调整初始模型中各顶点的位置信息;并基于调整后的各顶点的位置信息生成第一身体模型。In an embodiment of the present disclosure, the position information of each vertex in the initial model may be adjusted based on the model parameters; and the first body model may be generated based on the adjusted position information of each vertex.
这里,该初始模型可以为SMPL模型中预先创建的与目标对象的对象类型相匹配的身体模型,比如,该预设初始模型可以为预先创建的人体模型。初始模型中包含多个顶点和基于多个顶点确定的三角面片。Here, the initial model may be a pre-created body model in the SMPL model that matches the object type of the target object. For example, the preset initial model may be a pre-created human body model. The initial model contains multiple vertices and triangular faces determined based on the multiple vertices.
具体实施时,可以预先根据目标对象的对象类型搭建相匹配的初始模型。例如,首先创建P个3D顶点vertex(即,上述顶点)以及Q个三角面片,进而通过P个3D顶点vertex(即,上述顶点)以及Q个三角面片来表示目标对象的身体。其中,P个3D顶点可以理解为该初始模型的P个骨骼点,三角面片可以理解为基于3D顶点vertex所组成的三角形,每个三角形对应三个3D顶点vertex。之后,可以基于该模型参数确定初始模型中每个顶点经过动作形变和体形形变之后的顶点坐标,进而基于该顶点坐标调整初始模型中每个顶点的位置信息,利用调整之后的顶点确定三角面片,并由若干个三角面片构成第一身体模型。During specific implementation, a matching initial model can be built in advance according to the object type of the target object. For example, P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches are firstly created, and then the body of the target object is represented by the P 3D vertex vertex (ie, the above-mentioned vertex) and Q triangular patches. Among them, the P 3D vertices can be understood as P bone points of the initial model, and the triangle patch can be understood as triangles formed based on 3D vertex vertex, and each triangle corresponds to three 3D vertex vertex. Afterwards, based on the model parameters, the vertex coordinates of each vertex in the initial model after action deformation and body shape deformation can be determined, and then the position information of each vertex in the initial model can be adjusted based on the vertex coordinates, and the adjusted vertices can be used to determine the triangle surface , and the first body model is composed of several triangle faces.
步骤S904:基于第一身体模型确定原始图像的纹理贴图。Step S904: Determine the texture map of the original image based on the first body model.
具体实施时,可以将第一身体模型投影至原始图像,从而建立第一身体模型和原始图像之间的投影关系。其中,该投影关系用于指示第一身体模型中的顶点和原始图像中像素点之间的对应关系。基于该投影关系确定原始图像中与第一身体模型的各顶点相对应的像素点的像素值。获取第一身体模型的顶点和纹理贴图中各坐标之间的对应关系,进而可以基于该对应关系将第一身体模型的各顶点所对应的像素点渲染至纹理贴图上,从而得到原始图像的纹理贴图。During specific implementation, the first body model may be projected to the original image, so as to establish a projection relationship between the first body model and the original image. Wherein, the projection relationship is used to indicate the corresponding relationship between the vertices in the first body model and the pixels in the original image. Based on the projection relationship, the pixel values of the pixel points corresponding to the vertices of the first body model in the original image are determined. Obtain the correspondence between the vertices of the first body model and the coordinates in the texture map, and then render the pixels corresponding to the vertices of the first body model to the texture map based on the correspondence, so as to obtain the texture of the original image stickers.
步骤S905:基于纹理贴图和目标姿势参数确定目标对象处于目标姿势下的第一身 体形态图。Step S905: Determine the first body shape map of the target object in the target pose based on the texture map and the target pose parameters.
在本公开实施例中,可以首先基于第一身体模型和目标姿势参数确定目标对象处于目标姿势下的第二身体模型。之后,就可以基于纹理贴图的像素点和第二身体模型中各顶点之间的映射关系,将纹理贴图渲染至所述第二身体模型,得到第一身体形态图。In the embodiment of the present disclosure, firstly, the second body model of the target object in the target pose may be determined based on the first body model and the target pose parameters. Then, based on the mapping relationship between the pixels of the texture map and the vertices in the second body model, the texture map can be rendered to the second body model to obtain the first body shape map.
步骤S906:确定目标对象处于目标姿势下的第一语义分割图。Step S906: Determine the first semantic segmentation map in which the target object is in the target pose.
在本公开实施例中,可以获取步骤S905中所确定的第二身体模型,并确定原始图像的第二语义分割图;其中,第二语义分割图用于指示所述目标对象在原始姿势下肢体部位的穿着特征,进而基于第二身体模型和第二语义分割图,生成第一语义分割图。In an embodiment of the present disclosure, the second body model determined in step S905 may be obtained, and a second semantic segmentation map of the original image may be determined; wherein, the second semantic segmentation map is used to indicate that the limbs of the target object are in the original posture The wearing feature of the part, and then based on the second body model and the second semantic segmentation map, a first semantic segmentation map is generated.
这里,第二语义分割图包含目标对象在原始姿势下的原始分割图和原始语义图。在本公开实施例中,可以通过原始分割图和第二身体模型生成目标姿势下的目标分割图,以及通过原始语义图和第二身体模型生成目标姿势下的目标语义图,其中,目标对象处于目标姿势下的第一语义分割图包含目标分割图和目标语义图,具体过程描述如下:Here, the second semantic segmentation map contains the original segmentation map and the original semantic map of the target object in the original pose. In the embodiment of the present disclosure, the target segmentation map under the target pose can be generated through the original segmentation map and the second body model, and the target semantic map under the target pose can be generated through the original semantic map and the second body model, wherein the target object is in The first semantic segmentation map under the target pose includes the target segmentation map and the target semantic map. The specific process is described as follows:
将原始分割图和第二身体模型输入至目标子网络中,得到目标对象处于目标姿势下的目标分割图;其中,所述目标分割图包含所述目标对象处于所述目标姿势时全部身体部位的穿着特征;以及将原始语义图和第二身体模型输入至目标子网络中,得到目标对象处于目标姿势下的目标语义图,目标语义图包含所述目标分割图中所述目标对象的各个肢体部位的穿着特征。Inputting the original segmentation map and the second body model into the target sub-network to obtain the target segmentation map of the target object in the target pose; wherein, the target segmentation map includes all body parts of the target object in the target pose Wearing features; and inputting the original semantic map and the second body model into the target sub-network to obtain the target semantic map of the target object in the target pose, the target semantic map includes the various body parts of the target object in the target segmentation map clothing characteristics.
步骤S907:获取目标对象处于原始姿势下的第二身体形态图。Step S907: Obtain a second body shape map of the target object in the original posture.
步骤S908:基于第一身体形态图、第一语义分割图、第二身体形态图和所述第二语义分割图,确定携带目标对象的完整穿着特征的合成图像。Step S908: Based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determine a composite image carrying the complete wearing characteristics of the target object.
具体实施时,可以将所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图输入至图像生成网络中进行处理,得到所述合成图像。During specific implementation, the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map can be input into an image generation network for processing to obtain the synthesized image.
这里,图像生成网络可以为空间自适应归一化网络,其中,该空间自适应归一化网络可以为基于对抗式生成网络进行训练得到。Here, the image generation network may be a space-adaptive normalization network, wherein the space-adaptive normalization network may be trained based on an adversarial generation network.
在本公开实施例中,可以通过基于原始图像和目标姿势参数确定第一身体形态图,从而得到更加准确的目标对象的身体形态信息;之后,通过确定第一语义分割图,可以得到目标对象完整的穿着特征,从而解决现有参数化人体模型无法获得位于人体模型外的纹理信息和几何信息的问题。在基于第一身体形态图和第一语义分割图确定合成图像时,可以提升图像合成效果,在原始姿势和目标姿势之间差异较大的情况下,依然可以得到姿态更加准确的合成图像,并且对于穿着较宽松服饰的目标对象依然可以得到较好的图像合成结果。In the embodiment of the present disclosure, more accurate body shape information of the target object can be obtained by determining the first body shape map based on the original image and target pose parameters; after that, by determining the first semantic segmentation map, the complete body shape map of the target object can be obtained. Therefore, the problem that the existing parametric human body model cannot obtain the texture information and geometric information outside the human body model is solved. When the synthetic image is determined based on the first body shape map and the first semantic segmentation map, the image synthesis effect can be improved, and in the case of a large difference between the original posture and the target posture, a composite image with a more accurate posture can still be obtained, and Better image synthesis results can still be obtained for target objects wearing looser clothing.
基于同一发明构思,本公开实施例中还提供了与图像生成方法对应的图像生成装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述图像生成方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides an image generating device corresponding to the image generating method. Since the problem-solving principle of the device in the embodiment of the present disclosure is similar to the above-mentioned image generating method in the embodiment of the present disclosure, the implementation of the device Reference can be made to the implementation of the method, and repeated descriptions will not be repeated.
参照图10所示,为本公开实施例提供的一种图像生装置的示意图,所述装置包括:获取单元10、第一确定单元20、第二确定单元30;其中,Referring to FIG. 10 , which is a schematic diagram of an image generation device provided by an embodiment of the present disclosure, the device includes: an acquisition unit 10, a first determination unit 20, and a second determination unit 30; wherein,
获取单元10,用于获取原始图像;其中,所述原始图像中包含处于原始姿势的目标对象;An acquisition unit 10, configured to acquire an original image; wherein, the original image includes a target object in an original posture;
第一确定单元20,用于基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图;以及确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第一语义分割图用于指示所述目标对象的肢体部位的穿着特征;The first determining unit 20 is configured to determine, based on the original image and the target pose parameters, a first body shape map of the target object in the target pose; and determine a first semantic representation of the target object in the target pose Segmentation diagram; wherein, the first semantic segmentation diagram is used to indicate the wearing characteristics of the limb parts of the target object;
第二确定单元30,用于基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的所述目标对 象。The second determining unit 30 is configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target.
一种可能的实施方式中,第一确定单元,还用于:基于所述原始图像构建所述目标对象的第一身体模型,并基于所述第一身体模型确定所述原始图像的纹理贴图;基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图。In a possible implementation manner, the first determination unit is further configured to: construct a first body model of the target object based on the original image, and determine a texture map of the original image based on the first body model; A first body shape map of the target object in a target pose is determined based on the texture map and the target pose parameters.
一种可能的实施方式中,第一确定单元,还用于:基于所述原始图像确定第一身体模型的模型参数;其中,所述模型参数用于指示所述原始图像中所述目标对象的姿态信息;基于所述模型参数调整初始模型中各顶点的位置信息;基于调整后的各顶点的位置信息生成所述第一身体模型。In a possible implementation manner, the first determining unit is further configured to: determine a model parameter of the first body model based on the original image; wherein the model parameter is used to indicate the target object in the original image attitude information; adjusting the position information of each vertex in the initial model based on the model parameters; generating the first body model based on the adjusted position information of each vertex.
一种可能的实施方式中,第一确定单元,还用于:基于所述第一身体模型和所述目标姿势参数确定所述目标对象处于目标姿势下的第二身体模型;基于所述纹理贴图的像素点和各顶点之间的映射关系,将所述纹理贴图渲染至所述第二身体模型,得到所述第一身体形态图。In a possible implementation manner, the first determining unit is further configured to: determine a second body model of the target object in a target pose based on the first body model and the target pose parameter; The mapping relationship between the pixels and the vertices, rendering the texture map to the second body model to obtain the first body shape map.
一种可能的实施方式中,第一确定单元,还用于:确定所述原始图像的第二语义分割图;其中,所述第二语义分割图用于指示所述目标对象在原始姿势下肢体部位的穿着特征;基于所述第二语义分割图生成所述第一语义分割图。In a possible implementation manner, the first determining unit is further configured to: determine a second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate that the limbs of the target object are in the original posture The wearing feature of the part; generating the first semantic segmentation map based on the second semantic segmentation map.
一种可能的实施方式中,第一确定单元,还用于:获取所述目标对象处于目标姿势下的第二身体模型;其中,所述第二身体模型为基于第一身体模型和所述目标姿势参数确定的,所述第一身体模型为基于所述原始图像所构建的所述目标对象的身体模型;基于所述第二身体模型和所述第二语义分割图,生成所述第一语义分割图。In a possible implementation manner, the first determining unit is further configured to: acquire a second body model of the target object in a target posture; wherein, the second body model is based on the first body model and the target Determined by posture parameters, the first body model is the body model of the target object constructed based on the original image; based on the second body model and the second semantic segmentation map, the first semantic segmentation map is generated Split graph.
一种可能的实施方式中,在第二语义分割图包括原始语义图和原始分割图,所述第一语义分割图包括目标分割图和目标语义图的情况下;第一确定单元,还用于:基于所述原始分割图生成所述目标对象处于所述目标姿势下的目标分割图;其中,所述目标分割图包含所述目标对象处于所述目标姿势时全部身体部位的穿着特征;基于所述原始语义图生成所述目标对象处于所述目标姿势下的目标语义图;其中,所述目标语义图包含所述目标分割图中所述目标对象的各个肢体部位的穿着特征。In a possible implementation manner, when the second semantic segmentation graph includes an original semantic graph and an original segmentation graph, and the first semantic segmentation graph includes a target segmentation graph and a target semantic graph; the first determining unit is further configured to : Generate a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes the wearing features of all body parts when the target object is in the target pose; based on the The original semantic map generates a target semantic map in which the target object is in the target pose; wherein, the target semantic map includes wearing features of various body parts of the target object in the target segmentation map.
一种可能的实施方式中,该装置还用于:获取原始图像的第二语义分割图,并获取所述目标对象处于原始姿势下的第二身体形态图;其中,所述第二语义分割图用于表征所述目标对象在原始姿势下各个肢体部位的位置信息,所述第二身体形态图中携带所述目标对象的完整穿着特征;第二确定单元,还用于:基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的穿着特征的合成图像。In a possible implementation manner, the device is also used to: obtain a second semantic segmentation map of the original image, and obtain a second body shape map of the target object in the original posture; wherein, the second semantic segmentation map It is used to characterize the position information of each limb part of the target object in the original posture, and the second body shape map carries the complete wearing characteristics of the target object; the second determination unit is also configured to: based on the first The body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map determine a composite image carrying the wearing characteristics of the target object.
一种可能的实施方式中,第二确定单元,还用于:将所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图输入至图像生成网络中进行处理,得到所述合成图像。In a possible implementation manner, the second determining unit is further configured to: input the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map to the image generation network for processing to obtain the composite image.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the device and the interaction flow between the modules, reference may be made to the relevant description in the above method embodiment, and details will not be described here.
对应于图1中的图像生成方法,本公开实施例还提供了一种电子设备1100,如图11所示,为本公开实施例提供的电子设备1100结构示意图,包括:Corresponding to the image generation method in FIG. 1, the embodiment of the present disclosure also provides an electronic device 1100, as shown in FIG. 11, which is a schematic structural diagram of the electronic device 1100 provided in the embodiment of the present disclosure, including:
处理器111、存储器112、和总线113;存储器112用于存储执行指令,包括内存1121和外部存储器1122;这里的内存1121也称内存储器,用于暂时存放处理器111中的运算数据,以及与硬盘等外部存储器1122交换的数据,处理器111通过内存1121与外部存储器1122进行数据交换,当所述电子设备1100运行时,所述处理器111与所述存储器112之间通过总线113通信,使得所述处理器111执行以下指令: Processor 111, memory 112, and bus 113; memory 112 is used for storing execution order, comprises memory 1121 and external memory 1122; memory 1121 here is also called internal memory, is used for temporarily storing the operation data in processor 111, and The data exchanged by the external memory 1122 such as hard disk, the processor 111 exchanges data with the external memory 1122 through the memory 1121, when the electronic device 1100 is running, the processor 111 communicates with the memory 112 through the bus 113, so that The processor 111 executes the following instructions:
获取原始图像;其中,所述原始图像中包含处于原始姿势的目标对象;Acquiring an original image; wherein, the original image contains the target object in the original posture;
基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图;以及确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第 一语义分割图包含所述目标对象的肢体部位的穿着特征;Based on the original image and target pose parameters, determine a first body shape map of the target object in the target pose; and determine a first semantic segmentation map of the target object in the target pose; wherein, the first a semantic segmentation map comprising wearing features of body parts of the target object;
基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的所述目标对象。A composite image is determined based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing feature.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的图像生成方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image generation method described in the above-mentioned method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的图像生成方法的步骤,具体可参见上述方法实施例,在此不再赘述。The embodiment of the present disclosure also provides a computer program product, the computer program product carries a program code, and the instructions included in the program code can be used to execute the steps of the image generation method described in the above method embodiment, for details, please refer to the above method The embodiment will not be repeated here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Wherein, the above-mentioned computer program product may be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) etc. wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described system and device can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are realized in the form of software function units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disc and other media that can store program codes. .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that: the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, rather than limit them, and the protection scope of the present disclosure is not limited thereto, although referring to the aforementioned The embodiments have described the present disclosure in detail, and those skilled in the art should understand that any person familiar with the technical field can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure Changes can be easily imagined, or equivalent replacements can be made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be included in this disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be defined by the protection scope of the claims.
Claims (12)
- 一种图像生成方法,包括:A method of image generation, comprising:获取原始图像;其中,所述原始图像中包含处于原始姿势的目标对象;Acquiring an original image; wherein, the original image contains the target object in the original posture;基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图;以及确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第一语义分割图用于指示所述目标对象的肢体部位的穿着特征;Based on the original image and target pose parameters, determine a first body shape map of the target object in the target pose; and determine a first semantic segmentation map of the target object in the target pose; wherein, the first A semantic segmentation map is used to indicate the wearing characteristics of the body part of the target object;基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的所述目标对象。A composite image is determined based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target object in the target pose and having the wearing feature.
- 根据权利要求1所述的方法,其特征在于,所述基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图,包括:The method according to claim 1, wherein, based on the original image and target posture parameters, determining the first body shape diagram of the target object in the target posture comprises:基于所述原始图像构建所述目标对象的第一身体模型,并基于所述第一身体模型确定所述原始图像的纹理贴图;Constructing a first body model of the target object based on the original image, and determining a texture map of the original image based on the first body model;基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图。A first body shape map of the target object in a target pose is determined based on the texture map and the target pose parameters.
- 根据权利要求2所述的方法,其特征在于,所述基于所述原始图像构建所述目标对象的第一身体模型,包括:The method according to claim 2, wherein said constructing the first body model of the target object based on the original image comprises:基于所述原始图像确定第一身体模型的模型参数;其中,所述模型参数用于指示所述原始图像中所述目标对象的姿态信息;Determining model parameters of the first body model based on the original image; wherein the model parameters are used to indicate pose information of the target object in the original image;基于所述模型参数调整初始模型中各顶点的位置信息;adjusting the position information of each vertex in the initial model based on the model parameters;基于调整后的各顶点的位置信息生成所述第一身体模型。The first body model is generated based on the adjusted position information of each vertex.
- 根据权利要求2或3所述的方法,其特征在于,所述基于所述纹理贴图和所述目标姿势参数确定所述目标对象处于目标姿势下的第一身体形态图,包括:The method according to claim 2 or 3, wherein the determining the first body shape map of the target object in the target pose based on the texture map and the target pose parameters comprises:基于所述第一身体模型和所述目标姿势参数确定所述目标对象处于目标姿势下的第二身体模型;determining a second body model of the target subject in a target pose based on the first body model and the target pose parameters;基于所述纹理贴图的像素点和各顶点之间的映射关系,将所述纹理贴图渲染至所述第二身体模型,得到所述第一身体形态图。Rendering the texture map to the second body model based on the mapping relationship between the pixels of the texture map and the vertices, to obtain the first body shape map.
- 根据权利要求1至4中任一项所述的方法,其特征在于,所述确定目标对象处于所述目标姿势下的第一语义分割图,包括:The method according to any one of claims 1 to 4, wherein said determining the first semantic segmentation map in which the target object is in the target pose comprises:确定所述原始图像的第二语义分割图;其中,所述第二语义分割图用于指示所述目标对象在所述原始姿势下肢体部位的穿着特征;Determining a second semantic segmentation map of the original image; wherein, the second semantic segmentation map is used to indicate the wearing characteristics of the limb parts of the target object in the original posture;基于所述第二语义分割图生成所述第一语义分割图。The first semantic segmentation graph is generated based on the second semantic segmentation graph.
- 根据权利要求5所述的方法,其特征在于,所述基于所述第二语义分割图生成所述第一语义分割图,包括:The method according to claim 5, wherein the generating the first semantic segmentation graph based on the second semantic segmentation graph comprises:获取所述目标对象处于目标姿势下的第二身体模型;其中,所述第二身体模型为基于第一身体模型和所述目标姿势参数确定的,所述第一身体模型为基于所述原始图像所构建的所述目标对象的身体模型;Acquiring a second body model of the target object in a target posture; wherein, the second body model is determined based on the first body model and the target posture parameters, and the first body model is determined based on the original image a constructed body model of the target subject;基于所述第二身体模型和所述第二语义分割图,生成所述第一语义分割图。The first semantic segmentation map is generated based on the second body model and the second semantic segmentation map.
- 根据权利要求5或6所述的方法,其特征在于,所述第二语义分割图包括原始语义图和原始分割图;所述第一语义分割图包括目标分割图和目标语义图;The method according to claim 5 or 6, wherein the second semantic segmentation map includes an original semantic map and an original segmentation map; the first semantic segmentation map includes a target segmentation map and a target semantic map;所述基于所述第二语义分割图生成所述第一语义分割图,包括:The generating the first semantic segmentation graph based on the second semantic segmentation graph includes:基于所述原始分割图生成所述目标对象处于所述目标姿势下的目标分割图;其中,所述目标分割图包含所述目标对象处于所述目标姿势时全部身体部位的穿着特征;Generating a target segmentation map of the target object in the target pose based on the original segmentation map; wherein, the target segmentation map includes wearing features of all body parts of the target object in the target pose;基于所述原始语义图生成所述目标对象处于所述目标姿势下的目标语义图;其中,所述目标语义图包含所述目标分割图中所述目标对象的各个肢体部位的穿着特征。A target semantic map in which the target object is in the target pose is generated based on the original semantic map; wherein, the target semantic map includes wearing features of various body parts of the target object in the target segmentation map.
- 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 7, further comprising:获取原始图像的第二语义分割图,并获取所述目标对象处于原始姿势下的第二身体 形态图;其中,所述第二语义分割图用于表征所述目标对象在原始姿势下各个肢体部位的位置信息,所述第二身体形态图中携带所述目标对象的完整穿着特征;Obtain a second semantic segmentation map of the original image, and obtain a second body shape map of the target object in the original posture; wherein, the second semantic segmentation map is used to characterize the body parts of the target object in the original posture The location information of the second body shape map carries the complete wearing characteristics of the target object;所述基于所述第一身体形态图和所述第一语义分割图确定合成图像,包括:The determining a composite image based on the first body shape map and the first semantic segmentation map includes:基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的完整穿着特征的合成图像。Based on the first body shape map, the first semantic segmentation map, the second body shape map, and the second semantic segmentation map, determine a composite image carrying complete wearing features of the target object.
- 根据权利要求8所述的方法,其特征在于,所述基于所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图,确定携带所述目标对象的完整穿着特征的合成图像,包括:The method according to claim 8, wherein, based on the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map, determining the A composite image of the complete attire of the subject, including:将所述第一身体形态图、所述第一语义分割图、所述第二身体形态图和所述第二语义分割图输入至图像生成网络中进行处理,得到所述合成图像。Inputting the first body shape map, the first semantic segmentation map, the second body shape map and the second semantic segmentation map into an image generation network for processing to obtain the composite image.
- 一种图像生成装置,包括:An image generating device, comprising:获取单元,用于获取原始图像;其中,所述原始图像中包含处于原始姿势的目标对象;An acquisition unit, configured to acquire an original image; wherein, the original image contains the target object in the original posture;第一确定单元,用于基于所述原始图像及目标姿势参数,确定所述目标对象处于目标姿势下的第一身体形态图;以及确定所述目标对象处于所述目标姿势下的第一语义分割图;其中,所述第一语义分割图用于指示所述目标对象的肢体部位的穿着特征;A first determining unit, configured to determine a first body shape map of the target object in a target pose based on the original image and target pose parameters; and determine a first semantic segmentation of the target object in the target pose Figure; wherein, the first semantic segmentation figure is used to indicate the wearing characteristics of the limb parts of the target object;第二确定单元,用于基于所述第一身体形态图和所述第一语义分割图确定合成图像,其中,所述合成图像中包含处于所述目标姿势且具备所述穿着特征的所述目标对象。A second determining unit, configured to determine a composite image based on the first body shape map and the first semantic segmentation map, wherein the composite image includes the target in the target pose and having the wearing characteristics object.
- 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时执行如权利要求1至9任一所述的图像生成方法。An electronic device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the memory through the bus , when the machine-readable instructions are executed by the processor, the image generation method according to any one of claims 1 to 9 is executed.
- 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任一所述的图像生成方法。A computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processor, the image generation method according to any one of claims 1 to 9 is executed.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210187560.7 | 2022-02-28 | ||
CN202210187560.7A CN114581288A (en) | 2022-02-28 | 2022-02-28 | Image generation method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023160074A1 true WO2023160074A1 (en) | 2023-08-31 |
Family
ID=81777709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/134861 WO2023160074A1 (en) | 2022-02-28 | 2022-11-29 | Image generation method and apparatus, electronic device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114581288A (en) |
WO (1) | WO2023160074A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114581288A (en) * | 2022-02-28 | 2022-06-03 | 北京大甜绵白糖科技有限公司 | Image generation method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977847A (en) * | 2019-03-22 | 2019-07-05 | 北京市商汤科技开发有限公司 | Image generating method and device, electronic equipment and storage medium |
CN111275518A (en) * | 2020-01-15 | 2020-06-12 | 中山大学 | Video virtual fitting method and device based on mixed optical flow |
US20210065418A1 (en) * | 2019-08-27 | 2021-03-04 | Shenzhen Malong Technologies Co., Ltd. | Appearance-flow-based image generation |
CN113658309A (en) * | 2021-08-25 | 2021-11-16 | 北京百度网讯科技有限公司 | Three-dimensional reconstruction method, device, equipment and storage medium |
CN113838166A (en) * | 2021-09-22 | 2021-12-24 | 网易(杭州)网络有限公司 | Image feature migration method and device, storage medium and terminal equipment |
CN114529940A (en) * | 2022-01-19 | 2022-05-24 | 华南理工大学 | Human body image generation method based on posture guidance |
CN114581288A (en) * | 2022-02-28 | 2022-06-03 | 北京大甜绵白糖科技有限公司 | Image generation method and device, electronic equipment and storage medium |
-
2022
- 2022-02-28 CN CN202210187560.7A patent/CN114581288A/en not_active Withdrawn
- 2022-11-29 WO PCT/CN2022/134861 patent/WO2023160074A1/en unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977847A (en) * | 2019-03-22 | 2019-07-05 | 北京市商汤科技开发有限公司 | Image generating method and device, electronic equipment and storage medium |
US20210065418A1 (en) * | 2019-08-27 | 2021-03-04 | Shenzhen Malong Technologies Co., Ltd. | Appearance-flow-based image generation |
CN111275518A (en) * | 2020-01-15 | 2020-06-12 | 中山大学 | Video virtual fitting method and device based on mixed optical flow |
CN113658309A (en) * | 2021-08-25 | 2021-11-16 | 北京百度网讯科技有限公司 | Three-dimensional reconstruction method, device, equipment and storage medium |
CN113838166A (en) * | 2021-09-22 | 2021-12-24 | 网易(杭州)网络有限公司 | Image feature migration method and device, storage medium and terminal equipment |
CN114529940A (en) * | 2022-01-19 | 2022-05-24 | 华南理工大学 | Human body image generation method based on posture guidance |
CN114581288A (en) * | 2022-02-28 | 2022-06-03 | 北京大甜绵白糖科技有限公司 | Image generation method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114581288A (en) | 2022-06-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10540817B2 (en) | System and method for creating a full head 3D morphable model | |
Zurdo et al. | Animating wrinkles by example on non-skinned cloth | |
WO2022205762A1 (en) | Three-dimensional human body reconstruction method and apparatus, device, and storage medium | |
JP2024522287A (en) | 3D human body reconstruction method, apparatus, device and storage medium | |
US20130127827A1 (en) | Multiview Face Content Creation | |
CN104732585A (en) | Human body type reconstructing method and device | |
US20180315254A1 (en) | Virtual Reality Presentation of Layers of Clothing on Avatars | |
WO2022237249A1 (en) | Three-dimensional reconstruction method, apparatus and system, medium, and computer device | |
US12079947B2 (en) | Virtual reality presentation of clothing fitted on avatars | |
CN112102480B (en) | Image data processing method, apparatus, device and medium | |
TWI750710B (en) | Image processing method and apparatus, image processing device and storage medium | |
CN114821675B (en) | Object processing method and system and processor | |
JP2022512262A (en) | Image processing methods and equipment, image processing equipment and storage media | |
WO2023160074A1 (en) | Image generation method and apparatus, electronic device, and storage medium | |
CN114219001A (en) | Model fusion method and related device | |
CN109859306A (en) | A method of extracting manikin in the slave photo based on machine learning | |
Liu et al. | Three-dimensional cartoon facial animation based on art rules | |
KR20060131145A (en) | Randering method of three dimension object using two dimension picture | |
CN115908651A (en) | Synchronous updating method for three-dimensional human body model and skeleton and electronic equipment | |
CN117114965A (en) | Virtual fitting and dressing method, virtual fitting and dressing equipment and system | |
Dibra | Recovery of the 3D Virtual Human: Monocular Estimation of 3D Shape and Pose with Data Driven Priors | |
KR102580427B1 (en) | Method for providing virtual fitting service and system for same | |
CN111696183B (en) | Projection interaction method and system and electronic equipment | |
CN117557699B (en) | Animation data generation method, device, computer equipment and storage medium | |
Yang et al. | Application of augmented reality technology in smart cartoon character design and visual modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22928334 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |