CN114783039B - Motion migration method driven by 3D human body model - Google Patents
Motion migration method driven by 3D human body model Download PDFInfo
- Publication number
- CN114783039B CN114783039B CN202210708260.9A CN202210708260A CN114783039B CN 114783039 B CN114783039 B CN 114783039B CN 202210708260 A CN202210708260 A CN 202210708260A CN 114783039 B CN114783039 B CN 114783039B
- Authority
- CN
- China
- Prior art keywords
- human body
- motion
- posture
- image
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000005012 migration Effects 0.000 title claims abstract description 27
- 238000013508 migration Methods 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 46
- 230000000295 complement effect Effects 0.000 claims abstract description 8
- 210000000988 bone and bone Anatomy 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 13
- 238000013256 Gubra-Amylin NASH model Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000003321 amplification Effects 0.000 claims description 6
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 230000004323 axial length Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 210000005069 ears Anatomy 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000002441 reversible effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 238000013461 design Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 208000012661 Dyskinesia Diseases 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a 3D human body model driven motion migration method, which comprises the steps of converting training data into a UV space and constructing and optimizing a 3D human body model by using complementary information between adjacent video frames; then projecting the optimized 3D human body model to a 2D plane to keep the 3D information of the original motion and realize that the optimized 3D human body model is driven in a target posture; taking the 2D projection and the posture of the training data as the input of a pre-training model, and storing the trained model; then normalizing the posture of the target person; and finally, the 2D projection of the optimized 3D human body model driven by the target person posture and the normalized target person posture are used as the input of a trained motion image generation model for final motion migration, so that the problems of blurring, shape distortion and the like in 2D plane image generation are solved, and the generated motion image is ensured to have reliable depth information, accurate shape and clear human face.
Description
Technical Field
The invention belongs to the technical field of motion migration, and particularly relates to a motion migration method driven by a 3D human body model.
Background
The human motion migration aims to synthesize a human motion image with the human texture and the target pose of the training image. It is currently used in film production, game design and medical rehabilitation. Based on the human motion migration technique, the character of the training image can be animated freely to perform the user-defined mimic action. The traditional motion migration method based on computer graphics requires complicated rendering operation to generate appearance texture, is time-consuming and complicated in calculation, but an ordinary user or a small-sized organization cannot afford extremely high calculation amount and time cost.
Human motion is a complex natural phenomenon, all real motion occurs in 3D space, and the reason that real motion images look natural is that they are 2D projections of the original motion in 3D space, inheriting the 3D information naturally. Existing motion migration studies are mostly based on 2D motion data, such as images and video, which are 2D projections of true motion. From such motion migration studies, it is found that the generated moving images generally have problems such as blurring and shape distortion.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a motion migration method driven by a 3D human body model, which not only overcomes the problems of blurring, shape distortion and the like in the generation of a 2D plane image, but also ensures that the generated motion image has reliable depth information, accurate shape and clear human face.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a 3D mannequin-driven motion migration method comprising: constructing a training data set by taking a video frame shot in advance as training data, and extracting the posture of the training data; converting the training data into a UV space, generating a UV image, and constructing and optimizing a 3D human body model by using complementary information between adjacent video frames; then projecting the optimized 3D human body model to a 2D plane to obtain a 2D projection retaining 3D information of the original motion, and driving the optimized 3D human body model in the posture of the target person; using the 2D projection of the 3D information with the original motion and the posture of the training data as the input of a motion image generation model, and storing the trained motion image generation model; normalizing the posture of the target person; and finally, performing final motion migration by taking the 2D projection of the optimized 3D human body model driven by the posture of the target person and the normalized posture of the target person as the input of the trained motion image generation model.
Further, extracting the posture of the training data by adopting a posture estimation algorithm OpenPose.
Further, pixels of the images in the training data are converted into a UV space by using DensePose, corresponding UV maps are generated, and adjacent videos are usedComplementary information between frames to construct and optimize a 3D human body model, including: taking a set of images of different poses spaced by several frames from training dataAnd corresponding to the generated UV map of DensePose, and then generating a set of local texture maps by UV conversionLocal texture map to be generatedInputting the data into a texture filling network to generate a texture map with multi-pose texture informationAnd applying the texture map by a lossy function pairReduced set of "original images"With a set of real imagesAnd performing loss calculation to realize optimization of the 3D human body model.
Further, the loss function is expressed as:
wherein,,from texture mapsIs obtained by reduction, n represents the number of the reduced 'original images', texture mapObtained from the following equation:
representing a local texture mapThe total number of the (c) is,represents a probability map generated by a texture-filling network, which predictsThe upper pixel point comes from the corresponding positionProbability of an upper pixel point;obtained from the following equation:
wherein,representTo (1) aThe elements of row j and column k,representThe element value of the jth row and kth column of (1),andeach of which represents a value of one of the elements,which represents the output of the decoder, and,represents the number of channels output by the decoder,representing the amplification factor of the amplification module; in particular, the number n of restored "original images" and the total number of local texture mapsAnd number of channels output by decoderAre equal in number.
Further, the projecting the optimized 3D human body model to a 2D plane to obtain a 2D projection retaining 3D information of the original motion, and driving the optimized 3D human body model in the pose of the target person includes: and predicting the posture of the 3D human body model through the HMR, and transmitting the predicted posture to the 3D human body model, so that the 3D human body model is driven.
Further, the motion image generation model is defined as a Face-Attention GAN model; the Face-Attention GAN model is a GAN modelBased on the type, matching an elliptical face region by using Gaussian distribution, configuring a face enhancement loss function, and introducing an attention mechanism, wherein: the method for matching the elliptical face area by using Gaussian distribution is realized by designing a mean value and a covariance matrix, and comprises the following steps: the position of the image face region is determined by the pose estimation algorithm openpos,is the location of the nose, eyes and ears; the center of the ellipse is set as the noseThe position of (a); two axes of the ellipse are eigenvectors of the covariance matrix, and the length of the axes is an eigenvalue of the covariance matrix; let a and b be the two axes of the ellipse, a and b both being unit vectors, and satisfy the following formula:
wherein,is two elements of b, the relationship between the eigenvectors a and b and the covariance matrix Σ is as follows:
wherein,,is the characteristic value corresponding to the a, and is,,is the characteristic value corresponding to the b-number,is the axial length of the ellipse, σ is the scaling factor, a and b are orthogonal,are necessarily reversible; in the process ofIn a Gaussian distribution where Σ is covariance as a mean, face-enhanced Gaussian weights are obtained by uniformly sampling at a distance interval of 1 within a rectangular region constructed by four points of (1, 1), (1, 512), (512, 1), (512 ), and obtaining a face-enhanced Gaussian weightAnd with the generated Gaussian weightTo define a face enhancement loss function; the face enhancement loss function is as follows:
wherein,the gesture is represented by a gesture that is,representing a 2D projection of a 3D phantom,ya real image is represented by a real image,to representAndis input to the image generated by the generator G,representing a gaussian weight generated by a gaussian distribution matching elliptical face; the attention mechanism introduced includes channel attention and spatial attention; the final objective function is:
wherein G denotes a generator, D denotes a discriminator,a loss function representing the GAN model,the fact that the discriminator can accurately judge the authenticity of the sample through minG and maxD and the sample generated by the generator can be distinguished through the discriminator is a mutual game process;representing a face enhancement loss function for enhancing a face region of an image;representing feature matching loss for ensuring global consistency of image content;representing perceptual reconstruction loss for ensuring global consistency of image content; parameter(s)For adjustment to balance these losses.
Further, in the attention mechanism introduced, a feature matching penalty based on discriminator D is employed, the feature matching penalty being as follows:
wherein,is the second of discriminator DiA layer-feature extractor for extracting the layer feature,represents the firstiThe number of the elements of the layer,Tis the total number of layers of discriminator D; and then inputting the generated image and the real image into a pre-trained VGG network, comparing the characteristics of different layers, and perceiving reconstruction loss as follows:
representing the i-th layer feature extractor of the VGG network,indicates the number of elements in the i-th layer,is the total number of layers of the VGG network.
Further, normalizing the posture of the target person specifically comprises: the real length of the bone segments is approximated by the maximum bone segment length in the training set, and the real bone segment length of the new pose is approximated in the same way; then, the length of the bone segments displayed in the image is adjusted according to the proportion between the standard skeleton and the new skeleton; is provided withThe ith joint coordinate representing the new pose,representing its parent joint coordinates;byAn adjustment is made, wherein,andrespectively representing the maximum bone segment length between the ith joint and the father joint in the target person image and the training image.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method comprises the steps of converting training data into a UV space to generate a UV image, and constructing and optimizing a 3D human body model by using complementary information between adjacent video frames; then projecting the optimized 3D human body model to a 2D plane to obtain a 2D projection retaining 3D information of the original motion, and driving the optimized 3D human body model in the posture of the target person; using the 2D projection of the 3D information with the original motion and the posture of the training data as the input of a motion image generation model, and storing the trained motion image generation model; normalizing the posture of the target person; finally, the 2D projection of the optimized 3D human body model driven by the posture of the target person and the normalized posture of the target person are used as the input of a trained motion image generation model for final motion migration, so that the problems of blurring, shape distortion and the like in 2D plane image generation are solved, and the generated motion image is ensured to have reliable depth information, accurate shape and clear human face;
(2) the method has the advantages of small calculation burden and short time consumption, and can be mainly applied to three fields: 1) in the field of film and television industry, the method can be used for simulating real characters to make actions with ornamental value and high difficulty; 2) in the field of game design, the method can be used for action design of virtual characters; 3) in the field of medical rehabilitation, the method can be used for synthesizing the normal movement posture of a patient with dyskinesia.
Drawings
FIG. 1 is a model framework for optimizing a 3D human body model in an embodiment of the invention;
FIG. 2 is a diagram of a texture filling network in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of the pose drive of a 3D human body model according to an embodiment of the invention;
FIG. 4 is a Face-Attention GAN model framework constructed in an embodiment of the present invention;
FIG. 5 is a diagram illustrating matching elliptical faces using Gaussian distributions in an embodiment of the present invention;
FIG. 6 is a schematic illustration of a CBAM attention mechanism in an embodiment of the present invention;
fig. 7 is a schematic diagram of a motion transfer process in an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
A 3D mannequin-driven motion migration method comprising: constructing a training data set by taking a video frame shot in advance as training data, and extracting the posture of the training data; converting the training data into a UV space, generating a UV image, and constructing and optimizing a 3D human body model by using complementary information between adjacent video frames; then projecting the optimized 3D human body model to a 2D plane to obtain a 2D projection retaining 3D information of the original motion, and driving the optimized 3D human body model in the posture of the target person; using the 2D projection of the 3D information with the original motion and the posture of the training data as the input of a motion image generation model, and storing the trained motion image generation model; normalizing the posture of the target person; and finally, performing final motion migration by taking the 2D projection of the optimized 3D human body model driven by the posture of the target person and the normalized posture of the target person as the input of the trained motion image generation model.
Step 1, constructing a training data set by taking a video frame shot in advance as training data, and extracting the posture of the training data.
An average length of 3 minutes of motion video is taken for each person at a rate of 30 frames per second, and the training data is video frames for each person, each video frame having a resolution of 512 x 512. These videos are taken by cell phones or fixed locations, with a shooting distance of about 5 meters. After the training data set is prepared, the posture of the training data set is extracted by adopting the most advanced posture estimation algorithm OpenPose.
And 2, converting pixels of the image in the training data into a UV space by using DensePose to generate a corresponding UV image. And constructs and optimizes the 3D human body model with complementary information between adjacent video frames.
The present embodiment is a human body model optimization method based on sequential images, and the framework of the method is shown in fig. 1. Taking a set of images of different poses spaced apart by several frames from training dataAnd generating a UV map corresponding to DensePose, then generating a group of local texture maps through UV conversion, and generating the local texture mapsInput into the texture filling network.
The texture filling network is shown in FIG. 2, and finally generates a complete texture map with multi-pose texture informationBy usingReduced set of "original images"With a set of real imagesL1 loss calculations are performed to cause the network to generate a more detailed texture map, which is ultimately used to generate a 3D phantom, enabling optimization of the 3D phantom. The corresponding loss function is expressed as:
wherein,,from texture mapsIs obtained by reduction, n represents the number of the reduced 'original images', texture mapObtained from the following equation:
representing a local texture mapThe total number of the (c) is,represents a probability map generated by a texture-filling network, which predictsThe upper pixel point comes from the corresponding positionProbability of an upper pixel point;obtained from the following equation:
wherein,to representThe jth row and kth column of (c),to representThe element value of the jth row and kth column of (1),andeach of which represents a value of one of the elements,which represents the output of the decoder, and,represents the number of channels output by the decoder,representing the amplification factor of the amplification module; in particular, the number n of restored "original images" and the local texture mapTotal number of (2)And number of channels output by decoderAre equal in number.
The optimization of the 3D human body model is realized according to the method.
And 3, projecting the optimized 3D human body model to a 2D plane to keep the 3D information of the original motion, and designing a posture driving method of the 3D human body model. The method predicts the pose of the 3D human body model through HMR and transmits the predicted pose to the 3D human body model, thereby implementing the driving of the 3D human body model, as shown in fig. 3. Visual skeleton map representation 3D human body model's gesture is accepted to direct-viewing convenient to.
And 4, taking the 2D projection and the posture of the training data as the input of the motion image generation model, and storing the trained model.
The embodiment provides a motion image generation model for final motion migration, wherein the motion image generation model is defined as a Face-Attention GAN model; the Face-Attention GAN model is based on GAN model, uses gaussian distribution to match elliptical Face regions, and configures Face enhancement loss function, and introduces Attention mechanism, the model takes 2D projection obtained in step 3 and pose extracted in step 1 as input of the model, the model framework is as shown in fig. 4, wherein the confrontation loss of GAN is as follows:
wherein G denotes a generator, D denotes a discriminator,the gesture is represented by a gesture that is,representing a 2D projection of a 3D human model, y representing a real image,to representAndinput to the image generated by the generator G,The function of (a) is to ensure the basic judgment capability of the discriminator, and a larger one means thatThe larger, i.e., the more accurately the discriminator can identify the true sample as a true sample,the effect of (a) is to ensure that the discriminator is able to distinguish between false samples, the larger it is, the more likely it is to meanThe smaller the size, the more correctly the discriminator can distinguish between spurious samples.
Matching elliptical face regions using gaussian distributions is achieved by designing mean and covariance matrices of the gaussian distributions, including: the position of the image face region is determined by the pose estimation algorithm openpos,is the location of the nose, eyes and ears; the center of the ellipse is set as the noseThe position of (a); two axes of the ellipse are eigenvectors of the covariance matrix, and the length of the axes is an eigenvalue of the covariance matrix; as shown in fig. 5, it is assumed that a and b are two axes of an ellipse, that a and b are both unit vectors, and that the following condition is satisfied:
wherein,is two elements of b, the relationship between the eigenvectors a and b and the covariance matrix Σ is as follows:
wherein,,,is the characteristic value corresponding to the a, and is,is the characteristic value corresponding to the b-number,is the axial length of the ellipse, σ is the scaling factor, a and b are orthogonal,are necessarily reversible; in a manner thatIs taken as the mean value of the average value,in a Gaussian distribution of covariance, face-enhanced Gaussian weights are obtained by uniformly sampling at a distance interval of 1 in a rectangular region constructed by four points of (1, 1), (1, 512), (512, 1), (512 )And with the generated Gaussian weightDefining a face enhancement loss function;
the designed face enhancement loss function is as follows:
wherein,the gesture is represented by a gesture that is,representing a 2D projection of a 3D human model, y representing a real image,to representAndinput to the image generated by the generator G,Is represented by a Gaussian distributionAnd matching the Gaussian weight generated by the elliptical face.
And an attention mechanism is introduced into the model, and the attention mechanism structure is shown in fig. 6 and is formed by combining channel attention and space attention.
To further refine the details, a feature matching penalty based on discriminator D is employed, as follows:
wherein,is the i-th layer feature extractor of discriminator D,represents the number of elements of the ith layer,Tis the total number of layers of discriminator D.
And then inputting the generated image and the real image into a pre-trained VGG network, and comparing the characteristics of different layers. The perceptual reconstruction loss is as follows:
wherein,represents the i-th layer feature extractor of the VGG network,representing the number of elements in the ith layer, and N is the total number of layers of the VGG network.
The final objective function is:
wherein the parametersFor adjustment to balance these losses, G denotes a generator, D denotes a discriminator,a loss function representing the loss of the GAN is shown,the fact that the discriminator can accurately judge the authenticity of the sample through minG and maxD is shown, and the sample generated by the generator can be distinguished through the discriminator, so that the process of mutual game is realized.Representing a face enhancement loss function for enhancing the facial area of an image.And representing feature matching loss for ensuring the global consistency of the image content.Representing the perceptual reconstruction loss for ensuring global consistency of the image content.
Step 5, in this embodiment, the pose of the target person is normalized. The real length of the bone segments is approximated by the maximum bone segment length in the training set, and the real bone segment length of the new pose is approximated in the same way; then, the length of the bone segment displayed in the image is adjusted according to the proportion between the standard skeleton and the new skeleton; is provided withThe ith joint coordinate representing the new pose,representing its parent joint coordinates;byAn adjustment is made, wherein,andrespectively representing the maximum bone segment length between the ith joint and the father joint in the target person image and the training image.
And 6, inputting the 2D projection of the optimized 3D human body model driven by the target person posture and the normalized target person posture into a trained motion image generation model to perform final motion migration, wherein the motion migration process comprises posture normalization of a new skeleton and generation of a target person image, and is shown in FIG. 7.
Converting training data into a UV space to generate a UV image, and constructing and optimizing a 3D human body model by using complementary information between adjacent video frames; then projecting the optimized 3D human body model to a 2D plane to keep the 3D information of the original motion and realize that the optimized 3D human body model is driven in a target posture; taking the 2D projection and the posture of the training data as the input of a pre-training model, and storing the trained model; then normalizing the posture of the target person; finally, the 2D projection of the optimized 3D human body model driven by the target person posture and the normalized target person posture are used as the input of a trained motion image generation model for carrying out final motion migration, so that the problems of blurring, shape distortion and the like in 2D plane image generation are solved, and the generated motion image is ensured to have reliable depth information, accurate shape and clear human face; the method has the advantages of small calculation burden and short time consumption, and can be mainly applied to three fields: (1) in the field of film and television industry, the method can be used for simulating real characters to make actions with ornamental value and high difficulty; (2) in the field of game design, the method can be used for action design of virtual characters; (3) in the field of medical rehabilitation, the method can be used for synthesizing the normal movement posture of a patient with dyskinesia.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.
Claims (7)
1. A 3D mannequin-driven motion migration method, comprising:
constructing a training data set by taking a video frame shot in advance as training data, and extracting the posture of the training data;
converting the training data into a UV space, generating a UV map, and constructing and optimizing a 3D human body model by using complementary information between adjacent video frames;
then projecting the optimized 3D human body model to a 2D plane to obtain a 2D projection retaining 3D information of the original motion, and driving the optimized 3D human body model in the posture of the target person;
using the 2D projection of the 3D information with the original motion and the posture of the training data as the input of a motion image generation model, and storing the trained motion image generation model;
normalizing the posture of the target person;
finally, the 2D projection of the optimized 3D human body model driven by the posture of the target person and the normalized posture of the target person are used as the input of the trained motion image generation model for final motion migration;
the motion image generation model is defined as a Face-Attention GAN model; the Face-Attention GAN model is based on the GAN model, uses Gaussian distribution to match an elliptical human Face region, configures a human Face enhancement loss function, and introduces an Attention mechanism, wherein:
the method for matching the elliptical face area by using Gaussian distribution is realized by designing a mean value and a covariance matrix, and comprises the following steps: the position of the image face region is determined by the pose estimation algorithm openposition,is the location of the nose, eyes and ears; the center of the ellipse is set as a noseThe position of (a); two axes of the ellipse are eigenvectors of the covariance matrix, and the length of the axes is an eigenvalue of the covariance matrix; let a and b be the two axes of the ellipse, a and b both being unit vectors, and satisfy the following formula:
wherein,is two elements of b, the relationship between the eigenvectors a and b and the covariance matrix Σ is as follows:
wherein,,is the characteristic value corresponding to the a, and is,,is the characteristic value corresponding to the b-number,is the axial length of the ellipse, σ is the scaling factor, a and b are orthogonal,are necessarily reversible; in a manner thatIn a Gaussian distribution where Σ is covariance as a mean, face-enhanced Gaussian weights are obtained by uniformly sampling at a distance interval of 1 within a rectangular region constructed by four points of (1, 1), (1, 512), (512, 1), (512 ), and obtaining a face-enhanced Gaussian weightAnd with the generated Gaussian weightTo define a face enhancement loss function;
the face enhancement loss function is as follows:
wherein,the gesture is represented by a gesture that is,representing a 2D projection of a 3D phantom,ya real image is represented by a real image,representAndis input to the image generated by the generator G,representing a Gaussian weight generated by matching the face of the ellipse with a Gaussian distribution;
the attention mechanism introduced includes channel attention and spatial attention; the final objective function is:
wherein G denotes a generator, D denotes a discriminator,a loss function representing the GAN model,the fact that the discriminator can accurately judge the authenticity of the sample through minG and maxD and the sample generated by the generator can be distinguished through the discriminator is a mutual game process;representing a face enhancement loss function for enhancing a face region of an image;representing feature matching loss for ensuring global consistency of image content;representing perceptual reconstruction loss for ensuring global consistency of image content; parameter(s)For adjustment to balance these losses.
2. The 3D mannequin-driven motion migration method of claim 1, wherein a pose estimation algorithm openpos is used to extract a pose of the training data.
3. The 3D phantom-driven motion migration method according to claim 1, wherein converting pixels of the image in the training data to UV space using DensePose, generating corresponding UV maps, and constructing and optimizing the 3D phantom with complementary information between adjacent video frames comprises:
taking a set of images of different poses spaced by several frames from training dataAnd corresponding to the generated UV map of DensePose, and then generating a set of local texture maps by UV conversionLocal texture map to be generatedInputting the data into a texture filling network to generate a texture map with multi-pose texture informationAnd using the texture map through a pair of loss functionsReduced set of "original images"With a set of real imagesAnd performing loss calculation to realize optimization of the 3D human body model.
4. The 3D mannequin-driven motion transfer method of claim 3, wherein the loss function is expressed as:
wherein,,from texture mapsIs obtained by reduction, n represents the number of the reduced 'original images', texture mapObtained from the following equation:
representing a local texture mapThe total number of the (c) is,represents a probability map generated by a texture-filling network, which predictsThe upper pixel point comes from the corresponding positionProbability of an upper pixel point;obtained from the following equation:
wherein,to representThe jth row and kth column of (c),to representThe element value of the jth row and kth column of (1),andeach of which represents a value of one of the elements,which represents the output of the decoder, and,represents the number of channels output by the decoder,representing the amplification factor of the amplification module; the number n of restored 'original images' and the total number of local texture mapsAnd number of channels output by decoderAre equal in number.
5. The 3D mannequin-driven motion migration method according to claim 1, wherein the projecting the optimized 3D mannequin onto a 2D plane, obtaining a 2D projection retaining 3D information of the original motion, and driving the optimized 3D mannequin in the pose of the target person comprises: and predicting the posture of the 3D human body model through the HMR, and transmitting the predicted posture to the 3D human body model, so that the 3D human body model is driven.
6. The 3D mannequin-driven motion migration method according to claim 1, wherein in the introduced attention mechanism, a feature matching penalty based on discriminator D is employed as follows:
wherein,is the second of discriminator DiA layer-feature extractor for extracting the layer feature,represents the firstiThe number of the elements of the layer,Tis the total number of layers of discriminator D;
and then inputting the generated image and the real image into a pre-trained VGG network, comparing the characteristics of different layers, and perceiving reconstruction loss as follows:
7. The 3D human model-driven motion transfer method according to claim 1, wherein the pose of the target person is normalized, specifically: the real length of the bone segments is approximated by the maximum bone segment length in the training set, and the real bone segment length of the new pose is approximated in the same way; then, the length of the bone segment displayed in the image is adjusted according to the proportion between the standard skeleton and the new skeleton; is provided withThe ith joint coordinate representing the new pose,representing its parent joint coordinates;byAn adjustment is made, wherein,andrespectively representing the maximum bone segment length between the ith joint and the father joint in the target person image and the training image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210708260.9A CN114783039B (en) | 2022-06-22 | 2022-06-22 | Motion migration method driven by 3D human body model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210708260.9A CN114783039B (en) | 2022-06-22 | 2022-06-22 | Motion migration method driven by 3D human body model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114783039A CN114783039A (en) | 2022-07-22 |
CN114783039B true CN114783039B (en) | 2022-09-16 |
Family
ID=82422416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210708260.9A Active CN114783039B (en) | 2022-06-22 | 2022-06-22 | Motion migration method driven by 3D human body model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114783039B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116071831B (en) * | 2023-03-20 | 2023-06-20 | 南京信息工程大学 | Human body image generation method based on UV space transformation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640172A (en) * | 2020-05-08 | 2020-09-08 | 大连理工大学 | Attitude migration method based on generation of countermeasure network |
CN111724414A (en) * | 2020-06-23 | 2020-09-29 | 宁夏大学 | Basketball movement analysis method based on 3D attitude estimation |
CN111797753A (en) * | 2020-06-29 | 2020-10-20 | 北京灵汐科技有限公司 | Training method, device, equipment and medium of image driving model, and image generation method, device and medium |
CN112215116A (en) * | 2020-09-30 | 2021-01-12 | 江苏大学 | Mobile 2D image-oriented 3D river crab real-time detection method |
CN112651316A (en) * | 2020-12-18 | 2021-04-13 | 上海交通大学 | Two-dimensional and three-dimensional multi-person attitude estimation system and method |
CN114612614A (en) * | 2022-03-09 | 2022-06-10 | 北京大甜绵白糖科技有限公司 | Human body model reconstruction method and device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111161200A (en) * | 2019-12-22 | 2020-05-15 | 天津大学 | Human body posture migration method based on attention mechanism |
CN114049652A (en) * | 2021-11-05 | 2022-02-15 | 成都艾特能电气科技有限责任公司 | Human body posture migration method and system based on action driving |
-
2022
- 2022-06-22 CN CN202210708260.9A patent/CN114783039B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640172A (en) * | 2020-05-08 | 2020-09-08 | 大连理工大学 | Attitude migration method based on generation of countermeasure network |
CN111724414A (en) * | 2020-06-23 | 2020-09-29 | 宁夏大学 | Basketball movement analysis method based on 3D attitude estimation |
CN111797753A (en) * | 2020-06-29 | 2020-10-20 | 北京灵汐科技有限公司 | Training method, device, equipment and medium of image driving model, and image generation method, device and medium |
CN112215116A (en) * | 2020-09-30 | 2021-01-12 | 江苏大学 | Mobile 2D image-oriented 3D river crab real-time detection method |
CN112651316A (en) * | 2020-12-18 | 2021-04-13 | 上海交通大学 | Two-dimensional and three-dimensional multi-person attitude estimation system and method |
CN114612614A (en) * | 2022-03-09 | 2022-06-10 | 北京大甜绵白糖科技有限公司 | Human body model reconstruction method and device, computer equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
3DMM与GAN结合的实时人脸表情迁移方法;高翔等;《计算机应用与软件》;20200412(第04期);全文 * |
VIBE: Video Inference for Human Body Pose and Shape Estimation;Muhammed Kocabas 等;《arXiv:1912.05656 [cs.CV]》;20200615;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114783039A (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112887698B (en) | High-quality face voice driving method based on nerve radiation field | |
CN110827193B (en) | Panoramic video significance detection method based on multichannel characteristics | |
CN109376582A (en) | A kind of interactive human face cartoon method based on generation confrontation network | |
CN108537743A (en) | A kind of face-image Enhancement Method based on generation confrontation network | |
CN108596024A (en) | A kind of illustration generation method based on human face structure information | |
CN110796593A (en) | Image processing method, device, medium and electronic equipment based on artificial intelligence | |
CN115914505B (en) | Video generation method and system based on voice-driven digital human model | |
WO2020177214A1 (en) | Double-stream video generation method based on different feature spaces of text | |
CN110853119B (en) | Reference picture-based makeup transfer method with robustness | |
CN117496072B (en) | Three-dimensional digital person generation and interaction method and system | |
CN110363770A (en) | A kind of training method and device of the infrared semantic segmentation model of margin guide formula | |
US20240119671A1 (en) | Systems and methods for face asset creation and models from one or more images | |
CN114783039B (en) | Motion migration method driven by 3D human body model | |
CN115984485A (en) | High-fidelity three-dimensional face model generation method based on natural text description | |
CN113076918B (en) | Video-based facial expression cloning method | |
Bi et al. | NERF-AD: Neural Radiance Field With Attention-Based Disentanglement For Talking Face Synthesis | |
CN113947520A (en) | Method for realizing face makeup conversion based on generation of confrontation network | |
CN114399829A (en) | Posture migration method based on generative countermeasure network, electronic device and medium | |
CN116704084B (en) | Training method of facial animation generation network, facial animation generation method and device | |
US20240078773A1 (en) | Electronic device generating 3d model of human and its operation method | |
CN117333604A (en) | Character face replay method based on semantic perception nerve radiation field | |
Kang et al. | Image-to-image translation method for game-character face generation | |
Cao et al. | Guided cascaded super-resolution network for face image | |
CN116825127A (en) | Voice-driven digital person generation method based on nerve field | |
CN116863069A (en) | Three-dimensional light field face content generation method, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |