CN107369174B

CN107369174B - Face image processing method and computing device

Info

Publication number: CN107369174B
Application number: CN201710616812.2A
Authority: CN
Inventors: 吕仰铭; 李志阳; 李启东; 张伟; 洪炜冬
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Yifu Technology Co ltd
Priority date: 2017-07-26
Filing date: 2017-07-26
Publication date: 2020-01-17
Anticipated expiration: 2037-07-26
Also published as: CN107369174A

Abstract

The invention discloses a processing method of a face image and a computing device for executing the method, comprising the following steps: when a face in a video is detected, recording an image frame where the face is located as an initial image frame, and calculating a projection matrix of the face in the initial image frame according to a first calculation mode; calculating a three-dimensional face model, a three-dimensional expression model and a three-dimensional face model corresponding to the initial image frame; calculating a projection matrix of a current image frame according to a first calculation mode; calculating a three-dimensional expression model of the current image frame; comparing the projection matrix and the three-dimensional expression model of the current image frame with the projection matrix and the three-dimensional expression model of the previous image frame; if the comparison result meets a preset condition, calculating a three-dimensional face model of the current image frame, and if the comparison result does not meet the preset condition, keeping the three-dimensional face model unchanged; calculating a three-dimensional face model of the current image frame; calculating texture coordinates of the current image frame; and rendering the preset mask onto the current image frame according to the texture coordinates.

Description

Face image processing method and computing device

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a computing device for processing a face image.

Background

With the development of mobile communication and internet technologies, users no longer only need to take pictures and share pictures through mobile devices (such as mobile phones, cameras, tablets and the like), and in daily social contact, users often want to be able to perform interesting processing on some pictures so as to increase the interest of taking pictures. For example, some processing is performed on an image containing a human face (simply referred to as "face image"), such as adding special makeup such as glasses, moustache, rabbit ears, and the like to the human face. Generally, corresponding makeup is pasted to corresponding positions by positioning face characteristic points, the implementation is simple, and factors such as the face shapes and the expressions of different faces do not need to be considered.

Another approach is to attach a mask to the face, for example, to generate a suitable mask in real time for the face on a video or a photograph. The mask can be considered as a new face, the masks corresponding to different human faces are different, for example, the mask corresponding to the Chinese face is relatively square, the mask corresponding to the round face is relatively round, the mask corresponding to the melon seed face is relatively sharp, and the like. Even the same person in a video, his/her corresponding mask should change with the change of expression when he/she makes different expressions, such as the eyes of the corresponding mask are closed when closing the eyes, the mouth of the corresponding mask is opened when opening the mouth, and so on. Based on these requirements, the solution of locating the facial feature points only results in the mask that is generated cannot be adapted to the facial shapes of different faces, and also has the defect of being unable to track the facial expressions.

Therefore, there is a need for a facial mask generation method that can take into account the face shapes, postures and expressions of different faces, and can process the facial images in the video in real time.

Disclosure of Invention

To this end, the present invention provides a method of processing a face image and a computing device in an attempt to solve or at least alleviate at least one of the problems identified above.

According to one aspect of the present invention, there is provided a method for processing a face image, the method being used for generating a mask for the face image in a video in real time, and the method comprising the steps of: when a face in a video is detected, recording an image frame where the face is located as an initial image frame, and calculating a projection matrix of the face in the initial image frame according to a first calculation mode; calculating a three-dimensional face model, a three-dimensional expression model and a three-dimensional face model corresponding to the initial image frame according to the projection matrix of the initial image frame; calculating a projection matrix of a current image frame according to a first calculation mode for each subsequent image frame in the video; calculating a three-dimensional expression model of the current image frame according to the projection matrix of the current image frame; comparing the projection matrix and the three-dimensional expression model of the current image frame with the projection matrix and the three-dimensional expression model of the previous image frame respectively; if the comparison result meets a preset condition, calculating a three-dimensional face model of the current image frame, and if the comparison result does not meet the preset condition, taking the three-dimensional face model of the previous image frame as the three-dimensional face model of the current image frame; calculating a three-dimensional face model of the current image frame according to the three-dimensional face model and the three-dimensional expression model of the current image frame; calculating texture coordinates of the current image frame according to the projection matrix of the current image frame and the three-dimensional face model; and rendering the preset mask on the current image frame according to the texture coordinates.

Optionally, in the face image processing method according to the present invention, the method further includes: and establishing a face space substrate according to the pre-collected three-dimensional face data, wherein the face space substrate comprises a three-dimensional average face, a face form substrate forming a three-dimensional face form model and an expression substrate forming a three-dimensional expression model.

Optionally, in the face image processing method according to the present invention, the method further includes a step of generating a preset mask: and generating a preset mask according to the UV expansion diagram of the three-dimensional average face.

Optionally, in the face image processing method according to the present invention, the step of calculating a projection matrix of the face in the image frame in the first calculation manner includes: extracting face characteristic points in the image frame; and fitting the extracted human face characteristic points to obtain a projection matrix of the human face in the image frame.

Alternatively, in the face image processing method according to the present invention, the step of calculating a three-dimensional face model or a three-dimensional expression model or a three-dimensional face model of an image frame from a projection matrix of the image frame includes: and calculating a three-dimensional face model or a three-dimensional expression model or a three-dimensional face model of the image frame according to the projection matrix and the face characteristic points of the image frame by using a least square method.

Alternatively, in the face image processing method according to the present invention, the predetermined condition includes: the rotation parameter in the projection matrix of the current image frame is closer to the rotation parameter of the three-dimensional average face than the rotation parameter in the projection matrix of the last image frame; and the expression base coefficient in the three-dimensional expression model of the current image frame is closer to the expression base coefficient in the three-dimensional average face than the expression base coefficient in the three-dimensional expression model of the last image frame.

Alternatively, in the face image processing method according to the present invention, the step of calculating a three-dimensional face model of the current image frame when the comparison result satisfies a predetermined condition includes: and when the comparison result meets a preset condition, calculating the three-dimensional face model of the current image frame according to the projection matrix of the current image frame.

Alternatively, in the face image processing method according to the present invention, the step of calculating the three-dimensional face model of the current image frame from the three-dimensional face model and the three-dimensional expression model of the current image frame includes: and subtracting the three-dimensional average face on the basis of the three-dimensional face model and the three-dimensional expression model of the current image frame to obtain the three-dimensional face model of the current image frame.

Optionally, in the face image processing method according to the present invention, the three-dimensional face model M of the tth image frame_tComprises the following steps:

M_t＝E_t+F_t-meanEF

wherein E is_tThree-dimensional expression model representing the t-th image frame, F_tA three-dimensional face model representing the t-th image frame, and meanEF representing a three-dimensional average face.

Optionally, in the face image processing method according to the present invention, the step of calculating texture coordinates of the current image frame according to the projection matrix of the current image frame and the three-dimensional face model includes: and multiplying the projection matrix of the current image frame with the three-dimensional face model of the current image frame to obtain texture coordinates of the current image frame.

Optionally, in the face image processing method according to the present invention, before the step of calculating texture coordinates of the current image frame according to the projection matrix of the current image frame and the three-dimensional face model, the method further includes the steps of: and respectively smoothing the projection matrix of the current image frame and the three-dimensional face model by using the projection matrix and the three-dimensional face model of the previous image frame, and taking the projection matrix and the three-dimensional face model after smoothing as the projection matrix and the three-dimensional face model of the current image frame.

Optionally, in the face image processing method according to the present invention, the projection matrix MVP after the t-th image frame smoothing process_t' and three-dimensional face model M_t' are respectively:

wherein, MVP_tRepresenting the projection matrix, MVP, before the smoothing of the t-th image frame_t-1Projection matrix, M, representing the t-1 th image frame_tRepresenting the three-dimensional face model before the smoothing of the t-th image frame, M_t-1And representing the three-dimensional face model of the t-1 th image frame.

According to another aspect of the present invention, there is provided a computing device comprising: one or more processors; and a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods described above.

According to a further aspect of the invention there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods described above.

The scheme realizes real-time mask pasting of videos based on a three-dimensional face reconstruction technology, and overcomes the defects that the mask cannot adapt to different face shapes and different postures and cannot track expressions due to the fact that the mask is positioned only according to face feature points.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a schematic diagram of a configuration of a computing device 100 according to one embodiment of the invention;

FIG. 2 shows a flow diagram of a method 200 of processing a face image according to one embodiment of the invention; and

fig. 3 shows a schematic view of a UV development.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (μ P), a microcontroller (μ C), a Digital Signal Processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. The program data 124 comprises instructions, and in the computing device 100 according to the invention the program data 124 comprises instructions for performing a face image based processing method.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer-readable media as used herein may include both storage media and communication media, such as computer-readable storage media that store one or more programs.

Computing device 100 may be implemented as part of a small-form factor portable (or mobile) electronic device such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, the computing device 100 is configured to perform the face image processing method 200.

As described above, in daily life, assuming a scene in which a user self-photographs a small video through the computing device 100 and various fun masks are generated in real time according to the user's face in the small video, the user wants to share such a fun small video to friends. A flowchart of a face image processing method 200 for achieving such an effect according to an embodiment of the present invention will be described in detail below with reference to fig. 2.

Fig. 2 shows a flow diagram of a method 200 of processing a face image according to an embodiment of the invention.

As shown in fig. 2, the method 200 starts in step S210, when a face in a video is detected, an image frame where the face is located is recorded as an initial image frame, and a projection matrix of the face in the initial image frame is calculated in a first calculation manner. According to one implementation manner, the user may open a corresponding application program or trigger a specific key in the camera application to start detecting the face in the video, which is not limited in this respect.

According to an embodiment of the present invention, the step of calculating the projection matrix of the face in the image frame according to the first calculation mode includes:

① extracting face feature points in an image frame according to an embodiment of the present invention, a face image is divided into two regions, namely, a face region including a face and a background region outside the face region.

② A projection matrix Of The face in The image frame is obtained by fitting The extracted feature points Of The face, according to The implementation mode Of The invention, a three-dimensional Model Of The face and The corresponding projection matrix are obtained by a three-dimensional deformation Model (3DMM), wherein, The 3DMM Model is a method mentioned in The paper Of A Mobile Model For The Synthesis Of 3DFaces published by Blanz and Vetter in 1999, The basic idea Of The method is that The face space is regarded as a linear space, and The projection Of The linear combination Of The three-dimensional face data established in advance is used For approximating The face on The two-dimensional picture.

In consideration of the fact that the face shape and the facial expression need to be tracked in real time in the implementation process of the invention, in the embodiment of the invention, when the face space base is constructed according to the pre-collected three-dimensional face data, the three-dimensional face model is regarded as being composed of a three-dimensional average face, a three-dimensional face model and a three-dimensional expression model, and the basic formula of the 3DMM is expressed as formula (1):

wherein M represents the three-dimensional result of the final fittingA face model, meanEF represents a three-dimensional average face,

representing a face base constituting a three-dimensional face model,

expressing the expression bases forming the three-dimensional expression model, n and m respectively represent the number of the face type bases and the expression bases, a_iAnd b_jAnd respectively representing the coefficients corresponding to the face form base and the expression base.

And the initial parameters of the projection matrix are estimated according to the characteristic points of the human face space base, and the initial parameters comprise the position of a camera, the rotation angle of an image plane, components of direct light and ambient light, the image contrast and the like. For a given specific face (namely extracted face characteristic points), fitting a three-dimensional face model of the face image according to repeated iteration of face space bases and initial parameters of a projection matrix, in other words, solving to obtain combined parameters by minimizing the distance between feature points of linear combination of the three-dimensional model and two-dimensional feature points according to the existing corresponding three-dimensional feature points of three-dimensional face model data with the same vertex number and topological structure, and further obtaining the fitted three-dimensional face model and the projection matrix according to the parameters. As described in the following formula (2):

err＝MVP*M-P (2)

in the formula, MVP represents a projection matrix, P represents a human face feature point, and the projection matrix of the initial image frame can be obtained by minimizing err by combining the formula (1). Optionally, in view of the real-time nature of the video processing, the err is minimized using a least squares method.

Subsequently, in step S220, a three-dimensional face model, a three-dimensional expression model and a three-dimensional face model corresponding to the initial image frame are calculated according to the projection matrix of the initial image frame. According to the embodiment of the invention, the three-dimensional expression model E and the three-dimensional face model F are expressed as follows:

therefore, calculating the three-dimensional expression model E and the three-dimensional face model F, i.e., calculating the corresponding coefficients a_iAnd b_jWhen the projection matrix is calculated by the least square method, the corresponding coefficient a is obtained as described above_iAnd b_jAnd (3) obtaining a three-dimensional face model M of the initial image frame, and similarly, calculating a three-dimensional expression model E and a three-dimensional face model F of the initial image frame according to the formulas (3) and (4).

In the subsequent step, each subsequent image frame in the video containing the face is processed correspondingly to generate a corresponding mask. In step S230, a projection matrix of the current image frame is calculated in a first calculation manner. According to the implementation of the present invention, the same calculation steps as those of the initial image frame are performed on the subsequent image frame (as described in step S210 above), and the projection matrix of the subsequent image frame is calculated.

Subsequently, in step S240, a three-dimensional expression model of the current image frame is calculated according to the projection matrix of the current image frame. Also, the method for calculating the three-dimensional expression model of the image frame according to the projection matrix of the image frame has been specifically described above, and will not be described herein again.

According to the embodiment of the invention, in practical application, the face shape of a video is kept unchanged, so that the face shape is generally fixed, but the expression changes continuously from frame to frame over time, so that a three-dimensional expression model E of each image frame needs to be calculated to track the change of the face expression in the video, but the three-dimensional face model F is selectively calculated according to needs. By the method, the calculation amount can be effectively reduced on the basis of ensuring the calculation accuracy.

Subsequently, in step S250, the projection matrix and the three-dimensional expression model of the current image frame are compared with the projection matrix and the three-dimensional expression model of the previous image frame, respectively. If the current image frame is the t-th frame image, the projection matrix MV of the current image frame is comparedP_tAnd the projection matrix MVP of the previous image frame_t-1And comparing the three-dimensional expression model E of the current image frame_tAnd the three-dimensional expression model E of the previous image frame_t-1Expression base and expression base coefficient.

Subsequently, in step S260, if the comparison result satisfies a predetermined condition, a three-dimensional face model of the current image frame is calculated, and if not, the three-dimensional face model of the previous image frame is used as the three-dimensional face model of the current image frame.

According to an embodiment of the present invention, the predetermined condition includes ① a projection matrix MVP of a current image frame_tMiddle rotation parameter is compared with the projection matrix MVP of the last image frame_t-1The middle rotation parameter is closer to the rotation parameter of the three-dimensional average face, in other words, MVP_tRotational parameter ratio MVP of_t-1Is closer to the standard frontal face, i.e. the pose of the face of the current image frame is "corrected" than the pose of the face of the previous image frame, ② the three-dimensional expression model E of the current image frame_tThree-dimensional expression model E with middle expression base coefficient ratio to last image frame_t-1The coefficient of the mesoexpression base is closer to the mesoexpression base coefficient in the three-dimensional average face, i.e., E_tIn

The part is closer to 0 because

The larger the expression, the more exaggerated the expression, so when this portion is close to 0, the expression is judged to be "more neutral".

In summary, when it is determined that the face pose of the current image frame is "corrected" and the facial expression is "more neutral" than that of the previous image frame, the three-dimensional face model F of the current image frame is recalculated when it is determined that the predetermined condition is satisfied_tOptionally, a three-dimensional face model F of the current image frame is calculated according to the projection matrix of the current image frame (obtained in step S230)_tThe calculation method has been specifically described above, and will not be described herein again. Otherwise, the three-dimensional face model of the current image frame is not calculated, and the three-dimensional face of the previous image frame is keptType model, i.e. F_t＝F_t-1。

By the calculation mode of frame-by-frame comparison, the three-dimensional face model of the face in the video can be continuously optimized to keep the model consistent with the face of the user as much as possible.

Subsequently, in step S270, a three-dimensional face model F according to the current image frame_tAnd a three-dimensional expression model E_tThen three-dimensional human face model M of current image frame is calculated_t. According to one embodiment of the invention, the three-dimensional face model F can be directly on the current image frame_tAnd a three-dimensional expression model E_tOn the basis of the image frame, subtracting the three-dimensional average face meanEF to obtain a three-dimensional face model M of the current image frame_tAs shown in formula (5):

M_t＝E_t+F_t-meanEF (5)

subsequently, in step S280, texture coordinates of the current image frame are calculated based on the projection matrix of the current image frame and the three-dimensional face model. According to the embodiment of the invention, the projection matrix of the current image frame is multiplied by the three-dimensional face model of the current image frame to obtain the texture coordinates of the current image frame.

According to another embodiment of the present invention, after step S270 and before step S280, a step of smoothing the projection matrix and the three-dimensional face model is further included. Specifically, the projection matrix and the three-dimensional face model of the current image frame are respectively smoothed by the projection matrix and the three-dimensional face model of the previous image frame, the smoothed projection matrix and the smoothed three-dimensional face model are used as the projection matrix and the three-dimensional face model of the current image frame, and then texture coordinates of the current image frame are calculated.

The specific calculation method of the smoothing process can be expressed by the following expressions (6) and (7):

projection matrix MVP after t image frame smoothing_t' and three-dimensional face model M_t' is:

in the formula, MVP_tRepresenting the projection matrix, MVP, before the smoothing of the t-th image frame_t-1Projection matrix, M, representing the t-1 th image frame_tRepresenting the three-dimensional face model before the smoothing of the t-th image frame, M_t-1And representing the three-dimensional face model of the t-1 th image frame.

Subsequently, in step S290, a preset mask is rendered onto the current image frame according to the texture coordinates. After the steps S230 to S290 are completed for a plurality of consecutive image frames in the video, the preset mask is rendered in 3D to the video.

According to an embodiment of the invention, the preset mask is generated from a UV developed image of a three-dimensional average face. As a common way of 3D rendering, the UV expansion map is not described in detail in the present embodiment, and as shown in fig. 3, an example of the UV expansion map is shown.

It will be appreciated by those skilled in the art that, based on the present solution, further processing of the predetermined mask, such as wearing sunglasses on the predetermined mask, drawing small red faces, and so on, to increase the interest of the mask, may be implemented, and the embodiments of the present invention are not limited in this respect.

According to still other embodiments of the present invention, if a shot cut occurs in a video segment, it may be continuously detected whether a face appears in the image content after the shot cut occurs, and when the face is detected again, a face may have been changed, so that the image frame in which the face is detected again is used as a new initial image frame, and the method 200 is repeated to perform real-time mask pasting. The invention is not limited in this respect.

The scheme realizes real-time mask pasting of videos based on a three-dimensional face reconstruction technology, and overcomes the defects that the mask cannot adapt to different face shapes and different postures and cannot track expressions due to the fact that the mask is positioned only according to face feature points. Meanwhile, the method effectively reduces the calculated amount by solving the equation by using the least square method, selectively calculating the three-dimensional face model of the image frame and the like, so as to achieve the effect of real-time processing, avoid the feeling of blocking and pause for the user and improve the user experience.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the method of the invention according to instructions in said program code stored in the memory.

By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer-readable media includes both computer storage media and communication media. Computer storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of computer readable media.

It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

The invention also discloses:

a9, the method as A8, wherein the three-dimensional face model M of the t image frame_tComprises the following steps:

M_t＝E_t+F_t-meanEF

wherein E is_tThree-dimensional expression model representing the t-th image frame, F_tA three-dimensional face model representing the t-th image frame, and meanEF representing a three-dimensional averageA face.

A10, the method according to any one of a1-9, wherein the step of calculating texture coordinates of the current image frame based on the projection matrix of the current image frame and the three-dimensional face model comprises: and multiplying the projection matrix of the current image frame with the three-dimensional face model of the current image frame to obtain texture coordinates of the current image frame.

A11, the method according to any one of a1-10, wherein before the step of calculating texture coordinates of the current image frame based on the projection matrix of the current image frame and the three-dimensional face model, the method further comprises the steps of: and respectively smoothing the projection matrix of the current image frame and the three-dimensional face model by using the projection matrix and the three-dimensional face model of the previous image frame, and taking the projection matrix and the three-dimensional face model after smoothing as the projection matrix and the three-dimensional face model of the current image frame.

A12, the method as A11, wherein the projection matrix MVP after the t image frame smoothing process_t' is:

wherein, MVP_tRepresenting the projection matrix, MVP, before the smoothing of the t-th image frame_t-1A projection matrix representing the t-1 th image frame.

A13 the method as A11, wherein the three-dimensional human face model M after the t image frame smoothing processing_t' is:

wherein M is_tRepresenting the three-dimensional face model before the smoothing of the t-th image frame, M_t-1And representing the three-dimensional face model of the t-1 th image frame.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A method for processing a face image, the method being used for generating a mask for a face image in a video in real time, the method comprising the steps of:

when a face in a video is detected, recording an image frame where the face is located as an initial image frame, and calculating a projection matrix of the face in the initial image frame according to a first calculation mode, wherein the projection matrix comprises: extracting face characteristic points in the initial image frame; fitting the extracted human face characteristic points to obtain a projection matrix of the human face in the image frame;

calculating a three-dimensional face model, a three-dimensional expression model and a three-dimensional face model corresponding to the initial image frame according to the projection matrix and the face characteristic points of the initial image frame by using a least square method;

for each subsequent image frame in the video,

calculating a projection matrix of a current image frame according to a first calculation mode;

calculating a three-dimensional expression model of the current image frame according to the projection matrix of the current image frame;

comparing the projection matrix and the three-dimensional expression model of the current image frame with the projection matrix and the three-dimensional expression model of the previous image frame respectively;

if the comparison result meets a preset condition, calculating a three-dimensional face model of the current image frame, and if the comparison result does not meet the preset condition, taking the three-dimensional face model of the previous image frame as the three-dimensional face model of the current image frame;

calculating a three-dimensional face model of the current image frame according to the three-dimensional face model and the three-dimensional expression model of the current image frame;

calculating texture coordinates of the current image frame according to the projection matrix of the current image frame and the three-dimensional face model; and

and rendering a preset mask to the current image frame according to the texture coordinates.

2. The method of claim 1, further comprising the steps of:

and establishing a face space substrate according to the pre-collected three-dimensional face data, wherein the face space substrate comprises a three-dimensional average face, a face form substrate forming a three-dimensional face form model and an expression substrate forming a three-dimensional expression model.

3. The method of claim 2, further comprising the step of generating a preset mask:

and generating a preset mask according to the UV expansion diagram of the three-dimensional average face.

4. The method of claim 3, wherein the step of calculating the projection matrix of the face in the image frame in the first calculation mode comprises:

extracting face characteristic points in the image frame; and

and fitting the extracted human face characteristic points to obtain a projection matrix of the human face in the image frame.

5. The method of claim 4, wherein the step of calculating a three-dimensional face model or a three-dimensional expression model or a three-dimensional face model of the image frame from the projection matrix of the image frame comprises:

and calculating a three-dimensional face model or a three-dimensional expression model or a three-dimensional face model of the image frame according to the projection matrix and the face characteristic points of the image frame by using a least square method.

6. The method of claim 5, wherein the predetermined condition comprises:

the rotation parameter in the projection matrix of the current image frame is closer to the rotation parameter of the three-dimensional average face than the rotation parameter in the projection matrix of the last image frame; and is

The expression base coefficient in the three-dimensional expression model of the current image frame is closer to the expression base coefficient in the three-dimensional average face than the expression base coefficient in the three-dimensional expression model of the last image frame.

7. The method as claimed in claim 6, wherein the step of calculating the three-dimensional face model of the current image frame when the comparison result satisfies a predetermined condition comprises:

and when the comparison result meets a preset condition, calculating the three-dimensional face model of the current image frame according to the projection matrix of the current image frame.

8. The method as claimed in claim 7, wherein the step of calculating the three-dimensional face model of the current image frame based on the three-dimensional face model and the three-dimensional expression model of the current image frame comprises:

and subtracting the three-dimensional average face on the basis of the three-dimensional face model and the three-dimensional expression model of the current image frame to obtain the three-dimensional face model of the current image frame.

9. The method as claimed in claim 8, wherein the three-dimensional face model M of the tth image frame_tComprises the following steps:

M_t＝E_t+F_t-meanEF

10. The method of claim 9, wherein the calculating texture coordinates of the current image frame from the projection matrix of the current image frame and the three-dimensional face model comprises:

and multiplying the projection matrix of the current image frame with the three-dimensional face model of the current image frame to obtain texture coordinates of the current image frame.

11. The method according to any one of claims 1-10, wherein before the step of calculating texture coordinates of the current image frame from the projection matrix of the current image frame and the three-dimensional face model, further comprising the steps of:

and respectively smoothing the projection matrix of the current image frame and the three-dimensional face model by using the projection matrix and the three-dimensional face model of the previous image frame, and taking the projection matrix and the three-dimensional face model after smoothing as the projection matrix and the three-dimensional face model of the current image frame.

12. The method as claimed in claim 11, wherein the projection matrix MVP after the t image frame smoothing process_t' is:

13. The method as claimed in claim 11, wherein the three-dimensional face model M after the t image frame smoothing process_t' is:

14. A computing device, comprising:

one or more processors; and

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-13.

15. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-13.