CN112733667B

CN112733667B - Face alignment method and device based on face recognition

Info

Publication number: CN112733667B
Application number: CN202011626746.5A
Authority: CN
Inventors: 魏舒; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-06-28
Anticipated expiration: 2040-12-30
Also published as: CN112733667A

Abstract

The application discloses a face alignment method and device based on face recognition, which relate to the technical field of artificial intelligence and can improve face alignment accuracy. The method comprises the following steps: according to a first face ROI (region of interest) in an acquired video frame and preset face standard key points, determining an initial transformation array which corresponds to the first face ROI and is used for realizing face key point alignment; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; and performing face key point alignment processing on the re-extracted second face ROI region in the video frame by using the target transformation array to obtain a target face ROI region with the face key points aligned. The application is suitable for various application systems based on face alignment. In addition, the application also relates to a blockchain technology, and the video frame and the target face ROI area can be stored in the blockchain to ensure data privacy and safety.

Description

Face alignment method and device based on face recognition

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a face alignment method and device based on face recognition.

Background

Face alignment is a hot research problem of computer vision, and the accuracy of face feature extraction and recognition is directly affected by the effect of face alignment. Face alignment is a key factor for video streaming stability.

In the existing solution for face alignment of video scenes, a neural network method is used for key point detection and alignment (e.g. MTCNN), a large amount of labeling cost is required, hardware requirements such as a Graphic Processing Unit (GPU) are required, and training cost is high. In addition, some tracking methods are also commonly used, for example, on the basis of key point detection, the detection stability between continuous video frames in a video stream is improved, so that the accuracy of face alignment is ensured. Therefore, the existing solution has the defects that the method for detecting the key points of the human face MTCNN based on the neural network model has high labeling cost, GPU (graphics processing unit) and other hardware requirements and high training cost; the method for realizing the alignment of the key points of the human face by tracking and detecting the stability between the continuous video frames has the advantages of slower processing speed and influence on the real-time performance of the system.

Disclosure of Invention

In view of this, the application provides a face alignment method and device based on face recognition, and mainly aims to solve the technical problems of the existing method for detecting MTCNN face key points based on a neural network model, such as high labeling cost, high hardware requirements of a Graphic Processing Unit (GPU), high training cost, and low processing speed and poor real-time processing effect of a system by tracking and detecting stability between continuous video frames.

According to an aspect of the present application, there is provided a face alignment method based on face recognition, the method comprising:

According to a first face ROI (region of interest) in an acquired video frame and preset face standard key points, determining an initial transformation array which corresponds to the first face ROI and is used for realizing face key point alignment, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data;

Performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array;

Performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned;

The second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.

According to another aspect of the present application, there is provided a face alignment apparatus, comprising:

The initial transformation array module is used for determining an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points according to the acquired first face ROI area in the video frame and the preset face standard key points, and the initial transformation array comprises initial scale transformation data and initial translation transformation data;

The target transformation array module is used for carrying out jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array;

the alignment module is used for carrying out face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned;

According to still another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the face alignment method based on face recognition as described above.

According to still another aspect of the present application, there is provided a computer device including a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, the processor implementing the face alignment method based on face recognition as described above when executing the program.

By means of the technical scheme, compared with a face key point detection MTCNN method based on a neural network model and a method for realizing face key point alignment through tracking and detecting stability between continuous video frames, the face alignment method and device based on face recognition provided by the application determine an initial transformation array which corresponds to the first face ROI region and is used for realizing face key point alignment according to the acquired first face ROI region in the video frames and preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned; the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area. Therefore, according to the initial transformation array determined by the first face ROI in the video frame, after the self-adaptive jitter removal and smoothing treatment, the face alignment of the face ROI is realized, the technical problem that the accuracy of the face alignment is lower under the condition that jitter or jump exists between the continuous frames of the face can be solved, and meanwhile, the realization mode of the face alignment is simplified on the basis of guaranteeing the face alignment of the video frame. The method avoids a great deal of time-consuming early-stage preparation work such as the existing manual labeling and model training, and the like, and simultaneously adaptively realizes debouncing and smoothing processing while leading to higher investment of early-stage cost, thereby greatly improving the generalization capability of face alignment operation, greatly improving the processing speed of video data, effectively reducing time consumption and the like compared with the existing face key point alignment method by tracking and detecting the stability between continuous video frames, and meeting the real-time requirements of various application systems based on face alignment.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 shows a flow chart of a face alignment method based on face recognition according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another face alignment method based on face recognition according to an embodiment of the present application;

Fig. 3 shows a schematic structural diagram of a face alignment device according to an embodiment of the present application.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

Aiming at the technical problems that the existing method for detecting the key points of the face based on the neural network model has high labeling cost, GPU (graphics processing unit) and other hardware requirements, high training cost, and the method for realizing the alignment of the key points of the face by tracking and detecting the stability between continuous video frames has low processing speed and poor real-time processing effect of the system. The embodiment provides a face alignment method based on face recognition, which can effectively improve the accuracy of face alignment and simplify the implementation method of face alignment under the condition that jitter or jump exists between continuous frames of the face, as shown in fig. 1, and the method comprises the following steps:

101. according to a first face ROI region in an acquired video frame and a preset face standard key point, determining an initial transformation array which corresponds to the first face ROI region and is used for realizing face key point alignment, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data.

In this embodiment, a remote recording controller is used to obtain a video, frame division processing is performed on the obtained video to obtain a continuous video frame, and face key points in a first face ROI area in the continuous video frame are respectively subjected to face key point alignment preprocessing to obtain an initial transformation array for realizing face key point alignment, where the face key point alignment preprocessing refers to comparing the face key points in the first face ROI area with the preset face standard key points, and calculating an initial transformation array for realizing face key point alignment, but only determining the initial transformation array without performing face key point alignment operation, so that the initial transformation array is subjected to jitter removal and smoothing processing in order of video frames to improve accuracy of face alignment.

According to the requirements of practical application scenes, video can be acquired through the camera equipment, the stability of video acquisition is guaranteed through the tripod, the identification of the region of the face ROI is improved through the aperture light filling mode, the video stream acquired by the camera equipment is input into the remote recording controller, so that the accuracy of face alignment is improved while the remote recording controller carries out jitter processing according to the acquired video stream, and the video acquisition is not limited specifically.

102. And performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array.

In this embodiment, the initial transform array is subjected to a debouncing process, specifically, a first debouncing array is obtained by performing a first debouncing process on the initial transform array, the first debouncing process is used for eliminating abnormal points in a face key point detection result of a face ROI area, and a second debouncing array is obtained by performing a second debouncing process on the first debouncing array, wherein the second debouncing process is used for performing a large-amplitude debouncing process in the face ROI area into a small-amplitude debouncing process; and carrying out smoothing treatment on the initial transformation array after the debouncing treatment, specifically, carrying out smoothing treatment on small-amplitude jitter existing in the second debouncing array, so that the obtained target transformation array can be more similar to real transformation, and the purpose of anti-debouncing treatment in the face alignment process is achieved.

103. Performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned; the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.

In this embodiment, the face ROI area is extracted again from the acquired video frame, so that the alignment processing of the face key points is performed on the re-extracted second face ROI area by using the target transformation array, and the target face ROI area is obtained. Unlike the first face ROI detected in step 101, the re-extracted second face ROI includes a background region other than a face, and may further include hair, hair accessories, caps, and the like, and the face key point alignment processing is performed based on the re-extracted second face ROI, so that accuracy of face key point alignment can be effectively improved compared with face key point alignment processing of the first face ROI including only a face.

According to the requirements of practical application scenes, a target face ROI region or a video frame after face key point alignment treatment can be processed and applied, for example, the target face ROI region can be input into a face recognition network model obtained through training, a face recognition result in the video frame can be obtained through feature extraction in the target face ROI region, virtual anchor generation can be realized through feature extraction in the target face ROI region, and the application of the target face ROI region after face alignment is not specifically limited.

According to the scheme, an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points is determined according to the acquired first face ROI area in the video frame and the preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; the target transformation array is utilized to perform face key point alignment processing on the extracted second face ROI region in the video frame to obtain a target face ROI region with the face key points aligned, compared with the existing method for detecting MTCNN the face key points based on the neural network model and the method for realizing the face key point alignment through tracking and detecting the stability between continuous video frames, the face alignment of the face ROI region can be realized after self-adaptive debouncing and smoothing processing according to the initial transformation array determined by the first face ROI region in the video frame, so that the technical problem that the face alignment accuracy is low under the condition that jitter or jump exists between the continuous frames of the face is solved, and meanwhile, the implementation mode of the face alignment is simplified on the basis of guaranteeing the face alignment of the video frames.

Further, as a refinement and extension of the foregoing embodiment, in order to fully describe a specific implementation procedure of the embodiment, another face alignment method based on face recognition is provided, as shown in fig. 2, where the method includes:

201. And carrying out framing treatment on the acquired video to obtain continuous video frames.

202. Each video frame is processed as follows: performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest); performing key point detection on the first face ROI by using a second preset depth model to obtain target key points; and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the face key points by utilizing the target key points and preset face standard key points.

In this embodiment, a remote recording controller is used to acquire a video, and the acquired video is subjected to framing processing to obtain continuous video frames, and the following processing is performed for each video frame in the continuous video frames: specifically, a first preset depth model is utilized to detect a face region in a video frame, and a first processing result, namely a first face ROI region to be subjected to face key point alignment, is obtained; detecting key points of the human face in the human face ROI area in the first processing result by using a second preset depth model to obtain coordinates of 68 key points; and obtaining scale conversion data S and translation conversion data T for aligning the key points of the face from the key points to the standard key points in the 3D standard face template for the coordinates of 68 key points in the second processing result and the 3D standard face template, and storing the scale conversion data S and the translation conversion data T of a plurality of continuous video frames into an initial conversion array set_S and set_T.

According to the requirements of the actual application scene, the first depth residual error network ResNet in the computer vision library Dlib can be used for face region detection, and one or more detected face regions are recorded to obtain a first face ROI region; using a second depth residual error network ResNet in a computer vision library Dlib to detect key points of the human face in the human face ROI area, and recording coordinates (x, y) of 68 key points; according to the coordinates (x, y) of 68 key points and the 3D standard face template, 1 scale transformation data S and 2 translation transformation data T for aligning the key points of the face from the key points to the standard key points in the 3D standard face template are calculated and stored in transformation arrays set_S and set_T.

The first depth residual network ResNet and the second depth residual network ResNet are different, that is, the first depth residual network ResNet is obtained by training based on the training sample of the face ROI area, and the second depth residual network ResNet is obtained by training based on the training sample of the face keypoint; the method for acquiring the 3D standard face templates comprises the steps of obtaining a 3D standard face template by calculating the average value of each dimension characteristic in a plurality of 3D faces according to the plurality of 3D faces, comparing face key points according to the application, and not limiting the dimension characteristics of the 3D faces according to the requirements of actual application scenes.

To illustrate the specific implementation manner of step 202, as a preferred embodiment, the determining, by using the target keypoints and preset face standard keypoints, initial scale transformation data and initial translation transformation data for implementing face keypoint alignment specifically includes:

step 2021, calculating the corresponding relation between the target key point and the preset face standard key point by using a least square method.

Step 2022, obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.

In this embodiment, the initial transformation array is a two-dimensional sequence, one-dimensional is used to represent the number of video frames, and two-dimensional is used to represent the scale transformation data and translation transformation data of each video frame for aligning the key points of the face, so that the obtained initial scale transformation data and initial translation transformation data from the target key points to the preset key points of the face standard are respectively subjected to jitter removal and smoothing processing according to the sequence of the video frames, and the target transformation array which makes the scale transformation data tend to be stable is obtained.

203. And performing a first jitter removal process on the initial transformation array to obtain a first jitter removal array.

204. And performing a second jitter removal process on the first jitter removal array to obtain a second jitter removal array.

205. And performing smoothing filtering processing on the second debounce array to obtain a target transformation array.

To illustrate the specific implementation of steps 203 and 204, as a preferred embodiment, the first jitter removal process is a clipping filter process, and the second jitter removal process is a recursive average filter process.

In the practical application scene, the prior art focuses more on how to improve the accuracy of face key point detection so as to solve the problem of jitter or jump between continuous frames of a face, and the embodiment improves the accuracy of face key point detection while achieving face alignment anti-jitter in a video through the de-jitter and smooth correction operation of an initial conversion array for face alignment. On the other hand, although the face key point detection method in the prior art is accurate and has small errors, the face alignment result obtained finally still has large errors after the face alignment process is performed on a plurality of face key points and the transformation data of a plurality of video frames, namely the superposition operation of a plurality of small error results. Therefore, on the basis of more accurate detection of key points of the human face, the method carries out debouncing and smoothing processing on the initial transformation array which needs to be subjected to human face alignment, thereby greatly reducing the jitter of human face alignment in video and enabling the effect after human face alignment to reach a more ideal state.

Because the human face in the video is continuous, the key point detection and transformation data of the human face are also continuously and smoothly distributed in theory, but the key point detection and transformation data are interfered by various factors such as the background of the human face, illumination and the like in the actual application scene, so that the phenomenon of unsmooth distribution is extremely easy to cause. Therefore, based on this, the present embodiment performs clipping filtering on the initial transform arrays set_s, set_t, and performs recursive average filtering on the transform arrays set_s, set_t after the clipping filtering processing, to obtain the debounce processing results of the initial transform arrays set_s, set_t. The method comprises the steps of carrying out continuous filtering processing on an initial conversion array twice to realize the debouncing operation on the initial conversion array, specifically, eliminating abnormal points in key point detection results based on amplitude limiting filtering operation, and processing large-amplitude jitter in continuous video frames (human face ROI areas) into small-amplitude jitter based on recursive average filtering operation so as to realize near-real conversion processing by carrying out smoothing processing on initial conversion data processed into the small-amplitude jitter and achieve the purpose of anti-jitter processing in the human face alignment process.

The method comprises the steps of respectively carrying out limiting filtering treatment on an initial conversion array set_S and set_T, specifically, presetting a maximum deviation value (set as A) allowed by twice sampling, judging a detected current sampling value, and determining that the current value is effective if the difference value between the current value and a previous value is smaller than or equal to the maximum deviation value A; if the difference between the current value and the previous value is larger than A, the current value is invalid, the current value is abandoned, and the previous value replaces the current value, so that pulse interference caused by accidental factors is overcome, and the effect of eliminating the abnormal points in the key point detection result is achieved.

Further, recursive average filtering is performed on the initial transform arrays set_s and set_t after clipping filtering, specifically, N sampling values obtained continuously are used as a queue, the length of the queue is fixed to N, a group of transform data (scale transform data and translation transform data of the next video frame) obtained each time is put into the tail of the queue based on a first-in first-out principle, and new filtering results are obtained by performing arithmetic average operation on the N sampling values in the queue. Therefore, the initial transformation arrays set_S and set_T after the clipping filtering processing are subjected to recursive average filtering processing, so that the effect of well suppressing the periodic interference can be achieved, the smoothness is higher, the defect of poor periodic interference processing in the clipping filtering processing is overcome, and the initial transformation arrays set_S and set_T supplement each other, so that the best effect of removing the jitter of the video frame is achieved.

To illustrate a specific implementation of step 205, as a preferred embodiment, step 205 may specifically include: and carrying out smooth filtering treatment on the second debounce array by utilizing the self-adaptive set window size to obtain a target transformation array, wherein the self-adaptive set window size is determined according to the length of the video.

In this embodiment, smoothing filtering is performed on the debounce operation result to obtain a smoothed transform array set_s, set_t, specifically, filling is performed on the debounce operation result (the second debounce array) to obtain a one-dimensional array, and after updating the pixel value at the center of the window in the window with the adaptive size, the spatial region of the filling is cleared to obtain the target transform array. The size of the window is determined according to the acquired information such as the video length, so as to achieve the purpose of window size self-adaption.

Setting an adaptive data value according to the requirement of an actual application scene, calculating the current window size according to the acquired video length, for example, win_size=len (input video)/3 (adaptive data value), and filling padding processing is carried out on an initial conversion array set_S and set_T after the debouncing processing according to the adaptively set window size to obtain a one-dimensional sequence set_S_padding and set_T_padding; respectively moving the one-dimensional sequence set_S_padding, set_T_padding from the beginning according to the window size, and calculating the average value of all pixel values in each window so as to update the pixel value of the central position of the window in the one-dimensional sequence; and (3) clearing the filling part to obtain a jitter-removed and smoothed target transformation array set_S and set_T, namely a target transformation array in which the scale transformation data S tends to be stable.

206. And performing face key point scaling processing on the extracted second face ROI region in the video frame by utilizing target scale transformation data in the target transformation array to obtain a scaled third face ROI region.

207. And carrying out face key point translation processing on the third face ROI by utilizing the target translation conversion data in the target conversion array to obtain a translated target face ROI.

In this embodiment, the second face ROI area in the video frame is re-extracted, and the scale transformation and translation transformation are sequentially performed on the second face ROI area by using the target transformation arrays set_s and set_t after the debouncing and smoothing processing, so as to obtain the target face ROI area after the face alignment. The scale transformation can also comprise rotation processing according to the requirements of actual application scenes, and the scale transformation is not particularly limited here; the specific method for re-extracting the second face ROI area in the video frame includes extracting the second face ROI area centered on the center point by obtaining the center point of the face key point, or performing a certain area expansion according to the first face ROI area detected in step 202, so as to obtain the second face ROI area for face alignment, where the extraction mode of the second face ROI area is not specifically limited.

It should be noted that, in this embodiment, the video frame and the target face ROI area may be stored in a blockchain, where the blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Therefore, the face alignment method based on face recognition in the video scene based on artificial intelligence in the embodiment calculates initial transformation data from the face key points to the standard face key points according to the detection result of the face key points in each frame of image, and performs debouncing and smoothing processing on the initial transformation data to realize face alignment operation on each frame of image, ensure continuity between video frames on the basis of face alignment of each frame of image, and avoid face jitter or jump in continuous video frames.

By applying the technical scheme of the embodiment, according to the acquired first face ROI area in the video frame and the preset face standard key points, an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points is determined, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; the method has the advantages that the target transformation array is utilized to conduct face key point alignment processing on the second face ROI region in the extracted video frame, the target face ROI region after face key point alignment is obtained, compared with the existing method for detecting MTCNN face key points based on a neural network model and the method for achieving face key point alignment through tracking and detecting stability between continuous video frames, the face alignment method greatly improves the generalization capability of face alignment operation according to the initial transformation array determined by the first face ROI region in the video frame, achieves face alignment of the face ROI region after self-adaptive debouncing and smoothing processing, can simplify the implementation mode of face alignment on the basis of guaranteeing face alignment accuracy in the video, and accordingly reduces hardware resource consumption such as memory and display memory, namely, the method has the advantages that the prior method has high cost investment in the early stage and high requirements on hardware such as GPU, meanwhile, the face alignment operation is self-adaptively achieved, the face alignment operation generalization capability is greatly improved compared with the initial transformation array determined by the first face ROI region, the face alignment operation method has the advantages that the method has high requirements on the fact that the face alignment accuracy is achieved through the fact that the face alignment system is capable of meeting the requirements of real-time, and the face alignment is greatly reduced.

Further, as a specific implementation of the method of fig. 1, an embodiment of the present application provides a face alignment device, as shown in fig. 3, where the device includes: an initial transform array module 31, a target transform array module 32, an alignment module 33.

The initial transformation array module 31 may be configured to determine an initial transformation array for implementing alignment of face key points corresponding to a first face ROI area according to the obtained first face ROI area in the video frame and a preset face standard key point, where the initial transformation array includes initial scale transformation data and initial translation transformation data.

The target transform array module 32 may be configured to perform a debounce and smoothing process on the initial transform array to obtain a target transform array.

The alignment module 33 may be configured to perform a face key point alignment process on the extracted second face ROI area in the video frame by using the target transformation array, so as to obtain a target face ROI area after the face key point alignment; the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.

In a specific application scenario, the initial transform array module 31 includes: framing unit 311, and conversion unit 312.

The framing unit 311 may be configured to perform framing processing on the acquired video to obtain continuous video frames.

The transforming unit 312 may be configured to perform the following processing for each video frame: performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest); performing key point detection on the first face ROI by using a second preset depth model to obtain target key points; and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the face key points by utilizing the target key points and preset face standard key points.

In a specific application scenario, the transformation unit 312 includes: calculating the corresponding relation between the target key point and the preset face standard key point by using a least square method; and obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.

In a specific application scenario, the target transformation array module 32 includes: a first debounce unit 321, a second debounce unit 322, and a smoothing filter unit 323.

The first debounce unit 321 may be configured to perform a first debounce process on the initial transform array to obtain a first debounce array.

The second debounce unit 322 may be configured to perform a second debounce process on the first debounce array to obtain a second debounce array.

The smoothing filter unit 323 may be configured to perform smoothing filtering on the second debounce array to obtain a target transform array.

In a specific application scenario, the first jitter removal process is clipping filtering process, and the second jitter removal process is recursive average filtering process.

In a specific application scenario, the smoothing filter unit 323 includes: and carrying out smooth filtering treatment on the second debounce array by utilizing the self-adaptive set window size to obtain a target transformation array, wherein the self-adaptive set window size is determined according to the length of the video.

In a specific application scenario, the alignment module 33 includes: a scale conversion unit 331, a translation conversion unit 332.

The scale transformation unit 331 may be configured to perform face key point scaling processing on the extracted second face ROI area in the video frame by using the target scale transformation data in the target transformation array, so as to obtain a scaled third face ROI area.

The translation transformation unit 332 may be configured to perform a face key point translation process on the third face ROI area by using the target translation transformation data in the target transformation array, so as to obtain a translated target face ROI area.

It should be noted that, for other corresponding descriptions of each functional unit related to the face alignment device provided by the embodiment of the present application, reference may be made to corresponding descriptions in fig. 1 and fig. 2, and details are not repeated here.

Based on the above-mentioned methods shown in fig. 1 and 2, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the face alignment method based on face recognition shown in fig. 1 and 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present application.

Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3, in order to achieve the above objects, the embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, etc., where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the face alignment method based on face recognition as shown in fig. 1 and 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the architecture of a computer device provided in this embodiment is not limited to this physical device, but may include more or fewer components, or may be combined with certain components, or may be arranged in a different arrangement of components.

The storage medium may also include an operating system, a network communication module. An operating system is a program that manages the hardware and software resources of a computer device, supporting the execution of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the entity equipment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. By applying the technical scheme of the application, compared with the existing face key point detection MTCNN method based on the neural network model and the method for realizing face key point alignment by tracking and detecting stability between continuous video frames, the face key point alignment method based on the embodiment of the application has the advantages that the initial transformation array determined according to the first face ROI region in the video frames is subjected to self-adaptive debouncing and smoothing processing, the face alignment of the face ROI region is realized, the implementation mode of the face alignment can be simplified on the basis of ensuring the face alignment accuracy in the video, so that the hardware resource consumption of memory, video memory and the like is reduced, namely, the prior time-consuming preparation work such as manual labeling and model training is avoided, the pre-stage cost investment is high, the debouncing and smoothing processing is realized in a self-adaptive manner, the generalization capability of the face alignment operation is greatly improved, the processing speed of video data can be greatly improved, the time consumption is effectively reduced, and the like compared with the existing face key point alignment method based on the stability between the continuous video frames is detected by tracking, and the real-time requirements of various face alignment-based application systems are met.

Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.

Claims

1. The face alignment method based on face recognition is characterized by comprising the following steps of:

Determining an initial transformation array which corresponds to a first face ROI region and is used for realizing face key point alignment according to the acquired first face ROI region in a video frame and a preset face standard key point, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data;

the second face ROI area is different from the first face ROI area, and the second face ROI area further includes at least a background area compared to the first face ROI area;

The step of performing face key point alignment processing on the extracted face ROI area in the video frame by using the target transformation array to obtain a target face ROI area with aligned face key points, includes:

performing face key point scaling processing on the extracted second face ROI region in the video frame by utilizing target scale transformation data in the target transformation array to obtain a scaled third face ROI region;

And carrying out face key point translation processing on the third face ROI by utilizing the target translation conversion data in the target conversion array to obtain a translated target face ROI.

2. The method according to claim 1, wherein determining an initial transformation array for achieving alignment of face key points corresponding to a first face ROI area according to the obtained first face ROI area in a video frame and a preset face standard key point, wherein the initial transformation array includes initial scale transformation data and initial translation transformation data, and includes:

carrying out framing treatment on the acquired video to obtain continuous video frames;

each video frame is processed as follows:

Performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest);

performing key point detection on the first face ROI by using a second preset depth model to obtain target key points;

and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the face key points by utilizing the target key points and preset face standard key points.

3. The method according to claim 2, wherein determining initial scale transformation data and initial translation transformation data for achieving face key point alignment using the target key point and a preset face standard key point, comprises:

calculating the corresponding relation between the target key point and the preset face standard key point by using a least square method;

and obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.

4. The method of claim 1, wherein the performing the de-jittering and smoothing on the initial transform array to obtain a target transform array comprises:

Performing first jitter removal processing on the initial transformation array to obtain a first jitter removal array;

performing a second jitter removal process on the first jitter removal array to obtain a second jitter removal array;

And performing smoothing filtering processing on the second debounce array to obtain a target transformation array.

5. The method of claim 4, wherein the first de-dithering process is a clipping filtering process and the second de-dithering process is a recursive average filtering process.

6. The method of claim 4, wherein smoothing the second debounce array to obtain a target transform array, comprising:

And carrying out smooth filtering treatment on the second debounce array by utilizing the self-adaptive set window size to obtain a target transformation array, wherein the self-adaptive set window size is determined according to the length of the video.

7. A face alignment device, comprising:

the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area;

wherein, the alignment module includes:

The scale transformation unit is used for performing face key point scaling processing on the extracted second face ROI area in the video frame by utilizing the target scale transformation data in the target transformation array to obtain a scaled third face ROI area;

And the translation transformation unit is used for carrying out face key point translation processing on the third face ROI by utilizing the target translation transformation data in the target transformation array to obtain a target face ROI after the translation processing.

8. A storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the face alignment method based on face recognition of any one of claims 1 to 6.

9. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the face alignment method based on face recognition according to any of claims 1 to 6 when executing the program.