CN112733667B - Face alignment method and device based on face recognition - Google Patents
Face alignment method and device based on face recognition Download PDFInfo
- Publication number
- CN112733667B CN112733667B CN202011626746.5A CN202011626746A CN112733667B CN 112733667 B CN112733667 B CN 112733667B CN 202011626746 A CN202011626746 A CN 202011626746A CN 112733667 B CN112733667 B CN 112733667B
- Authority
- CN
- China
- Prior art keywords
- face
- array
- target
- transformation
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 230000009466 transformation Effects 0.000 claims abstract description 155
- 238000012545 processing Methods 0.000 claims abstract description 68
- 238000009499 grossing Methods 0.000 claims abstract description 33
- 238000013519 translation Methods 0.000 claims description 37
- 230000008569 process Effects 0.000 claims description 30
- 238000001914 filtration Methods 0.000 claims description 24
- 238000001514 detection method Methods 0.000 claims description 22
- 238000006243 chemical reaction Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 11
- 238000003491 array Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000003062 neural network model Methods 0.000 description 7
- 238000002372 labelling Methods 0.000 description 6
- 238000011049 filling Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/34—Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a face alignment method and device based on face recognition, which relate to the technical field of artificial intelligence and can improve face alignment accuracy. The method comprises the following steps: according to a first face ROI (region of interest) in an acquired video frame and preset face standard key points, determining an initial transformation array which corresponds to the first face ROI and is used for realizing face key point alignment; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; and performing face key point alignment processing on the re-extracted second face ROI region in the video frame by using the target transformation array to obtain a target face ROI region with the face key points aligned. The application is suitable for various application systems based on face alignment. In addition, the application also relates to a blockchain technology, and the video frame and the target face ROI area can be stored in the blockchain to ensure data privacy and safety.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a face alignment method and device based on face recognition.
Background
Face alignment is a hot research problem of computer vision, and the accuracy of face feature extraction and recognition is directly affected by the effect of face alignment. Face alignment is a key factor for video streaming stability.
In the existing solution for face alignment of video scenes, a neural network method is used for key point detection and alignment (e.g. MTCNN), a large amount of labeling cost is required, hardware requirements such as a Graphic Processing Unit (GPU) are required, and training cost is high. In addition, some tracking methods are also commonly used, for example, on the basis of key point detection, the detection stability between continuous video frames in a video stream is improved, so that the accuracy of face alignment is ensured. Therefore, the existing solution has the defects that the method for detecting the key points of the human face MTCNN based on the neural network model has high labeling cost, GPU (graphics processing unit) and other hardware requirements and high training cost; the method for realizing the alignment of the key points of the human face by tracking and detecting the stability between the continuous video frames has the advantages of slower processing speed and influence on the real-time performance of the system.
Disclosure of Invention
In view of this, the application provides a face alignment method and device based on face recognition, and mainly aims to solve the technical problems of the existing method for detecting MTCNN face key points based on a neural network model, such as high labeling cost, high hardware requirements of a Graphic Processing Unit (GPU), high training cost, and low processing speed and poor real-time processing effect of a system by tracking and detecting stability between continuous video frames.
According to an aspect of the present application, there is provided a face alignment method based on face recognition, the method comprising:
According to a first face ROI (region of interest) in an acquired video frame and preset face standard key points, determining an initial transformation array which corresponds to the first face ROI and is used for realizing face key point alignment, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data;
Performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array;
Performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned;
The second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.
According to another aspect of the present application, there is provided a face alignment apparatus, comprising:
The initial transformation array module is used for determining an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points according to the acquired first face ROI area in the video frame and the preset face standard key points, and the initial transformation array comprises initial scale transformation data and initial translation transformation data;
The target transformation array module is used for carrying out jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array;
the alignment module is used for carrying out face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned;
The second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.
According to still another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the face alignment method based on face recognition as described above.
According to still another aspect of the present application, there is provided a computer device including a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, the processor implementing the face alignment method based on face recognition as described above when executing the program.
By means of the technical scheme, compared with a face key point detection MTCNN method based on a neural network model and a method for realizing face key point alignment through tracking and detecting stability between continuous video frames, the face alignment method and device based on face recognition provided by the application determine an initial transformation array which corresponds to the first face ROI region and is used for realizing face key point alignment according to the acquired first face ROI region in the video frames and preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned; the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area. Therefore, according to the initial transformation array determined by the first face ROI in the video frame, after the self-adaptive jitter removal and smoothing treatment, the face alignment of the face ROI is realized, the technical problem that the accuracy of the face alignment is lower under the condition that jitter or jump exists between the continuous frames of the face can be solved, and meanwhile, the realization mode of the face alignment is simplified on the basis of guaranteeing the face alignment of the video frame. The method avoids a great deal of time-consuming early-stage preparation work such as the existing manual labeling and model training, and the like, and simultaneously adaptively realizes debouncing and smoothing processing while leading to higher investment of early-stage cost, thereby greatly improving the generalization capability of face alignment operation, greatly improving the processing speed of video data, effectively reducing time consumption and the like compared with the existing face key point alignment method by tracking and detecting the stability between continuous video frames, and meeting the real-time requirements of various application systems based on face alignment.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
Fig. 1 shows a flow chart of a face alignment method based on face recognition according to an embodiment of the present application;
fig. 2 is a schematic flow chart of another face alignment method based on face recognition according to an embodiment of the present application;
Fig. 3 shows a schematic structural diagram of a face alignment device according to an embodiment of the present application.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
Aiming at the technical problems that the existing method for detecting the key points of the face based on the neural network model has high labeling cost, GPU (graphics processing unit) and other hardware requirements, high training cost, and the method for realizing the alignment of the key points of the face by tracking and detecting the stability between continuous video frames has low processing speed and poor real-time processing effect of the system. The embodiment provides a face alignment method based on face recognition, which can effectively improve the accuracy of face alignment and simplify the implementation method of face alignment under the condition that jitter or jump exists between continuous frames of the face, as shown in fig. 1, and the method comprises the following steps:
101. according to a first face ROI region in an acquired video frame and a preset face standard key point, determining an initial transformation array which corresponds to the first face ROI region and is used for realizing face key point alignment, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data.
In this embodiment, a remote recording controller is used to obtain a video, frame division processing is performed on the obtained video to obtain a continuous video frame, and face key points in a first face ROI area in the continuous video frame are respectively subjected to face key point alignment preprocessing to obtain an initial transformation array for realizing face key point alignment, where the face key point alignment preprocessing refers to comparing the face key points in the first face ROI area with the preset face standard key points, and calculating an initial transformation array for realizing face key point alignment, but only determining the initial transformation array without performing face key point alignment operation, so that the initial transformation array is subjected to jitter removal and smoothing processing in order of video frames to improve accuracy of face alignment.
According to the requirements of practical application scenes, video can be acquired through the camera equipment, the stability of video acquisition is guaranteed through the tripod, the identification of the region of the face ROI is improved through the aperture light filling mode, the video stream acquired by the camera equipment is input into the remote recording controller, so that the accuracy of face alignment is improved while the remote recording controller carries out jitter processing according to the acquired video stream, and the video acquisition is not limited specifically.
102. And performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array.
In this embodiment, the initial transform array is subjected to a debouncing process, specifically, a first debouncing array is obtained by performing a first debouncing process on the initial transform array, the first debouncing process is used for eliminating abnormal points in a face key point detection result of a face ROI area, and a second debouncing array is obtained by performing a second debouncing process on the first debouncing array, wherein the second debouncing process is used for performing a large-amplitude debouncing process in the face ROI area into a small-amplitude debouncing process; and carrying out smoothing treatment on the initial transformation array after the debouncing treatment, specifically, carrying out smoothing treatment on small-amplitude jitter existing in the second debouncing array, so that the obtained target transformation array can be more similar to real transformation, and the purpose of anti-debouncing treatment in the face alignment process is achieved.
103. Performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned; the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.
In this embodiment, the face ROI area is extracted again from the acquired video frame, so that the alignment processing of the face key points is performed on the re-extracted second face ROI area by using the target transformation array, and the target face ROI area is obtained. Unlike the first face ROI detected in step 101, the re-extracted second face ROI includes a background region other than a face, and may further include hair, hair accessories, caps, and the like, and the face key point alignment processing is performed based on the re-extracted second face ROI, so that accuracy of face key point alignment can be effectively improved compared with face key point alignment processing of the first face ROI including only a face.
According to the requirements of practical application scenes, a target face ROI region or a video frame after face key point alignment treatment can be processed and applied, for example, the target face ROI region can be input into a face recognition network model obtained through training, a face recognition result in the video frame can be obtained through feature extraction in the target face ROI region, virtual anchor generation can be realized through feature extraction in the target face ROI region, and the application of the target face ROI region after face alignment is not specifically limited.
According to the scheme, an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points is determined according to the acquired first face ROI area in the video frame and the preset face standard key points, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; the target transformation array is utilized to perform face key point alignment processing on the extracted second face ROI region in the video frame to obtain a target face ROI region with the face key points aligned, compared with the existing method for detecting MTCNN the face key points based on the neural network model and the method for realizing the face key point alignment through tracking and detecting the stability between continuous video frames, the face alignment of the face ROI region can be realized after self-adaptive debouncing and smoothing processing according to the initial transformation array determined by the first face ROI region in the video frame, so that the technical problem that the face alignment accuracy is low under the condition that jitter or jump exists between the continuous frames of the face is solved, and meanwhile, the implementation mode of the face alignment is simplified on the basis of guaranteeing the face alignment of the video frames.
Further, as a refinement and extension of the foregoing embodiment, in order to fully describe a specific implementation procedure of the embodiment, another face alignment method based on face recognition is provided, as shown in fig. 2, where the method includes:
201. And carrying out framing treatment on the acquired video to obtain continuous video frames.
202. Each video frame is processed as follows: performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest); performing key point detection on the first face ROI by using a second preset depth model to obtain target key points; and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the face key points by utilizing the target key points and preset face standard key points.
In this embodiment, a remote recording controller is used to acquire a video, and the acquired video is subjected to framing processing to obtain continuous video frames, and the following processing is performed for each video frame in the continuous video frames: specifically, a first preset depth model is utilized to detect a face region in a video frame, and a first processing result, namely a first face ROI region to be subjected to face key point alignment, is obtained; detecting key points of the human face in the human face ROI area in the first processing result by using a second preset depth model to obtain coordinates of 68 key points; and obtaining scale conversion data S and translation conversion data T for aligning the key points of the face from the key points to the standard key points in the 3D standard face template for the coordinates of 68 key points in the second processing result and the 3D standard face template, and storing the scale conversion data S and the translation conversion data T of a plurality of continuous video frames into an initial conversion array set_S and set_T.
According to the requirements of the actual application scene, the first depth residual error network ResNet in the computer vision library Dlib can be used for face region detection, and one or more detected face regions are recorded to obtain a first face ROI region; using a second depth residual error network ResNet in a computer vision library Dlib to detect key points of the human face in the human face ROI area, and recording coordinates (x, y) of 68 key points; according to the coordinates (x, y) of 68 key points and the 3D standard face template, 1 scale transformation data S and 2 translation transformation data T for aligning the key points of the face from the key points to the standard key points in the 3D standard face template are calculated and stored in transformation arrays set_S and set_T.
The first depth residual network ResNet and the second depth residual network ResNet are different, that is, the first depth residual network ResNet is obtained by training based on the training sample of the face ROI area, and the second depth residual network ResNet is obtained by training based on the training sample of the face keypoint; the method for acquiring the 3D standard face templates comprises the steps of obtaining a 3D standard face template by calculating the average value of each dimension characteristic in a plurality of 3D faces according to the plurality of 3D faces, comparing face key points according to the application, and not limiting the dimension characteristics of the 3D faces according to the requirements of actual application scenes.
To illustrate the specific implementation manner of step 202, as a preferred embodiment, the determining, by using the target keypoints and preset face standard keypoints, initial scale transformation data and initial translation transformation data for implementing face keypoint alignment specifically includes:
step 2021, calculating the corresponding relation between the target key point and the preset face standard key point by using a least square method.
Step 2022, obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.
In this embodiment, the initial transformation array is a two-dimensional sequence, one-dimensional is used to represent the number of video frames, and two-dimensional is used to represent the scale transformation data and translation transformation data of each video frame for aligning the key points of the face, so that the obtained initial scale transformation data and initial translation transformation data from the target key points to the preset key points of the face standard are respectively subjected to jitter removal and smoothing processing according to the sequence of the video frames, and the target transformation array which makes the scale transformation data tend to be stable is obtained.
203. And performing a first jitter removal process on the initial transformation array to obtain a first jitter removal array.
204. And performing a second jitter removal process on the first jitter removal array to obtain a second jitter removal array.
205. And performing smoothing filtering processing on the second debounce array to obtain a target transformation array.
To illustrate the specific implementation of steps 203 and 204, as a preferred embodiment, the first jitter removal process is a clipping filter process, and the second jitter removal process is a recursive average filter process.
In the practical application scene, the prior art focuses more on how to improve the accuracy of face key point detection so as to solve the problem of jitter or jump between continuous frames of a face, and the embodiment improves the accuracy of face key point detection while achieving face alignment anti-jitter in a video through the de-jitter and smooth correction operation of an initial conversion array for face alignment. On the other hand, although the face key point detection method in the prior art is accurate and has small errors, the face alignment result obtained finally still has large errors after the face alignment process is performed on a plurality of face key points and the transformation data of a plurality of video frames, namely the superposition operation of a plurality of small error results. Therefore, on the basis of more accurate detection of key points of the human face, the method carries out debouncing and smoothing processing on the initial transformation array which needs to be subjected to human face alignment, thereby greatly reducing the jitter of human face alignment in video and enabling the effect after human face alignment to reach a more ideal state.
Because the human face in the video is continuous, the key point detection and transformation data of the human face are also continuously and smoothly distributed in theory, but the key point detection and transformation data are interfered by various factors such as the background of the human face, illumination and the like in the actual application scene, so that the phenomenon of unsmooth distribution is extremely easy to cause. Therefore, based on this, the present embodiment performs clipping filtering on the initial transform arrays set_s, set_t, and performs recursive average filtering on the transform arrays set_s, set_t after the clipping filtering processing, to obtain the debounce processing results of the initial transform arrays set_s, set_t. The method comprises the steps of carrying out continuous filtering processing on an initial conversion array twice to realize the debouncing operation on the initial conversion array, specifically, eliminating abnormal points in key point detection results based on amplitude limiting filtering operation, and processing large-amplitude jitter in continuous video frames (human face ROI areas) into small-amplitude jitter based on recursive average filtering operation so as to realize near-real conversion processing by carrying out smoothing processing on initial conversion data processed into the small-amplitude jitter and achieve the purpose of anti-jitter processing in the human face alignment process.
The method comprises the steps of respectively carrying out limiting filtering treatment on an initial conversion array set_S and set_T, specifically, presetting a maximum deviation value (set as A) allowed by twice sampling, judging a detected current sampling value, and determining that the current value is effective if the difference value between the current value and a previous value is smaller than or equal to the maximum deviation value A; if the difference between the current value and the previous value is larger than A, the current value is invalid, the current value is abandoned, and the previous value replaces the current value, so that pulse interference caused by accidental factors is overcome, and the effect of eliminating the abnormal points in the key point detection result is achieved.
Further, recursive average filtering is performed on the initial transform arrays set_s and set_t after clipping filtering, specifically, N sampling values obtained continuously are used as a queue, the length of the queue is fixed to N, a group of transform data (scale transform data and translation transform data of the next video frame) obtained each time is put into the tail of the queue based on a first-in first-out principle, and new filtering results are obtained by performing arithmetic average operation on the N sampling values in the queue. Therefore, the initial transformation arrays set_S and set_T after the clipping filtering processing are subjected to recursive average filtering processing, so that the effect of well suppressing the periodic interference can be achieved, the smoothness is higher, the defect of poor periodic interference processing in the clipping filtering processing is overcome, and the initial transformation arrays set_S and set_T supplement each other, so that the best effect of removing the jitter of the video frame is achieved.
To illustrate a specific implementation of step 205, as a preferred embodiment, step 205 may specifically include: and carrying out smooth filtering treatment on the second debounce array by utilizing the self-adaptive set window size to obtain a target transformation array, wherein the self-adaptive set window size is determined according to the length of the video.
In this embodiment, smoothing filtering is performed on the debounce operation result to obtain a smoothed transform array set_s, set_t, specifically, filling is performed on the debounce operation result (the second debounce array) to obtain a one-dimensional array, and after updating the pixel value at the center of the window in the window with the adaptive size, the spatial region of the filling is cleared to obtain the target transform array. The size of the window is determined according to the acquired information such as the video length, so as to achieve the purpose of window size self-adaption.
Setting an adaptive data value according to the requirement of an actual application scene, calculating the current window size according to the acquired video length, for example, win_size=len (input video)/3 (adaptive data value), and filling padding processing is carried out on an initial conversion array set_S and set_T after the debouncing processing according to the adaptively set window size to obtain a one-dimensional sequence set_S_padding and set_T_padding; respectively moving the one-dimensional sequence set_S_padding, set_T_padding from the beginning according to the window size, and calculating the average value of all pixel values in each window so as to update the pixel value of the central position of the window in the one-dimensional sequence; and (3) clearing the filling part to obtain a jitter-removed and smoothed target transformation array set_S and set_T, namely a target transformation array in which the scale transformation data S tends to be stable.
206. And performing face key point scaling processing on the extracted second face ROI region in the video frame by utilizing target scale transformation data in the target transformation array to obtain a scaled third face ROI region.
207. And carrying out face key point translation processing on the third face ROI by utilizing the target translation conversion data in the target conversion array to obtain a translated target face ROI.
In this embodiment, the second face ROI area in the video frame is re-extracted, and the scale transformation and translation transformation are sequentially performed on the second face ROI area by using the target transformation arrays set_s and set_t after the debouncing and smoothing processing, so as to obtain the target face ROI area after the face alignment. The scale transformation can also comprise rotation processing according to the requirements of actual application scenes, and the scale transformation is not particularly limited here; the specific method for re-extracting the second face ROI area in the video frame includes extracting the second face ROI area centered on the center point by obtaining the center point of the face key point, or performing a certain area expansion according to the first face ROI area detected in step 202, so as to obtain the second face ROI area for face alignment, where the extraction mode of the second face ROI area is not specifically limited.
It should be noted that, in this embodiment, the video frame and the target face ROI area may be stored in a blockchain, where the blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Therefore, the face alignment method based on face recognition in the video scene based on artificial intelligence in the embodiment calculates initial transformation data from the face key points to the standard face key points according to the detection result of the face key points in each frame of image, and performs debouncing and smoothing processing on the initial transformation data to realize face alignment operation on each frame of image, ensure continuity between video frames on the basis of face alignment of each frame of image, and avoid face jitter or jump in continuous video frames.
By applying the technical scheme of the embodiment, according to the acquired first face ROI area in the video frame and the preset face standard key points, an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points is determined, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data; performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array; the method has the advantages that the target transformation array is utilized to conduct face key point alignment processing on the second face ROI region in the extracted video frame, the target face ROI region after face key point alignment is obtained, compared with the existing method for detecting MTCNN face key points based on a neural network model and the method for achieving face key point alignment through tracking and detecting stability between continuous video frames, the face alignment method greatly improves the generalization capability of face alignment operation according to the initial transformation array determined by the first face ROI region in the video frame, achieves face alignment of the face ROI region after self-adaptive debouncing and smoothing processing, can simplify the implementation mode of face alignment on the basis of guaranteeing face alignment accuracy in the video, and accordingly reduces hardware resource consumption such as memory and display memory, namely, the method has the advantages that the prior method has high cost investment in the early stage and high requirements on hardware such as GPU, meanwhile, the face alignment operation is self-adaptively achieved, the face alignment operation generalization capability is greatly improved compared with the initial transformation array determined by the first face ROI region, the face alignment operation method has the advantages that the method has high requirements on the fact that the face alignment accuracy is achieved through the fact that the face alignment system is capable of meeting the requirements of real-time, and the face alignment is greatly reduced.
Further, as a specific implementation of the method of fig. 1, an embodiment of the present application provides a face alignment device, as shown in fig. 3, where the device includes: an initial transform array module 31, a target transform array module 32, an alignment module 33.
The initial transformation array module 31 may be configured to determine an initial transformation array for implementing alignment of face key points corresponding to a first face ROI area according to the obtained first face ROI area in the video frame and a preset face standard key point, where the initial transformation array includes initial scale transformation data and initial translation transformation data.
The target transform array module 32 may be configured to perform a debounce and smoothing process on the initial transform array to obtain a target transform array.
The alignment module 33 may be configured to perform a face key point alignment process on the extracted second face ROI area in the video frame by using the target transformation array, so as to obtain a target face ROI area after the face key point alignment; the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area.
In a specific application scenario, the initial transform array module 31 includes: framing unit 311, and conversion unit 312.
The framing unit 311 may be configured to perform framing processing on the acquired video to obtain continuous video frames.
The transforming unit 312 may be configured to perform the following processing for each video frame: performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest); performing key point detection on the first face ROI by using a second preset depth model to obtain target key points; and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the face key points by utilizing the target key points and preset face standard key points.
In a specific application scenario, the transformation unit 312 includes: calculating the corresponding relation between the target key point and the preset face standard key point by using a least square method; and obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.
In a specific application scenario, the target transformation array module 32 includes: a first debounce unit 321, a second debounce unit 322, and a smoothing filter unit 323.
The first debounce unit 321 may be configured to perform a first debounce process on the initial transform array to obtain a first debounce array.
The second debounce unit 322 may be configured to perform a second debounce process on the first debounce array to obtain a second debounce array.
The smoothing filter unit 323 may be configured to perform smoothing filtering on the second debounce array to obtain a target transform array.
In a specific application scenario, the first jitter removal process is clipping filtering process, and the second jitter removal process is recursive average filtering process.
In a specific application scenario, the smoothing filter unit 323 includes: and carrying out smooth filtering treatment on the second debounce array by utilizing the self-adaptive set window size to obtain a target transformation array, wherein the self-adaptive set window size is determined according to the length of the video.
In a specific application scenario, the alignment module 33 includes: a scale conversion unit 331, a translation conversion unit 332.
The scale transformation unit 331 may be configured to perform face key point scaling processing on the extracted second face ROI area in the video frame by using the target scale transformation data in the target transformation array, so as to obtain a scaled third face ROI area.
The translation transformation unit 332 may be configured to perform a face key point translation process on the third face ROI area by using the target translation transformation data in the target transformation array, so as to obtain a translated target face ROI area.
It should be noted that, for other corresponding descriptions of each functional unit related to the face alignment device provided by the embodiment of the present application, reference may be made to corresponding descriptions in fig. 1 and fig. 2, and details are not repeated here.
Based on the above-mentioned methods shown in fig. 1 and 2, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the face alignment method based on face recognition shown in fig. 1 and 2.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present application.
Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3, in order to achieve the above objects, the embodiment of the present application further provides a computer device, which may specifically be a personal computer, a server, a network device, etc., where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the face alignment method based on face recognition as shown in fig. 1 and 2.
Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.
It will be appreciated by those skilled in the art that the architecture of a computer device provided in this embodiment is not limited to this physical device, but may include more or fewer components, or may be combined with certain components, or may be arranged in a different arrangement of components.
The storage medium may also include an operating system, a network communication module. An operating system is a program that manages the hardware and software resources of a computer device, supporting the execution of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the entity equipment.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. By applying the technical scheme of the application, compared with the existing face key point detection MTCNN method based on the neural network model and the method for realizing face key point alignment by tracking and detecting stability between continuous video frames, the face key point alignment method based on the embodiment of the application has the advantages that the initial transformation array determined according to the first face ROI region in the video frames is subjected to self-adaptive debouncing and smoothing processing, the face alignment of the face ROI region is realized, the implementation mode of the face alignment can be simplified on the basis of ensuring the face alignment accuracy in the video, so that the hardware resource consumption of memory, video memory and the like is reduced, namely, the prior time-consuming preparation work such as manual labeling and model training is avoided, the pre-stage cost investment is high, the debouncing and smoothing processing is realized in a self-adaptive manner, the generalization capability of the face alignment operation is greatly improved, the processing speed of video data can be greatly improved, the time consumption is effectively reduced, and the like compared with the existing face key point alignment method based on the stability between the continuous video frames is detected by tracking, and the real-time requirements of various face alignment-based application systems are met.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.
Claims (9)
1. The face alignment method based on face recognition is characterized by comprising the following steps of:
Determining an initial transformation array which corresponds to a first face ROI region and is used for realizing face key point alignment according to the acquired first face ROI region in a video frame and a preset face standard key point, wherein the initial transformation array comprises initial scale transformation data and initial translation transformation data;
Performing jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array;
Performing face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned;
the second face ROI area is different from the first face ROI area, and the second face ROI area further includes at least a background area compared to the first face ROI area;
The step of performing face key point alignment processing on the extracted face ROI area in the video frame by using the target transformation array to obtain a target face ROI area with aligned face key points, includes:
performing face key point scaling processing on the extracted second face ROI region in the video frame by utilizing target scale transformation data in the target transformation array to obtain a scaled third face ROI region;
And carrying out face key point translation processing on the third face ROI by utilizing the target translation conversion data in the target conversion array to obtain a translated target face ROI.
2. The method according to claim 1, wherein determining an initial transformation array for achieving alignment of face key points corresponding to a first face ROI area according to the obtained first face ROI area in a video frame and a preset face standard key point, wherein the initial transformation array includes initial scale transformation data and initial translation transformation data, and includes:
carrying out framing treatment on the acquired video to obtain continuous video frames;
each video frame is processed as follows:
Performing face detection on a current image corresponding to the video frame by using a first preset depth model to obtain a first face ROI (region of interest);
performing key point detection on the first face ROI by using a second preset depth model to obtain target key points;
and determining initial scale transformation data and initial translation transformation data for realizing the alignment of the face key points by utilizing the target key points and preset face standard key points.
3. The method according to claim 2, wherein determining initial scale transformation data and initial translation transformation data for achieving face key point alignment using the target key point and a preset face standard key point, comprises:
calculating the corresponding relation between the target key point and the preset face standard key point by using a least square method;
and obtaining initial scale transformation data and initial translation transformation data from the target key point to the preset face standard key point according to the corresponding relation.
4. The method of claim 1, wherein the performing the de-jittering and smoothing on the initial transform array to obtain a target transform array comprises:
Performing first jitter removal processing on the initial transformation array to obtain a first jitter removal array;
performing a second jitter removal process on the first jitter removal array to obtain a second jitter removal array;
And performing smoothing filtering processing on the second debounce array to obtain a target transformation array.
5. The method of claim 4, wherein the first de-dithering process is a clipping filtering process and the second de-dithering process is a recursive average filtering process.
6. The method of claim 4, wherein smoothing the second debounce array to obtain a target transform array, comprising:
And carrying out smooth filtering treatment on the second debounce array by utilizing the self-adaptive set window size to obtain a target transformation array, wherein the self-adaptive set window size is determined according to the length of the video.
7. A face alignment device, comprising:
The initial transformation array module is used for determining an initial transformation array which corresponds to the first face ROI area and is used for realizing the alignment of the face key points according to the acquired first face ROI area in the video frame and the preset face standard key points, and the initial transformation array comprises initial scale transformation data and initial translation transformation data;
The target transformation array module is used for carrying out jitter removal and smoothing treatment on the initial transformation array to obtain a target transformation array;
the alignment module is used for carrying out face key point alignment processing on the extracted second face ROI region in the video frame by utilizing the target transformation array to obtain a target face ROI region with the face key points aligned;
the second face ROI area is different from the first face ROI area, the second face ROI area further including a background area as compared to the first face ROI area;
wherein, the alignment module includes:
The scale transformation unit is used for performing face key point scaling processing on the extracted second face ROI area in the video frame by utilizing the target scale transformation data in the target transformation array to obtain a scaled third face ROI area;
And the translation transformation unit is used for carrying out face key point translation processing on the third face ROI by utilizing the target translation transformation data in the target transformation array to obtain a target face ROI after the translation processing.
8. A storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the face alignment method based on face recognition of any one of claims 1 to 6.
9. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the face alignment method based on face recognition according to any of claims 1 to 6 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011626746.5A CN112733667B (en) | 2020-12-30 | 2020-12-30 | Face alignment method and device based on face recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011626746.5A CN112733667B (en) | 2020-12-30 | 2020-12-30 | Face alignment method and device based on face recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112733667A CN112733667A (en) | 2021-04-30 |
CN112733667B true CN112733667B (en) | 2024-06-28 |
Family
ID=75607964
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011626746.5A Active CN112733667B (en) | 2020-12-30 | 2020-12-30 | Face alignment method and device based on face recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112733667B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115601484B (en) * | 2022-11-07 | 2023-03-28 | 广州趣丸网络科技有限公司 | Virtual character face driving method and device, terminal equipment and readable storage medium |
CN116524418A (en) * | 2023-07-03 | 2023-08-01 | 平安银行股份有限公司 | Face and mouth recognition method, device and system and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488774A (en) * | 2019-01-29 | 2020-08-04 | 北京搜狗科技发展有限公司 | Image processing method and device for image processing |
CN111523402A (en) * | 2020-04-01 | 2020-08-11 | 车智互联(北京)科技有限公司 | Video processing method, mobile terminal and readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978754A (en) * | 2017-12-28 | 2019-07-05 | 广东欧珀移动通信有限公司 | Image processing method, image processing device, storage medium and electronic equipment |
CN112036256A (en) * | 2020-08-07 | 2020-12-04 | 深圳数联天下智能科技有限公司 | Human face key point training device |
CN112017212B (en) * | 2020-08-26 | 2022-10-04 | 北京紫光展锐通信技术有限公司 | Training and tracking method and system of face key point tracking model |
-
2020
- 2020-12-30 CN CN202011626746.5A patent/CN112733667B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488774A (en) * | 2019-01-29 | 2020-08-04 | 北京搜狗科技发展有限公司 | Image processing method and device for image processing |
CN111523402A (en) * | 2020-04-01 | 2020-08-11 | 车智互联(北京)科技有限公司 | Video processing method, mobile terminal and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112733667A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7490141B2 (en) | IMAGE DETECTION METHOD, MODEL TRAINING METHOD, IMAGE DETECTION APPARATUS, TRAINING APPARATUS, DEVICE, AND PROGRAM | |
CN110874594B (en) | Human body appearance damage detection method and related equipment based on semantic segmentation network | |
Garon et al. | Deep 6-DOF tracking | |
US10438631B2 (en) | Method for real-time video processing involving retouching of an object in the video | |
WO2016034059A1 (en) | Target object tracking method based on color-structure features | |
EP2808828B1 (en) | Image matching method, image matching device, model template generation method, model template generation device, and program | |
WO2017035966A1 (en) | Method and device for processing facial image | |
CN103279952B (en) | A kind of method for tracking target and device | |
CN112733667B (en) | Face alignment method and device based on face recognition | |
CN107316022B (en) | Dynamic gesture recognition method and device | |
US20120154638A1 (en) | Systems and Methods for Implementing Augmented Reality | |
CN109036522B (en) | Image processing method, device, equipment and readable storage medium | |
JP2021517281A (en) | Multi-gesture fine division method for smart home scenes | |
CN111882531A (en) | Automatic Analysis Method of Ultrasound Image of Hip Joint | |
CN110176021A (en) | In conjunction with the level set image segmentation method and system of the conspicuousness information of gamma correction | |
CN113886510B (en) | Terminal interaction method, device, equipment and storage medium | |
CN109034070B (en) | Blind separation method and device for replacement aliasing image | |
EP4009275B1 (en) | Golf ball top-view detection method and system, and storage medium | |
CN111798481A (en) | Image sequence segmentation method and device | |
CN113066059A (en) | Image definition detection method, device, equipment and storage medium | |
JP5051671B2 (en) | Information processing apparatus, information processing method, and program | |
CN116362975A (en) | Tongue picture skew correction method, device, equipment and storage medium | |
CN114219831B (en) | Target tracking method, device, terminal equipment and computer readable storage medium | |
CN111046727B (en) | Video feature extraction method and device, electronic equipment and storage medium | |
CN113052853B (en) | Method and device for video target tracking in complex environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |