Disclosure of Invention
The first purpose of the present invention is to provide a video-based face optimization method, which detects a face minimap from a compressed video and obtains a face minimap from an original video according to the position of the face minimap, so as to achieve the purpose of reducing system resources occupied in a face recognition process. The second purpose of the present invention is to provide a video-based face optimization method, which performs multiple matting and scoring again for the same face within a certain time period to achieve the purpose of improving the accuracy and breadth of face recognition.
In order to solve the technical problems, the invention adopts the technical scheme that:
a human face optimization method based on video comprises the steps of carrying out human face tracking detection on video frames of compressed video to obtain a human face small picture;
scoring the face minimap according to the definition and the noise coefficient of the face minimap, and correspondingly storing the face detection data of the face minimap and the time stamp of the video frame for the face minimap with the score reaching a set threshold value;
and acquiring the position information of the face small graph in the video frame of the compressed video, and acquiring the face big graph from the corresponding position of the original video according to the position information of the face small graph.
Preferably, the face tracking detection of the video frame of the compressed video includes acquiring a visible code stream and an infrared code stream from the compressed video, wherein the visible code stream is a real-time code stream for detection, and the infrared code stream is a real-time infrared code stream for binocular live detection;
and respectively carrying out face tracking detection on the video frames of the visible code stream and the infrared code stream to obtain all face minimaps contained in the video frames, judging whether the detected face minimaps are living bodies or not by adopting a binocular living body detection algorithm, and obtaining all face minimaps which are living bodies contained in the video frames.
Preferably, after all the face thumbnails included in the video frame are obtained, the size of each face thumbnail is obtained, and the face thumbnails with the size smaller than a set threshold are filtered out.
Preferably, the correspondingly storing the face detection data of the face thumbnail and the timestamp of the video frame includes acquiring the total number of faces in the video frame, creating a plurality of face tracking identifiers according to the acquired total number of faces, wherein each face tracking identifier corresponds to one person, correspondingly storing the face detection data of the face thumbnail and the matched face tracking identifier, and correspondingly storing the timestamp of the video frame where the face thumbnail is located and the face detection data of the face thumbnail.
Preferably, the obtaining the face size map from the corresponding position of the original video according to the position information of the face size map includes selecting a face tracking identifier, obtaining a face size map corresponding to the selected face tracking identifier, calculating a position ratio of the face size map in the video frame of the compressed video according to the position of the face size map in the video frame of the compressed video and the resolution of the compressed video, and performing matting from the corresponding position of the original video according to the resolution of the original video and the calculated position ratio of the face size map in the video frame of the compressed video to obtain the face size map.
Preferably, after the face map is obtained from the corresponding position of the original video, the face detection is performed on the face map, the size of the face map is obtained, and the face map with the size smaller than the set threshold value is filtered out.
Preferably, after the face map is filtered, the face map is scored according to the definition and the noise factor of the face map, if the score of the face map is greater than the highest score, the angle deviation of the face is detected, the ambiguity score is calculated, and a face picture with a clear front face is obtained, wherein the highest score is the highest score in the scores of all the face maps corresponding to the currently selected face tracking identifier.
Preferably, if the score of the face map is greater than the highest score, the face attributes contained in the face map are stored, the face attributes at least comprise skin color information, hair information and gender information, and the face attributes contained in the stored face map are compared with a face database to find out the face.
Preferably, after the face map is acquired, a video frame timestamp of an original video where the face map is located is acquired, whether an interval between the acquired timestamp and an earliest timestamp corresponding to a currently selected face tracking identifier reaches a preset time threshold value is judged, and if yes, the face map is reselected.
Another object of the present invention is to provide an apparatus using the video-based face optimization method, including:
the acquisition module is used for acquiring a visible code stream and an infrared code stream from a compressed video;
the face detection module is used for detecting video frames of the visible code stream and video frames of the infrared code stream to respectively perform face tracking detection and is connected with the acquisition module;
the face scoring module is used for scoring the face picture according to the definition and the noise coefficient of the face picture and is connected with the face detection module;
the face feature extraction module is used for extracting face features contained in the face picture and is connected with the face scoring module;
and the storage module is used for storing the face detection data and the face features and is connected with the face detection module and the face feature extraction module.
After adopting the technical scheme, compared with the prior art, the invention has the following beneficial effects:
1. the method and the device improve the speed of face detection by detecting the face picture from the compressed video, can quickly find out the position of the face in the compressed video, then obtain the large face picture of the corresponding position from the original video to obtain detailed face characteristics, and reduce system resources occupied in the face recognition process.
2. The invention respectively identifies a plurality of faces in the same video, performs multiple times of image matting on the same face within a certain time, and respectively scores again to obtain a face picture with the highest total score to compare with the face library, thereby effectively improving the identification precision and the identification breadth.
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and the following embodiments are used for illustrating the present invention and are not intended to limit the scope of the present invention.
In the description of the present invention, it should be noted that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
As shown in fig. 1, the present invention introduces a video-based face optimization method, which includes:
detecting and tracking a human face, detecting a human face small image and related human face detection data contained in a compressed video in the human face detection and tracking process, and calculating and scoring the human face small image;
and selecting a human face, picking out a human face large image from a corresponding position in an original video according to the position of the human face small image in the compressed video in the human face selection process, carrying out comprehensive scoring on the human face large image, and finally comparing the highest-scoring picking feature with the feature of a human face library to find out the human face.
The invention can detect the face small picture contained in the compressed video through the face detection and tracking, scores the face small picture, extracts the face big picture from the original video in the face optimization process, scores the face big picture again, compares the face with the highest total score with the face library, can reduce the system resources occupied in the face recognition process, and effectively improves the accuracy and the breadth of the face recognition.
Example one
As shown in fig. 2, in the present embodiment, a video-based face optimization method is introduced, in a face detection and tracking process, by scoring a face minimap, a face with a low score is filtered out, and system resources that need to be consumed are reduced.
The face detection tracking process comprises the following steps:
s101, acquiring a frame of visible code stream and infrared code stream.
S102, face tracking detection is respectively carried out on the visible code stream and the infrared code stream, edge faces with a certain threshold value and faces with undersize are filtered, and the number of the faces and handle information in the video picture are obtained.
And S103, matching the visible code stream with the face data detected by the infrared code stream.
And S104, judging whether the detected face is a living body or not by adopting a binocular living body algorithm, and abandoning the face information without wasting resources for face recognition if the detected face is not a living body.
And S105, for the detected face which is a living body, detecting the position of the face in the video frame and converting the position into the width-height ratio of 10000, wherein the width-height ratio is used for adapting to various resolution conversion.
S106, obtaining face tracking IDs for tracking a plurality of faces in the picture, wherein each face tracking ID corresponds to different people.
S107, 81 feature points of the face such as the periphery of the eyes, the nose and the mouth of the person are obtained, the definition and the noise coefficient are scored, the face with the score reaching a certain threshold value is placed into the chain table 1, and the face tracking ID is used as a key for optimization. In order to satisfy the condition that the same person has a plurality of pieces of face information passing the scoring threshold, the second parameter of the linked list 1 is the frame time, and the linked list 2 stores the information such as the face frame position detected at the frame time.
The correspondence list is defined as follows:
map < face tracking ID, map < frame timestamp, detected face related information >.
In step S101, a visible code stream and an infrared code stream are obtained from the compressed video, where the visible code stream is a real-time code stream for detection, and the yuv code stream is collected without being encoded, and has a resolution of 720P and a frame rate of 15 frames. The infrared code stream is a real-time infrared code stream for binocular living body detection, the resolution ratio is 720P, the frame rate is 15 frames, and the acquired yuv code stream is not subjected to coding processing. In the monitoring video, the resolution of the compressed video is lower than that of the original video, and because the resolution of the compressed video is lower, the information contained in the video frame of the compressed video is relatively less, and the area where the human face is located can be quickly found out from the compressed video. And then, matting is carried out from the corresponding position of the original video, the resolution of the original video is high, and very detailed human face features such as skin color, hair, mask wearing or not and other information can be obtained from the original video, and the human face features obtained from the original video are compared with a human face database to find out the human face and confirm the identity.
In step S102, face tracking detection is performed on the video frames of the visible code stream and the infrared code stream, so as to obtain all the face minimaps contained in the video frames. And acquiring the size of each face thumbnail, and filtering out the face thumbnails with the size smaller than a set threshold value.
In some scenes, for example, an access control system only needs to recognize a single face in a shot video, but a camera cannot guarantee that only one face is shot when shooting, and often a plurality of faces exist in a shot video image, the shot faces are not identical in size, and the face closest to the camera is the largest. When the user discerns before the entrance guard, it is close to the camera, and user's face is the biggest in the video image that the camera was shot, and other people's faces that shoot simultaneously are apart from the camera source, and other people's faces are less relatively in the video image that the camera was shot. According to the invention, the face picture is filtered according to the size of the face picture, so that the accuracy of face recognition is improved, and unnecessary faces are prevented from being recognized.
The face thumbnail may include an edge face picture, where the edge face picture is a face picture with incomplete face information at the edge of the video frame picture, and the face picture with incomplete information may be captured due to the existence of a boundary during video capturing. For example, an edge face picture only includes the mouth and the chin, and the area above the mouth, such as the nose and the eyes, is not captured. According to the invention, the edge face pictures are also filtered, so that the edge face pictures with smaller sizes are filtered, and the face recognition efficiency is improved.
In step S104, since there are images such as posters and photographs in reality, and faces of persons also exist on these images, these faces do not represent real persons, and these pieces of face information are not generally required in the monitored scene. If the images are shot by the monitoring video, the resources are wasted by detecting the faces on the images, and the user hopes to detect the real person shot in the monitoring video, so that the faces which are not living bodies are directly discarded without face recognition, and the waste of the resources is avoided. As shown in FIG. 3, face tracking IDs are created based on the total number of people included in the video, each face tracking ID corresponding to a person. And creating a linked list 1, and taking the face tracking ID as a key of the linked list 1. For example, the number of video frames is 100, if a person appears in the video all the way, the number of face pictures of the person in the video should be 100, and each frame has one face picture; if a person appears only in a certain segment of the video and does not appear in the rest of the video, the number of facial pictures of the person should be less than 100. Each face tracking ID corresponds to a linked list 2, the frame time is used as the key of the linked list 2, and information such as the position of a face frame detected at the time is stored in the linked list 2.
When a monitoring video is shot, a plurality of persons are often shot in the monitoring video at the same time, and face tracking identification needs to be carried out on each person. A video sequence includes multiple frames of pictures, and the faces of the same person included in the multiple frames of pictures are often different. The invention carries out face detection on a plurality of different faces of the same person in the video to obtain a plurality of groups of different face detection data, and the face detection data and the time stamp of the video frame are correspondingly stored, thereby being convenient for finding out the preferred face.
Example two
As shown in fig. 4, the present embodiment introduces a video-based face optimization method, which includes a face optimization process, in the face optimization process, a face map is extracted from a corresponding position in an original video according to a position of a face thumbnail in a compressed video, and a score is calculated again on the face map, so that a face with the highest total score is compared with a face library, and thus, recognition accuracy and breadth can be effectively improved.
In the face optimization process, whether the interval between the earliest face frame timestamp and the current frame timestamp in the linked list reaches 400ms needs to be judged, and the face tracking ID is read from the linked list 1 in a circulating mode to find out the matting with the highest ID score.
And judging whether the earliest face frame time stamp and the current frame time stamp in the linked list reach 400ms or not, wherein the frame time stamp is used for preventing time inaccuracy caused by system time modification. The earliest face frame time in the linked list is compared with the current frame time so as to ensure that excessive computing power is not wasted on the data frame without the face and the face can be quickly identified.
As shown in fig. 4, the face optimization process specifically includes the following steps:
s201, reading a face tracking ID from the linked list 1, circularly reading face detection data from the linked list 2, and picking out a face image at a corresponding position from an original video according to position information.
S202, carrying out face detection on face matting, filtering out edge faces with a certain threshold value and faces with undersize sizes, and obtaining detected face information.
S203, detecting 81 feature points of the face such as the eye circumference, the nose, the mouth and the like, comprehensively scoring the definition and the noise coefficient, and continuing if the score of the face reaches a set threshold and is greater than the highest score of the current face tracking ID.
And S204, detecting the deviation of each angle of the human face X, Y, Z axis, calculating the ambiguity score, acquiring the human face with clear front, and continuing if the human face with clear front is passed.
S205, obtaining the attributes of the human face, including but not limited to age, gender, mask, complexion, hair and other information.
S206, saving the current score as the highest score of the current face tracking ID, and saving the relevant features of the face for comparing with the features of the face library.
And S207, comparing the highest scoring matting feature obtained by the current face tracking ID with the feature of a face library to find out the face.
In step S201, since the resolution of the compressed video is different from that of the original video, the position ratio of the face minimap in the video frame of the compressed video is calculated, and then the matting is performed from the corresponding position of the video frame of the original video according to the calculated position ratio and the resolution of the original video to obtain the face minimap. For example, in this embodiment, the resolution of the compressed video is 720P, the size of the video frame is 1280 × 720, the vertex at the lower left corner of the video frame is taken as the origin of coordinates, the width of the video frame is taken as the X axis, the height of the video frame is taken as the Y axis, and the video frame is located in the first quadrant of the coordinate system. After face detection is performed, a rectangular face frame surrounding a face is obtained, the vertex coordinates of the lower left corner of the face frame are (128, 72), the vertex coordinates of the upper right corner of the face frame are (256, 144), the resolution of the original video is 1080P, and the size of the video frame is 1920 × 1080, so that the vertex coordinates of the lower left corner of the face frame in the original video are (192, 108), and the vertex coordinates of the upper right corner of the face frame in the original video are (384, 216).
In the surveillance video, the face is not necessarily directly facing the surveillance camera, resulting in the face shot in the surveillance video having an offset. The invention obtains the face with clear front for optimization by detecting the angular deviation of the face.
In the face optimization process, the same face is subjected to image matting for multiple times within a certain time, and is respectively graded again, and the face picture with the highest total grade is compared with the face library, so that the accuracy and the breadth of face recognition are effectively improved. EXAMPLE III
The device adopting the video-based face optimization method in the embodiment of the invention comprises the following steps: the acquisition module is used for acquiring a visible code stream and an infrared code stream from a compressed video; the face detection module is used for respectively carrying out face tracking detection on the video frame of the visible code stream and the video frame of the infrared code stream and is connected with the acquisition module; the face scoring module is used for scoring the face picture according to the definition and the noise coefficient of the face picture and is connected with the face detection module; the face feature extraction module is used for extracting face features contained in the face picture and is connected with the face scoring module; the storage module is used for storing the face detection data and the face features and is connected with the face detection module and the face feature extraction module; and the face recognition module is used for comparing the face features extracted from the face picture with the face features stored in the face database, and is connected with the face feature extraction module and the storage module.
In the process of carrying out real-time face tracking and identification on the monitoring video, the monitoring video is a video shot by using an infrared binocular camera. And acquiring a frame of visible code stream and an infrared code stream, wherein the visible code stream is a real-time code stream for detection, the resolution is 720P, the frame rate is 15 frames, and the visible code stream is a collected yuv code stream which is not subjected to encoding processing. The infrared code stream is a real-time infrared code stream for binocular living body detection, and the resolution and the frame rate of the infrared code stream are the same as those of the visible code stream.
Respectively carrying out face tracking detection on the video frames of the visible code stream and the video frames of the infrared code stream to obtain all face minimaps contained in the video frames, obtaining the size of each face minimap, and filtering out the face pictures with the sizes smaller than a set threshold value. Meanwhile, edge faces are filtered, wherein the edge faces refer to faces at the edge of a video frame, when a video is shot, because a camera can only shoot a video within a certain range, if people exist at the shooting boundary, edge faces can be formed, and the edge faces can only contain part of face information. For example, a person walking outside the shooting boundary into the shooting boundary may have only a chin on one frame of picture in the video and a mouth appears on the next frame of picture.
Matching all the face miniatures detected from the video frames of the visible code stream and the infrared code stream, judging whether the detected face miniatures are living bodies or not by adopting a binocular living body detection algorithm, and discarding the face information to obtain all the face miniatures which are the living bodies if the detected face is judged not to be the living bodies.
Face tracking IDs are created from the total number of people included in the video, each face tracking ID corresponding to a person. And creating a main chain table, wherein the main chain table is used for storing the corresponding relation between the face tracking ID and all the faces of the corresponding person in the video. For example, the number of video frames is 100, if a person appears in the video all the way, the number of face pictures of the person in the video should be 100, and each frame has one face picture; if a person appears only in a certain segment of the video and does not appear in the rest of the video, the number of facial pictures of the person should be less than 100. The face tracking ID is used as a key of the main linked list, the face tracking ID corresponds to a sub-link list, the sub-link list is used for storing detected face picture data and a frame timestamp, and the frame timestamp is used as the key of the sub-link list.
The method comprises the steps of obtaining a plurality of feature points of the face such as the periphery of the human eye, the nose and the mouth from a face small image, scoring the face small image according to the definition and the noise factor of the face small image, storing the face small image with the score exceeding a preset scoring threshold value into a main chain table, wherein a second parameter of the main chain table is a sublist, a time stamp and face detection data of a video frame are stored in the sublist, and the face detection data comprise position information of the face small image in the video frame of a compressed video.
And selecting a face tracking ID, circularly reading a face small image from the corresponding sub-chain table, and performing cutout from the corresponding position in the original video according to the face position information to obtain a face big image. Because the resolution ratios of the compressed video and the original video are different, the position proportion of the face small image in the video frame of the compressed video is calculated firstly, and then the matting is carried out from the corresponding position of the video frame of the original video according to the calculated position proportion and the resolution ratio of the original video to obtain the face large image. For example, in this embodiment, the resolution of the compressed video is 720P, the size of the video frame is 1280 × 720, the vertex at the lower left corner of the video frame is taken as the origin of coordinates, the width of the video frame is taken as the X axis, the height of the video frame is taken as the Y axis, and the video frame is located in the first quadrant of the coordinate system. After face detection is performed, a rectangular face frame surrounding a face is obtained, the vertex coordinates of the lower left corner of the face frame are (128, 72), the vertex coordinates of the upper right corner of the face frame are (256, 144), the resolution of the original video is 1080P, and the size of the video frame is 1920 × 1080, so that the vertex coordinates of the lower left corner of the face frame in the original video are (192, 108), and the vertex coordinates of the upper right corner of the face frame in the original video are (384, 216).
And carrying out face detection on the face map to obtain all faces contained in the face map, filtering out faces with the size smaller than a preset size threshold value, and simultaneously filtering out edge faces.
Detecting a plurality of facial feature points such as the periphery, nose and mouth of a human face in the human face map, scoring the human face map according to the definition and the noise coefficient, if the score of the human face map is larger than the highest score, detecting the angle deviation of the human face, calculating the ambiguity score, and acquiring a human face picture with clear front, wherein the highest score is the highest score in the scores of all human face maps corresponding to the currently selected human face tracking ID.
And acquiring the face features contained in the face map, wherein the face features at least comprise skin color information, hair information and gender information, and the face features can also comprise other information. And comparing the acquired face features with a face database, finding out the face and confirming the identity.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments.
The above embodiments are only preferred embodiments of the present invention, and not intended to limit the present invention in any way, and although the present invention has been disclosed by the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make various changes and modifications to the equivalent embodiments by using the technical contents disclosed above without departing from the technical scope of the present invention, and the embodiments in the above embodiments can be further combined or replaced, but any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention still fall within the technical scope of the present invention.