[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2019114036A1 - 人脸检测方法及装置、计算机装置和计算机可读存储介质 - Google Patents

人脸检测方法及装置、计算机装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2019114036A1
WO2019114036A1 PCT/CN2017/119043 CN2017119043W WO2019114036A1 WO 2019114036 A1 WO2019114036 A1 WO 2019114036A1 CN 2017119043 W CN2017119043 W CN 2017119043W WO 2019114036 A1 WO2019114036 A1 WO 2019114036A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
face
frame
shoulder
head
Prior art date
Application number
PCT/CN2017/119043
Other languages
English (en)
French (fr)
Inventor
张兆丰
牟永强
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2019114036A1 publication Critical patent/WO2019114036A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present invention relates to the field of computer vision technology, and in particular, to a face detection method and apparatus, a computer apparatus, and a computer readable storage medium.
  • Pedestrians capture commonly used methods such as face detection, head-shoulder detection, and pedestrian detection. Because face features are obvious and stable, face detection is the highest detection rate among the three detection methods, and the lowest detection rate. However, the actual application scene is more complicated, the change of the face angle (upward head, bow, side face), the change of illumination (backlight, shadow), occlusion (sunglasses, masks, hats), etc. will reduce the detection rate of the face.
  • the head-shoulder test detects the head and shoulders as a whole. Because the head and shoulders are not as distinctive and unique as the face features, the detection effect is slightly worse than the face.
  • head-to-shoulder detection generally uses edge features (HOG) or texture features (LBP), which are more complex and computationally time consuming.
  • HOG edge features
  • LBP texture features
  • Pedestrian detection generally requires the detection of the whole body. Pedestrians must all appear in the picture to be detected, but the actual scene is often not satisfied.
  • a first aspect of the present application provides a face detection method, the method comprising:
  • the detection frame is classified to obtain multiple candidate face frames
  • the number of layers of the image pyramid is determined by the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the person
  • n octave represents the number of layers of the image between each double size in the image pyramid.
  • the collecting the aggregate channel features of each layer image of the image pyramid includes:
  • the face detection model and the head-shoulder detection model are classifiers formed by cascading a plurality of decision trees.
  • the method further includes: acquiring a training sample of the head-shoulder detection model, and the specific method is as follows:
  • the trained face detection model is reduced by several decision trees to obtain a new face detection model
  • the trained face detection model and the new face detection model are detected on a preset image, and the new face detection model detects more faces than the trained face Detecting the face detected by the model;
  • the position of the face frame in the preset image is marked, and the face frame is extended to obtain the head-shoulder a frame, marking a position of the head-shoulder frame in the preset image;
  • the shoulder frame image scales the intercepted non-head-shoulder frame image to a predetermined size as a negative sample for training the head-shoulder detection model.
  • a second aspect of the present application provides a face detecting device, the device comprising:
  • a construction unit for constructing an image pyramid for the image to be detected
  • An extracting unit configured to extract an aggregate channel feature of each layer image of the image pyramid, to obtain a feature pyramid of the image to be detected
  • a first detecting unit configured to slide on each layer image of the image pyramid by using a first sliding window according to a first preset step, to obtain a plurality of first detecting frames, and using the trained face detecting model according to the The feature pyramid classifies the first detection frame to obtain a plurality of candidate face frames;
  • a first merging unit configured to merge the candidate face frames to obtain a merged candidate face frame
  • a second detecting unit configured to slide on each layer of the image pyramid according to a second preset step by using a second sliding window, to obtain a plurality of second detecting frames, and using the trained head-shoulder detecting model according to the The feature pyramid classifies the second detection frame to obtain a plurality of candidate head-shoulder frames;
  • a second merging unit configured to merge the candidate head-shoulder frames to obtain a combined candidate head-shoulder frame
  • a prediction unit configured to predict a face from the merged candidate head-shoulder frame by using the trained face frame prediction model to obtain a predicted face frame
  • a third merging unit configured to merge the merged candidate face frame and the predicted face frame to obtain a target face frame.
  • the constructing unit determines the number of layers of the image pyramid according to the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the person
  • n octave represents the number of layers of the image between each double size in the image pyramid.
  • the face detection model and the head-shoulder detection model are classifiers formed by cascading a plurality of decision trees.
  • a third aspect of the present application provides a computer apparatus including a processor that implements the face detection method when the processor is configured to execute a computer program stored in a memory.
  • a fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program that implements the face detection method when executed by a processor.
  • the invention constructs an image pyramid of the image to be detected; extracts an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected; and uses the first sliding window according to the first preset step size in the image pyramid Sliding on each layer of the image, obtaining a plurality of first detection frames, and classifying the first detection frame according to the feature pyramid by using the trained face detection model to obtain a plurality of candidate face frames;
  • the face frames are merged to obtain a merged candidate face frame; and the second sliding window is used to slide on each layer of the image pyramid according to the second preset step to obtain a plurality of second detection frames, which are trained.
  • the head-shoulder detection model classifies the second detection frame according to the feature pyramid to obtain a plurality of candidate head-shoulder frames; and combines the candidate head-shoulder frames to obtain a combined candidate head-shoulder frame; Using the trained face frame prediction model to predict a face from the merged candidate head-shoulder frame to obtain a predicted face frame; the merged candidate face frame and the Measuring face framing combined to give the target face framing.
  • the normal face detection (that is, face detection by the face detection model) has a high detection rate and a low false detection rate.
  • the present invention uses the usual face detection as a main detection scheme.
  • the usual face detection is sensitive to changes in angle (upward head, low head, side face), changes in illumination (backlighting, shadowing), occlusion (sunglasses, masks, caps), etc., and is prone to missed detection.
  • the present invention employs head-shoulder detection as an auxiliary detection scheme, and after detecting the head-shoulder area, the face frame is extracted. Finally, the face faces obtained by the usual face detection and head-shoulder detection are combined to form the final face frame output.
  • the present invention combines face detection and head-shoulder detection to improve the face detection rate.
  • the present invention adopts the same feature (ie, aggregate channel feature) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the present invention can realize face detection with fast high detection rate.
  • FIG. 1 is a flowchart of a face detection method according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic diagram of a face frame prediction model as a convolutional neural network.
  • FIG. 3 is a structural diagram of a face detecting apparatus according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of a computer device according to Embodiment 3 of the present invention.
  • the face detection method of the present invention is applied in one or more computer devices.
  • the computer device is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor and an application specific integrated circuit (ASIC). , Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded devices etc.
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device.
  • FIG. 1 is a flowchart of a face detection method according to Embodiment 1 of the present invention.
  • the face detection method is applied to a computer device.
  • the face detection method can be applied to various video surveillance scenarios, such as intelligent transportation, access control systems, urban security and security.
  • intelligent transportation the present invention can be used to perform face detection on a pedestrian or a driver.
  • the present invention detects a face region from an image to be detected for face-based processing such as face recognition, expression analysis, and the like.
  • face-based processing such as face recognition, expression analysis, and the like.
  • the surveillance image captured by the camera near the zebra crossing on the road is the image to be detected, and the present invention detects the face region from the monitoring image for pedestrian recognition.
  • the face detection method specifically includes the following steps:
  • the image to be detected is an image containing a human face, usually a surveillance image.
  • the image to be detected may include one face or multiple faces.
  • the image to be detected may be an image received from the outside, an image taken by the computer device, an image read from a memory of the computer device, or the like.
  • the image to be detected may be a grayscale image or a color image such as an RGB image, an LUV image, or an HSV image.
  • the image pyramid to be detected is a different scale of the image to be detected (can be enlarged or reduced) to obtain a scaled image of different sizes, and the image to be detected and its scaled image constitute an image pyramid of the image to be detected.
  • the image to be detected is scaled by 75% to obtain a first scaled image
  • the image to be detected is scaled by 50% to obtain a second scaled image
  • the image to be detected is scaled by 25% to obtain a third scaled image, the image to be detected and the first scaled image
  • the second zoom The image, the third zoom image constitutes an image pyramid.
  • the number of layers of the image pyramid of the image to be detected may be determined according to the size of the image to be detected and the size of the face detection model (see step 103) used in the present invention (ie, the size of the input image received by the face detection model).
  • the number of layers of the image pyramid of the image to be detected can be determined by the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected (ie, the multiple of the image to be detected is enlarged)
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the width and height of the face detection model (ie, the width and height of the input image received by the face detection model)
  • n octave represents the number of layers of the image between every two sizes in the image pyramid.
  • the width and height of the image to be detected are known, and the width and height of the face detection model are also known.
  • k up can be set by the user as needed, or the system default (for example, the default is 2).
  • the n octave can be set by the user as needed, or the system default (for example, the default is 8).
  • the polymeric channel features can include color features, gradient magnitude features, and gradient direction histogram features.
  • the color features may include RGB color features, LUV color features, HSV color features, grayscale features, and the like.
  • the color feature can be obtained directly from the image to be detected. For example, if the image to be detected is an RGB image, the RGB color feature can be directly obtained; if the image to be detected is an LUV image, the LUV color feature can be directly obtained; if the image to be detected is an HSV image, the HSV color feature can be directly obtained; The image is a grayscale image, and the grayscale feature can be directly obtained.
  • the image to be detected may be converted to obtain the color feature.
  • the image to be detected is an RGB image
  • the RGB image may be converted into a grayscale image (ie, a corresponding RGB value is calculated according to the gray value of each pixel) to obtain a grayscale feature of the image to be detected.
  • Gradient has a variety of calculation methods, such as using Sobel, Prewitt or Roberts operators to calculate the gradient of each pixel (including horizontal gradient values and vertical gradient values).
  • the gradient magnitude and gradient direction of each pixel point are determined according to the gradient of each pixel point.
  • the gradient magnitude of each pixel of the image is the gradient magnitude feature of the image.
  • the gradient direction histogram of the image that is, the gradient direction histogram feature of the image
  • the image may be divided into a plurality of equal-sized blocks (for example, 4 ⁇ 4 blocks), and the gradient direction histograms of the respective blocks are respectively obtained, and the image is obtained according to the gradient direction histogram of each block. Gradient direction histogram.
  • the gradient direction histogram of each block can be calculated as follows: according to the gradient direction of each pixel in the block, each pixel in the block is divided into a plurality of different angular ranges (for example, 6 angular ranges); statistical area The gradient amplitude of the pixel points in each angular range of the block obtains the gradient magnitude of each angular range in the block; and the gradient direction histogram of each block is obtained according to the gradient amplitude of each angular range in the block.
  • an aggregate channel feature (referred to as a real feature) of a partial image (referred to as a real feature layer) in the image pyramid, and other images in the image pyramid (referred to as an approximation) may be calculated.
  • the aggregate channel feature of the feature layer is obtained by real feature interpolation, for example, by real feature interpolation corresponding to the real feature layer closest to it.
  • the real feature layer in the image pyramid can be specified by the user as needed, or it can be system default.
  • s represents the ratio of the approximate feature layer to the real feature layer.
  • ⁇ ⁇ is constant for one feature, and the value of ⁇ ⁇ can be estimated in the following manner. Estimating, be replaced by a k ⁇ s k s, among them, Indicates that the image I i is scaled by the ratio s, f ⁇ (I) represents the feature ⁇ for the image I, and the features are averaged, and N represents the number of images participating in the estimation. In a specific embodiment, the value of s is N is taken to be 50000, and ⁇ ⁇ is obtained by the least squares method.
  • the 103 Swipe on each layer image of the image pyramid according to a first preset step by using a first sliding window to obtain a plurality of first detection frames, and use the trained face detection model according to the feature pyramid pair
  • the first detection frame is classified to obtain a plurality of candidate face frames.
  • the candidate face frame is a first detection frame classified as a face.
  • the size of the first sliding window is equal to the size of the input image received by the face detection model.
  • the size of the first sliding window is 32 ⁇ 32
  • the first preset step size is 2 (ie, 2 pixels).
  • the first sliding window and the first predetermined step size may be other sizes.
  • the first sliding window slides on the image of each layer of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a first detection frame, and the trained face detection model is used.
  • the first detection frame is classified to determine whether the first detection frame is a candidate face frame.
  • the face detection model may be a classifier formed by cascading a plurality of (for example, 512) decision trees, that is, a strong classifier formed by cascading a plurality of weak classifiers.
  • a decision tree also known as a decision tree, is a tree structure that is applied to classification. Each internal node in the decision tree represents a test of an attribute, with each edge representing a test result, a leaf node representing the distribution of a class or class, and the top node being the root node.
  • the decision tree constituting the face detection model may have a depth of 8 or other values.
  • the face detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • the training samples required to train the face detection model include positive samples and negative samples.
  • the positive sample of the trained face detection model is a face frame image
  • the negative sample is a non-face frame image.
  • the face frame image may be intercepted from the monitoring image, and the captured face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a positive sample of the training face detection model;
  • the non-face frame image is intercepted, and the intercepted non-face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a negative sample of the training face detection model.
  • the intercepted non-face frame image is an image taken from an image area outside the area where the face frame is located.
  • the training of the face detection model can refer to the prior art, and details are not described herein again.
  • Merging the candidate face frame is to de-weight the candidate face frame.
  • the merged candidate face frame can be one or more. If the image to be detected includes a face, a merged candidate face frame may be obtained; if the image to be detected includes multiple faces, a merged candidate face frame may be obtained for each face.
  • Candidate face frames can be merged by non-maximum suppression (NMS) algorithm, that is, according to the probability that the candidate face frame belongs to the face and the overlapping area ratio of the candidate face frame (Intersection over Union, IOU) ) Merging candidate face frames.
  • NMS non-maximum suppression
  • the merging of the candidate face frames by the NMS algorithm may include: sorting all candidate face frames according to the probability of belonging to the face; selecting the candidate face frame with the highest probability, and separately determining other candidates Whether the ratio of the overlapping area of the face frame to the selected candidate face frame is greater than a first preset threshold (for example, 0.25); if the overlapping area ratio is greater than the first preset threshold, deleting the other candidate face frame, and selecting the selected
  • the candidate face frame is used as the merged candidate face frame; the candidate face frame with the highest probability is selected from the remaining candidate face frames, and the above process is repeated until all the merged candidate face frames are obtained.
  • the remaining candidate face frame refers to the candidate face frame that is removed from the deleted candidate face frame and the merged candidate face frame.
  • candidate face frames which are ranked as A, B, C, D, E, and F according to the probability of belonging to the face.
  • Select the candidate face frame F with the highest probability to determine whether the ratio of the overlapping areas of A to E and F is greater than the first preset threshold.
  • B and D are deleted, and the flag F is the first merged candidate face frame obtained.
  • From the remaining candidate face frames A, C, and E select the face E of the candidate with the highest probability, and determine whether the ratio of the overlapping area of A, C, and E is greater than the first preset threshold.
  • the merged candidate face frames F and E are obtained by the NMS algorithm.
  • Step 105 Swipe on each layer of the image pyramid according to a second preset step by using a second sliding window to obtain a plurality of second detection frames, and using the trained head-shoulder detection model according to the feature pyramid pair
  • the second detection frame is classified to obtain a plurality of candidate head-shoulder frames.
  • the candidate head-shoulder frame is a second detection frame classified into a head-shoulder frame.
  • the size of the second sliding window is equal to the size of the input image received by the head-shoulder detection model.
  • the size of the second sliding window may be 64 ⁇ 64, and the second preset step size may be 2.
  • the second sliding window and the second predetermined step size may be other sizes.
  • the second preset step size may be equal to the first preset step size.
  • the second preset step size may also be not equal to the first preset step size.
  • the first preset step size is 2, and the second preset step size is 4.
  • the second sliding window slides on each layer image of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a second detection frame, and the trained face detection model is used.
  • the second detection frame is classified to determine whether the second detection frame is a candidate head-shoulder frame.
  • the head-shoulder detection model may be a classifier formed by cascading a plurality (eg, 512) of decision trees.
  • the number of decision trees included in the head-shoulder detection model may be the same as or different from the number of decision trees included in the face detection model.
  • the decision tree constituting the head-shoulder detection model may have a depth of 8, or may be other values.
  • a training sample of the head-shoulder detection model can be obtained from the trained face detection model.
  • the trained face detection model cascaded by the decision tree can be reduced by several decision trees to obtain a new face detection model.
  • the trained face detection model and the new face detection model detect the face on the monitoring image, and the new face detection model detects more faces than the face detected by the trained face detection model.
  • the position of the face frame in the monitoring image is marked, and the face frame is extended to obtain the head-shoulder frame, and the position of the head-shoulder frame in the monitoring image is marked.
  • the position of the head-shoulder frame is marked as [x', y', w', h'], x', y' represents the coordinates of the top left corner of the head-shoulder frame, and w' represents the width of the head-shoulder frame. h' indicates the height of the head-shoulder frame.
  • the head-shoulder frame image may be intercepted from the surveillance image, and the intercepted head-shoulder frame image is scaled to a second predetermined size (eg, 64 ⁇ 64) as a positive sample of the training head-shoulder detection model; the non-head is intercepted from the surveillance image a shoulder frame image that scales the intercepted non-head-shoulder frame image to a second predetermined size as a negative sample of the training head-shoulder detection model.
  • the intercepted non-head-shoulder frame image is an image taken from an image area outside the area where the head-shoulder frame is located.
  • the training sample required by the head-shoulder detection model can be conveniently obtained by the trained face detection model, and the obtained training samples are obtained from the monitoring image, and thus are more in line with the actual monitoring scene.
  • the head-shoulder detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • Existing head-shoulder detection generally uses edge features (HOG) or texture features (LBP), which are more complex and computationally time consuming.
  • the invention performs head-shoulder detection according to the feature pyramid of the image to be detected, does not need to perform additional feature extraction, saves the time of feature extraction in the head-shoulder detection process, accelerates the speed of head-shoulder detection, thereby improving the invention The efficiency of the face detection method.
  • Merging the candidate face frame is to de-weight the candidate head-shoulder frame.
  • the combined candidate head-shoulder frame may be one or more. If the image to be detected includes a head-shoulder, a merged candidate head-shoulder frame can be obtained; if the image to be detected includes a plurality of head-shoulders, a merged candidate head-shoulder frame can be obtained for each head-shoulder. .
  • the candidate head-shoulder frames may be merged by a non-maximum suppression algorithm, that is, the candidate head-shoulder frames are merged according to the probability that the candidate head-shoulder frame belongs to the head-shoulder and the candidate head-shoulder frame overlap area ratio.
  • combining the candidate head-shoulder frames by the non-maximum suppression algorithm may include: sorting all candidate head-shoulder frames according to the probability of belonging to the head-shoulder; selecting the candidate head with the highest probability-shoulder a frame, respectively, determining whether a ratio of overlapping area of the other candidate head-shoulder frame and the selected candidate head-shoulder frame is greater than a second preset threshold (eg, 0.30); if the overlapping area ratio is greater than a second preset threshold, deleting the other Candidate head-shoulder frame, and select the candidate head-shoulder frame as the merged candidate head-shoulder frame; select the highest probability candidate head-shoulder frame from the remaining candidate head-shoulder frame, and repeat the above process until Get all the merged candidate head-shoulder frames.
  • the remaining candidate head-shoulder frame refers to the candidate head-shoulder frame remaining except the deleted candidate head-shoulder frame and the merged candidate head-shoulder frame.
  • candidate head-shoulder frames which are ranked as A', B', C', D', E', F' according to the probability of belonging to the head-shoulder.
  • the candidate head with the highest probability, the shoulder frame F' is selected to determine whether the ratio of the overlapping area of A' ⁇ E' and F' is greater than the second predetermined threshold. Assuming that the overlapping area ratio of B', D', and F exceeds the second predetermined threshold, B', D' are deleted, and the flag F' is the first merged candidate head-shoulder frame obtained.
  • 107 predicting a face from the merged candidate head-shoulder frame by using the trained face frame prediction model to obtain a predicted face frame.
  • the face frame prediction model may be a convolutional neural network.
  • the face frame prediction model may be the convolutional neural network shown in FIG. 2, and the convolutional neural network includes two 3 ⁇ 3 convolution layers, one 2 ⁇ 2 convolution layer, and one full connection layer, the first two.
  • the convolutional layer uses the maximum pooling of 3X3.
  • the goal of regression is the position of the face frame [x, y, w, h].
  • the head-shoulder detection face frame prediction model for example, convolutional neural network
  • the merged candidate face frame and the predicted face frame may be merged by a non-maximum suppression algorithm, that is, according to the combined candidate face frame and the predicted face frame belonging to the head-shoulder probability and the merged candidate The ratio of the overlapping area of the face frame and the predicted face frame merges the candidate head-shoulder frames.
  • combining the merged candidate face frame and the predicted face frame by the non-maximum suppression algorithm may include: including all the merged candidate face frames and the predicted face frame according to the face.
  • the probability of sorting is from high to low; the face frame with the highest probability of selection (which may be the merged candidate face frame or the predicted face frame) is used to determine whether the ratio of the overlapping area of the other face frames to the selected face frame is respectively If the ratio of the overlap area is greater than the third preset threshold, the other face frame is deleted, and the selected face frame is used as the target face frame; from the remaining face frame Select the face box with the highest probability and repeat the above process until all target face frames are obtained.
  • the remaining face frame refers to the face frame left by the deleted face frame and the target face frame.
  • the first preset threshold, the second preset threshold, and the third preset threshold may be the same or different.
  • the face detection method of the first embodiment detects an image pyramid of the image to be detected; extracts an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected; and uses the first sliding window according to the first preset step Sliding on each layer image of the image pyramid to obtain a plurality of first detection frames, and using the trained face detection model to classify the first detection frame according to the feature pyramid to obtain a plurality of candidate faces Blocking the candidate face frames to obtain a merged candidate face frame; using a second sliding window to slide on each layer of the image pyramid according to a second preset step to obtain a plurality of second detections Blocking, by using the trained head-shoulder detection model, classifying the second detection frame according to the feature pyramid to obtain a plurality of candidate head-shoulder frames; and combining the candidate head-shoulder frames to obtain a combined Candidate head-shoulder frame; predicting a face from the merged candidate head-shoulder frame using the trained face frame prediction model to obtain a predicted face frame;
  • the normal face detection (that is, the face detection by the face detection model) has a high detection rate and a low false detection rate.
  • the face detection method of the first embodiment uses the usual face detection as the main detection scheme. However, the usual face detection is sensitive to changes in angle (upward head, low head, side face), changes in illumination (backlighting, shadow), occlusion (sunglasses, masks, hats), etc., and is prone to missed detection.
  • the face detection method of the first embodiment adopts the head-shoulder detection as an auxiliary detection scheme, and after detecting the head-shoulder area, the face frame is extracted. Finally, the face faces obtained by the usual face detection and head-shoulder detection are combined to form the final face frame output.
  • the face detection method of the first embodiment uses a combination of face detection and head-shoulder detection to improve the face detection rate. Meanwhile, the face detection method of the first embodiment adopts the same feature (ie, the aggregate channel feature, that is, the feature pyramid) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the face detection method of the first embodiment can realize face detection with fast high detection rate.
  • the face detection method of the first embodiment uses a combination of face detection and head-shoulder detection to improve the face detection rate.
  • the face detection method of the first embodiment adopts the same feature (ie, the aggregate channel feature, that is, the feature pyramid) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the face detection method of the first embodiment can realize face detection with fast high detection rate.
  • FIG. 3 is a structural diagram of a face detecting apparatus according to Embodiment 2 of the present invention.
  • the face detecting apparatus 10 may include: a construction unit 301, an extraction unit 302, a first detection unit 303, a first merging unit 304, a second detection unit 305, a second merging unit 306, and a prediction unit. 307.
  • the construction unit 301 is configured to construct an image pyramid for the image to be detected.
  • the image to be detected is an image containing a human face, usually a surveillance image.
  • the image to be detected may include one face or multiple faces.
  • the image to be detected may be a grayscale image or a color image such as an RGB image, an LUV image, or an HSV image.
  • the color feature can be obtained directly from the image to be detected. For example, if the image to be detected is an RGB image, the RGB color feature can be directly obtained; if the image to be detected is an LUV image, the LUV color feature can be directly obtained; if the image to be detected is an HSV image, the HSV color feature can be directly obtained; The image is a grayscale image, and the grayscale feature can be directly obtained.
  • the image to be detected may be converted to obtain the color feature.
  • the image to be detected is an RGB image
  • the RGB image may be converted into a grayscale image (ie, a corresponding RGB value is calculated according to the gray value of each pixel) to obtain a grayscale feature of the image to be detected.
  • the image pyramid to be detected is a different scale of the image to be detected (can be enlarged or reduced) to obtain a scaled image of different sizes, and the image to be detected and its scaled image constitute an image pyramid of the image to be detected.
  • the image to be detected is scaled by 75% to obtain a first scaled image
  • the image to be detected is scaled by 50% to obtain a second scaled image
  • the image to be detected is scaled by 25% to obtain a third scaled image, the image to be detected and the first scaled image
  • the second zoom The image, the third zoom image constitutes an image pyramid.
  • the number of layers of the image pyramid of the image to be detected may be determined according to the size of the image to be detected and the size of the face detection model (see step 103) used in the present invention (ie, the size of the input image received by the face detection model).
  • the number of layers of the image pyramid of the image to be detected can be determined by the following formula:
  • n represents the number of layers of the image pyramid of the image to be detected
  • k up represents the multiple of the image to be detected (ie, the multiple of the image to be detected is enlarged)
  • w img and h img respectively represent the width and height of the image to be detected
  • w m and h m respectively represent the width and height of the face detection model (ie, the width and height of the input image received by the face detection model)
  • n octave represents the number of layers of the image between every two sizes in the image pyramid.
  • the width and height of the image to be detected are known, and the width and height of the face detection model are also known.
  • k up can be set by the user as needed, or the system default (for example, the default is 2).
  • the n octave can be set by the user as needed, or the system default (for example, the default is 8).
  • the extracting unit 302 is configured to extract an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected.
  • the polymeric channel features can include color features, gradient magnitude features, and gradient direction histogram features.
  • the color features may include RGB color features, LUV color features, HSV color features, grayscale features, and the like.
  • Gradient has a variety of calculation methods, such as using Sobel, Prewitt or Roberts operators to calculate the gradient of each pixel (including horizontal gradient values and vertical gradient values).
  • the gradient magnitude and gradient direction of each pixel point are determined according to the gradient of each pixel point.
  • the gradient magnitude of each pixel of the image is the gradient magnitude feature of the image.
  • the gradient direction histogram of the image that is, the gradient direction histogram feature of the image
  • the image may be divided into a plurality of equal-sized blocks (for example, 4 ⁇ 4 blocks), and the gradient direction histograms of the respective blocks are respectively obtained, and the image is obtained according to the gradient direction histogram of each block. Gradient direction histogram.
  • the gradient direction histogram of each block can be calculated as follows: according to the gradient direction of each pixel in the block, each pixel in the block is divided into a plurality of different angular ranges (for example, 6 angular ranges); statistical area The gradient amplitude of the pixel points in each angular range of the block obtains the gradient magnitude of each angular range in the block; and the gradient direction histogram of each block is obtained according to the gradient amplitude of each angular range in the block.
  • the gradient direction histogram of the image can be obtained from the gradient direction histogram of each block in the image.
  • the gradient direction histogram vectors of the respective blocks in the image may be connected in series to form a gradient direction histogram series vector, and the gradient direction histogram series vector is the gradient direction histogram feature of the image.
  • an aggregate channel feature (referred to as a real feature) of a partial image (referred to as a real feature layer) in the image pyramid, and other images in the image pyramid (referred to as an approximation) may be calculated.
  • the aggregate channel feature of the feature layer is obtained by real feature interpolation, for example, by real feature interpolation corresponding to the real feature layer closest to it.
  • the real feature layer in the image pyramid can be specified by the user as needed, or it can be system default.
  • s represents the ratio of the approximate feature layer to the real feature layer.
  • ⁇ ⁇ is constant for one feature, and the value of ⁇ ⁇ can be estimated in the following manner. Estimating, be replaced by a k ⁇ s k s, among them, Indicates that the image I i is scaled by the ratio s, f ⁇ (I) represents the feature ⁇ for the image I, and the features are averaged, and N represents the number of images participating in the estimation. In a specific embodiment, the value of s is N is taken to be 50000, and ⁇ ⁇ is obtained by the least squares method.
  • the first detecting unit 303 is configured to slide on each layer image of the image pyramid according to the first preset step by using the first sliding window to obtain a plurality of first detection frames, and use the trained face detection model according to the The feature pyramid classifies the first detection frame to obtain a plurality of candidate face frames.
  • the candidate face frame is a first detection frame classified as a face.
  • the size of the first sliding window is equal to the size of the input image received by the face detection model.
  • the size of the first sliding window is 32 ⁇ 32
  • the first preset step size is 2 (ie, 2 pixels).
  • the first sliding window and the first predetermined step size may be other sizes.
  • the first sliding window slides on the image of each layer of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a first detection frame, and the trained face detection model is used.
  • the first detection frame is classified to determine whether the first detection frame is a candidate face frame.
  • the face detection model may be a classifier formed by cascading a plurality of (for example, 512) decision trees, that is, a strong classifier formed by cascading a plurality of weak classifiers.
  • a decision tree also known as a decision tree, is a tree structure that is applied to classification. Each internal node in the decision tree represents a test of an attribute, with each edge representing a test result, a leaf node representing the distribution of a class or class, and the top node being the root node.
  • the decision tree constituting the face detection model may have a depth of 8 or other values.
  • the face detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • the training samples required to train the face detection model include positive samples and negative samples.
  • the positive sample of the trained face detection model is a face frame image
  • the negative sample is a non-face frame image.
  • the face frame image may be intercepted from the monitoring image, and the captured face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a positive sample of the training face detection model;
  • the non-face frame image is intercepted, and the intercepted non-face frame image is scaled to a first predetermined size (for example, 32 ⁇ 32) as a negative sample of the training face detection model.
  • the intercepted non-face frame image is an image taken from an image area outside the area where the face frame is located.
  • the training of the face detection model can refer to the prior art, and details are not described herein again.
  • the first merging unit 304 is configured to merge the candidate face frames to obtain a merged candidate face frame.
  • Merging the candidate face frame is to de-weight the candidate face frame.
  • the merged candidate face frame can be one or more. If the image to be detected includes a face, a merged candidate face frame may be obtained; if the image to be detected includes multiple faces, a merged candidate face frame may be obtained for each face.
  • Candidate face frames can be merged by non-maximum suppression (NMS) algorithm, that is, according to the probability that the candidate face frame belongs to the face and the overlapping area ratio of the candidate face frame (Intersection over Union, IOU) ) Merging candidate face frames.
  • NMS non-maximum suppression
  • the merging of the candidate face frames by the NMS algorithm may include: sorting all candidate face frames according to the probability of belonging to the face; selecting the candidate face frame with the highest probability, and separately determining other candidates Whether the ratio of the overlapping area of the face frame to the selected candidate face frame is greater than a first preset threshold (for example, 0.25); if the overlapping area ratio is greater than the first preset threshold, deleting the other candidate face frame, and selecting the selected
  • the candidate face frame is used as the merged candidate face frame; the candidate face frame with the highest probability is selected from the remaining candidate face frames, and the above process is repeated until all the merged candidate face frames are obtained.
  • the remaining candidate face frame refers to the candidate face frame that is removed from the deleted candidate face frame and the merged candidate face frame.
  • candidate face frames which are ranked as A, B, C, D, E, and F according to the probability of belonging to the face.
  • Select the candidate face frame F with the highest probability to determine whether the ratio of the overlapping areas of A to E and F is greater than the first preset threshold.
  • B and D are deleted, and the flag F is the first merged candidate face frame obtained.
  • From the remaining candidate face frames A, C, and E select the face E of the candidate with the highest probability, and determine whether the ratio of the overlapping area of A, C, and E is greater than the first preset threshold.
  • the merged candidate face frames F and E are obtained by the NMS algorithm.
  • a second detecting unit 305 configured to slide on each layer of the image pyramid according to a second preset step by using a second sliding window, to obtain a plurality of second detecting frames, and using the trained head-shoulder detecting model according to the
  • the feature pyramid classifies the second detection frame to obtain a plurality of candidate head-shoulder frames.
  • the candidate head-shoulder frame is a second detection frame classified into a head-shoulder frame.
  • the size of the second sliding window is equal to the size of the input image received by the head-shoulder detection model.
  • the size of the second sliding window may be 64 ⁇ 64, and the second preset step size may be 2.
  • the second sliding window and the second predetermined step size may be other sizes.
  • the second preset step size may be equal to the first preset step size.
  • the second preset step size may also be not equal to the first preset step size.
  • the first preset step size is 2, and the second preset step size is 4.
  • the second sliding window slides on each layer image of the image pyramid according to a preset direction (for example, from top to bottom, from left to right), and each position obtains a second detection frame, and the trained face detection model is used.
  • the second detection frame is classified to determine whether the second detection frame is a candidate head-shoulder frame.
  • the head-shoulder detection model may be a classifier formed by cascading a plurality (eg, 512) of decision trees.
  • the number of decision trees included in the head-shoulder detection model may be the same as or different from the number of decision trees included in the face detection model.
  • the decision tree constituting the head-shoulder detection model may have a depth of 8, or may be other values.
  • a training sample of the head-shoulder detection model can be obtained from the trained face detection model.
  • the trained face detection model cascaded by the decision tree can be reduced by several decision trees to obtain a new face detection model.
  • the trained face detection model and the new face detection model detect the face on the monitoring image, and the new face detection model detects more faces than the face detected by the trained face detection model.
  • the position of the face frame in the monitoring image is marked, and the face frame is extended to obtain the head-shoulder frame, and the position of the head-shoulder frame in the monitoring image is marked.
  • the position of the head-shoulder frame is marked as [x', y', w', h'], x', y' represents the coordinates of the top left corner of the head-shoulder frame, and w' represents the width of the head-shoulder frame. h' indicates the height of the head-shoulder frame.
  • the head-shoulder frame image may be intercepted from the surveillance image; the intercepted head-shoulder frame image is scaled to a second predetermined size (eg, 64 ⁇ 64) as a positive sample of the training head-shoulder detection model; the non-head is intercepted from the surveillance image a shoulder frame image that scales the intercepted non-head-shoulder frame image to a second predetermined size as a negative sample of the training head-shoulder detection model.
  • the intercepted non-head-shoulder frame image is an image taken from an image area outside the area where the head-shoulder frame is located.
  • the training sample required by the head-shoulder detection model can be conveniently obtained by the trained face detection model, and the obtained training samples are obtained from the monitoring image, and thus are more in line with the actual monitoring scene.
  • the head-shoulder detection model formed by multiple decision trees can be trained using an adbost method such as the Gentle adboost method.
  • Existing head-shoulder detection generally uses edge features (HOG) or texture features (LBP), which are more complex and computationally time consuming.
  • the invention performs head-shoulder detection according to the feature pyramid of the image to be detected, does not need to perform additional feature extraction, saves the time of feature extraction in the head-shoulder detection process, accelerates the speed of head-shoulder detection, thereby improving the invention The efficiency of the face detection method.
  • the second merging unit 306 is configured to combine the candidate head-shoulder frames to obtain a combined candidate head-shoulder frame.
  • Combining the candidate head-shoulder frames is to de-weight the candidate head-shoulder frame.
  • the combined candidate head-shoulder frame may be one or more. If the image to be detected includes a head-shoulder, a merged candidate head-shoulder frame can be obtained; if the image to be detected includes a plurality of head-shoulders, a merged candidate head-shoulder frame can be obtained for each head-shoulder. .
  • the candidate head-shoulder frames may be merged by a non-maximum suppression algorithm, that is, the candidate head-shoulder frames are merged according to the probability that the candidate head-shoulder frame belongs to the head-shoulder and the candidate head-shoulder frame overlap area ratio.
  • combining the candidate head-shoulder frames by the non-maximum suppression algorithm may include: sorting all candidate head-shoulder frames according to the probability of belonging to the head-shoulder; selecting the candidate head with the highest probability-shoulder a frame, respectively, determining whether a ratio of overlapping area of the other candidate head-shoulder frame and the selected candidate head-shoulder frame is greater than a second preset threshold (eg, 0.30); if the overlapping area ratio is greater than a second preset threshold, deleting the other Candidate head-shoulder frame, and select the candidate head-shoulder frame as the merged candidate head-shoulder frame; select the highest probability candidate head-shoulder frame from the remaining candidate head-shoulder frame, and repeat the above process until Get all the merged candidate head-shoulder frames.
  • the remaining candidate head-shoulder frame refers to the candidate head-shoulder frame remaining except the deleted candidate head-shoulder frame and the merged candidate head-shoulder frame.
  • candidate head-shoulder frames which are ranked as A', B', C', D', E', F' according to the probability of belonging to the head-shoulder.
  • the candidate head with the highest probability, the shoulder frame F' is selected to determine whether the ratio of the overlapping area of A' ⁇ E' and F' is greater than the second predetermined threshold. Assuming that the overlapping area ratio of B', D', and F exceeds the second predetermined threshold, B', D' are deleted, and the flag F' is the first merged candidate head-shoulder frame obtained.
  • the prediction unit 307 is configured to predict a face from the merged candidate head-shoulder frame by using the trained face frame prediction model to obtain a predicted face frame.
  • the face frame prediction model may be a convolutional neural network.
  • the face frame prediction model may be the convolutional neural network shown in FIG. 2, and the convolutional neural network includes two 3 ⁇ 3 convolution layers, one 2 ⁇ 2 convolution layer, and one full connection layer, the first two.
  • the convolutional layer uses the maximum pooling of 3X3.
  • the goal of regression is the position of the face frame [x, y, w, h].
  • the head-shoulder detection face frame prediction model for example, convolutional neural network
  • the third merging unit 308 is configured to combine the merged candidate face frame and the predicted face frame to obtain a target face frame.
  • the merged candidate face frame and the predicted face frame may be merged by a non-maximum suppression algorithm, that is, according to the combined candidate face frame and the predicted face frame belonging to the head-shoulder probability and the merged candidate The ratio of the overlapping area of the face frame and the predicted face frame merges the candidate head-shoulder frames.
  • combining the merged candidate face frame and the predicted face frame by the non-maximum suppression algorithm may include: including all the merged candidate face frames and the predicted face frame according to the face.
  • the probability of sorting is from high to low; the face frame with the highest probability of selection (which may be the merged candidate face frame or the predicted face frame) is used to determine whether the ratio of the overlapping area of the other face frames to the selected face frame is respectively If the ratio of the overlap area is greater than the third preset threshold, the other face frame is deleted, and the selected face frame is used as the target face frame; from the remaining face frame Select the face box with the highest probability and repeat the above process until all target face frames are obtained.
  • the remaining face frame refers to the face frame left by the deleted face frame and the target face frame.
  • the face detecting device of the second embodiment constructs an image pyramid of the image to be detected; extracts an aggregate channel feature of each layer image of the image pyramid to obtain a feature pyramid of the image to be detected; and uses the first sliding window according to the first preset step Sliding on each layer image of the image pyramid to obtain a plurality of first detection frames, and using the trained face detection model to classify the first detection frame according to the feature pyramid to obtain a plurality of candidate faces Blocking the candidate face frames to obtain a merged candidate face frame; using a second sliding window to slide on each layer of the image pyramid according to a second preset step to obtain a plurality of second detections Blocking, by using the trained head-shoulder detection model, classifying the second detection frame according to the feature pyramid to obtain a plurality of candidate head-shoulder frames; and combining the candidate head-shoulder frames to obtain a combined Candidate head-shoulder frame; predicting a face from the merged candidate head-shoulder frame using the trained face frame prediction model to obtain a predicted face frame;
  • the normal face detection (that is, the face detection by the face detection model) has a high detection rate and a low false detection rate.
  • the face detection device of the second embodiment uses the usual face detection as the main detection scheme. However, the usual face detection is sensitive to changes in angle (upward head, low head, side face), changes in illumination (backlighting, shadow), occlusion (sunglasses, masks, hats), etc., and is prone to missed detection.
  • the face detecting device of the second embodiment adopts head-shoulder detection as an auxiliary detection scheme, and after detecting the head-shoulder region, the face frame is extracted. Finally, the face faces obtained by the usual face detection and head-shoulder detection are combined to form the final face frame output.
  • the face detecting device of the second embodiment uses the face detection and the head-shoulder detection in combination to improve the face detection rate. Meanwhile, the face detecting device of the second embodiment adopts the same feature (ie, the aggregate channel feature) in face detection and head-shoulder detection, which reduces the time for feature extraction and speeds up the detection process. Therefore, the face detecting device of the second embodiment can realize face detection with fast high detection rate.
  • FIG. 3 is a schematic diagram of a computer apparatus according to Embodiment 3 of the present invention.
  • the computer device 1 includes a memory 20, a processor 30, and a computer program 40, such as a face detection program, stored in the memory 20 and operable on the processor 30.
  • a computer program 40 such as a face detection program
  • the steps in the embodiment of the face detection method described above are implemented, for example, steps 101-108 shown in FIG.
  • the processor 30 executes the computer program 40
  • the functions of the modules/units in the above device embodiments are implemented, such as the units 301-308 in FIG.
  • the computer program 40 can be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to complete this invention.
  • the one or more modules/units may be a series of computer program instruction segments capable of performing a particular function for describing the execution of the computer program 40 in the computer device 1.
  • the computer program 40 may be divided into the construction unit 301, the extraction unit 302, the first detection unit 303, the first merging unit 304, the second detection unit 305, the second merging unit 306, and the prediction unit 307 in FIG.
  • the third merging unit 308, for the specific functions of each unit, refer to the second embodiment.
  • the computer device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. It will be understood by those skilled in the art that the schematic diagram 4 is merely an example of the computer device 1, and does not constitute a limitation of the computer device 1, and may include more or less components than those illustrated, or may combine some components, or different.
  • the components, such as the computer device 1, may also include input and output devices, network access devices, buses, and the like.
  • the processor 30 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, and the processor 30 is a control center of the computer device 1, and connects the entire computer device 1 by using various interfaces and lines. Various parts.
  • the memory 20 can be used to store the computer program 40 and/or modules/units by running or executing computer programs and/or modules/units stored in the memory 20, and by calling in memory.
  • the data within 20 implements various functions of the computer device 1.
  • the memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be Data (such as audio data, phone book, etc.) created according to the use of the computer device 1 is stored.
  • the memory 20 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • SMC smart memory card
  • SD Secure Digital
  • Card flash card, at least one disk storage device, flash device, or other volatile solid state storage device.
  • the modules/units integrated by the computer device 1 can be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present invention implements all or part of the processes in the foregoing embodiments, and may also be completed by a computer program to instruct related hardware.
  • the computer program may be stored in a computer readable storage medium. The steps of the various method embodiments described above may be implemented when the program is executed by the processor.
  • the computer program comprises computer program code, which may be in the form of source code, object code form, executable file or some intermediate form.
  • the computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM). , random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in a jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, computer readable media Does not include electrical carrier signals and telecommunication signals.
  • the disclosed computer apparatus and method may be implemented in other manners.
  • the computer device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner.
  • each functional unit in each embodiment of the present invention may be integrated in the same processing unit, or each unit may exist physically separately, or two or more units may be integrated in the same unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸检测方法、人脸检测装置、计算机装置及可读存储介质,所述方法包括:对待检测图像构造图像金字塔;提取图像金字塔的各层图像的聚合通道特征,得到待检测图像的特征金字塔;利用第一滑动窗口获得待检测图像的多个第一检测框,对第一检测框进行分类,获得多个候选人脸框;对候选人脸框进行合并;利用第二滑动窗口获得待检测图像的多个第二检测框,对第二检测框进行分类,得到多个候选头-肩框;对候选头-肩框进行合并;从合并后的候选头-肩框中预测人脸,得到预测人脸框;对合并后的候选人脸框和预测人脸框进行合并,得到目标人脸框。本方法可以实现快速高检出率的人脸检测。

Description

人脸检测方法及装置、计算机装置和计算机可读存储介质
本申请要求于2017年12月12日提交中国专利局,申请号为201711319416.X、发明名称为“人脸检测方法及装置、计算机装置和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机视觉技术领域,具体涉及一种人脸检测方法及装置、计算机装置和计算机可读存储介质。
背景技术
行人抓拍常用的方式有人脸检测、头-肩检测、行人检测。因人脸特征明显且稳定,人脸检测是3种检测方法中检出率最高,错检率最低的。但实际应用场景较复杂,人脸角度的变化(仰头,低头,侧脸),光照的变化(逆光,阴影),遮挡(墨镜、口罩、帽子)等都会降低人脸的检出率。头-肩检测是将头和肩部作为一个整体检测出来,因头、肩不如人脸特征明显和独特,检测效果较人脸稍差。另外,头-肩检测一般使用边缘特征(HOG)或纹理特征(LBP),这些特征较复杂,计算耗时。行人检测一般要求检测全身,行人必须全部出现在画面中才能被检出,而实际场景中经常不能满足。
发明内容
鉴于以上内容,有必要提出一种人脸检测方法及装置、计算机装置和计算机可读存储介质,其可以实现快速高检出率的人脸检测。
本申请的第一方面提供一种人脸检测方法,所述方法包括:
对待检测图像构造图像金字塔;
提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;
利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;
对所述候选人脸框进行合并,得到合并后的候选人脸框;
利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;
对所述候选头-肩框进行合并,得到合并后的候选头-肩框;
利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸, 得到预测人脸框;
对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
另一种可能的实现方式中,所述图像金字塔的层数由如下公式确定:
Figure PCTCN2017119043-appb-000001
其中,n表示待检测图像的图像金字塔的层数,k up表示待检测图像上采样的倍数,w img、h img分别表示待检测图像的宽度和高度,w m、h m分别表示所述人脸检测模型接收的输入图像的宽度和高度,n octave表示图像金字塔中每两倍尺寸之间的图像的层数。
另一种可能的实现方式中,所述提取所述图像金字塔的各层图像的聚合通道特征包括:
计算所述图像金字塔中部分图像的聚合通道特征,由所述部分图像的聚合通道特征插值得到所述图像金字塔中其他图像的聚合通道特征。
另一种可能的实现方式中,所述人脸检测模型和所述头-肩检测模型是由多个决策树级联形成的分类器。
另一种可能的实现方式中,所述方法还包括:获取所述头-肩检测模型的训练样本,具体方法如下:
将所述训练好的人脸检测模型减少若干个决策树,得到新的人脸检测模型;
将所述训练好的人脸检测模型和所述新的人脸检测模型在预设图像上检测人脸,所述新的人脸检测模型检出的人脸多于所述训练好的人脸检测模型检出的人脸;
针对所述新的人脸检测模型比所述训练好的人脸检测模型多检出的人脸,标出人脸框在所述预设图像中的位置,扩展该人脸框得到头-肩框,标注所述头-肩框在所述预设图像中的位置;
从所述预设图像中截取头-肩框图像,将截取的头-肩框图像缩放为预定大小作为训练所述头-肩检测模型的正样本,从所述预设图像中截取非头-肩框图像,将截取的非头-肩框图像缩放为预定大小作为训练所述头-肩检测模型的负样本。
本申请的第二方面提供一种人脸检测装置,所述装置包括:
构造单元,用于对待检测图像构造图像金字塔;
提取单元,用于提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;
第一检测单元,用于利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;
第一合并单元,用于对所述候选人脸框进行合并,得到合并后的候选人脸框;
第二检测单元,用于利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;
第二合并单元,用于对所述候选头-肩框进行合并,得到合并后的候选头-肩框;
预测单元,用于利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框;
第三合并单元,用于对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
另一种可能的实现方式中,所述构造单元根据如下公式确定所述图像金字塔的层数:
Figure PCTCN2017119043-appb-000002
其中,n表示待检测图像的图像金字塔的层数,k up表示待检测图像上采样的倍数,w img、h img分别表示待检测图像的宽度和高度,w m、h m分别表示所述人脸检测模型接收的输入图像的宽度和高度,n octave表示图像金字塔中每两倍尺寸之间的图像的层数。
另一种可能的实现方式中,所述人脸检测模型和所述头-肩检测模型是由多个决策树级联形成的分类器。
本申请的第三方面提供一种计算机装置,所述计算机装置包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现所述人脸检测方法。
本申请的第四方面提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述人脸检测方法。
本发明对待检测图像构造图像金字塔;提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;对所述候选人脸框进行合并,得到合并后的候选人脸框;利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;对所述候选头-肩框进行合并,得到合并后的候选头-肩框;利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框;对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
通常的人脸检测(即通过人脸检测模型进行人脸检测)检出率高,误检率较低,本发明以通常的人脸检测作为主要检测方案。但通常的人脸检测对角度的变化(仰头,低头,侧脸),光照的变化(逆光,阴影),遮挡(墨镜、口罩、帽 子)等情况较敏感,容易出现漏检。针对通常的人脸检测的缺陷,本发明采用头-肩检测作为辅助检测方案,检测到头-肩区域后,再提取人脸框。最后,将通常的人脸检测和头-肩检测得到的人脸框合并,形成最终的人脸框输出。因此,本发明联合使用人脸检测与头-肩检测,提高了人脸检出率。同时,本发明在人脸检测和头-肩检测时采用相同的特征(即聚合通道特征),减少了特征提取的时间,加快了检测过程。因此,本发明可以实现快速高检出率的人脸检测。
附图说明
图1是本发明实施例一提供的人脸检测方法的流程图。
图2是人脸框预测模型为卷积神经网络的示意图。
图3是本发明实施例二提供的人脸检测装置的结构图。
图4是本发明实施例三提供的计算机装置的示意图。
具体实施方式
为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施例对本发明进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本发明,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。
优选地,本发明的人脸检测方法应用在一个或者多个计算机装置中。所述计算机装置是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机装置可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机装置可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
实施例一
图1是本发明实施例一提供的人脸检测方法的流程图。所述人脸检测方法应用于计算机装置。所述人脸检测方法可以应用于各种视频监控场景,如智能交通、门禁系统、城市安防安保等。在智能交通中,可以利用本发明对行人或司机进行人脸检测。
本发明从待检测图像中检测出人脸区域,以便进行基于人脸的处理,例如人脸识别、表情分析等。本发明用于行人检测时,道路上斑马线附近的摄 像头拍摄到的监控图像为待检测图像,本发明从监控图像中检测出人脸区域,以便进行行人识别。
如图1所示,所述人脸检测方法具体包括以下步骤:
101:对待检测图像构造图像金字塔。
待检测图像是包含人脸的图像,通常为监控图像。待检测图像可以包含一个人脸,也可以包含多个人脸。
待检测图像可以是从外部接收的图像、所述计算机装置拍摄的图像、从所述计算机装置的存储器中读取的图像等。
待检测图像可以是灰度图像,也可以是彩色图像,例如RGB图像、LUV图像或HSV图像。
对待检测图像构造图像金字塔是对待检测图像进行不同比例的缩放(可以放大也可以缩小),得到不同大小的缩放图像,待检测图像及其缩放图像组成待检测图像的图像金字塔。例如,对待检测图像缩放75%得到第一缩放图像,对待检测图像缩放50%得到第二缩放图像,对待检测图像缩放25%得到第三缩放图像,待检测图像和第一缩放图像、第二缩放图像、第三缩放图像组成图像金字塔。
可以根据待检测图像的尺寸和本发明所用的人脸检测模型(见步骤103)的尺寸(即人脸检测模型接收的输入图像的尺寸)确定待检测图像的图像金字塔的层数。例如,可以由如下公式确定待检测图像的图像金字塔的层数:
Figure PCTCN2017119043-appb-000003
其中,n表示待检测图像的图像金字塔的层数,k up表示待检测图像上采样的倍数(即待检测图像放大的倍数),w img、h img分别表示待检测图像的宽度和高度,w m、h m分别表示人脸检测模型的宽度和高度(即人脸检测模型接收的输入图像的宽度和高度),n octave表示图像金字塔中每两倍尺寸之间的图像的层数。其中,待检测图像的宽度和高度为已知量,人脸检测模型的宽度和高度也为已知量。k up可由用户根据需要设定,或者系统默认(例如默认为2)。n octave可由用户根据需要设定,或者系统默认(例如默认为8)。
102:提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔。
所述聚合通道特征可以包括颜色特征、梯度幅值特征和梯度方向直方图特征。所述颜色特征可以包括RGB颜色特征、LUV颜色特征、HSV颜色特征、灰度特征等。
可以直接从待检测图像获得所述颜色特征。例如,若待检测图像是RGB图像,可以直接得到RGB颜色特征;若待检测图像是LUV图像,可以直接 得到LUV颜色特征;若待检测图像是HSV图像,可以直接得到HSV颜色特征;若待检测图像是灰度图像,可以直接得到灰度特征。
或者,可以对待检测图像进行转换,获得所述颜色特征。例如,若待检测图像是RGB图像,可以将该RGB图像转换为灰度图像(即根据每个像素点的灰度值计算对应的RGB值),得到待检测图像的灰度特征。
为了得到图像的梯度幅值特征和梯度方向直方图特征,需要计算图像中各个像素点的梯度。梯度有多种计算方法,例如利用Sobel、Prewitt或Roberts算子计算各个像素点的梯度(包括水平方向梯度值和垂直方向梯度值)。根据各个像素点的梯度确定各个像素点的梯度幅值和梯度方向。图像的各个像素点的梯度幅值即为图像的梯度幅值特征。
根据图像中各个像素点的梯度幅值和梯度方向,可以求解图像的梯度方向直方图,即图像的梯度方向直方图特征。在本实施例中,可以将图像分成多个大小相等的区块(例如4×4的区块),分别求取各个区块的梯度方向直方图,根据各个区块的梯度方向直方图获得图像的梯度方向直方图。
每个区块的梯度方向直方图可以计算如下:根据区块中各个像素点的梯度方向,将区块中的各个像素点划分进多个不同的角度范围(例如6个角度范围);统计区块中各个角度范围内的像素点的梯度幅值,得到区块中各个角度范围的梯度幅值;根据区块中各个角度范围的梯度幅值,得到各个区块的梯度方向直方图。
可以由图像中各个区块的梯度方向直方图,得到图像的梯度方向直方图。例如,可以将该图像中各个区块的梯度方向直方图向量串联起来构成梯度方向直方图串联向量,该梯度方向直方图串联向量即为图像的梯度方向直方图特征。
在一较佳实施例中,为了加快特征金字塔的计算速度,可以计算图像金字塔中部分图像(称作实特征层)的聚合通道特征(称为实特征),图像金字塔中其他图像(称作近似特征层)的聚合通道特征由实特征插值得到,例如由与其距离最近的实特征层对应的实特征插值得到。图像金字塔中的实特征层可以由用户按照需要指定,也可以系统默认。
在根据实特征层的聚合通道特征插值得到近似特征层的聚合通道特征时,将实特征层的聚合通道特征乘以系数k s,可以按照如下公式计算k s
Figure PCTCN2017119043-appb-000004
其中,s表示近似特征层相对于实特征层的比例。λ Ω对一种特征来说为常数,可以采用以下方式估计λ Ω的值。估计时,由k μs来代替k s
Figure PCTCN2017119043-appb-000005
其中,
Figure PCTCN2017119043-appb-000006
表示对图像I i按比例s进行缩放,f μΩ(I)表示对图像I求特征Ω,并将这些特征取平均,N表示参与估计的图像数目。在一具体实施例中,将s取值为
Figure PCTCN2017119043-appb-000007
N取50000,利用最小二乘法求得λ Ω
103:利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框。所述候选人脸框是分类为人脸的第一检测框。
所述第一滑动窗口的大小等于人脸检测模型接收的输入图像的大小。在一具体实施例中,所述第一滑动窗口的大小为32×32,所述第一预设步长为2(即2个像素)。在其他的实施例中,所述第一滑动窗口和所述第一预设步长可以是其他大小。
第一滑动窗口在图像金字塔的各层图像上按照预设方向(例如从上到下、从左到右)滑动,每个位置获得一个第一检测框,利用训练好的人脸检测模型对所述第一检测框进行分类,确定所述第一检测框是否为候选人脸框。
所述人脸检测模型可以是由多个(例如512个)决策树(Decision Tree)级联形成的分类器,即由多个弱分类器级联形成的强分类器。决策树又称判定树,是运用于分类的一种树结构。决策树中的每个内部结点代表对某个属性的一次测试,每条边代表一个测试结果,叶结点代表某个类或者类的分布,最上面的结点是根结点。构成人脸检测模型的决策树的深度可以为8,也可以是其他值。
可以使用adboost方法(例如Gentle adboost方法)对由多个决策树形成的人脸检测模型进行训练。
训练人脸检测模型需要的训练样本包括正样本和负样本。训练人脸检测模型的正样本为人脸框图像,负样本为非人脸框图像。
在一具体实施例中,可以从监控图像中截取人脸框图像,将截取的人脸框图像缩放为第一预定大小(例如32×32)作为训练人脸检测模型的正样本;从监控图像中截取非人脸框图像,将截取的非人脸框图像缩放为第一预定大小(例如32×32)作为训练人脸检测模型的负样本。截取的非人脸框图像是从人脸框所在区域之外的图像区域中截取的图像。
对人脸检测模型进行训练可以参考现有技术,此处不再赘述。
104:对所述候选人脸框进行合并,得到合并后的候选人脸框。
对所述候选人脸框进行合并是对所述候选人脸框进行去重。合并后的候选人脸框可以是一个也可以是多个。若待检测图像包含一个人脸,则可以得到一个合并后的候选人脸框;若待检测图像包含多个人脸,则对应每个人脸可以得到一个合并后的候选人脸框。
可以通过非极大值抑制(Non-maximum suppression,NMS)算法对候选人脸框进行合并,即根据候选人脸框属于人脸的概率和候选人脸框的重叠面积比例(Intersection over Union,IOU)对候选人脸框进行合并。
在一具体实施例中,通过NMS算法对候选人脸框进行合并可以包括:将所有候选人脸框按照属于人脸的概率进行排序;选择概率最高的候选人脸框,分别判断其他的候选人脸框与选择的候选人脸框的重叠面积比例是否大 于第一预设阈值(例如0.25);若重叠面积比例大于第一预设阈值,则删除该其他的候选人脸框,并将选择的候选人脸框作为合并后的候选人脸框;从剩余的候选人脸框中选择概率最高的候选人脸框,重复上述过程,直至得到所有合并后的候选人脸框。其中,剩余的候选人脸框是指除去删除的候选人脸框和合并后的候选人脸框所剩下的候选人脸框。
举例来说,假设有6个候选人脸框,按照属于人脸的概率由低到高排序分别为A、B、C、D、E、F。选择概率最高的候选人脸框F,分别判断A~E与F的重叠面积比例是否大于第一预设阈值。假设B、D与F的重叠面积比例超过第一预设阈值,则删除B、D,并标记F是第一个得到的合并后的候选人脸框。从剩下的候选人脸框A、C、E中,选择概率最高的候选人脸框E,判断A、C与E的重叠面积比例是否大于第一预设阈值。假设A、C与E的重叠面积比例大于第一预设阈值,则删除A、C,并标记E是第二个得到的合并后的候选人脸框。因此,通过NMS算法得到合并后的候选人脸框F、E。
105:利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框。所述候选头-肩框是分类为头-肩框的第二检测框。
所述第二滑动窗口的大小等于头-肩检测模型接收的输入图像的大小。在一具体实施例中,所述第二滑动窗口的大小可以是64×64,所述第二预设步长可以是2。在其他的实施例中,所述第二滑动窗口和所述第二预设步长可以是其他大小。
所述第二预设步长可以等于第一预设步长。所述第二预设步长也可以不等于所述第一预设步长,例如,所述第一预设步长为2,所述第二预设步长为4。
第二滑动窗口在图像金字塔的各层图像上按照预设方向(例如从上到下、从左到右)滑动,每个位置获得一个第二检测框,利用训练好的人脸检测模型对所述第二检测框进行分类,确定所述第二检测框是否为候选头-肩框。
所述头-肩检测模型可以是由多个(例如512个)决策树级联形成的分类器。头-肩检测模型包含的决策树的数量与人脸检测模型包含的决策树的数量可以相同,也可以不同。构成头-肩检测模型的决策树的深度可以为8,也可以是其他值。
可以由训练好的人脸检测模型获得头-肩检测模型的训练样本。例如,可以将训练好的由决策树级联得到的人脸检测模型减少若干个决策树,得到新的人脸检测模型。将训练好的人脸检测模型和新的人脸检测模型在监控图像上检测人脸,新的人脸检测模型检出的人脸多于训练好的人脸检测模型检出的人脸。针对新的人脸检测模型多检出的人脸,标出人脸框在监控图像中的位置,扩展该人脸框得到头-肩框,标注头-肩框在监控图像中的位置。例如,将头-肩框的位置标注为[x',y',w',h'],x',y'表示头-肩框的左上角坐标,w'表示头-肩框的宽度,h'表示头-肩框的高度。可以从监控图像中截取头-肩框图像,将截取的头-肩框图像缩放为第二预定大小(例如64×64)作为训练头-肩检测模 型的正样本;从监控图像中截取非头-肩框图像,将截取的非头-肩框图像缩放为第二预定大小作为训练头-肩检测模型的负样本。截取的非头-肩框图像是从头-肩框所在区域之外的图像区域中截取的图像。
由训练好的人脸检测模型可以方便地获得头-肩检测模型需要的训练样本,并且得到的训练样本是从监控图像中得到的,因而更符合实际的监控场景。
可以使用adboost方法(例如Gentle adboost方法)对由多个决策树形成的头-肩检测模型进行训练。
对头-肩检测模型的训练过程可以参考现有技术,此处不再赘述。
现有的头-肩检测一般使用边缘特征(HOG)或纹理特征(LBP),这些特征较复杂,计算耗时。本发明根据待检测图像的特征金字塔进行头-肩检测,不需要进行额外的特征提取,省却了头-肩检测过程中特征提取的时间,加快了头-肩检测的速度,从而提高了本发明人脸检测方法的效率。
106:对所述候选头-肩框进行合并,得到合并后的候选头-肩框。
对所述候选人脸框进行合并是对所述候选头-肩框进行去重。合并后的候选头-肩框可以是一个也可以是多个。若待检测图像包含一个头-肩,则可以得到一个合并后的候选头-肩框;若待检测图像包含多个头-肩,则对应每个头-肩可以得到一个合并后的候选头-肩框。
可以通过非极大值抑制算法对候选头-肩框进行合并,即根据候选头-肩框属于头-肩的概率和候选头-肩框的重叠面积比例对候选头-肩框进行合并。
在一具体实施例中,通过非极大值抑制算法对候选头-肩框进行合并可以包括:将所有候选头-肩框按照属于头-肩的概率进行排序;选择概率最高的候选头-肩框,分别判断其他的候选头-肩框与选择的候选头-肩框的重叠面积比例是否大于第二预设阈值(例如0.30);若重叠面积比例大于第二预设阈值,则删除该其他的候选头-肩框,并将选择的候选头-肩框作为合并后的候选头-肩框;从剩余的候选头-肩框中选择概率最高的候选头-肩框,重复上述过程,直至得到所有合并后的候选头-肩框。其中,剩余的候选头-肩框是指除去删除的候选头-肩框和合并后的候选头-肩框所剩下的候选头-肩框。
举例来说,假设有6个候选头-肩框,按照属于头-肩的概率由低到高排序分别为A'、B'、C'、D'、E'、F'。选择概率最高的候选头-肩框F',分别判断A'~E'与F'的重叠面积比例是否大于第二预设阈值。假设B'、D'与F的重叠面积比例超过第二预设阈值,则删除B'、D',并标记F'是第一个得到的合并后的候选头-肩框。从剩下的候选头-肩框A'、C'、E'中,选择概率最高的候选头-肩框E',判断A'、C'与E'的重叠面积比例是否大于第二预设阈值。假设A'、C'与E'的重叠面积比例大于第二预设阈值,则删除A'、C',并标记E'是第二个得到的合并后的候选头-肩框。
107:利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框。
在本实施例中,人脸框预测模型可以是卷积神经网络。例如,人脸框预测模型可以是图2所示的卷积神经网络,该卷积神经网络包含2个3X3的卷 积层、1个2X2的卷积层、1个全连接层,前2个卷积层使用3X3的最大池化。对卷积神经网络进行训练时,回归的目标是人脸框的位置[x,y,w,h]。
对头-肩检测人脸框预测模型(例如卷积神经网络)的训练过程可以参考现有技术,此处不再赘述。
108:对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
可以通过非极大值抑制算法对合并后的候选人脸框和预测人脸框进行合并,即根据合并后的候选人脸框和预测人脸框属于头-肩的概率和合并后的候选人脸框和预测人脸框的重叠面积比例对候选头-肩框进行合并。
在一具体实施例中,通过非极大值抑制算法对合并后的候选人脸框和预测人脸框进行合并可以包括:将所有合并后的候选人脸框和预测人脸框按照属于人脸的概率由高到低排序;选择概率最高的人脸框(可以是合并后的候选人脸框或预测人脸框),分别判断其他的人脸框与选择的人脸框的重叠面积比例是否大于第三预设阈值(例如0.30);若重叠面积比例大于第三预设阈值,则删除该其他的人脸框,并将选择的人脸框作为目标人脸框;从剩余的人脸框中选择概率最高的人脸框,重复上述过程,直至得到所有目标人脸框。其中,剩余的人脸框是指除去删除的人脸框和目标人脸框所剩下的人脸框。
第一预设阈值、第二预设阈值、第三预设阈值可以相同,也可以不同。
实施例一的人脸检测方法对待检测图像构造图像金字塔;提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;对所述候选人脸框进行合并,得到合并后的候选人脸框;利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;对所述候选头-肩框进行合并,得到合并后的候选头-肩框;利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框;对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
通常的人脸检测(即通过人脸检测模型进行人脸检测)检出率高,误检率较低,实施例一的人脸检测方法以通常的人脸检测作为主要检测方案。但通常的人脸检测对角度的变化(仰头,低头,侧脸),光照的变化(逆光,阴影),遮挡(墨镜、口罩、帽子)等情况较敏感,容易出现漏检。针对通常的人脸检测的缺陷,实施例一的人脸检测方法采用头-肩检测作为辅助检测方案,检测到头-肩区域后,再提取人脸框。最后,将通常的人脸检测和头-肩检测得到的人脸框合并,形成最终的人脸框输出。因此,实施例一的人脸检测方法联合使用人脸检测与头-肩检测,提高了人脸检出率。同时,实施例一的人脸检测方法在人脸检测和头-肩检测时采用相同的特征(即聚合通道特征,也就是特征金字塔),减少了特征提取的时间,加快了检测过程。因此,实施例一的人脸检测方法可以实 现快速高检出率的人脸检测。
实施例二
图3为本发明实施例二提供的人脸检测装置的结构图。如图3所示,所述人脸检测装置10可以包括:构造单元301、提取单元302、第一检测单元303、第一合并单元304、第二检测单元305、第二合并单元306、预测单元307、第三合并单元308。
构造单元301,用于对待检测图像构造图像金字塔。
待检测图像是包含人脸的图像,通常为监控图像。待检测图像可以包含一个人脸,也可以包含多个人脸。
待检测图像可以是灰度图像,也可以是彩色图像,例如RGB图像、LUV图像或HSV图像。
可以直接从待检测图像获得所述颜色特征。例如,若待检测图像是RGB图像,可以直接得到RGB颜色特征;若待检测图像是LUV图像,可以直接得到LUV颜色特征;若待检测图像是HSV图像,可以直接得到HSV颜色特征;若待检测图像是灰度图像,可以直接得到灰度特征。
或者,可以对待检测图像进行转换,获得所述颜色特征。例如,若待检测图像是RGB图像,可以将该RGB图像转换为灰度图像(即根据每个像素点的灰度值计算对应的RGB值),得到待检测图像的灰度特征。
对待检测图像构造图像金字塔是对待检测图像进行不同比例的缩放(可以放大或缩小),得到不同大小的缩放图像,待检测图像及其缩放图像组成待检测图像的图像金字塔。例如,对待检测图像缩放75%得到第一缩放图像,对待检测图像缩放50%得到第二缩放图像,对待检测图像缩放25%得到第三缩放图像,待检测图像和第一缩放图像、第二缩放图像、第三缩放图像组成图像金字塔。
可以根据待检测图像的尺寸和本发明所用的人脸检测模型(见步骤103)的尺寸(即人脸检测模型接收的输入图像的尺寸)确定待检测图像的图像金字塔的层数。例如,可以由如下公式确定待检测图像的图像金字塔的层数:
Figure PCTCN2017119043-appb-000008
其中,n表示待检测图像的图像金字塔的层数,k up表示待检测图像上采样的倍数(即待检测图像放大的倍数),w img、h img分别表示待检测图像的宽度和高度,w m、h m分别表示人脸检测模型的宽度和高度(即人脸检测模型接收的输入图像的宽度和高度),n octave表示图像金字塔中每两倍尺寸之间的图像的层数。其中,待检测图像的宽度和高度为已知量,人脸检测模型的宽度和高度也为已知量。k up可由用户根据需要设定,或者系统默认(例如默认为2)。n octave可 由用户根据需要设定,或者系统默认(例如默认为8)。
提取单元302,用于提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔。
所述聚合通道特征可以包括颜色特征、梯度幅值特征和梯度方向直方图特征。所述颜色特征可以包括RGB颜色特征、LUV颜色特征、HSV颜色特征、灰度特征等。
为了得到图像的梯度幅值特征和梯度方向直方图特征,需要计算图像中各个像素点的梯度。梯度有多种计算方法,例如利用Sobel、Prewitt或Roberts算子计算各个像素点的梯度(包括水平方向梯度值和垂直方向梯度值)。根据各个像素点的梯度确定各个像素点的梯度幅值和梯度方向。图像的各个像素点的梯度幅值即为图像的梯度幅值特征。
根据图像中各个像素点的梯度幅值和梯度方向,可以求解图像的梯度方向直方图,即图像的梯度方向直方图特征。在本实施例中,可以将图像分成多个大小相等的区块(例如4×4的区块),分别求取各个区块的梯度方向直方图,根据各个区块的梯度方向直方图获得图像的梯度方向直方图。
每个区块的梯度方向直方图可以计算如下:根据区块中各个像素点的梯度方向,将区块中的各个像素点划分进多个不同的角度范围(例如6个角度范围);统计区块中各个角度范围内的像素点的梯度幅值,得到区块中各个角度范围的梯度幅值;根据区块中各个角度范围的梯度幅值,得到各个区块的梯度方向直方图。
可以由图像中各个区块的梯度方向直方图,得到图像的梯度方向直方图。例如,可以将该图像中各个区块的梯度方向直方图向量串联起来构成梯度方向直方图串联向量,该梯度方向直方图串联向量即为图像的梯度方向直方图特征。
在一较佳实施例中,为了加快特征金字塔的计算速度,可以计算图像金字塔中部分图像(称作实特征层)的聚合通道特征(称为实特征),图像金字塔中其他图像(称作近似特征层)的聚合通道特征由实特征插值得到,例如由与其距离最近的实特征层对应的实特征插值得到。图像金字塔中的实特征层可以由用户按照需要指定,也可以系统默认。
在根据实特征层的聚合通道特征插值得到近似特征层的聚合通道特征时,将实特征层的聚合通道特征乘以系数k s,可以按照如下公式计算k s
Figure PCTCN2017119043-appb-000009
其中,s表示近似特征层相对于实特征层的比例。λ Ω对一种特征来说为常数,可以采用以下方式估计λ Ω的值。估计时,由k μs来代替k s
Figure PCTCN2017119043-appb-000010
其中,
Figure PCTCN2017119043-appb-000011
表示对图像I i按比例s进行缩放,f μΩ(I)表示对图像I求特征Ω,并将这些特征取平均,N表示参与估计的图像数目。在一具体实施例中,将s取值为
Figure PCTCN2017119043-appb-000012
N取50000,利用最小二乘法求得λ Ω
第一检测单元303,用于利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框。所述候选人脸框是分类为人脸的第一检测框。
所述第一滑动窗口的大小等于人脸检测模型接收的输入图像的大小。在一具体实施例中,所述第一滑动窗口的大小为32×32,所述第一预设步长为2(即2个像素)。在其他的实施例中,所述第一滑动窗口和所述第一预设步长可以是其他大小。
第一滑动窗口在图像金字塔的各层图像上按照预设方向(例如从上到下、从左到右)滑动,每个位置获得一个第一检测框,利用训练好的人脸检测模型对所述第一检测框进行分类,确定所述第一检测框是否为候选人脸框。
所述人脸检测模型可以是由多个(例如512个)决策树(Decision Tree)级联形成的分类器,即由多个弱分类器级联形成的强分类器。决策树又称判定树,是运用于分类的一种树结构。决策树中的每个内部结点代表对某个属性的一次测试,每条边代表一个测试结果,叶结点代表某个类或者类的分布,最上面的结点是根结点。构成人脸检测模型的决策树的深度可以为8,也可以是其他值。
可以使用adboost方法(例如Gentle adboost方法)对由多个决策树形成的人脸检测模型进行训练。
训练人脸检测模型需要的训练样本包括正样本和负样本。训练人脸检测模型的正样本为人脸框图像,负样本为非人脸框图像。
在一具体实施例中,可以从监控图像中截取人脸框图像,将截取的人脸框图像缩放为第一预定大小(例如32×32)作为训练人脸检测模型的正样本;从监控图像中截取非人脸框图像,将截取的非人脸框图像缩放为第一预定大小(例如32×32)作为训练人脸检测模型的负样本。截取的非人脸框图像是从人脸框所在区域之外的图像区域中截取的图像。
对人脸检测模型进行训练可以参考现有技术,此处不再赘述。
第一合并单元304,用于对所述候选人脸框进行合并,得到合并后的候选人脸框。
对所述候选人脸框进行合并是对所述候选人脸框进行去重。合并后的候选人脸框可以是一个也可以是多个。若待检测图像包含一个人脸,则可以得到一个合并后的候选人脸框;若待检测图像包含多个人脸,则对应每个人脸可以得到一个合并后的候选人脸框。
可以通过非极大值抑制(Non-maximum suppression,NMS)算法对候选人脸框进行合并,即根据候选人脸框属于人脸的概率和候选人脸框的重叠面积比例(Intersection over Union,IOU)对候选人脸框进行合并。
在一具体实施例中,通过NMS算法对候选人脸框进行合并可以包括:将所有候选人脸框按照属于人脸的概率进行排序;选择概率最高的候选人脸 框,分别判断其他的候选人脸框与选择的候选人脸框的重叠面积比例是否大于第一预设阈值(例如0.25);若重叠面积比例大于第一预设阈值,则删除该其他的候选人脸框,并将选择的候选人脸框作为合并后的候选人脸框;从剩余的候选人脸框中选择概率最高的候选人脸框,重复上述过程,直至得到所有合并后的候选人脸框。其中,剩余的候选人脸框是指除去删除的候选人脸框和合并后的候选人脸框所剩下的候选人脸框。
举例来说,假设有6个候选人脸框,按照属于人脸的概率由低到高排序分别为A、B、C、D、E、F。选择概率最高的候选人脸框F,分别判断A~E与F的重叠面积比例是否大于第一预设阈值。假设B、D与F的重叠面积比例超过第一预设阈值,则删除B、D,并标记F是第一个得到的合并后的候选人脸框。从剩下的候选人脸框A、C、E中,选择概率最高的候选人脸框E,判断A、C与E的重叠面积比例是否大于第一预设阈值。假设A、C与E的重叠面积比例大于第一预设阈值,则删除A、C,并标记E是第二个得到的合并后的候选人脸框。因此,通过NMS算法得到合并后的候选人脸框F、E。
第二检测单元305,用于利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框。所述候选头-肩框是分类为头-肩框的第二检测框。
所述第二滑动窗口的大小等于头-肩检测模型接收的输入图像的大小。在一具体实施例中,所述第二滑动窗口的大小可以是64×64,所述第二预设步长可以是2。在其他的实施例中,所述第二滑动窗口和所述第二预设步长可以是其他大小。
所述第二预设步长可以等于第一预设步长。所述第二预设步长也可以不等于所述第一预设步长,例如,所述第一预设步长为2,所述第二预设步长为4。
第二滑动窗口在图像金字塔的各层图像上按照预设方向(例如从上到下、从左到右)滑动,每个位置获得一个第二检测框,利用训练好的人脸检测模型对所述第二检测框进行分类,确定所述第二检测框是否为候选头-肩框。
所述头-肩检测模型可以是由多个(例如512个)决策树级联形成的分类器。头-肩检测模型包含的决策树的数量与人脸检测模型包含的决策树的数量可以相同,也可以不同。构成头-肩检测模型的决策树的深度可以为8,也可以是其他值。
可以由训练好的人脸检测模型获得头-肩检测模型的训练样本。例如,可以将训练好的由决策树级联得到的人脸检测模型减少若干个决策树,得到新的人脸检测模型。将训练好的人脸检测模型和新的人脸检测模型在监控图像上检测人脸,新的人脸检测模型检出的人脸多于训练好的人脸检测模型检出的人脸。针对新的人脸检测模型多检出的人脸,标出人脸框在监控图像中的位置,扩展该人脸框得到头-肩框,标注头-肩框在监控图像中的位置。例如,将头-肩框的位置标注为[x',y',w',h'],x',y'表示头-肩框的左上角坐标,w'表示头-肩框的宽度,h'表示头-肩框的高度。可以从监控图像中截取头-肩框图像;将 截取的头-肩框图像缩放为第二预定大小(例如64×64)作为训练头-肩检测模型的正样本;从监控图像中截取非头-肩框图像,将截取的非头-肩框图像缩放为第二预定大小作为训练头-肩检测模型的负样本。截取的非头-肩框图像是从头-肩框所在区域之外的图像区域中截取的图像。
由训练好的人脸检测模型可以方便地获得头-肩检测模型需要的训练样本,并且得到的训练样本是从监控图像中得到的,因而更符合实际的监控场景。
可以使用adboost方法(例如Gentle adboost方法)对由多个决策树形成的头-肩检测模型进行训练。
对头-肩检测模型的训练过程可以参考现有技术,此处不再赘述。
现有的头-肩检测一般使用边缘特征(HOG)或纹理特征(LBP),这些特征较复杂,计算耗时。本发明根据待检测图像的特征金字塔进行头-肩检测,不需要进行额外的特征提取,省却了头-肩检测过程中特征提取的时间,加快了头-肩检测的速度,从而提高了本发明人脸检测方法的效率。
第二合并单元306,用于对所述候选头-肩框进行合并,得到合并后的候选头-肩框。
对所述候选头-肩框进行合并是对所述候选头-肩框进行去重。合并后的候选头-肩框可以是一个也可以是多个。若待检测图像包含一个头-肩,则可以得到一个合并后的候选头-肩框;若待检测图像包含多个头-肩,则对应每个头-肩可以得到一个合并后的候选头-肩框。
可以通过非极大值抑制算法对候选头-肩框进行合并,即根据候选头-肩框属于头-肩的概率和候选头-肩框的重叠面积比例对候选头-肩框进行合并。
在一具体实施例中,通过非极大值抑制算法对候选头-肩框进行合并可以包括:将所有候选头-肩框按照属于头-肩的概率进行排序;选择概率最高的候选头-肩框,分别判断其他的候选头-肩框与选择的候选头-肩框的重叠面积比例是否大于第二预设阈值(例如0.30);若重叠面积比例大于第二预设阈值,则删除该其他的候选头-肩框,并将选择的候选头-肩框作为合并后的候选头-肩框;从剩余的候选头-肩框中选择概率最高的候选头-肩框,重复上述过程,直至得到所有合并后的候选头-肩框。其中,剩余的候选头-肩框是指除去删除的候选头-肩框和合并后的候选头-肩框所剩下的候选头-肩框。
举例来说,假设有6个候选头-肩框,按照属于头-肩的概率由低到高排序分别为A'、B'、C'、D'、E'、F'。选择概率最高的候选头-肩框F',分别判断A'~E'与F'的重叠面积比例是否大于第二预设阈值。假设B'、D'与F的重叠面积比例超过第二预设阈值,则删除B'、D',并标记F'是第一个得到的合并后的候选头-肩框。从剩下的候选头-肩框A'、C'、E'中,选择概率最高的候选头-肩框E',判断A'、C'与E'的重叠面积比例是否大于第二预设阈值。假设A'、C'与E'的重叠面积比例大于第二预设阈值,则删除A'、C',并标记E'是第二个得到的合并后的候选头-肩框。
预测单元307,用于利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框。
在本实施例中,人脸框预测模型可以是卷积神经网络。例如,人脸框预测模型可以是图2所示的卷积神经网络,该卷积神经网络包含2个3X3的卷积层、1个2X2的卷积层、1个全连接层,前2个卷积层使用3X3的最大池化。对卷积神经网络进行训练时,回归的目标是人脸框的位置[x,y,w,h]。
对头-肩检测人脸框预测模型(例如卷积神经网络)的训练过程可以参考现有技术,此处不再赘述。
第三合并单元308,用于对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
可以通过非极大值抑制算法对合并后的候选人脸框和预测人脸框进行合并,即根据合并后的候选人脸框和预测人脸框属于头-肩的概率和合并后的候选人脸框和预测人脸框的重叠面积比例对候选头-肩框进行合并。
在一具体实施例中,通过非极大值抑制算法对合并后的候选人脸框和预测人脸框进行合并可以包括:将所有合并后的候选人脸框和预测人脸框按照属于人脸的概率由高到低排序;选择概率最高的人脸框(可以是合并后的候选人脸框或预测人脸框),分别判断其他的人脸框与选择的人脸框的重叠面积比例是否大于第三预设阈值(例如0.30);若重叠面积比例大于第三预设阈值,则删除该其他的人脸框,并将选择的人脸框作为目标人脸框;从剩余的人脸框中选择概率最高的人脸框,重复上述过程,直至得到所有目标人脸框。其中,剩余的人脸框是指除去删除的人脸框和目标人脸框所剩下的人脸框。
实施例二的人脸检测装置对待检测图像构造图像金字塔;提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;对所述候选人脸框进行合并,得到合并后的候选人脸框;利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;对所述候选头-肩框进行合并,得到合并后的候选头-肩框;利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框;对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
通常的人脸检测(即通过人脸检测模型进行人脸检测)检出率高,误检率较低,实施例二的人脸检测装置以通常的人脸检测作为主要检测方案。但通常的人脸检测对角度的变化(仰头,低头,侧脸),光照的变化(逆光,阴影),遮挡(墨镜、口罩、帽子)等情况较敏感,容易出现漏检。针对通常的人脸检测的缺陷,实施例二的人脸检测装置采用头-肩检测作为辅助检测方案,检测到头-肩区域后,再提取人脸框。最后,将通常的人脸检测和头-肩检测得到的人脸框合并,形成最终的人脸框输出。因此,实施例二的人脸检测装置联合使用人脸检测与头-肩检测,提高了人脸检出率。同时,实施例二的人脸检测装置在人脸检测和头-肩检测时采用相同的特征(即聚合通道特征),减少了特征提取的时间, 加快了检测过程。因此,实施例二的人脸检测装置可以实现快速高检出率的人脸检测。
实施例三
图3为本发明实施例三提供的计算机装置的示意图。所述计算机装置1包括存储器20、处理器30以及存储在所述存储器20中并可在所述处理器30上运行的计算机程序40,例如人脸检测程序。所述处理器30执行所述计算机程序40时实现上述人脸检测方法实施例中的步骤,例如图1所示的步骤101~108。或者,所述处理器30执行所述计算机程序40时实现上述装置实施例中各模块/单元的功能,例如图3中的单元301~308。
示例性的,所述计算机程序40可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器20中,并由所述处理器30执行,以完成本发明。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序40在所述计算机装置1中的执行过程。例如,所述计算机程序40可以被分割成图3中的构造单元301、提取单元302、第一检测单元303、第一合并单元304、第二检测单元305、第二合并单元306、预测单元307、第三合并单元308,各单元具体功能参见实施例二。
所述计算机装置1可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图4仅仅是计算机装置1的示例,并不构成对计算机装置1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机装置1还可以包括输入输出设备、网络接入设备、总线等。
所称处理器30可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器30也可以是任何常规的处理器等,所述处理器30是所述计算机装置1的控制中心,利用各种接口和线路连接整个计算机装置1的各个部分。
所述存储器20可用于存储所述计算机程序40和/或模块/单元,所述处理器30通过运行或执行存储在所述存储器20内的计算机程序和/或模块/单元,以及调用存储在存储器20内的数据,实现所述计算机装置1的各种功能。所述存储器20可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机装置1的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器20可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述计算机装置1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
在本发明所提供的几个实施例中,应该理解到,所揭露的计算机装置和方法,可以通过其它的方式实现。例如,以上所描述的计算机装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
另外,在本发明各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。计算机装置权利要求中陈述的多个单元或计算机装置也可以由同一个单元或计算机装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。

Claims (10)

  1. 一种人脸检测方法,其特征在于,所述方法包括:
    对待检测图像构造图像金字塔;
    提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;
    利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;
    对所述候选人脸框进行合并,得到合并后的候选人脸框;
    利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;
    对所述候选头-肩框进行合并,得到合并后的候选头-肩框;
    利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框;
    对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
  2. 如权利要求1所述的方法,其特征在于,所述图像金字塔的层数由如下公式确定:
    Figure PCTCN2017119043-appb-100001
    其中,n表示待检测图像的图像金字塔的层数,k up表示待检测图像上采样的倍数,w img、h img分别表示待检测图像的宽度和高度,w m、h m分别表示所述人脸检测模型接收的输入图像的宽度和高度,n octave表示图像金字塔中每两倍尺寸之间的图像的层数。
  3. 如权利要求1所述的方法,其特征在于,所述提取所述图像金字塔的各层图像的聚合通道特征包括:
    计算所述图像金字塔中部分图像的聚合通道特征,由所述部分图像的聚合通道特征插值得到所述图像金字塔中其他图像的聚合通道特征。
  4. 如权利要求1至3中任一项所述的方法,其特征在于,所述人脸检测模型和所述头-肩检测模型是由多个决策树级联形成的分类器。
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括:获取所述头- 肩检测模型的训练样本,具体方法如下:
    将所述训练好的人脸检测模型减少若干个决策树,得到新的人脸检测模型;
    将所述训练好的人脸检测模型和所述新的人脸检测模型在预设图像上检测人脸,所述新的人脸检测模型检出的人脸多于所述训练好的人脸检测模型检出的人脸;
    针对所述新的人脸检测模型比所述训练好的人脸检测模型多检出的人脸,标出人脸框在所述预设图像中的位置,扩展该人脸框得到头-肩框,标注所述头-肩框在所述预设图像中的位置;
    从所述预设图像中截取头-肩框图像,将截取的头-肩框图像缩放为预定大小作为训练所述头-肩检测模型的正样本,从所述预设图像中截取非头-肩框图像,将截取的非头-肩框图像缩放为预定大小作为训练所述头-肩检测模型的负样本。
  6. 一种人脸检测装置,其特征在于,所述装置包括:
    构造单元,用于对待检测图像构造图像金字塔;
    提取单元,用于提取所述图像金字塔的各层图像的聚合通道特征,得到所述待检测图像的特征金字塔;
    第一检测单元,用于利用第一滑动窗口按照第一预设步长在所述图像金字塔的各层图像上滑动,获得多个第一检测框,利用训练好的人脸检测模型根据所述特征金字塔对所述第一检测框进行分类,获得多个候选人脸框;
    第一合并单元,用于对所述候选人脸框进行合并,得到合并后的候选人脸框;
    第二检测单元,用于利用第二滑动窗口按照第二预设步长在所述图像金字塔的各层上滑动,获得多个第二检测框,利用训练好的头-肩检测模型根据所述特征金字塔对所述第二检测框进行分类,得到多个候选头-肩框;
    第二合并单元,用于对所述候选头-肩框进行合并,得到合并后的候选头-肩框;
    预测单元,用于利用训练好的人脸框预测模型从所述合并后的候选头-肩框中预测人脸,得到预测人脸框;
    第三合并单元,用于对所述合并后的候选人脸框和所述预测人脸框进行合并,得到目标人脸框。
  7. 如权利要求6所述的装置,其特征在于,所述构造单元根据如下公式确定所述图像金字塔的层数:
    Figure PCTCN2017119043-appb-100002
    其中,n表示待检测图像的图像金字塔的层数,k up表示待检测图像上采样的倍数,w img、h img分别表示待检测图像的宽度和高度,w m、h m分别表示所述人脸 检测模型接收的输入图像的宽度和高度,n octave表示图像金字塔中每两倍尺寸之间的图像的层数。
  8. 如权利要求6所述的装置,其特征在于,所述人脸检测模型和所述头-肩检测模型是由多个决策树级联形成的分类器。
  9. 一种计算机装置,其特征在于:所述计算机装置包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1-5中任一项所述人脸检测方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现如权利要求1-5中任一项所述人脸检测方法。
PCT/CN2017/119043 2017-12-12 2017-12-27 人脸检测方法及装置、计算机装置和计算机可读存储介质 WO2019114036A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711319416.X 2017-12-12
CN201711319416.XA CN109918969B (zh) 2017-12-12 2017-12-12 人脸检测方法及装置、计算机装置和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019114036A1 true WO2019114036A1 (zh) 2019-06-20

Family

ID=66819559

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119043 WO2019114036A1 (zh) 2017-12-12 2017-12-27 人脸检测方法及装置、计算机装置和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN109918969B (zh)
WO (1) WO2019114036A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179218A (zh) * 2019-12-06 2020-05-19 深圳市派科斯科技有限公司 传送带物料检测方法、装置、存储介质及终端设备
CN111538861A (zh) * 2020-04-22 2020-08-14 浙江大华技术股份有限公司 基于监控视频进行图像检索的方法、装置、设备及介质
CN111783601A (zh) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111832460A (zh) * 2020-07-06 2020-10-27 北京工业大学 一种基于多特征融合的人脸图像提取方法及系统
CN112183351A (zh) * 2020-09-28 2021-01-05 普联国际有限公司 结合肤色信息的人脸检测方法、装置、设备及可读存储介质
CN112825138A (zh) * 2019-11-21 2021-05-21 佳能株式会社 图像处理设备、图像处理方法、摄像设备及机器可读介质
CN113095284A (zh) * 2021-04-30 2021-07-09 平安国际智慧城市科技股份有限公司 人脸选取方法、装置、设备及计算机可读存储介质
CN113221812A (zh) * 2021-05-26 2021-08-06 广州织点智能科技有限公司 人脸关键点检测模型的训练方法和人脸关键点检测方法
CN113723274A (zh) * 2021-08-27 2021-11-30 上海科技大学 改进基于非极大抑制的目标物体检测方法
CN113989881A (zh) * 2021-10-18 2022-01-28 奥比中光科技集团股份有限公司 一种人脸检测方法、装置及终端
CN114444895A (zh) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 清洁质量评估方法及相关设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264396B (zh) * 2019-06-27 2022-11-18 杨骥 视频人脸替换方法、系统及计算机可读存储介质
CN113051960A (zh) * 2019-12-26 2021-06-29 深圳市光鉴科技有限公司 深度图人脸检测方法、系统、设备及存储介质
CN111985439B (zh) * 2020-08-31 2024-08-13 中移(杭州)信息技术有限公司 人脸检测方法、装置、设备和存储介质
CN112507786B (zh) * 2020-11-03 2022-04-08 浙江大华技术股份有限公司 人体多部位检测框关联方法、装置、电子装置和存储介质
CN112714253B (zh) * 2020-12-28 2022-08-26 维沃移动通信有限公司 视频录制方法、装置、电子设备和可读存储介质
CN113095257A (zh) * 2021-04-20 2021-07-09 上海商汤智能科技有限公司 异常行为检测方法、装置、设备及存储介质
CN113269761A (zh) * 2021-05-31 2021-08-17 广东联通通信建设有限公司 一种倒影检测方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131728A (zh) * 2007-09-29 2008-02-27 东华大学 一种基于Shape Context的人脸形状匹配方法
CN102163283A (zh) * 2011-05-25 2011-08-24 电子科技大学 一种基于局部三值模式的人脸特征提取方法
CN102254183A (zh) * 2011-07-18 2011-11-23 北京汉邦高科数字技术有限公司 一种基于AdaBoost算法的人脸检测方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法
CN107330390A (zh) * 2017-06-26 2017-11-07 上海远洲核信软件科技股份有限公司 一种基于图像分析和深度学习的人数统计方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096801A (zh) * 2009-12-14 2011-06-15 北京中星微电子有限公司 一种坐姿检测方法及装置
CN104361327B (zh) * 2014-11-20 2018-09-18 苏州科达科技股份有限公司 一种行人检测方法和系统
CN106650615B (zh) * 2016-11-07 2018-03-27 深圳云天励飞技术有限公司 一种图像处理方法及终端
CN106991688A (zh) * 2017-03-09 2017-07-28 广东欧珀移动通信有限公司 人体跟踪方法、人体跟踪装置和电子装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101131728A (zh) * 2007-09-29 2008-02-27 东华大学 一种基于Shape Context的人脸形状匹配方法
CN102163283A (zh) * 2011-05-25 2011-08-24 电子科技大学 一种基于局部三值模式的人脸特征提取方法
CN102254183A (zh) * 2011-07-18 2011-11-23 北京汉邦高科数字技术有限公司 一种基于AdaBoost算法的人脸检测方法
CN106529448A (zh) * 2016-10-27 2017-03-22 四川长虹电器股份有限公司 利用聚合通道特征进行多视角人脸检测的方法
CN107330390A (zh) * 2017-06-26 2017-11-07 上海远洲核信软件科技股份有限公司 一种基于图像分析和深度学习的人数统计方法

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11670112B2 (en) 2019-11-21 2023-06-06 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and image capture apparatus
CN112825138A (zh) * 2019-11-21 2021-05-21 佳能株式会社 图像处理设备、图像处理方法、摄像设备及机器可读介质
EP3826293A1 (en) * 2019-11-21 2021-05-26 Canon Kabushiki Kaisha Detection of head and face separately and selection of result for use in detection of facial features
CN111179218A (zh) * 2019-12-06 2020-05-19 深圳市派科斯科技有限公司 传送带物料检测方法、装置、存储介质及终端设备
CN111179218B (zh) * 2019-12-06 2023-07-04 深圳市燕麦科技股份有限公司 传送带物料检测方法、装置、存储介质及终端设备
CN111538861A (zh) * 2020-04-22 2020-08-14 浙江大华技术股份有限公司 基于监控视频进行图像检索的方法、装置、设备及介质
CN111538861B (zh) * 2020-04-22 2023-08-15 浙江大华技术股份有限公司 基于监控视频进行图像检索的方法、装置、设备及介质
CN111783601A (zh) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111783601B (zh) * 2020-06-24 2024-04-26 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111832460A (zh) * 2020-07-06 2020-10-27 北京工业大学 一种基于多特征融合的人脸图像提取方法及系统
CN111832460B (zh) * 2020-07-06 2024-05-21 北京工业大学 一种基于多特征融合的人脸图像提取方法及系统
CN112183351A (zh) * 2020-09-28 2021-01-05 普联国际有限公司 结合肤色信息的人脸检测方法、装置、设备及可读存储介质
CN112183351B (zh) * 2020-09-28 2024-03-29 普联国际有限公司 结合肤色信息的人脸检测方法、装置、设备及可读存储介质
CN113095284A (zh) * 2021-04-30 2021-07-09 平安国际智慧城市科技股份有限公司 人脸选取方法、装置、设备及计算机可读存储介质
CN113221812A (zh) * 2021-05-26 2021-08-06 广州织点智能科技有限公司 人脸关键点检测模型的训练方法和人脸关键点检测方法
CN113723274A (zh) * 2021-08-27 2021-11-30 上海科技大学 改进基于非极大抑制的目标物体检测方法
CN113723274B (zh) * 2021-08-27 2023-09-22 上海科技大学 改进基于非极大抑制的目标物体检测方法
CN113989881A (zh) * 2021-10-18 2022-01-28 奥比中光科技集团股份有限公司 一种人脸检测方法、装置及终端
CN114444895A (zh) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 清洁质量评估方法及相关设备

Also Published As

Publication number Publication date
CN109918969A (zh) 2019-06-21
CN109918969B (zh) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2019114036A1 (zh) 人脸检测方法及装置、计算机装置和计算机可读存储介质
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
Wei et al. Multi-vehicle detection algorithm through combining Harr and HOG features
CN108121986B (zh) 目标检测方法及装置、计算机装置和计算机可读存储介质
WO2018103608A1 (zh) 一种文字检测方法、装置及存储介质
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
WO2020107717A1 (zh) 视觉显著性区域检测方法及装置
US8351662B2 (en) System and method for face verification using video sequence
US9014467B2 (en) Image processing method and image processing device
CN112784810B (zh) 手势识别方法、装置、计算机设备和存储介质
CN104866616B (zh) 监控视频目标搜索方法
CN107273832B (zh) 基于积分通道特征与卷积神经网络的车牌识别方法及系统
JP2003030667A (ja) イメージ内で目を自動的に位置決めする方法
JP2017531883A (ja) 画像の主要被写体を抽出する方法とシステム
WO2020187160A1 (zh) 基于级联的深层卷积神经网络的人脸识别方法及系统
JP6095817B1 (ja) 物体検出装置
WO2018082308A1 (zh) 一种图像处理方法及终端
CN112651996B (zh) 目标检测跟踪方法、装置、电子设备和存储介质
KR101343623B1 (ko) 적응적 피부색 검출 방법, 그리고 이를 이용한 얼굴 검출 방법 및 그 장치
WO2019095998A1 (zh) 图像识别方法及装置、计算机装置和计算机可读存储介质
CN114973057B (zh) 基于人工智能的视频图像检测方法及相关设备
Wo et al. A saliency detection model using aggregation degree of color and texture
CN112101139A (zh) 人形检测方法、装置、设备及存储介质
Ghandour et al. Building shadow detection based on multi-thresholding segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17934931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17934931

Country of ref document: EP

Kind code of ref document: A1