[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2018188453A1 - 人脸区域的确定方法、存储介质、计算机设备 - Google Patents

人脸区域的确定方法、存储介质、计算机设备 Download PDF

Info

Publication number
WO2018188453A1
WO2018188453A1 PCT/CN2018/079551 CN2018079551W WO2018188453A1 WO 2018188453 A1 WO2018188453 A1 WO 2018188453A1 CN 2018079551 W CN2018079551 W CN 2018079551W WO 2018188453 A1 WO2018188453 A1 WO 2018188453A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
region
neural network
convolutional neural
area
Prior art date
Application number
PCT/CN2018/079551
Other languages
English (en)
French (fr)
Inventor
王亚彪
倪辉
赵艳丹
汪铖杰
李季檩
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018188453A1 publication Critical patent/WO2018188453A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present application relates to the field of image processing, and in particular to a method for determining a face region, a storage medium, and a computer device.
  • Face recognition is a biometric recognition technology based on human facial feature information for identification. Using a camera or camera to capture an image or video stream containing a face, and automatically detect and track the face in the image, and then perform a series of related techniques on the face of the detected face, usually called portrait recognition and face recognition.
  • Face detection has been widely studied as a basis for applications such as face recognition, face key location, and face retrieval. Face detection is to determine whether there is a face in the image from a given image. If it exists, the size and position of the face are given. As shown in Figure 1, the image on the left is performed. Detection, the right image is obtained, and the face area (ie, the dotted area) is identified.
  • the main difficulty lies in the following two aspects: the face itself can have various forms of details. Changes, such as changes in skin color, face, expression, and face pose; faces in images are also affected by a variety of external factors, such as lighting, camera shake, and occlusion from the ornament on the human face. Wait.
  • the face detection methods are various and can be classified into a feature-based detection method and a statistical model-based detection method.
  • the feature-based face detection method is mainly based on some empirical rules and artificially constructed features for face detection, such as detection methods based on some facial organ structures and texture features.
  • the statistical model based detection method also needs to be extracted first on the sample.
  • face detection based on statistical models is not purely based on some set rules, but a large number of samples are used to train the detector model.
  • support vector machine SVM A face detection algorithm called Support Vector Machine, based on Adaboost's face detection algorithm.
  • the commonly used indicators for evaluating face detection methods are mainly the following: (1) Detection rate, that is, the number of faces that are correctly detected and the total person in the image in a given image set. The ratio between the number of faces; (2) the number of false detections, that is, the number of non-face areas detected as the face area, the ideal face detector should have a detection rate of 100% and 0 errors.
  • the feature-based detection method in the related art is used, and it is easy to be affected by the subjective factors of the user due to the need to use the empirical rules and the artificially constructed features.
  • the detection rate and robustness of face recognition if the statistical model-based detection method in the related technology is used, the commonly used models often have more layers to ensure the accuracy of recognition, which leads to a larger model. These models all exceed 15MB. Although the number of layers will ensure the accuracy of recognition, the increase of the number of layers will bring about the defect of face detection speed (more than 300ms on mainstream PC), which cannot meet the requirements of real-time.
  • the embodiment of the present application provides a method for determining a face area, a storage medium, and a computer device, to at least solve the technical problem of poor real-time detection of face detection in the related art.
  • a method for determining a face area includes: receiving a location request, where the location request is used to request to locate a face area in a target picture;
  • the convolutional neural network performs a face positioning operation on the target image to obtain a positioning result, wherein the convolutional neural network is used to call the graphics processor to perform a convolution operation on the target image, and the face positioning operation includes a convolution operation;
  • the positioning result is returned.
  • a computer device (or a determining device for a face region), comprising a memory and a processor, the processor being configured to execute a computer program saved in the memory: receiving the positioning a request, wherein the positioning request is used to request to locate a face region in the target picture; performing a face positioning operation on the target image by using a convolutional neural network to obtain a positioning result, wherein the convolutional neural network is used to invoke a graphics processor pair
  • the target image is subjected to a convolution operation, and the face positioning operation includes a convolution operation; and when the positioning result is used to indicate that the presence of the face region is located in the target image, the positioning result is returned.
  • a storage medium comprising a stored program, wherein the program is configured to execute any of the methods described above at runtime.
  • the convolution operation is directly called by the graphics processor through the full convolution network in the convolutional neural network, and the hardware acceleration method is adopted.
  • the software processing method of scanning by region by the CPU the technical problem of poor real-time detection of the face detection in the related art can be solved, thereby achieving the technical effect of improving the real-time performance of the face detection.
  • FIG. 1 is a schematic diagram of an optional face region in the related art
  • FIG. 2 is a schematic diagram of a hardware environment of a method for determining a face region according to an embodiment of the present application
  • FIG. 3 is a flowchart of an optional method for determining a face region according to an embodiment of the present application
  • FIG. 4 is a schematic diagram of an optional degree of face coincidence according to an embodiment of the present application.
  • FIG. 5 is a schematic illustration of an alternative sample in accordance with an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an optional network structure according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an optional face region according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an optional face region according to an embodiment of the present application.
  • FIG. 9 is a flowchart of an optional method for determining a face region according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an alternative probability map in accordance with an embodiment of the present application.
  • FIG. 11 is a schematic diagram of an alternative computer device in accordance with an embodiment of the present application.
  • FIG. 12 is a structural block diagram of a terminal according to an embodiment of the present application.
  • the Convolutional Neural Network is a feedforward neural network. Its artificial neurons can respond to a part of the coverage area. It has excellent performance for large-scale image processing. It mainly includes convolutional layers and pools. Layer.
  • Adaboost An iterative algorithm that can be used to train different classifiers for the same training set, and then combine these classifiers to form a stronger classifier.
  • a method embodiment of a method for determining a face region is provided.
  • the method for determining the face region may be applied to a hardware environment formed by the server 202 and the terminal 204 as shown in FIG. 2 .
  • the server 202 is connected to the terminal 204 through a network.
  • the network includes but is not limited to a wide area network, a metropolitan area network, or a local area network.
  • the terminal 204 is not limited to a PC, a mobile phone, a tablet, or the like.
  • the face recognition function provided by the method of the present application may be directly integrated on the terminal, or a client for implementing the method of the present application may be installed, so that the terminal receives the message.
  • the face image is operated by the convolutional neural network to obtain the positioning result, and the convolutional neural network is used to call the graphics processor to convolve the target image.
  • the face positioning operation includes a convolution operation; and when the positioning result is used to indicate that the face area exists in the target picture, the positioning result is returned.
  • the method provided by the present application may also be run on a server or the like in the form of a Software Development Kit (SDK), and provided to an application in the form of an SDK, providing an interface for the face recognition function, and other devices. Face area recognition can be achieved through the provided interface.
  • SDK Software Development Kit
  • the server When receiving the positioning request sent by the other device through the interface, the server performs a face positioning operation on the target image through the convolutional neural network to obtain a positioning result, and the convolutional neural network is used to call the graphics processor to perform convolution operation on the target image.
  • the face positioning operation includes a convolution operation; when the positioning result is used to indicate that the presence of the face region in the target image, the positioning result is returned to the device that initiated the request.
  • FIG. 3 is a flowchart of an optional method for determining a face region according to an embodiment of the present application. As shown in FIG. 3, the method may include the following steps:
  • Step S302 the server receives a positioning request, and the positioning request is used to request to locate a face area in the target picture;
  • Step S304 the server performs a face positioning operation on the target image through the deployed convolutional neural network to obtain a positioning result, and the convolutional neural network is used to invoke a graphics processor to perform a convolution operation on the target image, and the face positioning operation includes convolution. operating;
  • Step S306 in the case that the positioning result is used to indicate that the presence of the face area is located in the target picture, the server returns the positioning result to the object that initiated the positioning request or the receiver indicated by the positioning request.
  • the computer device of the present application may be the above-mentioned server or terminal.
  • the above embodiment is schematically illustrated by the server 202 as a method for determining the face area.
  • the method for determining the face area of the present application may also be performed by the terminal 204. That is, the execution body of the above step may be replaced by a server, or may be performed by the server 202 and the terminal 204.
  • the terminal initiates a positioning request
  • the server completes the positioning, and returns the positioning result to the terminal.
  • the determining method of the face area by the terminal 204 in the embodiment of the present application may also be performed by a client installed thereon. For the unification of the description, the following method is performed by the server as an example for detailed description.
  • the face image is operated by the convolutional neural network to obtain a positioning result, and the positioning result is used to represent the presence of the face region in the target image.
  • the convolution operation is directly performed by calling the graphics processor through the full convolution network in the convolutional neural network, and the hardware acceleration method is adopted.
  • the software processing method of performing area-by-area scanning by the CPU can solve the technical problem of poor real-time detection of the face detection in the related art, thereby achieving the technical effect of improving the real-time performance of the face detection.
  • the face detection algorithm in the related art has many problems in the general application scenario.
  • the feature-based face detection has a fast detection speed, but for a slightly complex scene, the detection rate of the algorithm is low and lacks robustness;
  • the Adaboost face detection algorithm is small in model and fast in detection speed, it is less robust to complex scenes, such as face detection in extreme scenes, such as masks, black-rimmed glasses, and blurred images. .
  • the convolutional neural networks there are mainly three convolutional neural networks, namely the first-level convolutional neural network net-1, the second-level convolutional neural network net-2, and the third-level convolutional neural network net-3.
  • These three networks can adopt a cascading structure, given an image, output a candidate face frame set through net-1, and input the candidate set to net-2 to obtain a more accurate candidate face frame set, and then The obtained candidate set is input to net-3, and the final face frame set is obtained, that is, the final face position, which is a process from positioning coarse to fine.
  • Convolutional neural network CNN is used to express facial features. Compared with the face detection method based on Adaboost or SVM in related art, it has stronger Lu detection for scenes such as side face, dark light and blur.
  • the bar-shaped, convolutional neural network with a three-level cascade structure can ensure the accuracy of recognition;
  • the initial positioning and precise positioning of the face frame ie, the face area
  • a classification branch and a regression branch respectively, and the two branches share the middle layer, compared with some face detection methods appearing in the related art.
  • the model used (such as a model based on deep learning) reduces the size of the model, making the detection faster;
  • the first-level network in the three-level cascade structure of the present application adopts a full convolutional neural network instead of the sliding window in the related art, and the full convolutional neural network directly calls the GPU for processing.
  • the process of generating a candidate face frame is greatly accelerated.
  • the parameters in the convolutional neural network may be learned by training the deployed convolutional neural network on the server through the pictures in the picture set to determine the convolutional neural network.
  • the value of the parameter, the picture in the picture collection is an image that includes some or all of the face regions.
  • the above learning process mainly includes selecting the appropriate training data and training to obtain the parameter values in two parts.
  • the data of the above convolutional neural network can be divided into three categories: positive samples, regression Samples and negative samples. These three types of samples are based on the face region (ie, face frame) identified in the sample and the IoU (Intersection over Union) of the real face region. IoU defines the degree of overlap between the two frames. The ratio of the common area A ⁇ B of the face area A frame and the real face area B frame (ie, the overlapping parts) to the sum of the area of the sample face area and the real face area A ⁇ B, namely:
  • a ⁇ B is the common area of the sample face area A frame and the real face area B frame
  • a ⁇ B is the A frame and the B frame possession. The total area.
  • the dotted frame is a real ground truth (real face area), and the solid line frame is a generated sample frame (ie, a sample face area), and when performing training, it can be viewed from the figure. 5 obtain the sample data used for training, such as inputting the sample face region into the convolutional neural network.
  • the three types of samples can be defined as follows: positive samples, samples with IoU greater than 0.7; regression samples, samples with IoU between 0.5 and 0.7; negative samples with IoU less than Sample of 0.25.
  • the prepared neural network can be trained using the prepared data.
  • the network structure used in the present application may be a dual branch structure.
  • One of the branches is the Face Classification, which is used to determine whether the current input contains a face, to obtain a candidate face frame set, to obtain a face frame set, and another branch to be a Face Box Regression. After the initial face frame coordinates are given in the classification branch, the coordinate adjustment of the face region is performed to obtain an accurate face frame position.
  • the goal of the three Net face classification branch optimization in Figure 6 is to minimize the error softmax loss, and the softmax expression of the final classification neuron is:
  • h is the result
  • is the model parameter
  • k is the number of states to be estimated.
  • M is the number of samples used in a forward process
  • x i is the i-th input
  • T is the parameter, Used to normalize the probability distribution so that the sum of the probabilities is 1.
  • the candidate frame obtained for the face classification branch may include information of four dimensions, namely (x i , y i , w i , h i ), as shown in FIG. 7 , in one image
  • the position of the face frame is shown in the solid line frame in Figure 7, and the dotted line frame is an example of a sample selected.
  • the Euclidean Loss function to be optimized (Euclidean Loss) is:
  • the above dimension information uses a relative quantity, taking the first component of z i as an example, namely:
  • x′′ is the vertex coordinates of the selected sample frame.
  • the parameters in the convolutional neural network can be initialized first, and then the sample picture is input into the convolutional neural network to obtain the output of the convolutional neural network (ie, face localization).
  • the output result and the actual result are calculated by the above two formulas, and if the error is within the allowable range, the current parameter is Reasonable; if the error is not within the allowable range, adjust the parameters according to the error magnitude, then re-enter the sample picture, and again calculate the error and other information through the above two formulas until the actual result is adjusted.
  • the result of the subsequent convolutional neural network is within the allowable range.
  • the face region can be identified by the method provided by the present application.
  • the method provided by the present application E.g:
  • the positioning request mainly includes but is not limited to the following sources:
  • the face location request initiated by the client B on the terminal to the terminal may be received, the client B It can be a client that needs to detect a face in real time, such as live beauty, face tracking, etc.
  • the terminal B communicates with the terminal A (for example, by WIFI, Bluetooth, NFC, etc.), the terminal A received face location request initiated by terminal B;
  • the third-level convolutional neural network of the present application works in a cascade manner, and the face frame set 1 of the first-level convolutional neural network can be used as the second level in the third-level convolutional neural network.
  • the input of the convolutional neural network net-2 ie, the second convolutional neural network
  • the output of the second-stage convolutional neural network for filtering and screening can be used as the third-level convolutional neural network (third volume).
  • the input of the neural network uses the output of the third-level convolutional neural network filter and filter as the final result.
  • the first convolutional neural network net-1 in the third-level convolutional neural network may be used to call the graphics processor to convolute the target image to obtain a volume.
  • the result of the product wherein the convolutional neural network comprises a first convolutional neural network; determining, according to the convolution result, the confidence of the first region in the target image as a face region; and determining the face region in the first region according to the confidence.
  • the server calls the graphics processor through the first convolutional neural network to convolve the target image, and obtains the convolution result
  • the convolution algorithm on the first convolutional neural network can be executed by calling the graphics processor to target the image.
  • Each of the first regions performs a class of feature recognition to obtain a convolution result, and the convolution result is used to indicate features of the first region of the class of features.
  • the first region in the target image is determined as the confidence of the face region according to the convolution result
  • the confidence of the first region as the face region may be determined according to the feature of the first region of the type of features.
  • the input picture has a parameter of 12*12*3, and “12*12” indicates that the input picture has a pixel size of at least 12*12 (ie, the third Threshold), that is, the minimum face area that supports recognition is "12*12", "3" is represented as a 3-channel image;
  • the first-level convolutional neural network is used for relatively coarse-grained face features (ie, the above).
  • the identification of a class of features, for each region in the picture includes the identified features, and then uses a pre-set feature matching algorithm to determine the confidence that the region is the face region.
  • the first area with a confidence greater than the first threshold is placed in the candidate face frame set 1 (the area in the set is recorded as the second area).
  • the position of the face reference feature in the first area may be The first area is positionally adjusted such that the face reference feature is located at a preset position in the first position after the position adjustment.
  • the face region when the face region is determined in the first region according to the confidence level, the face region can be determined in the first region after the position adjustment according to the confidence level.
  • the second area in the first area may be used (ie, the confidence is greater than the first threshold)
  • the first area adjusts the position to avoid waste of resources.
  • the face region when the face region is determined in the first region according to the confidence level, the face region may be determined in the second region after the position adjustment according to the confidence level.
  • the above-mentioned face reference feature may be a facial feature of a face (such as a nose, eyes, mouth, eyebrows, etc.), and a fixed facial feature is relatively fixed on the face of the person, for example, for the nose, generally located on the face.
  • the centrally located position i.e., after identifying the nose in the first region, the first region can be adjusted such that the nose is centered in the adjusted first region.
  • the convolutional neural network of the present application may be a three-level convolutional neural network, and the first-level convolutional neural network mainly performs preliminary identification of the facial region, and obtains the candidate face frame set 1 described above.
  • the above-mentioned face frame set 1 can be input as the second convolutional neural network, and the second region in the face frame set 1 is determined by the second convolutional neural network.
  • the second region is the region in the first region whose confidence is greater than the first threshold.
  • the area size of the second area may be adjusted to a fourth threshold before the second convolutional neural network determines the confidence of the second area as the face area, and the fourth threshold is greater than the third threshold, for example, the pixel size is adjusted to “24” *24" 3-channel image, and then feature recognition of the region-adjusted second region through the second convolutional neural network, the feature type identified here and the characteristics identified by the first-stage convolutional neural network Different types, after the identification is completed, the confidence of the second region as the face region may be determined according to the identified features, for example, by using a preset feature matching algorithm.
  • the region with the confidence greater than the second threshold in the second region may be placed into the face frame set 2 (the region in the set is recorded as the first Three areas).
  • the face region in the third region can then be identified by the third convolutional neural network.
  • the third area in the face frame set 2 may be adjusted according to the foregoing position adjustment method for the second area.
  • the area size of the third area may be adjusted to a fifth threshold, and the fifth threshold is greater than a fourth threshold, for example, the third
  • the image adjusted to the area of "48*48" is used as the input of the third convolutional neural network, and the third region subjected to the area resizing is characterized by the third convolutional neural network, and the feature type identified here is the same as the foregoing
  • the first-level convolutional neural network and the second-level convolutional neural network identify different types of features, and after the identification is completed, the face region in the third region can be determined according to the identified features, for example, by preset feature matching.
  • the algorithm calculates the calculation matching degree, and uses the third area with the highest matching degree as the face area.
  • the features identified by the first-stage convolutional neural network are relatively simple, and the discriminant threshold can be set loosely, so that a large number of non-face windows can be eliminated while maintaining a high recall rate;
  • the polar convolutional neural network and the second polar convolutional neural network can be designed to be more complex, but sufficient efficiency can be ensured due to the need to process the remaining windows.
  • cascading can help to combine the poor performance of the classifier, and at the same time, can obtain a certain efficiency guarantee. Because the image pixels of each level input are different in size, the network can learn multi-scale feature combination, which is easy to complete. The final recognition of the face.
  • the depth models in the related art are relatively large (the number of convolutional neural networks is large). If the model in the related art exceeds 15 MB, the face detection speed is relatively slow (more than 300 ms on the mainstream PC), and the real-time performance cannot be satisfied. Requirements.
  • the deep network architecture of the cascading result used in this application has the characteristics of high detection rate, low false detection, fast speed (less than 40ms on mainstream PC), small model, etc., which fully compensates for the shortcomings of the existing face detection methods.
  • the server returning the positioning result includes: returning location information of the face region located by the convolutional neural network, wherein the location information is used to indicate the location of the face region in the target image.
  • the number of faces. (x i , y i )(x i , y i ) represents the image coordinates of the upper left vertex of the face frame, and w i and h i represent the width and height of the face frame, respectively.
  • the face area as shown in the right figure is obtained, and the position information is returned to the object that initiated the request.
  • the above location information is information that can uniquely identify a face region in the image
  • the above (x i , y i , w i , h i ) is a representative representation of the location information.
  • the method can be adjusted as needed, such as returning the coordinates of any of the lower left corner, the lower right corner, and the lower right corner, and returning the width and height of the face frame; or the coordinates of the center point of the area, and returning the width of the face frame.
  • height you can also return the coordinates of any two points in the lower left corner, upper left corner, lower right corner, and lower right corner.
  • the application of face key point location, living body detection, face recognition and retrieval can be completed.
  • the eye in the face area can be located according to the relevant algorithm. , nose, mouth, eyebrows and other features.
  • CNN convolutional neural network
  • Step S902 learning the value of the parameter in the convolutional neural network.
  • the convolutional neural network can be applied to products in the above-mentioned public domain intelligent monitoring, hospital patient identification, station or airport automatic identification and other scenarios.
  • the recognition of the face area is completed by the following steps:
  • Step S904 inputting an image P to the first-level convolutional neural network net-1 of the server, and the face classification branch of net-1 will output a probability map Prob (as shown in FIG. 10), and a point corresponding person in the Prob The likelihood that the face will appear somewhere in the image P (ie confidence).
  • Set the threshold cls-1 to retain the position greater than cls-1 in Prob, assuming that the face frame is There are a total of m, and the face frame set is R 1 (that is, the candidate face frame set 1).
  • Step S906 the return branch R to adjust the net-1 by the human face of each block in a frame the face of the individual positions, to obtain a more accurate face frame set
  • Step S908 Each face frame performs a non-maximum suppression (NMS) process, that is, when the IoU of the two sample frames is greater than the threshold nms-1, the frame with low confidence is deleted, and the candidate after the process is deleted.
  • NMS non-maximum suppression
  • Step S910 Each sub-image in the corresponding original image P is scaled, such as an image that is scaled to a length of 24 and a width of 24, sequentially input to the second-level convolutional neural network net-2, and set a threshold cls-2, the person passing the net-2 Face classification branch, you can get The confidence of each candidate frame is preserved, and the face frame of the cls-2 with greater confidence is retained, and a new face frame set R 2 (ie, candidate face frame set 2) is obtained.
  • a threshold cls-2 the person passing the net-2 Face classification branch
  • Step S912 the adjustment of the return branch R in each individual face frame 2 by face framing net-2 position to give a more accurate face frame set
  • Step S914 In each face, the non-maximum suppression NMS is performed with the threshold nms-2, and the candidate box set is obtained.
  • Step S916 The corresponding sub-images are scaled into an image with a length and a width of 48, and the third-level convolutional neural network net-3 is input, and the threshold cls-3 is set, and the face classification branch of the net-3 can be obtained.
  • the confidence level of each candidate frame is reserved, and the face frame of the cls-3 with greater confidence is retained to obtain a new face frame set R 3 .
  • Step S918 the regression block 3 of each branch of facial adjust R by net-3 position of the face frame, to obtain a more accurate face frame set
  • Step S920 In each face, the non-maximum suppression NMS is performed with the threshold nms-3, and the face frame set is obtained. That is, each face position in the image P, such as the face area shown in FIG. After the identification of the face area is completed, the matching software in the above-mentioned public domain intelligent monitoring, hospital patient identification, station or airport automatic identification, etc., will identify the face area and each database in the database. The faces of the recorded identity are matched to identify the identity of the person to which each face region belongs.
  • the technical solution provided by the embodiment of the present application can provide services for various scenarios in the form of an SDK, which can make the detection rate of the face detection high and the false detection low, so that the face detection based on the deep learning can be performed on the mobile terminal in real time. become possible.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present application which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present application.
  • a computer apparatus for implementing the above method for determining a face area is also provided.
  • 11 is a schematic diagram of an optional computer device (or a device for determining a face region), including a memory and a processor, the processor being configured to run a computer program saved in the memory to receive: according to an embodiment of the present application: a positioning request, wherein the positioning request is used to request to locate a face region in the target image; the face positioning operation is performed on the target image by using a convolutional neural network to obtain a positioning result, wherein the convolutional neural network is used to call the graphics processor The convolution operation is performed on the target image, and the face positioning operation includes a convolution operation; and when the positioning result is used to indicate that the face region exists in the target image, the positioning result is returned.
  • the above computer program may include the following software modules: a receiving unit 112, a positioning unit 114, and a return unit 116.
  • the receiving unit 112 is configured to receive a positioning request, where the positioning request is used to request to locate a face region in the target image;
  • the positioning unit 114 is configured to perform a face positioning operation on the target image by using a convolutional neural network to obtain a positioning result, and the convolutional neural network is used to invoke a graphics processor to perform a convolution operation on the target image, and the face positioning operation includes a convolution operation. ;
  • the returning unit 116 is configured to return a positioning result if the positioning result is used to indicate that the presence of the face area in the target picture.
  • the receiving unit 112 in this embodiment may be used to perform step S302 in the embodiment of the present application.
  • the positioning unit 114 in this embodiment may be used to perform step S304 in the embodiment of the present application.
  • the returning unit 116 can be used to perform step S306 in the embodiment of the present application.
  • the foregoing modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the contents disclosed in the foregoing embodiments. It should be noted that the foregoing module may be implemented in a hardware environment as shown in FIG. 2 as part of a computer device, and may be implemented by software or by hardware.
  • the face image is operated by the convolutional neural network to obtain a positioning result, and when the positioning result is used to indicate that the face area exists in the target picture, the positioning result is returned.
  • the convolution operation is directly performed by calling the graphics processor through the full convolutional network in the convolutional neural network, and the hardware acceleration method is adopted instead of
  • the software processing method of the CPU scanning by region can solve the technical problem of poor real-time detection of the face detection in the related art, thereby achieving the technical effect of improving the real-time performance of the face detection.
  • the face detection algorithm in the related art has many problems in the general application scenario.
  • the feature-based face detection has a fast detection speed, but for a slightly complex scene, the detection rate of the algorithm is low and lacks robustness;
  • the Adaboost face detection algorithm is small in model and fast in detection speed, it is less robust to complex scenes, such as face detection in extreme scenes, such as masks, black-rimmed glasses, and blurred images. .
  • the candidate face frame set is output through net-1, and the candidate set is input to net-2 to obtain a more accurate candidate face frame set, and then the obtained candidate set is input. Go to net-3 and get the final set of face frames, which is the final face position. This is a process from coarse to fine.
  • the problem of poor real-time performance in related technologies can be solved under the premise of ensuring robustness, detection rate and accuracy, which are mainly embodied as follows:
  • Convolutional neural network CNN is used to express facial features. Compared with the face detection method based on Adaboost or SVM in related art, it has stronger Lu detection for scenes such as side face, dark light and blur.
  • the bar-shaped, convolutional neural network with a three-level cascade structure can ensure the accuracy of recognition;
  • the initial positioning and precise positioning of the face frame ie, the face area
  • a classification branch and a regression branch respectively, and the two branches share the middle layer, compared with some face detection methods appearing in the related art.
  • the model used (such as a model based on deep learning) reduces the size of the model, making the detection faster;
  • the first-level network in the three-level cascade structure of the present application adopts a full convolutional neural network instead of the traditional sliding window, and the full convolutional neural network directly calls the graphics processor GPU (The Graphics Processing Unit is processed so that the process of generating a candidate face frame is greatly accelerated.
  • the parameters in the convolutional neural network may be learned as follows: the processor is implemented by running a computer program saved in the memory: passing the picture in the picture set before receiving the positioning request The convolutional neural network is trained to determine the values of the parameters in the convolutional neural network, wherein the pictures in the set of pictures are images that include some or all of the face regions.
  • the above learning process mainly includes selecting the appropriate training data and training to obtain the parameter values in two parts.
  • the face region can be identified by the solution provided by the present application.
  • the solution provided by the present application E.g:
  • Location requests mainly include but are not limited to the following sources:
  • the solution of the present application can be integrated on the terminal or installed on the terminal in the form of the client A
  • the face location request initiated by the client B on the terminal to the terminal can be received
  • the client B It can be a client that needs to detect a face in real time, such as live beauty, face tracking, etc.
  • the terminal B communicates with the terminal A (for example, by WIFI, Bluetooth, NFC, etc.), the terminal A received face location request initiated by terminal B;
  • the third-level convolutional neural network of the present application works in a cascade manner, and the face frame set 1 of the first-level convolutional neural network can be used as the second-level convolutional neural network net-2 in the third-order convolutional neural network (ie, The input of the second convolutional neural network) is further filtered and filtered.
  • the output of the second-stage convolutional neural network for filtering and filtering can be used as the input of the third-order convolutional neural network (third convolutional neural network).
  • the output of the third-level convolutional neural network filter screening is the final result.
  • the processor is further configured to run a computer program saved in the memory: the convolution operation is performed on the target image by calling the graphics processor through the first convolutional neural network to obtain a convolution result, wherein the convolutional neural network includes a first convolutional neural network; determining, according to the convolution result, a confidence that the first region in the target picture is a face region; and determining a face region in the first region according to the confidence.
  • the processor is further configured to run a computer program saved in the memory: positionally adjusting the first region according to the position of the face reference feature in the first region, so that the face reference feature is located after the position adjustment
  • the preset position in the first area includes: determining the face area in the first area after the position adjustment according to the confidence level.
  • the processor is further configured to execute a computer program saved in the memory: executing a convolution algorithm on the first convolutional neural network by calling the graphics processor to perform a class on each of the first regions in the target image Identifying the feature, obtaining a convolution result, wherein the convolution result is used to indicate the feature of the first region of the one type of feature; and determining the confidence of the first region as the face region according to the feature of the first region of the class of features degree.
  • the convolutional neural network further comprises a second convolutional neural network and a third convolutional neural network, wherein the processor is further configured to run the following computer program saved in the memory: determining by the second convolutional neural network The second area is a confidence level of the face area, wherein the second area is an area in the first area with a confidence greater than the first threshold; the third convolutional neural network identifies the face area in the third area, wherein the third area The area is an area in the second area where the confidence is greater than the second threshold.
  • the region size of the second region is adjusted to a fourth threshold, and the fourth threshold is greater than a third threshold, for example, the pixel size is adjusted to
  • the "24*24" 3-channel image is then characterized by the second convolutional neural network to identify the second region after the region size adjustment.
  • the feature type identified here is identified by the first-order convolutional neural network. The feature types are different.
  • the confidence of the second region as the face region may be determined according to the identified features, and may be calculated by a preset feature matching algorithm.
  • the area size of the first area is not less than a third threshold
  • the processor is further configured to run a computer program saved in the memory: determining that the second area is a face area by the second convolutional neural network Before the degree, the area size of the second area is adjusted to a fourth threshold, wherein the fourth threshold is greater than the third threshold; the second area after the area size adjustment is characterized by the second convolutional neural network, and is identified according to the The feature is determined to be the confidence of the second region as the face region; the region size of the third region is adjusted to a fifth threshold, wherein the fifth threshold is greater than the fourth threshold; and the size of the passing region is adjusted by the third convolutional neural network
  • the third area performs feature recognition, and determines a face area in the third area based on the identified features.
  • the area size of the third area may be adjusted to a fifth threshold, and the fifth threshold is greater than a fourth threshold, for example, the third area is adjusted.
  • the image of "48*48" is used as the input of the third convolutional neural network, and the third region subjected to the area resizing is characterized by the third convolutional neural network, and the feature type identified here is the first type
  • the feature types identified by the level convolutional neural network and the second-level convolutional neural network are different.
  • the face region in the third region can be determined according to the identified features, for example, by using a preset feature matching algorithm. The calculation of the matching degree is calculated, and the third area with the highest matching degree is used as the face area.
  • the features identified by the first-stage convolutional neural network are relatively simple, and the discriminant threshold can be set loosely, so that a large number of non-face windows can be eliminated while maintaining a high recall rate;
  • the polar convolutional neural network and the second polar convolutional neural network can be designed to be more complex, but sufficient efficiency can be ensured due to the need to process the remaining windows.
  • cascading can help to combine the poor performance of the classifier, and at the same time, can obtain a certain efficiency guarantee. Because the image pixels of each level input are different in size, the network can learn multi-scale feature combination, which is easy to complete. The final recognition of the face.
  • the depth models in the related art are relatively large (the number of convolutional neural networks is large). If the model in the related art exceeds 15 MB, the face detection speed is relatively slow (more than 300 ms on the mainstream PC), and the real-time performance cannot be satisfied. Requirements.
  • the deep network architecture of the cascading result used in this application has the characteristics of high detection rate, low false detection, fast speed (less than 40ms on mainstream PC), small model, etc., which fully compensates for the shortcomings of the existing face detection methods.
  • the processor is further configured to run a computer program saved in the memory: returning location information of the face region located by the convolutional neural network, wherein the location information is used to indicate the location of the face region in the target image .
  • the number of faces. (x i , y i )(x i , y i ) represents the image coordinates of the upper left vertex of the face frame, and w i and h i represent the width and height of the face frame, respectively.
  • the above location information is information that can uniquely identify a face region in the image
  • the above (x i , y i , w i , h i ) is a representative representation of the location information.
  • the method can also be adjusted as needed, such as returning the coordinates of any of the lower left corner, the lower right corner, and the lower right corner, and returning the width and height of the face frame; or the coordinates of the center point of the area, and returning to the face frame. Width and height; you can also return the coordinates of any two points in the lower left corner, upper left corner, lower right corner, and lower right corner.
  • the application of face key point location, living body detection, face recognition and retrieval can be completed.
  • the eye in the face area can be located according to the relevant algorithm. , nose, mouth, eyebrows and other features.
  • CNN convolutional neural network
  • the foregoing modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the contents disclosed in the foregoing embodiments. It should be noted that the foregoing module may be implemented in a hardware environment as shown in FIG. 2 as part of the device, and may be implemented by software or by hardware, where the hardware environment includes a network environment.
  • a storage medium also referred to as a memory
  • the storage medium comprising a stored program, wherein the program is configured to execute any of the methods described above at runtime.
  • a server or terminal (also referred to as a computer device) for implementing the above-described method for determining a face area.
  • FIG. 12 is a structural block diagram of a terminal according to an embodiment of the present application.
  • the terminal may include: one or more (only one shown in FIG. 12) processor 1201, memory 1203, and transmission device. 1205 (such as the computer device in the above embodiment), as shown in FIG. 12, the terminal may further include an input and output device 1207.
  • the memory 1203 can be used to store a software program and a module, such as a program instruction/module corresponding to the method for determining a face area in the embodiment of the present application, and the processor 1201 executes by executing a software program and a module stored in the memory 1203.
  • the memory 1203 may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 1203 can further include memory remotely located relative to processor 1201, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the above-mentioned transmission device 1205 is used to receive or transmit data via a network, and can also be used for data transmission between the processor and the memory.
  • Specific examples of the above network may include a wired network and a wireless network.
  • the transmission device 1205 includes a Network Interface Controller (NIC) that can be connected to other network devices and routers via a network cable to communicate with the Internet or a local area network.
  • NIC Network Interface Controller
  • transmission device 1205 is a radio frequency (RF) module for communicating with the Internet wirelessly.
  • RF radio frequency
  • the memory 1203 is used to store an application.
  • the processor 1201 may invoke the application stored in the memory 1203 through the transmission device 1205 to perform the steps of: receiving a positioning request, wherein the positioning request is used to request to locate a face region in the target picture; and the target is convolved through the neural network.
  • the image performs a face positioning operation to obtain a positioning result, wherein the convolutional neural network is used to call a graphics processor to perform a convolution operation on the target image, the face positioning operation includes a convolution operation, and the positioning result is used to represent the target image. In the case where there is a face area, the positioning result is returned.
  • the processor 1201 is further configured to: perform a convolution operation on the target image by calling the graphics processor through the first convolutional neural network to obtain a convolution result, where the convolutional neural network includes the first convolutional neural network;
  • the convolution result determines the first region in the target picture as the confidence of the face region; the face region is determined in the first region according to the confidence.
  • the face image when receiving the positioning request, the face image is operated by the convolutional neural network to obtain a positioning result, and when the positioning result is used to indicate that the face area exists in the target picture, the method returns.
  • the result of the positioning, in the process of face recognition, in the preliminary identification is to directly call the graphics processor convolution operation through the full convolution network in the convolutional neural network, using this hardware acceleration method, and It is not a software processing method that scans the area by region by the CPU, and can solve the technical problem of poor real-time detection of the face detection in the related art, thereby achieving the technical effect of improving the real-time performance of the face detection.
  • the terminal can be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palm computer, and a mobile Internet device (MID). Terminal equipment such as PAD.
  • Figure 12 does not limit the structure of the above computer device.
  • the terminal may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 12, or have a different configuration than that shown in FIG.
  • Embodiments of the present application also provide a storage medium.
  • the foregoing storage medium may be used to execute program code of a method for determining a face area.
  • the foregoing storage medium may be located on at least one of the plurality of network devices in the network shown in the foregoing embodiment.
  • the storage medium is arranged to store program code for performing the following steps:
  • the storage medium is further arranged to store program code for performing the following steps:
  • the first convolutional neural network is used to invoke a graphics processor to perform a convolution operation on the target image to obtain a convolution result, and the convolutional neural network includes a first convolutional neural network;
  • the foregoing storage medium may include, but not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk e.g., a hard disk
  • magnetic memory e.g., a hard disk
  • the integrated unit in the above embodiment if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium.
  • the technical solution of the present application in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium.
  • a number of instructions are included to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the disclosed client may be implemented in other manners.
  • the embodiments of the computer device described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, i.e., may be located at one location, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种人脸区域的确定方法、存储介质、计算机设备。其中,该方法包括:接收定位请求,定位请求用于请求在目标图片中定位出人脸区域;通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。本申请解决了相关技术中进行人脸检测的实时性较差的技术问题。

Description

人脸区域的确定方法、存储介质、计算机设备
本申请要求于2017年4月11日提交中国专利局,申请号为2017102335906、发明名称“人脸区域的确定方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,具体而言,涉及一种人脸区域的确定方法、存储介质、计算机设备。
背景技术
人脸识别,是基于人的脸部特征信息进行身份识别的一种生物识别技术。用摄像机或摄像头采集含有人脸的图像或视频流,并自动在图像中检测和跟踪人脸,进而对检测到的人脸进行脸部的一系列相关技术,通常也叫做人像识别、面部识别。
人脸检测作为人脸识别、人脸关键点定位、人脸检索等应用的基础,一直以来受到广泛研究。人脸检测是从给定的一幅图像中,采用一定的方式判断图像中是否存在人脸,如果存在,则给出人脸的大小和位置,如图1所示,对左侧的图像进行检测,得到右侧图像,并标识出人脸区域(即虚线区域)。
虽然人类可以很容易的从一幅图像中找出人脸,但要计算机自动地检测出人脸仍然存在困难,其主要的难点来自于以下两个方面:人脸本身可以存在多种形式的细节变化,如不同的肤色、脸型、表情和人脸姿态带来的变化;图像中的人脸还会受到多种外部因素的影响,如光照、相机抖动、人脸上的装饰物带来的遮挡等。
在相关技术中,人脸检测方法多种多样,可以分为基于特征的检测方 法和基于统计模型的检测方法。基于特征的人脸检测方法主要是基于一些经验规则和人工构造的特征进行人脸检测,例如基于一些面部器官结构和纹理特征的检测方法;基于统计模型的检测方法虽然也需要在样本上先提取特征,但与基于特征的检测方法不同的是,基于统计模型的人脸检测不是纯粹的基于一些设定规则,而是采用大量的样本来训练检测器模型,常见的有基于支持向量机SVM(全称为Support Vector Machine)的人脸检测算法,基于Adaboost的人脸检测算法等。
评估人脸检测方法(也称为检测器)的常用指标主要有以下几种:(1)检测率,即在给定的图像集合中,被正确检测到的人脸数与图像中总的人脸数之间的比值;(2)错误检测数,即被当做人脸区域检测出来的,实际为非人脸区域的数量,理想的人脸检测器应该具有100%的检测率和0个错误检测数;(3)检测速度,从开始检测到正确定位出人脸区域所需要消耗的时间,目前很多应用中对检测速度有较高的要求,如直播美颜、人脸跟踪都需要实时地检测人脸,在检测率高,误检数低的情况下,检测速度自然越快越能提高用户的体验;(4)鲁棒性,用于表示在各种条件下,人脸检测器对环境的适应能力,检测器鲁棒性越高,在光照、人脸姿态、表情等变化以及人脸出现遮挡等情况下能准确地检测出人脸的概率越大。
为了克服上述提及的问题,实现对人脸区域的准确检测,利用相关技术中基于特征的检测方法时,由于需要使用经验规则和人工构造的特征,容易受到使用者主观因素的影响,无法保证人脸识别的检测率和鲁棒性;若利用相关技术中的基于统计模型的检测方法,目前常用的模型为了保证识别的准确度,往往设置的层数较多,会导致模型比较大,基本上这些模型均超过15MB,虽然层数较多会保证识别的准确率,但是层数的增加会带来人脸检测速度降低(在主流PC上大于300ms)的缺陷,无法满足实时性的要求。
针对相关技术中进行人脸检测的实时性较差的技术问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种人脸区域的确定方法、存储介质、计算机设备,以至少解决相关技术中进行人脸检测的实时性较差的技术问题。
根据本申请实施例的一个方面,提供了一种人脸区域的确定方法,该人脸区域的确定方法包括:接收定位请求,其中,定位请求用于请求在目标图片中定位出人脸区域;通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,其中,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
根据本申请实施例的另一方面,还提供了一种计算机设备(或称为人脸区域的确定装置),包括存储器和处理器,处理器被设置为运行存储器中保存的如下计算机程序:接收定位请求,其中,定位请求用于请求在目标图片中定位出人脸区域;通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,其中,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
根据本申请实施例的另一方面,还提供了一种存储介质,该存储介质包括存储的程序,其中,该程序被设置为运行时执行上述的任一种方法。
在本申请实施例中,在接收定位请求时,通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果,由于进行人脸识别的过程中,在初步识别中是通过卷积神经网络中的全卷积网络直接调用图形处理器对目标图片进行卷积操作,采用这种硬件加速的方式,而不是通过CPU进行逐个区域的扫描这一软件处理方式,可以解决相关技术中进行人脸检测的实时性较差的技术问题,进而达到了提高人脸检测的实时性的技术效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是相关技术中的可选的人脸区域的示意图;
图2是根据本申请实施例的人脸区域的确定方法的硬件环境的示意图;
图3是根据本申请实施例的一种可选的人脸区域的确定方法的流程图;
图4是根据本申请实施例的一种可选的人脸重合程度的示意图;
图5是根据本申请实施例的一种可选的样本的示意图;
图6是根据本申请实施例的一种可选的网络结构的示意图;
图7是根据本申请实施例的一种可选的人脸区域的示意图;
图8是根据本申请实施例的一种可选的人脸区域的示意图;
图9是根据本申请实施例的一种可选的人脸区域的确定方法的流程图;
图10是根据本申请实施例的一种可选的概率图的示意图;
图11是根据本申请实施例的一种可选的计算机设备的示意图;以及
图12是根据本申请实施例的一种终端的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语 “第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
首先,在对本申请实施例进行描述的过程中出现的部分名词或者术语适用于如下解释:
卷积神经网络(Convolutional Neural Network,CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现,它主要包括卷积层和池化层。
Adaboost:一种迭代算法,可用于针对同一训练集训练不同的分类器,然后将这些分类器集合起来构成一个更强的分类器。
根据本申请实施例,提供了一种人脸区域的确定方法的方法实施例。
可选地,在本实施例中,上述人脸区域的确定方法可以应用于如图2所示的由服务器202和终端204所构成的硬件环境中。如图2所示,服务器202通过网络与终端204进行连接,上述网络包括但不限于:广域网、城域网或局域网,终端204并不限定于PC、手机、平板电脑等。
例如,对于需要进行人脸区域识别的终端,可以直接在终端上集成本申请的方法所提供的人脸识别功能,或者安装用于实现本申请的方法的客户端,这样,终端在接收到用于请求在目标图片中定位出人脸区域的定位请求时,通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
再如,本申请所提供的方法还可以软件开发工具包SDK(Software Development Kit)的形式运行在服务器等设备上,以SDK的形式提供给应用使用,提供人脸区域识别功能的接口,其它设备通过提供的接口即可实现人脸区域的识别。服务器在接收到其它设备通过该接口发送的定位请求时,通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果给发起请求的设备。
图3是根据本申请实施例的一种可选的人脸区域的确定方法的流程图,如图3所示,该方法可以包括以下步骤:
步骤S302,服务器接收定位请求,定位请求用于请求在目标图片中定位出人脸区域;
步骤S304,服务器通过所部署的卷积神经网络对目标图片进行人脸定位操作,得到定位结果,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;
步骤S306,在定位结果用于表示目标图片中定位出存在人脸区域的情况下,服务器返回定位结果给发起定位请求的对象或者定位请求所指示的接收者。
本申请的计算机设备可以为上述服务器或终端,上述实施例以人脸区域的确定方法由服务器202来执行为例进行示意性说明,本申请的人脸区域的确定方法也可以由终端204来执行,即将上述步骤的执行主体更换为服务器即可,还可以是由服务器202和终端204共同执行,例如,由终端发起定位请求,由服务器来完成定位,并返回定位结果给终端。其中,终端204执行本申请实施例的人脸区域的确定方法也可以是由安装在其上的客户端来执行。为了描述的统一,后续以服务器执行上述方法为例来详述。
通过上述步骤S302至步骤S306,在接收定位请求时,通过卷积神经 网络对目标图片进行人脸定位操作,得到定位结果,在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果,由于进行人脸识别的过程中,在初步识别中是通过卷积神经网络中的全卷积网络直接调用图形处理器对目标图片进行卷积操作,采用这种硬件加速的方式,而不是通过CPU进行逐个区域的扫描这一软件处理方式,可以解决相关技术中进行人脸检测的实时性较差的技术问题,进而达到了提高人脸检测的实时性的技术效果。
相关技术中的人脸检测算法在通用应用场景下存在诸多问题,如基于特征的人脸检测虽然检测速度快,但对于稍复杂的场景该类算法的检测率偏低,缺乏鲁棒性;基于Adaboost人脸检测算法虽然模型小,检测速度也较快,但对于复杂场景的鲁棒性较差,如对于极端场景下的人脸检测,如戴口罩、戴黑框眼镜、模糊图像等检测场景。
而在本申请中,采用的卷积神经网络主要有三个,分别为第一级卷积神经网络net-1、第二级卷积神经网络net-2、第三级卷积神经网络net-3,这三个网络可采用级联结构,给定一幅图像,通过net-1后输出候选人脸框集合,将候选集合输入到net-2,得到较精准的候选人脸框集合,再将得到的候选集合输入到net-3,得到最终的人脸框集合,即是最终的人脸位置,这是一个定位由粗泛到精细的过程。使用本申请的方法,能够在保证鲁棒性、检测率及准确率的前提下解决相关技术中实时性较差的问题,主要体现如下:
(1)采用了卷积神经网CNN来表达人脸特征,相较于相关技术中的基于Adaboost或SVM的人脸检测方法,对于侧脸、暗光以及模糊等场景的检测具有更强的鲁棒性,同时采用三级级联结构的卷积神经网,能够保证识别的准确度;
(2)将人脸框(即人脸区域)的初始定位和精确定位分别用一个分类分支和回归分支来代替,两个分支共享中间层,相较于相关技术中出现的一些人脸检测方法所使用的模型(如基于深度学习的模型),减小了模 型的大小,使得检测速度更快;
(3)本申请的三级级联结构中的第一级网络采用了全卷积神经网络,代替了相关技术中的扫描窗(sliding window)的方式,全卷积神经网络直接调用GPU进行处理,使得生成候选人脸框的过程大大加快。
下面结合图3进一步详述本申请的实施例:
在执行步骤S302的接收定位请求之前,可以采用如下方式学习卷积神经网络中的参数:在服务器上通过图片集合中的图片对所部署的卷积神经网络进行训练,以确定卷积神经网络中参数的数值,图片集合中的图片为包括部分或者全部人脸区域的图像。上述的学习过程主要包括选择合适的训练数据和训练得到参数数值两个部分。
(1)选择合适的训练数据
为了使训练得到的模型参数更为准确,数据越丰富越好,在本申请中,作为一种可选的实施方式,训练以上的卷积神经网络的数据可分为三类:正样本、回归样本以及负样本,这三类样本基于样本中标识出的人脸区域(即人脸框)与真实人脸区域的IoU(Intersection over Union)来划分,IoU定义了两个框的重叠程度,样本人脸区域A框和真实人脸区域B框的公共面积A∩B(即相互重叠的部分),与样本人脸区域和真实人脸区域的面积之和A∪B的比值,即:
Figure PCTCN2018079551-appb-000001
如图4所示,在X轴和Y轴组成的二维平面中,A∩B为样本人脸区域A框和真实人脸区域B框的公共面积,A∪B为A框和B框占有的总面 积。
如图5所示,虚线的框为真实的人脸框(ground truth,真实人脸区域),实线的框为生成的样本框(即样本人脸区域),在进行训练时,可以从图5中得到训练所用的样本数据,如将样本人脸区域输入卷积神经网络。
为了使模型对噪声有较强的鲁棒性,三类样本可按照如下方式进行定义:正样本,IoU大于0.7的样本;回归样本,IoU在0.5~0.7之间的样本;负样本,IoU小于0.25的样本。
需要说明的是,本申请上述的分为三类样本进行训练仅为示意性的描述,为了使学习得到的卷积神经网络的参数更为准确,可以增加样本的学习量;同时对样本进行进一步细分,如划分为五类,IoU在0.8~1.0之间的为一类;IoU在0.6~0.8之间的为一类,以此类推。
完成训练数据准备之后,即可使用准备好的数据对卷积神经网络进行训练。
(2)训练过程
由图6中的网络结构可以看出,本申请所采用的网络结构可为双分支结构。其中一个分支为人脸分类分支(Face Classification),用以判断当前的输入是否含有人脸,得到候选人脸框集合,得到人脸框集合,另外一个分支为人脸框回归分支(Face Box Regression),用以在分类分支给出初始的人脸框坐标后,进行人脸区域的坐标调整,以得到精确的人脸框位置。
对于人脸分类分支,图6中三个Net的人脸分类分支优化的目标是最小化误差softmax loss,最终的分类神经元的softmax表达式为:
Figure PCTCN2018079551-appb-000002
在softmax表达式中,h为结果,θ是模型参数,k表示待估计的状态个数,本申请中可以为区分人脸和非人人脸的两个状态,因此k=2,i=1…m,M为一次前向过程中采用的样本数量,x i表示第i个输入,即训练样本,T为参数,
Figure PCTCN2018079551-appb-000003
用于对概率分布进行归一化处理,使得概率之和为1。
通过以上表达式即可得到待优化的代价函数
Figure PCTCN2018079551-appb-000004
(softmax loss)为:
Figure PCTCN2018079551-appb-000005
式中“1{·}”为示性函数,当表达式中值为真时,该函数的值为1。“y (i)”(也即表示为y i)为样本“xi”对应的标签,在训练过程每个样本即一幅图像,若不含人脸则标签为0,若包含人脸则标签为1,其余参数与softmax表达式中的相同,参数m与表达式中M相同。
对于人脸框回归分支,对于人脸分类分支得到的候选框,可包含四个维度的信息,即(x i,y i,w i,h i),如图7所示,一幅图像中人脸框位置如图7中实线框所示,虚线框为一个选取的样本示例。
待优化的欧式距离损失函数((Euclidean Loss)为:
Figure PCTCN2018079551-appb-000006
式中z i表示人脸框的四个维度R 4,故z i∈R 4
上述各维度信息使用的是相对量,以z i的第一个分量为例,即:
Figure PCTCN2018079551-appb-000007
式中
Figure PCTCN2018079551-appb-000008
表示真实人脸框的顶点坐标,x″为选取的样本框的顶点坐标。
z i为训练过程中输入的监督信息,
Figure PCTCN2018079551-appb-000009
为网络的实际输出,本申请中p=4,整个网络优化的目标是使以上的两个loss最小。
在利用前述的三类样本进行参数训练时,首先可以将卷积神经网络中的参数进行初始化,然后将样本图片输入到卷积神经网络中,得到卷积神经网络输出的结果(即人脸定位结果,包括识别出的IoU等),将输出的结果与真实的结果(如实际的IoU)通过上述的两个公式进行误差等信息的计算,若误差在允许范围内,则说明当前的参数是合理的;若误差不在允许范围内,则根据误差大小对参数进行调整,然后重新输入样本图片,再次将输出的结果与真实的结果通过上述的两个公式进行误差等信息的计算,直至调整参数后的卷积神经网络得到的结果误差在允许范围内为止。
在对卷积神经网络中的参数训练完毕之后,即可通过本申请提供的方法进行人脸区域的识别。例如:
在步骤S302提供的技术方案中,在接收定位请求时,定位请求主要包括但不局限于以下几种来源:
(1)在本申请的方法是集成在终端上、或者以客户端A的形式安装在终端上的情况下,可以接收终端上的客户端B向终端发起的人脸定位请求,该客户端B可以为直播美颜、人脸跟踪等需要实时地检测人脸的客户端;
(2)在本申请的方法是集成在终端A上、或者以客户端的形式安装在终端A上的情况下,终端B与终端A通讯连接(如通过WIFI、蓝牙、NFC等方式连接),终端A接收到的终端B发起的人脸定位请求;
(3)在本申请所提供的方法以软件开发工具包SDK的形式运行在服务器上的情况下,在服务器上接收到的其它设备通过调用接口发起的人脸定位请求,其它设备可以为手机、电脑、平板电脑等设备。
在步骤S304提供的技术方案中,本申请三级卷积神经网络为采用级联的方式工作,第一级卷积神经网络的人脸框集合1可以作为三级卷积神经网络中第二级卷积神经网络net-2(即第二卷积神经网络)的输入,进行进一步过滤筛选,第二级卷积神经网络进行过滤筛选的输出又可以作为第三级卷积神经网络(第三卷积神经网络)的输入,将第三级卷积神经网络过滤筛选的输出作为最终的结果。一种可选的实现方案如下:
在服务器通过卷积神经网络对目标图片进行人脸定位操作时,可通过三级卷积神经网络中的第一卷积神经网络net-1调用图形处理器对目标图片进行卷积操作,得到卷积结果,其中,卷积神经网络包括第一卷积神经网络;根据卷积结果确定目标图片中的第一区域为人脸区域的置信度;根据置信度在第一区域中确定出人脸区域。
在服务器通过第一卷积神经网络调用图形处理器对目标图片进行卷积操作,得到卷积结果时,可通过调用图形处理器执行第一卷积神经网络上的卷积算法,以对目标图片中的各个第一区域进行一类特征的识别,得到卷积结果,卷积结果用于指示一类特征中第一区域所具有的特征。这样在根据卷积结果确定目标图片中的第一区域为人脸区域的置信度时,即可根据一类特征中第一区域所具有的特征确定第一区域为人脸区域的置信度。
如图6所示,对于第一级卷积神经网络net-1,输入的图片的参数为12*12*3,“12*12”表示输入图片的像素大小至少为12*12(即第三阈值),也即支持识别的最小人脸区域为“12*12”,“3”表示为3通道的图像;第 一级卷积神经网络用于对较为粗粒度的人脸特征(即上述的一类特征)的识别,对于图片中的每个区域(即第一区域),包括识别出来的特征,然后用预先设置的特征匹配算法来确定该区域为人脸区域的置信度。最终将置信度大于第一阈值的第一区域放入候选人脸框集合1(集合中的区域记为第二区域)。
在根据置信度在第一区域中确定出人脸区域之前,为了使可能是人脸区域的第一区域中人脸处于较为居中的位置,可根据人脸参考特征在第一区域中的位置对第一区域进行位置调整,以使人脸参考特征位于经过位置调整后的第一区域中的预设位置。
采用上述的调整方式时,在根据置信度在第一区域中确定出人脸区域时,可根据置信度在经过位置调整后的第一区域中确定出人脸区域。
可选地,为了提高处理效率,在根据人脸参考特征在第一区域中的位置对第一区域进行位置调整时,可以对第一区域中的第二区域(即置信度大于第一阈值的第一区域)进行位置调整,避免资源的浪费。
采用上述的调整方式时,在根据置信度在第一区域中确定出人脸区域时,可以是根据置信度在经过位置调整后的第二区域中确定出人脸区域。
上述的人脸参考特征可以为人脸的面部特征(如鼻子、眼睛、嘴、眉毛等),某一固定的面部特征在人脸上的位置是相对固定的,例如对于鼻子而言,一般位于脸部居中的位置,也即在识别出第一区域中的鼻子之后,可以对第一区域进行调整,以使鼻子位于调整后第一区域中的中心位置。
本申请的卷积神经网络可以为三级卷积神经网络,第一级卷积神经网络主要完成脸部区域的初步识别,得到上述的候选人脸框集合1。
在根据置信度在第一区域中确定出人脸区域时,上述的人脸框集合1可以作为第二卷积神经网络输入,通过第二卷积神经网络确定人脸框集合1中第二区域为人脸区域的置信度,第二区域为第一区域中置信度大于第一阈值的区域。
可以在通过第二卷积神经网络确定第二区域为人脸区域的置信度之前,将第二区域的区域大小调整为第四阈值,第四阈值大于第三阈值,例如将像素大小调整为“24*24”的3通道图像,然后通过第二卷积神经网络对经过区域大小调整后的第二区域进行特征识别,此处识别的特征类型与前述的第一级卷积神经网络所识别的特征类型不同,完成识别后可根据识别出的特征确定第二区域为人脸区域的置信度,如可以通过预置的特征匹配算法进行计算。
在通过第二卷积神经网络确定第二区域为人脸区域的置信度之后,可将第二区域中置信度大于第二阈值的区域放入人脸框集合2(该集合中的区域记为第三区域)。然后可通过第三卷积神经网络识别出第三区域中的人脸区域。
可选地,在完成对第三区域的筛选之后,可以按照前述的对于第二区域的位置调整方法对人脸框集合2中的第三区域进行位置调整。
可选地,在通过第三卷积神经网络识别出第三区域中的人脸区域之前,可以将第三区域的区域大小调整为第五阈值,第五阈值大于第四阈值,例如将第三区域调整为“48*48”的图像作为第三卷积神经网络的输入,通过第三卷积神经网络对经过区域大小调整后的第三区域进行特征识别,此处所识别的特征类型与前述的第一级卷积神经网络和第二级卷积神经网络所识别的特征类型不同,完成识别后可根据识别出的特征确定第三区域中的人脸区域,例如,可以通过预置的特征匹配算法进行计算匹配度的计算,将匹配度最高的第三区域作为人脸区域。
在上述实施例中,第一级卷积神经网络所识别的特征比较简单,判别阈值可以设置得比较宽松,这样就可以在保持较高召回率的同时排除掉大量的非人脸窗口;第二极卷积神经网络和第二极卷积神经网络可以设计得比较复杂,但由于需要处理前面剩下的窗口,因此可以保证足够的效率。
采用级联的思想可以帮助去组合利用性能较差的分类器,同时又可以获得一定的效率保证,由于每一级输入的图像像素大小不一,可以使网络 学习到多尺度特征组合,便于完成对人脸的最终识别。
相关技术中的深度模型都比较大(卷积神经网络的级数较多),如相关技术中的模型超过15MB,导致人脸检测速度比较慢(在主流PC上大于300ms),无法满足实时性的要求。本申请采用的级联结果的深度网络架构具有检测率高、误检低、速度快(主流PC上小于40ms)、模型小等特点,充分弥补了已有的人脸检测方法的不足。
在步骤S306提供的技术方案中,服务器返回定位结果包括:返回卷积神经网络定位出的人脸区域的位置信息,其中,位置信息用于指示人脸区域在目标图片中的位置。
在相关的产品应用中,本申请可返回人脸框在图像中的位置信息,如位置信息(x i,y i,w i,h i),i=1…k,k为检测到的人脸个数。(x i,y i)(x i,y i)表示人脸框左上顶点的图像坐标,w i和h i分别表示人脸框的宽度和高度。如图8所示,对图8中左侧图像完成检测之后,得到如右侧图所示的人脸区域,并将位置信息返回给发起请求的对象。
需要说你的是,上述的位置信息为在图像中能够唯一确定出一个人脸区域的信息,上述的(x i,y i,w i,h i)为一种示意性的位置信息的表示方式,可以根据需要进行调整,如返回左下角、右下角以及右下角中任意一个角的坐标,并返回人脸框的宽度和高度;也可以区域中心点的坐标,并返回人脸框的宽度和高度;还可以返回左下角、左上角、右下角以及右下角中任意两个点的坐标。
在得到图像中的人脸位置之后可完成人脸关键点定位、活体检测、人脸识别与检索等应用,如对于人脸关键点定位而言,可以根据相关算法定位出人脸区域内的眼睛、鼻子、嘴、眉毛等特征部位。
在本申请的实施例中,采用基于卷积神经网络(CNN)的人脸检测方法,由于卷积网络对样本具有更强的特征表示能力,卷积神经网络的基于深度学习的人脸检测在多种复杂场景下能够取得更为优异的检测性能。
本申请的技术方案可以应用在例如公共领域的智能监控、医院的患者识别、车站或机场的自动身份识别等场景相关的产品中,下面结合图9和图10进一步详述本申请的实施例:
步骤S902,学习卷积神经网络中参数的数值。
在完成训练之后,即可将该卷积神经网络应用于上述公共领域的智能监控、医院的患者识别、车站或机场的自动身份识别等场景下的产品中。通过如下步骤完成人脸区域的识别:
步骤S904,输入一幅图像P到服务器的第一级卷积神经网络net-1,net-1的人脸分类分支将输出一个概率图Prob(如图10所示),Prob中一个点对应人脸在图像P中某个位置出现的可能性(即置信度)。设定阈值cls-1,将Prob中大于cls-1的位置保留,假设得到人脸框为
Figure PCTCN2018079551-appb-000010
共有m个,记人脸框集合为R 1(即候选人脸框集合1)。
步骤S906,通过net-1的人脸框回归分支调整R 1中各个人脸框的位置,得到更精确的人脸框集合
Figure PCTCN2018079551-appb-000011
步骤S908,将
Figure PCTCN2018079551-appb-000012
中各个人脸框执行非最大值抑制(non-maximum suppression,即NMS)过程,即在两个样本框的IoU大于阈值nms-1时,删除置信度低的框,经过该过程后的候选人脸框集合记为
Figure PCTCN2018079551-appb-000013
步骤S910,将
Figure PCTCN2018079551-appb-000014
对应的原图像P中的各个子图像进行缩放,如缩放为长、宽为24的图像,依次输入第二级卷积神经网络net-2,设定阈值cls-2,通过net-2的人脸分类分支,可以得到
Figure PCTCN2018079551-appb-000015
中各个候选框的置信度,将置信度大于的cls-2的人脸框保留,得到新的人脸框集合R 2(即候选人脸框集合2)。
步骤S912,通过net-2的人脸框回归分支调整R 2中各个人脸框的位置,得到更精确的人脸框集合
Figure PCTCN2018079551-appb-000016
步骤S914,将
Figure PCTCN2018079551-appb-000017
中各个人脸,以阈值nms-2执行非最大值抑制NMS, 得到候选框集合
Figure PCTCN2018079551-appb-000018
步骤S916,将
Figure PCTCN2018079551-appb-000019
对应的各个子图像缩放为长、宽均为48的图像,输入第三级卷积神经网络net-3,设定阈值cls-3,通过net-3的人脸分类分支,可以得到
Figure PCTCN2018079551-appb-000020
中各个候选框的置信度,将置信度大于的cls-3的人脸框保留,得到新的人脸框集合R 3
步骤S918,通过net-3的人脸框回归分支调整R 3中各个人脸框的位置,得到更精确的人脸框集合
Figure PCTCN2018079551-appb-000021
步骤S920,将
Figure PCTCN2018079551-appb-000022
中各个人脸,以阈值nms-3执行非最大值抑制NMS,得到人脸框集合
Figure PCTCN2018079551-appb-000023
即为图像P中的各个人脸位置,如图8中示出的人脸区域。在完成人脸区域的识别之后,上述公共领域的智能监控、医院的患者识别、车站或机场的自动身份识别等场景下的产品中匹配软件,会将识别出的人脸区域与数据库中各个已经记录身份的人脸进行匹配,从而识别出每个人脸区域所属的人物的身份。
使用本申请实施例提供的技术方案,可以SDK的形式为各类场景提供服务,可以使得人脸检测的检测率高、误检低,使得基于深度学习的人脸检测在移动端实时人脸检测成为可能。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如 ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
根据本申请实施例,还提供了一种用于实施上述人脸区域的确定方法的计算机设备。图11是根据本申请实施例的一种可选的计算机设备(或称为人脸区域的确定装置)的示意图,包括存储器和处理器,处理器被设置为运行存储器中保存的如下计算机程序:接收定位请求,其中,定位请求用于请求在目标图片中定位出人脸区域;通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,其中,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
如图11所示,上述计算机程序可以包括如下软件模块:接收单元112、定位单元114以及返回单元116。
接收单元112,用于接收定位请求,定位请求用于请求在目标图片中定位出人脸区域;
定位单元114,用于通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;
返回单元116,用于在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
需要说明的是,该实施例中的接收单元112可以用于执行本申请实施例中的步骤S302,该实施例中的定位单元114可以用于执行本申请实施例中的步骤S304,该实施例中的返回单元116可以用于执行本申请实施例中的步骤S306。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为 计算机设备的一部分可以运行在如图2所示的硬件环境中,可以通过软件实现,也可以通过硬件实现。
通过上述方案,在接收定位请求时,通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果,由于进行人脸识别的过程中,在初步识别中是通过卷积神经网络中的全卷积网络直接调用图形处理器对目标图片进行卷积操作,采用这种硬件加速的方式,而不是通过CPU进行逐个区域的扫描这一软件处理方式,可以解决相关技术中进行人脸检测的实时性较差的技术问题,进而达到了提高人脸检测的实时性的技术效果。
相关技术中的人脸检测算法在通用应用场景下存在诸多问题,如基于特征的人脸检测虽然检测速度快,但对于稍复杂的场景该类算法的检测率偏低,缺乏鲁棒性;基于Adaboost人脸检测算法虽然模型小,检测速度也较快,但对于复杂场景的鲁棒性较差,如对于极端场景下的人脸检测,如戴口罩、戴黑框眼镜、模糊图像等检测场景。
而在本申请中,采用的卷积神经网络主要有三个,分别为第一级卷积神经网络net-1、第二级卷积神经网络net-2、第三级卷积神经网络net-3,采用级联结构,给定一幅图像,通过net-1后输出候选人脸框集合,将候选集合输入到net-2,得到较精准的候选人脸框集合,再将得到的候选集合输入到net-3,得到最终的人脸框集合,即是最终的人脸位置,这是一个由粗到精的过程。使用本申请的方法,能够在保证鲁棒性、检测率及准确率的前提下解决相关技术中实时性较差的问题,主要体现如下:
(1)采用了卷积神经网CNN来表达人脸特征,相较于相关技术中的基于Adaboost或SVM的人脸检测方法,对于侧脸、暗光以及模糊等场景的检测具有更强的鲁棒性,同时采用三级级联结构的卷积神经网,能够保证识别的准确度;
(2)将人脸框(即人脸区域)的初始定位和精确定位分别用一个分类分支和回归分支来代替,两个分支共享中间层,相较于相关技术中出现 的一些人脸检测方法所使用的模型(如基于深度学习的模型),减小了模型的大小,使得检测速度更快;
(3)本申请的三级级联结构中的第一级网络采用了全卷积神经网络,代替了传统的扫描窗(sliding window)的方式,全卷积神经网络直接调用图形处理器GPU(Graphics Processing Unit)进行处理,使得生成候选人脸框的过程大大加快。
可选地,在进行人脸识别之前,可以采用如下方式学习卷积神经网络中的参数:如处理器通过运行存储器中保存的如下计算机程序实现:在接收定位请求之前,通过图片集合中的图片对卷积神经网络进行训练,以确定卷积神经网络中参数的数值,其中,图片集合中的图片为包括部分或者全部人脸区域的图像。上述的学习过程主要包括选择合适的训练数据和训练得到参数数值两个部分。
在对卷积神经网络中的参数训练完毕之后,即可通过本申请提供的方案进行人脸区域的识别。例如:
通过接收定位请求,定位请求用于请求在目标图片中定位出人脸区域。定位请求主要包括但不局限于以下几种来源:
(1)在本申请的方案可以集成在终端上、或者以客户端A的形式安装在终端上的情况下,可以接收终端上的客户端B向终端发起的人脸定位请求,该客户端B可以为直播美颜、人脸跟踪等需要实时地检测人脸的客户端;
(2)在本申请的方案可以集成在终端A上、或者以客户端的形式安装在终端A上的情况下,终端B与终端A通讯连接(如通过WIFI、蓝牙、NFC等方式连接),终端A接收到的终端B发起的人脸定位请求;
(3)在本申请所提供的方案可以软件开发工具包SDK的形式运行在服务器上的情况下,在服务器上接收到的其它设备通过调用接口发起的人脸定位请求,其它设备可以为手机、电脑、平板电脑等设备。
本申请三级卷积神经网络为采用级联的方式工作,第一级卷积神经网络的人脸框集合1可以作为三级卷积神经网络中第二级卷积神经网络net-2(即第二卷积神经网络)的输入,进行进一步过滤筛选,第二级卷积神经网络进行过滤筛选的输出又可以作为第三级卷积神经网络(第三卷积神经网络)的输入,将第三级卷积神经网络过滤筛选的输出作为最终的结果。
可选地,处理器还被设置为运行存储器中保存的如下计算机程序:通过第一卷积神经网络调用图形处理器对目标图片进行卷积操作,得到卷积结果,其中,卷积神经网络包括第一卷积神经网络;根据卷积结果确定目标图片中的第一区域为人脸区域的置信度;根据置信度在第一区域中确定出人脸区域。
可选地,处理器还被设置为运行存储器中保存的如下计算机程序:根据人脸参考特征在第一区域中的位置对第一区域进行位置调整,以使人脸参考特征位于经过位置调整后的第一区域中的预设位置;根据置信度在第一区域中确定出人脸区域包括:根据置信度在经过位置调整后的第一区域中确定出人脸区域。
可选地,处理器还被设置为运行存储器中保存的如下计算机程序:通过调用图形处理器执行第一卷积神经网络上的卷积算法,以对目标图片中的各个第一区域进行一类特征的识别,得到卷积结果,其中,卷积结果用于指示一类特征中第一区域所具有的特征;根据一类特征中第一区域所具有的特征确定第一区域为人脸区域的置信度。
可选地,卷积神经网络还包括第二卷积神经网络和第三卷积神经网络,其中,处理器还被设置为运行存储器中保存的如下计算机程序:通过第二卷积神经网络确定第二区域为人脸区域的置信度,其中,第二区域为第一区域中置信度大于第一阈值的区域;通过第三卷积神经网络识别出第三区域中的人脸区域,其中,第三区域为第二区域中置信度大于第二阈值的区域。
例如,可以在通过第二卷积神经网络确定第二区域为人脸区域的置信度之前,将第二区域的区域大小调整为第四阈值,第四阈值大于第三阈值,例如将像素大小调整为“24*24”的3通道图像,然后通过第二卷积神经网络对经过区域大小调整后的第二区域进行特征识别,此处识别的特征类型与前述的第一级卷积神经网络所识别的特征类型不同,完成识别后可根据识别出的特征确定第二区域为人脸区域的置信度,可以通过预置的特征匹配算法进行计算。
可选地,第一区域的区域大小不小于第三阈值,其中,处理器还被设置为运行存储器中保存的如下计算机程序:在通过第二卷积神经网络确定第二区域为人脸区域的置信度之前,将第二区域的区域大小调整为第四阈值,其中,第四阈值大于第三阈值;通过第二卷积神经网络对经过区域大小调整后的第二区域进行特征识别,并根据识别出的特征确定第二区域为人脸区域的置信度;将第三区域的区域大小调整为第五阈值,其中,第五阈值大于第四阈值;通过第三卷积神经网络对经过区域大小调整后的第三区域进行特征识别,并根据识别出的特征确定第三区域中的人脸区域。
也即在通过第三卷积神经网络识别出第三区域中的人脸区域之前,可以将第三区域的区域大小调整为第五阈值,第五阈值大于第四阈值,例如将第三区域调整为“48*48”的图像作为第三卷积神经网络的输入,通过第三卷积神经网络对经过区域大小调整后的第三区域进行特征识别,此处所识别的特征类型与前述的第一级卷积神经网络和第二级卷积神经网络所识别的特征类型不同,完成识别后可根据识别出的特征确定第三区域中的人脸区域,例如,可以通过预置的特征匹配算法进行计算匹配度的计算,将匹配度最高的第三区域作为人脸区域。
在上述实施例中,第一级卷积神经网络所识别的特征比较简单,判别阈值可以设置得比较宽松,这样就可以在保持较高召回率的同时排除掉大量的非人脸窗口;第二极卷积神经网络和第二极卷积神经网络可以设计得比较复杂,但由于需要处理前面剩下的窗口,因此可以保证足够的效率。
采用级联的思想可以帮助去组合利用性能较差的分类器,同时又可以获得一定的效率保证,由于每一级输入的图像像素大小不一,可以使网络学习到多尺度特征组合,便于完成对人脸的最终识别。
相关技术中的深度模型都比较大(卷积神经网络的级数较多),如相关技术中的模型超过15MB,导致人脸检测速度比较慢(在主流PC上大于300ms),无法满足实时性的要求。本申请采用的级联结果的深度网络架构具有检测率高、误检低、速度快(主流PC上小于40ms)、模型小等特点,充分弥补了已有的人脸检测方法的不足。
可选地,处理器还被设置为运行存储器中保存的如下计算机程序:返回卷积神经网络定位出的人脸区域的位置信息,其中,位置信息用于指示人脸区域在目标图片中的位置。
在相关的产品应用中,本申请可返回人脸框在图像中的位置信息,如位置信息(x i,y i,w i,h i),i=1…k,k为检测到的人脸个数。(x i,y i)(x i,y i)表示人脸框左上顶点的图像坐标,w i和h i分别表示人脸框的宽度和高度。
需要说你的是,上述的位置信息为在图像中能够唯一确定出一个人脸区域的信息,上述的(x i,y i,w i,h i)为一种示意性的位置信息的表示方式,也可以根据需要进行调整,如返回左下角、右下角以及右下角中任意一个角的坐标,并返回人脸框的宽度和高度;也可以区域中心点的坐标,并返回人脸框的宽度和高度;还可以返回左下角、左上角、右下角以及右下角中任意两个点的坐标。
在得到图像中的人脸位置之后可完成人脸关键点定位、活体检测、人脸识别与检索等应用,如对于人脸关键点定位而言,可以根据相关算法定位出人脸区域内的眼睛、鼻子、嘴、眉毛等特征部位。
在本申请的实施例中,采用基于卷积神经网络(CNN)的人脸检测方法,由于卷积网络对样本具有更强的特征表示能力,卷积神经网络的基于深度学习的人脸检测在多种复杂场景下能够取得更为优异的检测性能。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在如图2所示的硬件环境中,可以通过软件实现,也可以通过硬件实现,其中,硬件环境包括网络环境。
根据本申请实施例的另一方面,还提供了一种存储介质(也称为存储器),该存储介质包括存储的程序,其中,该程序被设置为运行时执行上述的任一种方法。
根据本申请实施例,还提供了一种用于实施上述人脸区域的确定方法的服务器或终端(也称为计算机设备)。
图12是根据本申请实施例的一种终端的结构框图,如图12所示,该终端可以包括:一个或多个(图12中仅示出一个)处理器1201、存储器1203、以及传输装置1205(如上述实施例中的计算机设备),如图12所示,该终端还可以包括输入输出设备1207。
其中,存储器1203可用于存储软件程序以及模块,如本申请实施例中的人脸区域的确定方法对应的程序指令/模块,处理器1201通过运行存储在存储器1203内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的人脸区域的确定方法。存储器1203可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1203可进一步包括相对于处理器1201远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置1205用于经由一个网络接收或者发送数据,还可以用于处理器与存储器之间的数据传输。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1205包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置 1205为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器1203用于存储应用程序。
处理器1201可以通过传输装置1205调用存储器1203存储的应用程序,以执行下述步骤:接收定位请求,其中,定位请求用于请求在目标图片中定位出人脸区域;通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,其中,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
处理器1201还用于执行下述步骤:通过第一卷积神经网络调用图形处理器对目标图片进行卷积操作,得到卷积结果,其中,卷积神经网络包括第一卷积神经网络;根据卷积结果确定目标图片中的第一区域为人脸区域的置信度;根据置信度在第一区域中确定出人脸区域。
采用本申请实施例,在接收定位请求时,通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果,由于进行人脸识别的过程中,在初步识别中是通过卷积神经网络中的全卷积网络直接调用图形处理器对目标图片进行卷积操作,采用这种硬件加速的方式,而不是通过CPU进行逐个区域的扫描这一软件处理方式,可以解决相关技术中进行人脸检测的实时性较差的技术问题,进而达到了提高人脸检测的实时性的技术效果。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
本领域普通技术人员可以理解,图12所示的结构仅为示意,终端可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图 12其并不对上述计算机设备的结构造成限定。例如,终端还可包括比图12中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图12所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本申请的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于执行人脸区域的确定方法的程序代码。
可选地,在本实施例中,上述存储介质可以位于上述实施例所示的网络中的多个网络设备中的至少一个网络设备上。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:
S11,接收定位请求,定位请求用于请求在目标图片中定位出人脸区域;
S12,通过卷积神经网络对目标图片进行人脸定位操作,得到定位结果,卷积神经网络用于调用图形处理器对目标图片进行卷积操作,人脸定位操作包括卷积操作;
S13,在定位结果用于表示目标图片中定位出存在人脸区域的情况下,返回定位结果。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:
S21,通过第一卷积神经网络调用图形处理器对目标图片进行卷积操作,得到卷积结果,卷积神经网络包括第一卷积神经网络;
S22,根据卷积结果确定目标图片中的第一区域为人脸区域的置信度;
S23,根据置信度在第一区域中确定出人脸区域。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的计算机设备实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地 方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (16)

  1. 一种人脸区域的确定方法,包括:
    计算机设备接收定位请求,其中,所述定位请求用于请求在目标图片中定位出人脸区域;
    所述计算机设备通过卷积神经网络对所述目标图片进行人脸定位操作,得到定位结果,其中,所述卷积神经网络用于调用图形处理器对所述目标图片进行卷积操作,所述人脸定位操作包括所述卷积操作;
    在所述定位结果用于表示所述目标图片中定位出存在人脸区域的情况下,所述计算机设备返回所述定位结果。
  2. 根据权利要求1所述的方法,其中,通过卷积神经网络对所述目标图片进行人脸定位操作包括:
    通过第一卷积神经网络调用所述图形处理器对所述目标图片进行所述卷积操作,得到卷积结果,其中,所述卷积神经网络包括所述第一卷积神经网络;
    根据所述卷积结果确定所述目标图片中的第一区域为所述人脸区域的置信度;
    根据所述置信度在所述第一区域中确定出所述人脸区域。
  3. 根据权利要求2所述的方法,其中,
    通过第一卷积神经网络调用所述图形处理器对所述目标图片进行所述卷积操作,得到卷积结果包括:通过调用所述图形处理器执行所述第一卷积神经网络上的卷积算法,以对所述目标图片中的各个所述第一区域进行一类特征的识别,得到所述卷积结果,其中,所述卷积结果用于指示所述一类特征中所述第一区域所具有的特征;
    根据所述卷积结果确定所述目标图片中的第一区域为所述人脸区域的置信度包括:根据所述一类特征中所述第一区域所具有的特征 确定所述第一区域为所述人脸区域的置信度。
  4. 根据权利要求2所述的方法,其中,所述卷积神经网络还包括第二卷积神经网络和第三卷积神经网络,其中,根据所述置信度在所述第一区域中确定出所述人脸区域包括:
    通过所述第二卷积神经网络确定第二区域为所述人脸区域的置信度,其中,所述第二区域为所述第一区域中置信度大于第一阈值的区域;
    通过所述第三卷积神经网络识别出第三区域中的人脸区域,其中,所述第三区域为所述第二区域中置信度大于第二阈值的区域。
  5. 根据权利要求4所述的方法,其中,所述第一区域的区域大小不小于第三阈值,其中,
    在通过所述第二卷积神经网络确定第二区域为所述人脸区域的置信度之前,所述方法还包括:所述计算机设备将所述第二区域的区域大小调整为第四阈值,其中,所述第四阈值大于所述第三阈值;
    通过所述第二卷积神经网络确定第二区域为所述人脸区域的置信度包括:通过所述第二卷积神经网络对经过区域大小调整后的所述第二区域进行特征识别,并根据识别出的特征确定所述第二区域为所述人脸区域的置信度;
    在通过所述第三卷积神经网络识别出第三区域中的人脸区域之前,所述方法还包括:所述计算机设备将所述第三区域的区域大小调整为第五阈值,其中,所述第五阈值大于所述第四阈值;
    通过所述第三卷积神经网络识别出第三区域中的人脸区域包括:通过所述第三卷积神经网络对经过区域大小调整后的所述第三区域进行特征识别,并根据识别出的特征确定所述第三区域中的人脸区域。
  6. 根据权利要求2至5中任意一项所述的方法,其中,
    在根据所述置信度在所述第一区域中确定出所述人脸区域之前,所述方法还包括:所述计算机设备根据人脸参考特征在所述第一区域 中的位置对所述第一区域进行位置调整,以使所述人脸参考特征位于经过位置调整后的所述第一区域中的预设位置;
    根据所述置信度在所述第一区域中确定出所述人脸区域包括:根据所述置信度在经过位置调整后的所述第一区域中确定出所述人脸区域。
  7. 根据权利要求1所述的方法,其中,在接收所述定位请求之前,所述方法还包括:
    所述计算机设备通过图片集合中的图片对所述卷积神经网络进行训练,以确定所述卷积神经网络中参数的数值,其中,所述图片集合中的图片为包括部分或者全部人脸区域的图像。
  8. 根据权利要求1所述的方法,其中,返回所述定位结果包括:
    返回所述卷积神经网络定位出的所述人脸区域的位置信息,其中,所述位置信息用于指示所述人脸区域在所述目标图片中的位置。
  9. 一种计算机设备,包括存储器和处理器,所述处理器被设置为运行所述存储器中保存的如下计算机程序:
    接收定位请求,其中,所述定位请求用于请求在目标图片中定位出人脸区域;
    通过卷积神经网络对所述目标图片进行人脸定位操作,得到定位结果,其中,所述卷积神经网络用于调用图形处理器对所述目标图片进行卷积操作,所述人脸定位操作包括所述卷积操作;
    在所述定位结果用于表示所述目标图片中定位出存在人脸区域的情况下,返回所述定位结果。
  10. 根据权利要求9所述的计算机设备,其中,所述处理器还被设置为运行所述存储器中保存的如下计算机程序:
    通过第一卷积神经网络调用所述图形处理器对所述目标图片进行所述卷积操作,得到卷积结果,其中,所述卷积神经网络包括所述 第一卷积神经网络;
    根据所述卷积结果确定所述目标图片中的第一区域为所述人脸区域的置信度;
    根据所述置信度在所述第一区域中确定出所述人脸区域。
  11. 根据权利要求10所述的计算机设备,其中,所述处理器还被设置为运行所述存储器中保存的如下计算机程序:
    通过调用所述图形处理器执行所述第一卷积神经网络上的卷积算法,以对所述目标图片中的各个所述第一区域进行一类特征的识别,得到所述卷积结果,其中,所述卷积结果用于指示所述一类特征中所述第一区域所具有的特征;
    根据所述一类特征中所述第一区域所具有的特征确定所述第一区域为所述人脸区域的置信度。
  12. 根据权利要求10所述的计算机设备,其中,所述卷积神经网络还包括第二卷积神经网络和第三卷积神经网络,其中,所述处理器还被设置为运行所述存储器中保存的如下计算机程序:
    通过所述第二卷积神经网络确定第二区域为所述人脸区域的置信度,其中,所述第二区域为所述第一区域中置信度大于第一阈值的区域;
    通过所述第三卷积神经网络识别出第三区域中的人脸区域,其中,所述第三区域为所述第二区域中置信度大于第二阈值的区域。
  13. 根据权利要求12所述的计算机设备,其中,所述第一区域的区域大小不小于第三阈值,其中,所述处理器还被设置为运行所述存储器中保存的如下计算机程序:
    在通过所述第二卷积神经网络确定第二区域为所述人脸区域的置信度之前,将所述第二区域的区域大小调整为第四阈值,其中,所述第四阈值大于所述第三阈值;通过所述第二卷积神经网络对经过区域大小调整后的所述第二区域进行特征识别,并根据识别出的特征确 定所述第二区域为所述人脸区域的置信度;
    将所述第三区域的区域大小调整为第五阈值,其中,所述第五阈值大于所述第四阈值;通过所述第三卷积神经网络对经过区域大小调整后的所述第三区域进行特征识别,并根据识别出的特征确定所述第三区域中的人脸区域。
  14. 根据权利要求10至13中任意一项所述的计算机设备,其中,所述处理器还被设置为运行所述存储器中保存的如下计算机程序:
    根据人脸参考特征在所述第一区域中的位置对所述第一区域进行位置调整,以使所述人脸参考特征位于经过位置调整后的所述第一区域中的预设位置;
    根据所述置信度在所述第一区域中确定出所述人脸区域包括:根据所述置信度在经过位置调整后的所述第一区域中确定出所述人脸区域。
  15. 根据权利要求9所述的计算机设备,其中,所述处理器还被设置为运行所述存储器中保存的如下计算机程序:
    在接收所述定位请求之前,通过图片集合中的图片对所述卷积神经网络进行训练,以确定所述卷积神经网络中参数的数值,其中,所述图片集合中的图片为包括部分或者全部人脸区域的图像。
  16. 一种存储介质,其中,所述存储介质中存储有计算机程序,所述计算机程序被设置为运行时执行权利要求1至8中任意一项所述的方法。
PCT/CN2018/079551 2017-04-11 2018-03-20 人脸区域的确定方法、存储介质、计算机设备 WO2018188453A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710233590.6A CN107145833A (zh) 2017-04-11 2017-04-11 人脸区域的确定方法和装置
CN201710233590.6 2017-04-11

Publications (1)

Publication Number Publication Date
WO2018188453A1 true WO2018188453A1 (zh) 2018-10-18

Family

ID=59773604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079551 WO2018188453A1 (zh) 2017-04-11 2018-03-20 人脸区域的确定方法、存储介质、计算机设备

Country Status (2)

Country Link
CN (1) CN107145833A (zh)
WO (1) WO2018188453A1 (zh)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345770A (zh) * 2018-11-14 2019-02-15 深圳市尼欧科技有限公司 一种孩童遗留车内报警系统及孩童遗留车内报警方法
CN110472728A (zh) * 2019-07-30 2019-11-19 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN110879983A (zh) * 2019-11-18 2020-03-13 讯飞幻境(北京)科技有限公司 一种人脸特征关键点的提取方法和一种人脸图像合成方法
CN110969189A (zh) * 2019-11-06 2020-04-07 杭州宇泛智能科技有限公司 人脸检测方法、装置及电子设备
CN111274886A (zh) * 2020-01-13 2020-06-12 天地伟业技术有限公司 一种基于深度学习的行人闯红灯违法行为分析方法及系统
CN111401247A (zh) * 2020-03-17 2020-07-10 杭州趣维科技有限公司 一种基于级联卷积神经网络的人像分割方法
CN111428628A (zh) * 2020-03-23 2020-07-17 北京每日优鲜电子商务有限公司 人脸检测方法、装置、设备及存储介质
CN111553216A (zh) * 2020-04-20 2020-08-18 维沃移动通信有限公司 图像处理方法、电子设备及存储介质
CN111753598A (zh) * 2019-03-29 2020-10-09 中国联合网络通信集团有限公司 人脸检测方法及设备
CN111770299A (zh) * 2020-04-20 2020-10-13 厦门亿联网络技术股份有限公司 一种智能视频会议终端的实时人脸摘要服务的方法及系统
CN111767832A (zh) * 2020-06-28 2020-10-13 北京百度网讯科技有限公司 模型生成方法、装置、电子设备及存储介质
CN111783601A (zh) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111814568A (zh) * 2020-06-11 2020-10-23 开易(北京)科技有限公司 一种用于驾驶员状态监测的目标检测方法及装置
CN111931661A (zh) * 2020-08-12 2020-11-13 桂林电子科技大学 一种基于卷积神经网络的实时口罩佩戴检测方法
CN113095284A (zh) * 2021-04-30 2021-07-09 平安国际智慧城市科技股份有限公司 人脸选取方法、装置、设备及计算机可读存储介质
CN113128320A (zh) * 2020-01-16 2021-07-16 浙江舜宇智能光学技术有限公司 基于tof相机的人脸活体检测方法、检测装置及电子设备
CN113343927A (zh) * 2021-07-03 2021-09-03 郑州铁路职业技术学院 一种适用于面瘫患者的智能化人脸识别方法和系统
CN113361413A (zh) * 2021-06-08 2021-09-07 南京三百云信息科技有限公司 一种里程显示区域检测方法、装置、设备及存储介质
CN114764925A (zh) * 2020-12-30 2022-07-19 北京眼神智能科技有限公司 口罩佩戴检测方法、装置、计算机可读存储介质及设备
CN116012649A (zh) * 2022-12-30 2023-04-25 东莞理工学院 用于医学图像的集成学习投票分类方法、系统及终端
CN118254663A (zh) * 2024-05-31 2024-06-28 天和骏行智能装备(福建)有限公司 一种电力工程抢险车

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145833A (zh) * 2017-04-11 2017-09-08 腾讯科技(上海)有限公司 人脸区域的确定方法和装置
CN107644208A (zh) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN107644209A (zh) * 2017-09-21 2018-01-30 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN107704836B (zh) * 2017-10-17 2021-10-08 电子科技大学 基于物体检测的疲劳驾驶检测方法
CN108010078B (zh) * 2017-11-29 2020-06-26 中国科学技术大学 一种基于三级卷积神经网络的物体抓取检测方法
CN107992844B (zh) * 2017-12-14 2022-01-18 离娄科技(北京)有限公司 基于深度学习的人脸识别系统及方法
US10574890B2 (en) 2018-01-12 2020-02-25 Movidius Ltd. Methods and apparatus to operate a mobile camera for low-power usage
CN108363962B (zh) * 2018-01-25 2021-01-26 南京邮电大学 一种基于多层次特征深度学习的人脸检测方法及系统
CN110147703B (zh) * 2018-08-20 2023-10-31 腾讯科技(深圳)有限公司 人脸关键点检测方法、装置及存储介质
CN109146906B (zh) * 2018-08-22 2021-03-23 Oppo广东移动通信有限公司 图像处理方法和装置、电子设备、计算机可读存储介质
US10915995B2 (en) 2018-09-24 2021-02-09 Movidius Ltd. Methods and apparatus to generate masked images based on selective privacy and/or location tracking
CN109784207B (zh) * 2018-12-26 2020-11-24 深圳云天励飞技术有限公司 一种人脸识别方法、装置及介质
CN109993086B (zh) * 2019-03-21 2021-07-27 北京华捷艾米科技有限公司 人脸检测方法、装置、系统及终端设备
CN110096964B (zh) * 2019-04-08 2021-05-04 厦门美图之家科技有限公司 一种生成图像识别模型的方法
CN110135279B (zh) * 2019-04-23 2021-06-08 深圳神目信息技术有限公司 一种基于人脸识别的预警方法、装置、设备及计算机可读介质
CN110046602A (zh) * 2019-04-24 2019-07-23 李守斌 基于分类的深度学习人脸检测方法
CN110236530A (zh) * 2019-06-20 2019-09-17 武汉中旗生物医疗电子有限公司 一种心电信号qrs波群定位方法、装置及计算机存储介质
CN111079625B (zh) * 2019-12-11 2023-10-27 江苏国光信息产业股份有限公司 一种摄像头自动跟随人脸转动的控制方法
CN111583671B (zh) * 2020-06-05 2022-05-31 南京信息职业技术学院 一种毫米波雷达路口车流量监测方法及其系统
CN112257491B (zh) * 2020-08-20 2021-12-24 江苏正赫通信息科技有限公司 自适应调度人脸识别和属性分析方法及装置
CN112967292B (zh) * 2021-03-01 2022-03-04 焦点科技股份有限公司 一种针对电商产品的自动抠图与评分方法及系统
CN113222973B (zh) * 2021-05-31 2024-03-08 深圳市商汤科技有限公司 图像处理方法及装置、处理器、电子设备及存储介质
CN117503062B (zh) * 2023-11-21 2024-04-09 欣颜时代(广州)技术有限公司 美容仪的神经检测控制方法、装置、设备和存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148079A1 (en) * 2014-11-21 2016-05-26 Adobe Systems Incorporated Object detection using cascaded convolutional neural networks
CN105912990A (zh) * 2016-04-05 2016-08-31 深圳先进技术研究院 人脸检测的方法及装置
CN105975961A (zh) * 2016-06-28 2016-09-28 北京小米移动软件有限公司 人脸识别的方法、装置及终端
CN106295678A (zh) * 2016-07-27 2017-01-04 北京旷视科技有限公司 神经网络训练与构建方法和装置以及目标检测方法和装置
CN106295502A (zh) * 2016-07-25 2017-01-04 厦门中控生物识别信息技术有限公司 一种人脸检测方法及装置
CN106355573A (zh) * 2016-08-24 2017-01-25 北京小米移动软件有限公司 图片中目标物的定位方法及装置
CN106446862A (zh) * 2016-10-11 2017-02-22 厦门美图之家科技有限公司 一种人脸检测方法及系统
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN107145833A (zh) * 2017-04-11 2017-09-08 腾讯科技(上海)有限公司 人脸区域的确定方法和装置
CN107688786A (zh) * 2017-08-30 2018-02-13 南京理工大学 一种基于级联卷积神经网络的人脸检测方法

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160148079A1 (en) * 2014-11-21 2016-05-26 Adobe Systems Incorporated Object detection using cascaded convolutional neural networks
US20170083752A1 (en) * 2015-09-18 2017-03-23 Yahoo! Inc. Face detection
CN105912990A (zh) * 2016-04-05 2016-08-31 深圳先进技术研究院 人脸检测的方法及装置
CN105975961A (zh) * 2016-06-28 2016-09-28 北京小米移动软件有限公司 人脸识别的方法、装置及终端
CN106295502A (zh) * 2016-07-25 2017-01-04 厦门中控生物识别信息技术有限公司 一种人脸检测方法及装置
CN106295678A (zh) * 2016-07-27 2017-01-04 北京旷视科技有限公司 神经网络训练与构建方法和装置以及目标检测方法和装置
CN106355573A (zh) * 2016-08-24 2017-01-25 北京小米移动软件有限公司 图片中目标物的定位方法及装置
CN106446862A (zh) * 2016-10-11 2017-02-22 厦门美图之家科技有限公司 一种人脸检测方法及系统
CN107145833A (zh) * 2017-04-11 2017-09-08 腾讯科技(上海)有限公司 人脸区域的确定方法和装置
CN107688786A (zh) * 2017-08-30 2018-02-13 南京理工大学 一种基于级联卷积神经网络的人脸检测方法

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109345770A (zh) * 2018-11-14 2019-02-15 深圳市尼欧科技有限公司 一种孩童遗留车内报警系统及孩童遗留车内报警方法
CN111753598A (zh) * 2019-03-29 2020-10-09 中国联合网络通信集团有限公司 人脸检测方法及设备
CN110472728A (zh) * 2019-07-30 2019-11-19 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN110472728B (zh) * 2019-07-30 2023-05-23 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN110969189A (zh) * 2019-11-06 2020-04-07 杭州宇泛智能科技有限公司 人脸检测方法、装置及电子设备
CN110969189B (zh) * 2019-11-06 2023-07-25 杭州宇泛智能科技有限公司 人脸检测方法、装置及电子设备
CN110879983B (zh) * 2019-11-18 2023-07-25 讯飞幻境(北京)科技有限公司 一种人脸特征关键点的提取方法和一种人脸图像合成方法
CN110879983A (zh) * 2019-11-18 2020-03-13 讯飞幻境(北京)科技有限公司 一种人脸特征关键点的提取方法和一种人脸图像合成方法
CN111274886A (zh) * 2020-01-13 2020-06-12 天地伟业技术有限公司 一种基于深度学习的行人闯红灯违法行为分析方法及系统
CN111274886B (zh) * 2020-01-13 2023-09-19 天地伟业技术有限公司 一种基于深度学习的行人闯红灯违法行为分析方法及系统
CN113128320A (zh) * 2020-01-16 2021-07-16 浙江舜宇智能光学技术有限公司 基于tof相机的人脸活体检测方法、检测装置及电子设备
CN113128320B (zh) * 2020-01-16 2023-05-16 浙江舜宇智能光学技术有限公司 基于tof相机的人脸活体检测方法、检测装置及电子设备
CN111401247A (zh) * 2020-03-17 2020-07-10 杭州趣维科技有限公司 一种基于级联卷积神经网络的人像分割方法
CN111401247B (zh) * 2020-03-17 2023-07-28 杭州小影创新科技股份有限公司 一种基于级联卷积神经网络的人像分割方法
CN111428628A (zh) * 2020-03-23 2020-07-17 北京每日优鲜电子商务有限公司 人脸检测方法、装置、设备及存储介质
CN111770299B (zh) * 2020-04-20 2022-04-19 厦门亿联网络技术股份有限公司 一种智能视频会议终端的实时人脸摘要服务的方法及系统
CN111770299A (zh) * 2020-04-20 2020-10-13 厦门亿联网络技术股份有限公司 一种智能视频会议终端的实时人脸摘要服务的方法及系统
CN111553216A (zh) * 2020-04-20 2020-08-18 维沃移动通信有限公司 图像处理方法、电子设备及存储介质
CN111814568A (zh) * 2020-06-11 2020-10-23 开易(北京)科技有限公司 一种用于驾驶员状态监测的目标检测方法及装置
CN111814568B (zh) * 2020-06-11 2022-08-02 开易(北京)科技有限公司 一种用于驾驶员状态监测的目标检测方法及装置
CN111783601B (zh) * 2020-06-24 2024-04-26 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111783601A (zh) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 人脸识别模型的训练方法、装置、电子设备及存储介质
CN111767832A (zh) * 2020-06-28 2020-10-13 北京百度网讯科技有限公司 模型生成方法、装置、电子设备及存储介质
CN111931661A (zh) * 2020-08-12 2020-11-13 桂林电子科技大学 一种基于卷积神经网络的实时口罩佩戴检测方法
CN114764925A (zh) * 2020-12-30 2022-07-19 北京眼神智能科技有限公司 口罩佩戴检测方法、装置、计算机可读存储介质及设备
CN113095284A (zh) * 2021-04-30 2021-07-09 平安国际智慧城市科技股份有限公司 人脸选取方法、装置、设备及计算机可读存储介质
CN113361413A (zh) * 2021-06-08 2021-09-07 南京三百云信息科技有限公司 一种里程显示区域检测方法、装置、设备及存储介质
CN113343927A (zh) * 2021-07-03 2021-09-03 郑州铁路职业技术学院 一种适用于面瘫患者的智能化人脸识别方法和系统
CN116012649A (zh) * 2022-12-30 2023-04-25 东莞理工学院 用于医学图像的集成学习投票分类方法、系统及终端
CN116012649B (zh) * 2022-12-30 2023-09-19 东莞理工学院 用于医学图像的集成学习投票分类方法、系统及终端
CN118254663A (zh) * 2024-05-31 2024-06-28 天和骏行智能装备(福建)有限公司 一种电力工程抢险车

Also Published As

Publication number Publication date
CN107145833A (zh) 2017-09-08

Similar Documents

Publication Publication Date Title
WO2018188453A1 (zh) 人脸区域的确定方法、存储介质、计算机设备
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
TWI677825B (zh) 視頻目標跟蹤方法和裝置以及非易失性電腦可讀儲存介質
WO2021077984A1 (zh) 对象识别方法、装置、电子设备及可读存储介质
González-Briones et al. A multi-agent system for the classification of gender and age from images
WO2022078041A1 (zh) 遮挡检测模型的训练方法及人脸图像的美化处理方法
KR102174595B1 (ko) 비제약형 매체에 있어서 얼굴을 식별하는 시스템 및 방법
US8805018B2 (en) Method of detecting facial attributes
CN111767900B (zh) 人脸活体检测方法、装置、计算机设备及存储介质
WO2022252642A1 (zh) 基于视频图像的行为姿态检测方法、装置、设备及介质
TW202026948A (zh) 活體檢測方法、裝置以及儲存介質
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN112836625A (zh) 人脸活体检测方法、装置、电子设备
CN112446322B (zh) 眼球特征检测方法、装置、设备及计算机可读存储介质
CN109389002A (zh) 活体检测方法及装置
CN110728242A (zh) 基于人像识别的图像匹配方法、装置、存储介质及应用
CN112149615A (zh) 人脸活体检测方法、装置、介质及电子设备
WO2023000253A1 (zh) 攀爬行为预警方法和装置、电子设备、存储介质
WO2021203718A1 (zh) 人脸识别方法及系统
WO2023279799A1 (zh) 对象识别方法、装置和电子系统
CN112434647A (zh) 一种人脸活体检测方法
CN112766065A (zh) 一种移动端考生身份认证方法、装置、终端及存储介质
US11605220B2 (en) Systems and methods for video surveillance
CN112700568B (zh) 一种身份认证的方法、设备及计算机可读存储介质
WO2024198475A1 (zh) 人脸活体识别方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18784448

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18784448

Country of ref document: EP

Kind code of ref document: A1