CN118097755A - Intelligent face identity recognition method based on YOLO network - Google Patents
Intelligent face identity recognition method based on YOLO network Download PDFInfo
- Publication number
- CN118097755A CN118097755A CN202410289557.5A CN202410289557A CN118097755A CN 118097755 A CN118097755 A CN 118097755A CN 202410289557 A CN202410289557 A CN 202410289557A CN 118097755 A CN118097755 A CN 118097755A
- Authority
- CN
- China
- Prior art keywords
- yolo
- face
- image
- bounding box
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000001514 detection method Methods 0.000 claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000002372 labelling Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 230000001629 suppression Effects 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013178 mathematical model Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 23
- 238000005516 engineering process Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/166—Detection; Localisation; Normalisation using acquisition arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for identifying intelligent face identity based on a YOLO network, which comprises the following steps: acquiring video data, and acquiring a labeled face recognition image dataset based on the video data; building a Yolo intelligent face identification model based on the Yolo v5 model, and training the Yolo intelligent face identification model based on the labeled face identification image data set to obtain a trained Yolo intelligent face identification model; and inputting the video data to be detected into a trained YOLO intelligent face identification model to detect so as to obtain a target detection result. The invention not only improves the accuracy of evaluation, but also reduces the dependence on a large amount of training data, so that the system is easier to deploy and apply in educational environments with limited resources.
Description
Technical Field
The invention relates to the technical field of education, in particular to a method for identifying intelligent face identity based on a YOLO network.
Background
Labor education has a unique and critical position in a primary and secondary education system, and aims to cultivate practical skills, team cooperation spirit and social responsibility feeling of students. However, in the traditional education system, the evaluation of the labor education mainly depends on direct observation of teachers and self-reporting of students, and the method often causes inconsistent and unreliable evaluation results and actual situations due to inherent subjective judgment and non-quantitative characteristics, and often cannot accurately evaluate the performances and progress of the students in the labor education. Such inaccuracy of assessment not only affects improvement of educational quality, but may also lead to imbalance in development of student labor skills and value perspective. Therefore, making an objective, accurate and efficient labor education detection method is important to improving the achievement and overall quality of labor education.
In recent years, with the development of computer vision and deep learning technology, intelligent data acquisition and analysis technology is increasingly applied in the education field. In particular to a face detection identity recognition technology based on a YOLO network, which provides a new solution for the evaluation of labor education. Compared with the traditional method, the technology automatically identifies and records the participation of students in labor education activities by analyzing video data, thereby improving the objectivity and efficiency of evaluation and reducing the consumption of human resources.
Although the YOLO network-based face detection technology is excellent in terms of still image recognition, its accuracy and stability are still to be improved in terms of real-time video processing and dynamic scene recognition, especially in a complex labor education environment. Furthermore, efficient operation of these systems typically requires training of large amounts of data, with high demands on the quality and diversity of the data sets. Therefore, in order to solve the technical problems, the invention provides a method for identifying the intelligent face based on the YOLO network.
Disclosure of Invention
The invention aims to provide a method for identifying intelligent face identity based on a YOLO network, which aims to solve the problems in the prior art.
The invention provides a method for identifying intelligent face identity based on a YOLO network, which comprises the following steps:
acquiring video data, and acquiring a labeled face recognition image data set based on the video data;
Building a Yolo intelligent face identity recognition model based on a Yolo v5 model, and training the Yolo intelligent face identity recognition model based on the labeled face recognition image dataset to obtain a trained Yolo intelligent face identity recognition model;
and inputting the video data to be detected into the trained YOLO intelligent face identification model to detect so as to obtain a target detection result.
Optionally, the process of obtaining the annotated face recognition image dataset based on the video data includes:
Carrying out image interception on the video data by taking a frame as a unit to obtain a plurality of face images;
Recognizing a plurality of face images based on a YOLO v5 model to obtain a face boundary box corresponding to each image;
Obtaining corresponding initial boundary frame coordinates and face confidence scores based on the face boundary frames;
Transforming the initial boundary frame coordinates corresponding to the face boundary frame to obtain transformed boundary frame coordinates;
And carrying out category identification labeling on the face image based on Labelimg software to obtain a labeled face image, wherein the labeled face image corresponds to a labeling file, and the labeling file comprises: file name, face confidence score, class name and transformed bounding box coordinates for each labeling object.
Optionally, the process of transforming the initial bounding box coordinates corresponding to the face bounding box to obtain transformed bounding box coordinates includes:
carrying out standardization processing on the initial boundary frame coordinates based on the original image size and different image sizes after the model is input to obtain standardized coordinates;
Performing inverse standardization on the standardized coordinates to obtain transformed boundary frame coordinates;
The calculation formula for carrying out standardization processing on the initial boundary frame coordinates is as follows:
Where W orig denotes the width of the original image, H orig denotes the height of the original image, x min denotes the x-coordinate of the upper left corner of the bounding box, y min denotes the y-coordinate of the upper left corner of the bounding box, x max denotes the x-coordinate of the lower right corner of the bounding box, y max denotes the y-coordinate of the lower right corner of the bounding box, x norm denotes the normalized value of the x-coordinate of the center point of the bounding box with respect to the width of the image, y norm denotes the normalized value of the y-coordinate of the center point of the bounding box with respect to the height of the image, W norm denotes the normalized value of the width of the bounding box with respect to the width of the image, and H norm denotes the normalized value of the height of the bounding box with respect to the height of the image.
Optionally, the calculation formula of the transformed bounding box coordinate obtained by performing inverse normalization on the normalized coordinate is:
optionally, the YOLO intelligent face identity recognition model includes a backhaul module and a Head module, and the process of obtaining the prediction result based on the backhaul module and the Head module includes:
performing feature extraction on the labeled face recognition image based on the Backbone module to obtain a multi-layer feature map;
the Head module fuses the multi-layer feature images based on a C3 layer to obtain a final output feature image;
Predicting the final output feature map to obtain a prediction result, wherein the prediction result comprises: the feature map predicts bounding boxes, categories, and confidence of the object.
Optionally, constructing a total loss function based on the prediction result and the true value of the annotation file, and performing model optimization based on the total loss function, wherein a mathematical model of the total loss function is as follows:
Ltotal=λCIoULCIoU+λFocalLFocal+λcross-entropyLcross-entropy
Where L CIoU denotes a bounding box loss function, L Focal denotes a background noise loss function, and L cross-entropy denotes a multi-class loss function.
Optionally, the YOLO smart face identification model further includes a process of filtering the bounding box based on non-maximum suppression.
Optionally, inputting the video data to be detected into the trained YOLO intelligent face identity recognition model to detect to obtain a class label and a confidence coefficient of the target person, and obtaining a target detection result based on the class label and the confidence coefficient of the target person;
The calculation formulas of the category labels and the confidence coefficient are as follows:
category = arg max c Pclass (c)
P=Pobj×Pclass(c)
Where arg max c represents the value of c for finding the maximization of P class (c), P class (c) represents the probability that the prediction box belongs to class c, P obj represents the confidence that the object exists, and P represents the integrated confidence.
The invention has the following technical effects:
the model of the invention shows higher accuracy and robustness when processing dynamic scenes and complex backgrounds. In addition, the model is particularly optimized for the labor education environment, so that different illumination conditions and diversified student behaviors can be processed more effectively. The model not only improves the accuracy of evaluation, but also reduces the dependence on a large amount of training data, so that the system is easier to deploy and apply in an educational environment with limited resources.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a schematic diagram of a network structure of a YOLO v5 model in an embodiment of the present invention;
FIG. 2 is a schematic diagram of the structure of each network module of YOLO v5 in an embodiment of the present invention;
FIG. 3 is a graph of a YOLO v5 model training and validation loss function and evaluation index in an embodiment of the invention;
Fig. 4 is a flowchart of a YOLO v5 intelligent face identification method in an embodiment of the invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
Example 1
A face detection and identity recognition method based on an improved YOLO network model is specially designed for a middle and primary school labor education scene. The main purpose of the method is to improve the objectivity and efficiency of labor education assessment through intelligent data acquisition and analysis technology. The method includes collecting and processing video data, automatically labeling with YOLO v5, generating an image dataset, and improving accuracy and stability of face detection by training and optimizing a YOLO model. At the same time, the model is particularly optimized for the labor education environment to deal with challenges of dynamic scenes and complex backgrounds. The invention provides a more accurate and efficient assessment method for middle and primary school labor education through an advanced computer vision technology.
As shown in fig. 4, the present embodiment optimizes the labor education scenario of primary and secondary schools to improve the accuracy of face detection and identity recognition, and optimizes the data training and processing flow to better adapt to the complex education environment, and the present embodiment discloses a method for intelligent face identity recognition based on a YOLO network, which comprises the following steps:
Acquiring video data, and acquiring a labeled face recognition image dataset based on the video data; the specific implementation process comprises the following steps:
s1, collecting video data related to face recognition, and intercepting images in each video according to the number of frames to form a picture sequence so as to construct a face recognition image data set; and (3) automatically marking by using YOLO, generating a file of the name of the png/. Jpg and the name of the txt, and checking and identifying missing by using Labelimg software on the face image dataset and improving the precision.
Based on the YOLO v5 model, a YOLO intelligent face identity recognition model is built, and based on the labeled face recognition image data set, the YOLO intelligent face identity recognition model is trained to obtain a trained YOLO intelligent face identity recognition model, and the specific implementation process comprises the following steps:
and S2, training and optimizing the YOLO v5 model by using the image data set in the S1, and storing the trained optimal weight file to obtain the YOLO intelligent face identification model.
Inputting video data to be detected into a trained YOLO intelligent face identity recognition model to detect to obtain a target detection result, wherein the specific implementation process comprises the following steps:
S3, inputting video data to be detected or pictures to be detected in the Yolo intelligent face identification model, detecting the video data and outputting a corresponding target detection result.
Further, step S1 includes:
S101: collecting video image data with character images in different multiple forms; and intercepting images in each video according to the frame number to form a picture sequence, and constructing a face image data set after eliminating pictures without face graphics in the images.
S102: automatic labeling with YOLO, YOLO v5 runs on the input image and outputs a series of detected face bounding boxes, each bounding box containing bounding box coordinates x min,ymin,xmax,ymax along with a confidence score of whether a face is contained within a bounding box, assuming the original image size is W orig,Horig and the image size input to the model is W input,Hinput, to ensure that the detection result fits to the original image size, the detected bounding box coordinates need to be adjusted according to the original image size, and the scaled center point coordinates can be expressed as:
s1021: the scaling of width and height can be expressed as:
s103: in order to enable the object detection model to process images with multiple resolutions independent of image sizes, the applicability and detection accuracy of the model on different image sizes are improved, and the coordinates of the object detection model are subjected to standardized processing, which can be expressed as follows:
S104: to ensure that the normalized output of the model maps precisely back to the original image size, so that the detection result has direct usability and accuracy, the coordinate anti-normalization process is performed on the detection result, which can be expressed as:
s105: converting each bounding box information calculated into a standardized data format and generating a file of png/. Jpg and txt, assuming that I is a annotated image, the save process can be expressed as:
SaveAsJPG(I,path)
S1051: assuming that there are N Detection results, each of which Detection i contains bounding box coordinates and confidence, the process of outputting the txt file may be expressed as:
OutputTXT({Detection1,Detection2,...,DetectionN},path)
Detectioni={Bi,Ci}
Bi=(xmin,i,ymin,i,xmax,i,ymax,i)
Wherein B i is the boundary frame coordinates of the ith detection result; c i is the confidence of the ith detection result;
s106: carrying out detection and identification omission on the face image dataset by Labelimg software and improving the precision; each image (. Png/. Jpg) corresponds to a. Txt file; the file includes the following information: file name, confidence score, category name, and coordinate information of each labeling target.
Further, step S2 includes: as shown in figures 1-2 of the drawings,
S201: the Backbone network of the YOLO v5 model is a CSPDARKNET structure, and the CSPDARKNET structure comprises a back bone module and a Head module which are connected with each other.
S2011: the back box module is used for extracting features of an input image and converting the features into a multi-layer feature map, the Head module is used for fusing the feature maps of different layers output by the back box module, the detection capability of the model on objects of different scales is enhanced, and the feature maps predict the bounding boxes, the categories and the confidence of the objects and output.
S2012: the backup module and the Head module both comprise a C3 layer (CSP Bottleneck with, 3, convolutions), and the C3 layer is used for dividing, convolving and combining the feature images to output new two-dimensional feature images; assume that the input feature map is:
Wherein H, W, and D are height, width, and depth, respectively; the C3 layer first partitions the input feature map into two parts F 1 and F 2 in depth, such that:
S2013: selecting F 1 for convolution, setting the operation as a function C (-), and taking the processed characteristic diagram as C (F 1); the convolved C (F 1) is recombined with the unprocessed F 2, and the output signature F out can be expressed as:
Fout=Concat[C(F1),F2]
Wherein, To better accommodate the subsequent identity detection task, an additional convolutional layer C' (. Cndot.) is added for F out for feature extraction enhancement, expressed as:
F′out=C′(Fout)
where F' out is the final output profile.
S202: predicting the coupled feature map of the Head module; the predictive calculation formula is:
bx=2σ(tx)-0.5+cx
by=2σ(ty)-0.5+cy
wherein b x,by represents the x and y coordinates of the center coordinates of the prediction bounding box by a convolution operation; t x,ty represents the original output of the model for the center position offset of each grid cell; c x,cy denotes the center coordinates of the grid cell; b w denotes the width of the prediction bounding box, and b h denotes the height of the prediction bounding box; p w,ph denotes the width and height of the a priori frame; t w,th represents the original output of the model for resizing; λ represents a scale factor that is adjustable according to the face data characteristics;
S203: for detecting whether a face exists in a specific boundary box, setting object confidence, and calculating the probability of detecting an object in a specific boundary box, wherein the calculation formula is as follows:
Object confidence=σ(to)
Where t o is the original output of the model.
S204: the Head module is used for predicting the category to which the object may belong in each bounding box. Specifically, for multi-category detection tasks, the module can output a corresponding probability score for each category, so that accurate identification and classification of each category of objects are effectively realized; wherein the class score vector is a set of raw scores that the model outputs for each detected object, each score representing the relative confidence that the object belongs to a particular class, assuming that the model detects K classes, for each detected object the model outputs a class score vector of length K, expressed as:
t=[t1,t2,t3,...,tK]
where t i represents the original score of the model predictive object belonging to the ith class.
S205: in order to determine the probability that each detected object belongs to each predefined category, in face detection, each object only belongs to one category, and for each bounding box, according to the category score vector output by the model, a multi-category probability distribution is obtained, and the function is expressed as:
where P i represents the predicted probability that the object belongs to the ith class.
S206: setting model loss function
S2061: the difference between the bounding box predicted by the input loss function quantization model and the corresponding real bounding box, the bounding box loss function can be expressed as:
Wherein IoU denotes the cross-over ratio between the predicted box b predict and the real box b gt; ρ (b predict,bgt) represents the euclidean distance between the predicted box and the true box center point; c represents the diagonal length of the smallest closed frame containing the predicted frame and the real frame; alpha represents a trade-off parameter; v quantifies the aspect ratio consistency of the prediction and real frames.
S2062: adding an object confidence loss function, improving the overall accuracy when detecting an object and reducing false positive prediction, effectively distinguishing an object from a non-object region, wherein the object confidence loss function can be expressed as:
LBCE(O,y)=-[y·log(O)+(1-y)·log(1-O)]
wherein O represents the object confidence of model prediction; y is the true label, 1 if there is an object, and 0 if there is no object.
S2063: in the task of face detection for identification, the background area is usually far more than the area containing the object, which results in unbalanced category, and the characteristic of adding Focal Loss can be expressed as:
LFocal(O,y)=-α·[y·(1-O)γ·log(O)+(1-y)·Oγ·log(1-O)]
wherein γ represents an index that adjusts the contribution of the easily classified sample to loss; alpha represents the weight that balances the positive and negative samples.
S2064: in the task of face recognition identity detection, the difference between different individuals may be very subtle, in order to ensure that the face is matched with the identity, and ensure the consistency between the probability distribution predicted by the model and the actual label distribution, a multi-class loss function is added, which can be expressed as:
Wherein N represents the number of predicted bounding boxes; c represents the number of possible identity categories; y i,c is an indicator, which indicates whether the true class of the ith bounding box is class c, if so, it is 1, otherwise it is 0; p i,c represents the probability that the model predicts that the ith bounding box belongs to category c.
S2065: in a model for face detection identity recognition, a total loss function forms a key evaluation index for model training, ensures that the performance of a plurality of detection dimensions is comprehensively considered in the training process of the model, promotes the model to achieve double targets of high efficiency and accuracy when carrying out accurate object detection and classification, and is expressed as:
Ltotal=λCIoULCIoU+λFocalLFocal+λcross-entropyLcross-entropy
where λ represents a hyper-parameter for balancing the weights of the various parts of the loss function.
S207: in a YOLO v5 architecture of face detection identity recognition, an optimization method of a back propagation algorithm is adopted to refine parameter configuration of a neural network; firstly, calculating the value of a loss function through a forward propagation process, and then reversely transmitting layer by utilizing a chain rule to accurately calculate the gradient value of the loss function on each network parameter. The network parameters are adjusted and updated through a gradient descent rule so as to achieve the purpose of reducing the total loss function value; further, through continuous iterative optimization process, the network model is ensured to have higher recognition rate and positioning accuracy for the object in the detection task after multiple times of learning, so that the generalization capability of the model to new data is remarkably improved; the algorithm can be expressed as:
Wherein θ old represents the old value of the parameter before the current iteration; η represents the learning rate, which is a super-parameter used to control the scale of the gradient applied in each parameter update; Representing the gradient of the total loss function L total with respect to the parameter θ; the rate of change of the loss function with respect to the parameter θ, i.e., the direction and extent to which the parameter θ needs to be adjusted, is indicated such that the loss function value decreases.
S208: after the training phase of the YOLO v5 intelligent face identification model is completed, non-maximum suppression (NMS) is adopted as a key post-processing technology. The method is specially used for analyzing a plurality of face boundary frames identified by the model, removing redundant frames with high overlapping degree, only preserving the frame with highest confidence, optimizing the output quality of the model, and ensuring the high efficiency and reliability of the model in complex image processing tasks; for each of the other bounding boxes B i, its intersection ratio with the bounding box B max with the highest confidence is calculated as:
Wherein, if IoU of B i and B max exceeds a preset threshold, ioU (B max,Bi) > IoU threshold, inhibiting B i; the box with the highest confidence is selected again from the remaining bounding boxes and the above procedure is repeated until all bounding boxes have been considered.
S209: after finishing the non-maximum suppression (NMS) step, implementing a threshold filtering technique, aiming at further refining the detection result after NMS, and ensuring that the output bounding box has high confidence and accuracy; for each NMS-back bounding box B i, its object confidence O i and class probability P i are calculated, and the threshold filtering performance can be expressed as:
Reserved bounding box = { B i|Oi>θconfidence,Pi>θprobability }
Where θ confidence and θ probability represent thresholds of explicitly set confidence and class probability, respectively.
S210: in the case that YOLO v5 is used for the intelligent face identification model, the boundary box which is finally reserved after threshold filtering is regarded as a positive sample block of the model; and training and optimizing the YOLO v5 model by using the image data set in the S1 in a mode of obtaining a positive sample block, and storing an optimal weight file best. Pt after training to obtain a YOLO intelligent face identity recognition model and a training result.
S211: dividing a face image data set into a training set, a verification set and a test set; the training set is used for training the YOLO v5 model, and the verification set is used for verifying the YOLO v5 model after training is completed so as to evaluate the training result of the YOLO v5 model; the test set is used for testing the YOLO v5 model to judge the recognition accuracy of the YOLO v5 model.
S212: as shown in fig. 3, in step S2, evaluation indexes of the YOLO face detection identity recognition intelligent recognition model include an accuracy rate (P), a recall rate (R), a mean average accuracy (mAP) and a F1 Score (F1 Score), and the calculation formula is as follows:
where TP represents the number of correctly detected samples, FP represents the number of erroneously detected samples, FN represents the number of undetected samples, and N represents the number of categories.
Further, step S3 includes:
S301: inputting video data to be detected into a YOLO intelligent face identity recognition model;
the method comprises the steps that a YOLO intelligent face identity recognition model processes video data to be detected, a processed video is output, if a target detection person appears in a picture of the processed video, a prediction frame is used for identifying the target person to determine a category label, confidence coefficients are displayed corresponding to each prediction frame, and a calculation formula of the category label determination and the confidence coefficients (P) is as follows:
category = arg max cPclass (c)
P=Pobj×Pclass(c)
Combining the category label with the corresponding comprehensive confidence coefficient to obtain a final output:
Final output= (category, P)
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.
Claims (8)
1. A method for identifying intelligent face identity based on a YOLO network is characterized by comprising the following steps:
acquiring video data, and acquiring a labeled face recognition image data set based on the video data;
Building a Yolo intelligent face identity recognition model based on a Yolo v5 model, and training the Yolo intelligent face identity recognition model based on the labeled face recognition image dataset to obtain a trained Yolo intelligent face identity recognition model;
and inputting the video data to be detected into the trained YOLO intelligent face identification model to detect so as to obtain a target detection result.
2. The YOLO network-based intelligent face identification method of claim 1, wherein the process of obtaining the annotated face recognition image dataset based on the video data comprises:
Carrying out image interception on the video data by taking a frame as a unit to obtain a plurality of face images;
Recognizing a plurality of face images based on a YOLO v5 model to obtain a face boundary box corresponding to each image;
Obtaining corresponding initial boundary frame coordinates and face confidence scores based on the face boundary frames;
Transforming the initial boundary frame coordinates corresponding to the face boundary frame to obtain transformed boundary frame coordinates;
And carrying out category identification labeling on the face image based on Labelimg software to obtain a labeled face image, wherein the labeled face image corresponds to a labeling file, and the labeling file comprises: file name, face confidence score, class name and transformed bounding box coordinates for each labeling object.
3. The intelligent face identification method based on the YOLO network according to claim 2, wherein the process of transforming the initial bounding box coordinates corresponding to the face bounding box to obtain transformed bounding box coordinates comprises the following steps:
carrying out standardization processing on the initial boundary frame coordinates based on the original image size and different image sizes after the model is input to obtain standardized coordinates;
Performing inverse standardization on the standardized coordinates to obtain transformed boundary frame coordinates;
The calculation formula for carrying out standardization processing on the initial boundary frame coordinates is as follows:
Where W orig denotes the width of the original image, H orig denotes the height of the original image, x min denotes the x-coordinate of the upper left corner of the bounding box, y min denotes the y-coordinate of the upper left corner of the bounding box, x max denotes the x-coordinate of the lower right corner of the bounding box, y max denotes the y-coordinate of the lower right corner of the bounding box, x norm denotes the normalized value of the x-coordinate of the center point of the bounding box with respect to the width of the image, y norm denotes the normalized value of the y-coordinate of the center point of the bounding box with respect to the height of the image, W norm denotes the normalized value of the width of the bounding box with respect to the width of the image, and H norm denotes the normalized value of the height of the bounding box with respect to the height of the image.
4. The YOLO network-based intelligent face identification method according to claim 3, wherein the calculation formula for obtaining the transformed bounding box coordinates by performing inverse normalization on the normalized coordinates is:
5. The YOLO network-based intelligent face identification method of claim 4, wherein the YOLO intelligent face identification model comprises a backbox module and a Head module, and the face identification is performed based on the backbox module and the Head module, so that the process of obtaining the prediction result comprises the following steps:
performing feature extraction on the labeled face recognition image based on the Backbone module to obtain a multi-layer feature map;
the Head module fuses the multi-layer feature images based on a C3 layer to obtain a final output feature image;
Predicting the final output feature map to obtain a prediction result, wherein the prediction result comprises: the feature map predicts bounding boxes, categories, and confidence of the object.
6. The YOLO network-based intelligent face identification method according to claim 5, wherein a total loss function is constructed based on the prediction result and the actual value of the annotation file, and model optimization is performed based on the total loss function, wherein the mathematical model of the total loss function is as follows:
Ltotal=λCIoULCIoU+λFocalLFocal+λcross-entropyLcross-entropy
Where L CIoU denotes a bounding box loss function, L Focal denotes a background noise loss function, and L cross-entropy denotes a multi-class loss function.
7. The YOLO network-based intelligent face identification method of claim 6, wherein the YOLO intelligent face identification model further comprises a process of filtering the bounding box based on non-maximum suppression.
8. The intelligent face identification method based on the YOLO network according to claim 1, wherein video data to be detected is input into the trained YOLO intelligent face identification model to be detected to obtain a class label and a confidence coefficient of a target person, and a target detection result is obtained based on the class label and the confidence coefficient of the target person;
The calculation formulas of the category labels and the confidence coefficient are as follows:
category = arg max c Pclass (c)
P=Pobj×Pclass(c)
Where arg max c represents the value of c for finding the maximization of P class (c), P class (c) represents the probability that the prediction box belongs to class c, P obj represents the confidence that the object exists, and P represents the integrated confidence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410289557.5A CN118097755A (en) | 2024-03-14 | 2024-03-14 | Intelligent face identity recognition method based on YOLO network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410289557.5A CN118097755A (en) | 2024-03-14 | 2024-03-14 | Intelligent face identity recognition method based on YOLO network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118097755A true CN118097755A (en) | 2024-05-28 |
Family
ID=91161516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410289557.5A Pending CN118097755A (en) | 2024-03-14 | 2024-03-14 | Intelligent face identity recognition method based on YOLO network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118097755A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118279573A (en) * | 2024-06-03 | 2024-07-02 | 广东师大维智信息科技有限公司 | Method for monitoring moving target based on YOLO network |
CN118366205A (en) * | 2024-06-17 | 2024-07-19 | 长城信息股份有限公司 | Attention mechanism-based light face tracking method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210224609A1 (en) * | 2019-03-07 | 2021-07-22 | Institute Of Automation, Chinese Academy Of Sciences | Method, system and device for multi-label object detection based on an object detection network |
CN113269142A (en) * | 2021-06-18 | 2021-08-17 | 中电科大数据研究院有限公司 | Method for identifying sleeping behaviors of person on duty in field of inspection |
CN113435330A (en) * | 2021-06-28 | 2021-09-24 | 平安科技(深圳)有限公司 | Micro-expression identification method, device, equipment and storage medium based on video |
CN115240240A (en) * | 2022-04-29 | 2022-10-25 | 清远蓄能发电有限公司 | Infrared face recognition method and system based on YOLO network |
-
2024
- 2024-03-14 CN CN202410289557.5A patent/CN118097755A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210224609A1 (en) * | 2019-03-07 | 2021-07-22 | Institute Of Automation, Chinese Academy Of Sciences | Method, system and device for multi-label object detection based on an object detection network |
CN113269142A (en) * | 2021-06-18 | 2021-08-17 | 中电科大数据研究院有限公司 | Method for identifying sleeping behaviors of person on duty in field of inspection |
CN113435330A (en) * | 2021-06-28 | 2021-09-24 | 平安科技(深圳)有限公司 | Micro-expression identification method, device, equipment and storage medium based on video |
CN115240240A (en) * | 2022-04-29 | 2022-10-25 | 清远蓄能发电有限公司 | Infrared face recognition method and system based on YOLO network |
Non-Patent Citations (1)
Title |
---|
HYWMJ: "LabelImg标注的VOC格式xml标签与YOLO格式txt标签相互转换", 18 May 2021 (2021-05-18), pages 1 - 4 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118279573A (en) * | 2024-06-03 | 2024-07-02 | 广东师大维智信息科技有限公司 | Method for monitoring moving target based on YOLO network |
CN118366205A (en) * | 2024-06-17 | 2024-07-19 | 长城信息股份有限公司 | Attention mechanism-based light face tracking method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287932B (en) | Road blocking information extraction method based on deep learning image semantic segmentation | |
CN111507370A (en) | Method and device for obtaining sample image of inspection label in automatic labeling image | |
CN109977191B (en) | Problem map detection method, device, electronic equipment and medium | |
CN104680542B (en) | Remote sensing image variation detection method based on on-line study | |
CN118097755A (en) | Intelligent face identity recognition method based on YOLO network | |
CN108875600A (en) | A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO | |
CN110264444B (en) | Damage detection method and device based on weak segmentation | |
CN111914642B (en) | Pedestrian re-identification method, device, equipment and medium | |
CN112418278A (en) | Multi-class object detection method, terminal device and storage medium | |
CN110633711B (en) | Computer device and method for training feature point detector and feature point detection method | |
CN114722958A (en) | Network training and target detection method and device, electronic equipment and storage medium | |
CN111738319B (en) | Clustering result evaluation method and device based on large-scale samples | |
CN116912796A (en) | Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device | |
CN113158777A (en) | Quality scoring method, quality scoring model training method and related device | |
CN111539456A (en) | Target identification method and device | |
CN116415020A (en) | Image retrieval method, device, electronic equipment and storage medium | |
CN113988222A (en) | Forest fire detection and identification method based on fast-RCNN | |
CN112348062A (en) | Meteorological image prediction method, meteorological image prediction device, computer equipment and storage medium | |
CN116152576B (en) | Image processing method, device, equipment and storage medium | |
CN117437555A (en) | Remote sensing image target extraction processing method and device based on deep learning | |
CN115082713A (en) | Method, system and equipment for extracting target detection frame by introducing space contrast information | |
CN115424000A (en) | Pointer instrument identification method, system, equipment and storage medium | |
CN112396648B (en) | Target identification method and system capable of positioning mass center of target object | |
CN114627534A (en) | Living body discrimination method, electronic device, and storage medium | |
CN111369532A (en) | Method and device for processing mammary gland X-ray image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |