CN112507897A - Cross-modal face recognition method, device, equipment and storage medium - Google Patents
Cross-modal face recognition method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112507897A CN112507897A CN202011467115.3A CN202011467115A CN112507897A CN 112507897 A CN112507897 A CN 112507897A CN 202011467115 A CN202011467115 A CN 202011467115A CN 112507897 A CN112507897 A CN 112507897A
- Authority
- CN
- China
- Prior art keywords
- face
- cross
- image sequence
- face recognition
- modal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000007781 pre-processing Methods 0.000 claims abstract description 67
- 238000012545 processing Methods 0.000 claims description 52
- 238000012549 training Methods 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 17
- 238000010606 normalization Methods 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000005286 illumination Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
The method adopts a cross-mode face recognition model trained by a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform face recognition on a face image to be recognized, and can improve the accuracy of face image recognition obtained by cameras of different modes.
Description
Technical Field
The present application belongs to the field of image processing technologies, and in particular, to a cross-modality face recognition method, apparatus, device, and storage medium.
Background
The accuracy of face recognition is greatly affected by the illumination of the surrounding environment. The common face recognition technology mainly aims at recognizing face images shot by a near-infrared camera which is not influenced by ambient light. However, in a real environment, uneven illumination and poor illumination often occur, which requires recognition of images acquired by cameras in different modalities, and the current face recognition technology cannot accurately recognize images acquired by cameras in different modalities. Therefore, the prior art has the problem that the accuracy of face image recognition obtained by cameras of different modes is not high.
Disclosure of Invention
The application provides a cross-modal face recognition method, a cross-modal face recognition device, a cross-modal face recognition equipment and a cross-modal face recognition storage medium, which can solve the problem that the accuracy rate of face image recognition obtained by cameras of different modalities is not high.
In a first aspect, the present application provides a cross-modal face recognition method, including:
collecting a face image to be recognized;
inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
In an optional implementation manner, before the obtaining the first training sample set, the method further includes:
acquiring the visible light face image sequences of the first preset number;
and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, before the obtaining the first training sample set, the method further includes:
acquiring the infrared human face image sequences of the second preset number;
and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence includes:
converting the visible light face image sequence into a gray image;
and carrying out normalization processing on the gray level image to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessed image sequence includes:
carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence, includes:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
In a second aspect, the present application provides a cross-modal face recognition apparatus, including:
the acquisition module is used for acquiring a face image to be recognized;
the recognition module is used for inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
In an optional implementation manner, the method further includes:
the first acquisition module is used for acquiring the visible light face image sequences of the first preset number;
and the first processing module is used for carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the method further includes:
the second acquisition module is used for acquiring the infrared human face image sequences of the first preset number;
and the second processing module is used for carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the first processing module includes:
the conversion unit is used for converting the visible light face image sequence into a gray image;
and the first processing unit is used for carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the second processing module includes:
the enhancement module is used for carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and the second processing unit is used for carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the enhancement module is specifically configured to:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
In a third aspect, the present application provides a cross-modality face recognition apparatus, where the cross-modality face recognition apparatus includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method of the first aspect as described above.
In the cross-modal face recognition method of the first aspect, the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence is adopted to perform face recognition on the face image to be recognized, so that the accuracy of face image recognition obtained by cameras of different modalities can be improved.
It is understood that the beneficial effects of the second to fifth aspects can be seen from the description of the first aspect, and are not repeated herein。
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart illustrating an implementation of a cross-modal face recognition method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a training process of a pre-trained cross-modal face recognition model;
fig. 3 is a schematic diagram of a cross-mode face recognition apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a cross-mode face recognition device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The cross-modal face recognition method provided by the present application is exemplarily described below with reference to specific embodiments. As shown in fig. 1, fig. 1 is a flowchart of an implementation of a cross-modal face recognition method provided in an embodiment of the present application. The implementation can be performed by a cross-modal face recognition device, which includes but is not limited to a self-service terminal, a monitoring device, an attendance device, a server, a robot, a wearable device or a mobile terminal in various application scenarios, and the like. The details are as follows:
and S101, acquiring a face image to be recognized.
In an embodiment of the present application, the facial image to be recognized may be a facial image acquired in a visible mode or an infrared mode. Illustratively, the face image in the visible mode or the face image in the infrared mode may be acquired by a camera of a cross-modality face recognition device, such as a camera of a mobile terminal or an attendance device.
S102, inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition.
In the embodiment of the application, the pre-trained cross-modal face recognition model is a depth convolution neural network model obtained after a depth convolution neural network is pre-trained through a face image in a visible light mode, the pre-trained cross-modal image depth convolution neural network is obtained, prior knowledge is provided for training of the cross-modal image depth convolution neural network, then a face image in the visible light mode and a face image in an infrared mode form a binary training set according to a preset rule, the pre-trained cross-modal image depth convolution neural network is selected for fine tuning, iteration is carried out repeatedly until the performance of the pre-trained cross-modal image depth convolution neural network is not improved any more.
Fig. 2 is a schematic diagram of a training process of a cross-modal face recognition model which is trained in advance. As shown in fig. 2, the training process of the pre-trained cross-modal face recognition model includes the following steps:
s201, a first training sample set is obtained, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.
It should be noted that a color camera or a multispectral camera may be used to capture a sequence of visible light face images that include a face. The visible light face image contains abundant texture features and is easily influenced by ambient light. Therefore, in some optional implementations, by acquiring a first preset number of visible light face image sequences; and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence may include: and segmenting the acquired human face area comprising the visible light human face image and the background area to acquire the visible light human face image.
In some embodiments of the present application, before performing image segmentation, whether a face exists in an image to be processed may be detected by an image detection model, and when an output result of the image detection model shows that no face exists in the image, face segmentation is not necessary, and face segmentation processing is ended at the same time, so as to reduce unnecessary workload. When the output result of the image detection model shows that a face exists in the image, the face screening can be further performed, that is, whether the face meeting the preset condition exists in the image is judged, for example, whether the face meets the requirement can be judged, specifically, the requirement can be preset according to the position and/or size of the face, and if the size of the face area meeting the preset size can be preset, the face area is considered to meet the requirement. When the face meets the preset conditions, subsequent processing is executed, such as performing rotation correction on the image and performing object segmentation processing on the image, and when the face does not meet the preset conditions, the image may not be segmented.
In some embodiments of the application, the visible light face image sequence can be obtained by processing a series of visible light face images as described above, and the visible light face preprocessed image sequence can be obtained by further preprocessing the visible light face image sequence. Optionally, the preprocessing the visible light face image sequence may include: and carrying out gray level conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.
The human face images in the visible light human face image sequence are subjected to gray scale conversion to be gray scale images. Alternatively, the gray scale conversion may be performed by a preset formula of gray scale conversion, which may be expressed as:
Igary=0.2989×R+0.5870×G+0.1140×B
wherein Igray is the gray image output after gray conversion, and R, G, B is the RGB value corresponding to the image before gray conversion.
The converted grayscale image is further normalized, illustratively, by a preset normalization processing formula.
S202, training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.
Because the human face in the visible light mode is complex, the cross-mode neural network can be pre-trained by using the visible light human face preprocessing image sequence. In one embodiment of the application, the visible light face preprocessing image sequence is divided into a training set and a verification set. The training set is not overlapped with the verification set, the preset cross-modal face recognition model is trained by the training set, the preset cross-modal face recognition model is verified through the training of the verification set, meanwhile, a first classification loss function is established, and a neural network which enables the loss of the verification set to be minimum is continuously stored to determine a first cross-modal neural network obtained through final training of the embodiment, wherein the first cross-modal neural network is the first cross-modal face recognition model.
S203, inputting the visible light preprocessing image sequence and the infrared human face preprocessing image sequence into the first cross-modal human face recognition model, and performing retraining on the first cross-modal human face recognition model based on the first classification loss function to obtain a second cross-modal human face recognition model.
It should be noted that the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence. Illustratively, prior to obtaining the second set of training samples, comprising: acquiring the infrared human face image sequences of the second preset number; and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
The pixel equalization processing of the infrared face image sequence comprises the following steps: and carrying out image contrast enhancement and normalization processing on the infrared face image sequence. In some embodiments of the present application, histogram equalization may be performed on a sequence of infrared face images to enhance image contrast. Histogram equalization, among other things, is a method of enhancing image contrast by stretching the pixel intensity distribution range. In other embodiments, the image contrast can be enhanced by converting the infrared face image sequence by adopting a logarithmic function and a power function.
In addition, the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
After the embodiment of the application, the process of performing normalization processing on the infrared face image sequence is the same as the process of performing normalization processing on the visible light face image, and details are not repeated here.
In some embodiments of the present application, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers. Wherein, there can be any number of convolution layers before the full connection layer. Illustratively, the cross-modality face recognition model includes five convolutional layers and one fully-connected layer; the first convolution layer and the second convolution layer respectively comprise two convolution layers for feature extraction and a maximum pooling layer for dimension reduction, the third convolution layer to the fifth convolution layer respectively comprise three convolution layers for feature extraction and a maximum pooling layer for dimension reduction, and a feature graph of each layer after operation is subjected to a nonlinear activation function; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation through a convolution layer to extract characteristic values, and then face characteristic vectors are output through a full-connection layer.
Illustratively, the first convolutional layer may include two convolutional layers having a convolutional kernel size of 3 × 3 and a step size of 1 × 1 and a convolutional kernel number of 64 and one maximum pooling layer having a convolutional kernel size of 2 × 2 and a step size of 2 × 2; the second convolutional layer comprises two convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 128 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the third convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 256 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the fourth convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 512 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the fifth convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 512 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; there are 4096 nodes for each of the two fully connected layers. It should be appreciated that the cross-modal neural network described above may take any configuration, and the above examples are not limiting.
In the training process of the cross-mode face recognition model, a convolution kernel and a weight are initialized randomly, and a bias term is set to be 0. And updating network parameters and optimizing the gradient of the cross-modal neural network by adopting a random gradient descent (SGD) algorithm, and stopping training and storing the trained cross-modal neural network when the network iteration number reaches a preset value.
According to the cross-modal face recognition method, the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence is adopted, the face of the face image to be recognized is recognized, and the accuracy of the face image recognition obtained by the cameras in different modes can be improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Corresponding to the cross-modal face recognition method described in the foregoing embodiment, fig. 3 shows a structural block diagram of a cross-modal face recognition apparatus provided in the embodiment of the present application, and for convenience of description, only the relevant portions of the embodiment of the present application are shown.
As shown in fig. 3, fig. 3 is a schematic diagram of a cross-mode face recognition apparatus according to an embodiment of the present application. The cross-modal face recognition apparatus 300 includes:
the acquisition module 301 is used for acquiring a face image to be recognized;
the recognition module 302 is configured to input the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
In an optional implementation manner, the method further includes:
the first acquisition module is used for acquiring the visible light face image sequences of the first preset number;
and the first processing module is used for carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the method further includes:
the second acquisition module is used for acquiring the infrared human face image sequences of the first preset number;
and the second processing module is used for carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the first processing module includes:
the conversion unit is used for converting the visible light face image sequence into a gray image;
and the first processing unit is used for carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the second processing module includes:
the enhancement module is used for carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and the second processing unit is used for carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the enhancement module is specifically configured to:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Fig. 4 is a schematic structural diagram of a cross-mode face recognition device according to an embodiment of the present application. As shown in fig. 4, the cross-modality face recognition apparatus 4 of the embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the steps in the method embodiment of fig. 1 being implemented by the processor 40 when the computer program 42 is executed.
The cross-modal face recognition device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The cross-modal face recognition device 4 may include, but is not limited to, a processor 40, a memory 41. It will be understood by those skilled in the art that fig. 4 is merely an example of the cross-modality face recognition device 4, and does not constitute a limitation of the cross-modality face recognition device 4, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.
The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may in some embodiments be an internal storage unit of the cross-modality face recognition device 4, such as a hard disk or a memory of the cross-modality face recognition device 4. In other embodiments, the memory 41 may also be an external storage device of the cross-modality face recognition device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the cross-modality face recognition device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the cross-modality face recognition device 4. The memory 41 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 41 may also be used to temporarily store data that has been output or is to be output.
An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a cross-modal face recognition device, enables the cross-modal face recognition device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/electronic device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.
Claims (10)
1. A cross-mode face recognition method is characterized by comprising the following steps:
collecting a face image to be recognized;
inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
2. The method of claim 1, wherein the cross-modality face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
3. The method of claim 1, wherein prior to said obtaining the first set of training samples, further comprising:
acquiring the visible light face image sequences of the first preset number;
and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
4. The method of claim 1, wherein prior to said obtaining the first set of training samples, further comprising:
acquiring the infrared human face image sequences of the second preset number;
and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
5. The method of claim 3, wherein performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence comprises:
converting the visible light face image sequence into a gray image;
and carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.
6. The method of claim 4, wherein performing pixel equalization processing on the infrared face image sequence to obtain the infrared face pre-processing image sequence comprises:
carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
7. The cross-modality face recognition method of claim 6, wherein the image contrast enhancement of the infrared face image sequence to obtain an enhanced infrared face image sequence comprises:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
8. A cross-modality face recognition apparatus, comprising:
the acquisition module is used for acquiring a face image to be recognized;
the recognition module is used for inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
9. A cross-modality face recognition device, comprising: memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011467115.3A CN112507897A (en) | 2020-12-14 | 2020-12-14 | Cross-modal face recognition method, device, equipment and storage medium |
PCT/CN2021/107933 WO2022127111A1 (en) | 2020-12-14 | 2021-07-22 | Cross-modal face recognition method, apparatus and device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011467115.3A CN112507897A (en) | 2020-12-14 | 2020-12-14 | Cross-modal face recognition method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112507897A true CN112507897A (en) | 2021-03-16 |
Family
ID=74973029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011467115.3A Pending CN112507897A (en) | 2020-12-14 | 2020-12-14 | Cross-modal face recognition method, device, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112507897A (en) |
WO (1) | WO2022127111A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113743379A (en) * | 2021-11-03 | 2021-12-03 | 杭州魔点科技有限公司 | Light-weight living body identification method, system, device and medium for multi-modal characteristics |
WO2022127111A1 (en) * | 2020-12-14 | 2022-06-23 | 奥比中光科技集团股份有限公司 | Cross-modal face recognition method, apparatus and device, and storage medium |
CN115147679A (en) * | 2022-06-30 | 2022-10-04 | 北京百度网讯科技有限公司 | Multi-modal image recognition method and device and model training method and device |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115565215B (en) * | 2022-07-01 | 2023-09-15 | 北京瑞莱智慧科技有限公司 | Face recognition algorithm switching method and device and storage medium |
CN116167278B (en) * | 2023-01-09 | 2024-09-03 | 支付宝(杭州)信息技术有限公司 | Pipeline simulation method, division device, medium and equipment |
CN117373138B (en) * | 2023-11-02 | 2024-11-08 | 厦门熵基科技有限公司 | Cross-modal living fusion detection method and device, storage medium and computer equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608450A (en) * | 2016-03-01 | 2016-05-25 | 天津中科智能识别产业技术研究院有限公司 | Heterogeneous face identification method based on deep convolutional neural network |
CN108520220A (en) * | 2018-03-30 | 2018-09-11 | 百度在线网络技术(北京)有限公司 | model generating method and device |
US20190258885A1 (en) * | 2018-02-19 | 2019-08-22 | Avigilon Corporation | Method and system for object classification using visible and invisible light images |
CN112149635A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Cross-modal face recognition model training method, device, device and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507897A (en) * | 2020-12-14 | 2021-03-16 | 奥比中光科技集团股份有限公司 | Cross-modal face recognition method, device, equipment and storage medium |
-
2020
- 2020-12-14 CN CN202011467115.3A patent/CN112507897A/en active Pending
-
2021
- 2021-07-22 WO PCT/CN2021/107933 patent/WO2022127111A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105608450A (en) * | 2016-03-01 | 2016-05-25 | 天津中科智能识别产业技术研究院有限公司 | Heterogeneous face identification method based on deep convolutional neural network |
US20190258885A1 (en) * | 2018-02-19 | 2019-08-22 | Avigilon Corporation | Method and system for object classification using visible and invisible light images |
CN108520220A (en) * | 2018-03-30 | 2018-09-11 | 百度在线网络技术(北京)有限公司 | model generating method and device |
CN112149635A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Cross-modal face recognition model training method, device, device and storage medium |
Non-Patent Citations (1)
Title |
---|
张典等: "基于轻量网络的近红外光和可见光融合的异质人脸识别", 小型微型计算机系统 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022127111A1 (en) * | 2020-12-14 | 2022-06-23 | 奥比中光科技集团股份有限公司 | Cross-modal face recognition method, apparatus and device, and storage medium |
CN113743379A (en) * | 2021-11-03 | 2021-12-03 | 杭州魔点科技有限公司 | Light-weight living body identification method, system, device and medium for multi-modal characteristics |
CN113743379B (en) * | 2021-11-03 | 2022-07-12 | 杭州魔点科技有限公司 | Light-weight living body identification method, system, device and medium for multi-modal characteristics |
CN115147679A (en) * | 2022-06-30 | 2022-10-04 | 北京百度网讯科技有限公司 | Multi-modal image recognition method and device and model training method and device |
CN115147679B (en) * | 2022-06-30 | 2023-11-14 | 北京百度网讯科技有限公司 | Multi-mode image recognition method and device, model training method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2022127111A1 (en) | 2022-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112507897A (en) | Cross-modal face recognition method, device, equipment and storage medium | |
CN112528866A (en) | Cross-modal face recognition method, device, equipment and storage medium | |
CN110334605B (en) | Gesture recognition method, device, storage medium and equipment based on neural network | |
CN111079764B (en) | Low-illumination license plate image recognition method and device based on deep learning | |
JP6341650B2 (en) | Image processing apparatus, image processing method, and program | |
Wang et al. | Blur image classification based on deep learning | |
CN110163111A (en) | Method, apparatus of calling out the numbers, electronic equipment and storage medium based on recognition of face | |
CN112651380B (en) | Face recognition method, face recognition device, terminal equipment and storage medium | |
CN110717497B (en) | Image similarity matching method, device and computer readable storage medium | |
CN110852311A (en) | Three-dimensional human hand key point positioning method and device | |
CN112464803A (en) | Image comparison method and device | |
CN111814682A (en) | Face living body detection method and device | |
CN114359289A (en) | An image processing method and related device | |
Lahiani et al. | Hand pose estimation system based on Viola-Jones algorithm for android devices | |
CN112508017A (en) | Intelligent digital display instrument reading identification method, system, processing equipment and storage medium | |
CN112084874A (en) | Object detection method and device and terminal equipment | |
CN111126250A (en) | Pedestrian re-identification method and device based on PTGAN | |
CN113569707A (en) | Living body detection method, living body detection device, electronic apparatus, and storage medium | |
CN111144374B (en) | Facial expression recognition method and device, storage medium and electronic equipment | |
CN115116111B (en) | Anti-disturbance human face living body detection model training method and device and electronic equipment | |
Selvi et al. | FPGA implementation of a face recognition system | |
CN117218037A (en) | Image definition evaluation method and device, equipment and storage medium | |
CN113239738B (en) | Image blurring detection method and blurring detection device | |
CN114882308A (en) | Biological feature extraction model training method and image segmentation method | |
CN113240723A (en) | Monocular depth estimation method and device and depth evaluation equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210316 |