CN112507897A

CN112507897A - Cross-modal face recognition method, device, equipment and storage medium

Info

Publication number: CN112507897A
Application number: CN202011467115.3A
Authority: CN
Inventors: 陈碧辉; 高通; 钱贝贝; 黄源浩; 肖振中
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-16
Also published as: WO2022127111A1

Abstract

The method adopts a cross-mode face recognition model trained by a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform face recognition on a face image to be recognized, and can improve the accuracy of face image recognition obtained by cameras of different modes.

Description

Cross-modal face recognition method, device, equipment and storage medium

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to a cross-modality face recognition method, apparatus, device, and storage medium.

Background

The accuracy of face recognition is greatly affected by the illumination of the surrounding environment. The common face recognition technology mainly aims at recognizing face images shot by a near-infrared camera which is not influenced by ambient light. However, in a real environment, uneven illumination and poor illumination often occur, which requires recognition of images acquired by cameras in different modalities, and the current face recognition technology cannot accurately recognize images acquired by cameras in different modalities. Therefore, the prior art has the problem that the accuracy of face image recognition obtained by cameras of different modes is not high.

Disclosure of Invention

The application provides a cross-modal face recognition method, a cross-modal face recognition device, a cross-modal face recognition equipment and a cross-modal face recognition storage medium, which can solve the problem that the accuracy rate of face image recognition obtained by cameras of different modalities is not high.

In a first aspect, the present application provides a cross-modal face recognition method, including:

collecting a face image to be recognized;

inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;

the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;

training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;

inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.

In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.

In an optional implementation manner, before the obtaining the first training sample set, the method further includes:

acquiring the visible light face image sequences of the first preset number;

and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.

acquiring the infrared human face image sequences of the second preset number;

and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence includes:

converting the visible light face image sequence into a gray image;

and carrying out normalization processing on the gray level image to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessed image sequence includes:

carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;

and carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence, includes:

and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.

In a second aspect, the present application provides a cross-modal face recognition apparatus, including:

the acquisition module is used for acquiring a face image to be recognized;

the recognition module is used for inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;

In an optional implementation manner, the method further includes:

the first acquisition module is used for acquiring the visible light face image sequences of the first preset number;

and the first processing module is used for carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.

In an optional implementation manner, the method further includes:

the second acquisition module is used for acquiring the infrared human face image sequences of the first preset number;

and the second processing module is used for carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, the first processing module includes:

the conversion unit is used for converting the visible light face image sequence into a gray image;

and the first processing unit is used for carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.

In an optional implementation manner, the second processing module includes:

the enhancement module is used for carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;

and the second processing unit is used for carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, the enhancement module is specifically configured to:

In a third aspect, the present application provides a cross-modality face recognition apparatus, where the cross-modality face recognition apparatus includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method of the first aspect as described above.

In the cross-modal face recognition method of the first aspect, the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence is adopted to perform face recognition on the face image to be recognized, so that the accuracy of face image recognition obtained by cameras of different modalities can be improved.

It is understood that the beneficial effects of the second to fifth aspects can be seen from the description of the first aspect, and are not repeated herein。

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart illustrating an implementation of a cross-modal face recognition method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a training process of a pre-trained cross-modal face recognition model;

fig. 3 is a schematic diagram of a cross-mode face recognition apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a cross-mode face recognition device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The cross-modal face recognition method provided by the present application is exemplarily described below with reference to specific embodiments. As shown in fig. 1, fig. 1 is a flowchart of an implementation of a cross-modal face recognition method provided in an embodiment of the present application. The implementation can be performed by a cross-modal face recognition device, which includes but is not limited to a self-service terminal, a monitoring device, an attendance device, a server, a robot, a wearable device or a mobile terminal in various application scenarios, and the like. The details are as follows:

and S101, acquiring a face image to be recognized.

In an embodiment of the present application, the facial image to be recognized may be a facial image acquired in a visible mode or an infrared mode. Illustratively, the face image in the visible mode or the face image in the infrared mode may be acquired by a camera of a cross-modality face recognition device, such as a camera of a mobile terminal or an attendance device.

S102, inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition.

In the embodiment of the application, the pre-trained cross-modal face recognition model is a depth convolution neural network model obtained after a depth convolution neural network is pre-trained through a face image in a visible light mode, the pre-trained cross-modal image depth convolution neural network is obtained, prior knowledge is provided for training of the cross-modal image depth convolution neural network, then a face image in the visible light mode and a face image in an infrared mode form a binary training set according to a preset rule, the pre-trained cross-modal image depth convolution neural network is selected for fine tuning, iteration is carried out repeatedly until the performance of the pre-trained cross-modal image depth convolution neural network is not improved any more.

Fig. 2 is a schematic diagram of a training process of a cross-modal face recognition model which is trained in advance. As shown in fig. 2, the training process of the pre-trained cross-modal face recognition model includes the following steps:

s201, a first training sample set is obtained, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.

It should be noted that a color camera or a multispectral camera may be used to capture a sequence of visible light face images that include a face. The visible light face image contains abundant texture features and is easily influenced by ambient light. Therefore, in some optional implementations, by acquiring a first preset number of visible light face image sequences; and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.

In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence may include: and segmenting the acquired human face area comprising the visible light human face image and the background area to acquire the visible light human face image.

In some embodiments of the present application, before performing image segmentation, whether a face exists in an image to be processed may be detected by an image detection model, and when an output result of the image detection model shows that no face exists in the image, face segmentation is not necessary, and face segmentation processing is ended at the same time, so as to reduce unnecessary workload. When the output result of the image detection model shows that a face exists in the image, the face screening can be further performed, that is, whether the face meeting the preset condition exists in the image is judged, for example, whether the face meets the requirement can be judged, specifically, the requirement can be preset according to the position and/or size of the face, and if the size of the face area meeting the preset size can be preset, the face area is considered to meet the requirement. When the face meets the preset conditions, subsequent processing is executed, such as performing rotation correction on the image and performing object segmentation processing on the image, and when the face does not meet the preset conditions, the image may not be segmented.

In some embodiments of the application, the visible light face image sequence can be obtained by processing a series of visible light face images as described above, and the visible light face preprocessed image sequence can be obtained by further preprocessing the visible light face image sequence. Optionally, the preprocessing the visible light face image sequence may include: and carrying out gray level conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.

The human face images in the visible light human face image sequence are subjected to gray scale conversion to be gray scale images. Alternatively, the gray scale conversion may be performed by a preset formula of gray scale conversion, which may be expressed as:

I_gary＝0.2989×R+0.5870×G+0.1140×B

wherein Igray is the gray image output after gray conversion, and R, G, B is the RGB value corresponding to the image before gray conversion.

The converted grayscale image is further normalized, illustratively, by a preset normalization processing formula.

S202, training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.

Because the human face in the visible light mode is complex, the cross-mode neural network can be pre-trained by using the visible light human face preprocessing image sequence. In one embodiment of the application, the visible light face preprocessing image sequence is divided into a training set and a verification set. The training set is not overlapped with the verification set, the preset cross-modal face recognition model is trained by the training set, the preset cross-modal face recognition model is verified through the training of the verification set, meanwhile, a first classification loss function is established, and a neural network which enables the loss of the verification set to be minimum is continuously stored to determine a first cross-modal neural network obtained through final training of the embodiment, wherein the first cross-modal neural network is the first cross-modal face recognition model.

S203, inputting the visible light preprocessing image sequence and the infrared human face preprocessing image sequence into the first cross-modal human face recognition model, and performing retraining on the first cross-modal human face recognition model based on the first classification loss function to obtain a second cross-modal human face recognition model.

It should be noted that the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence. Illustratively, prior to obtaining the second set of training samples, comprising: acquiring the infrared human face image sequences of the second preset number; and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.

The pixel equalization processing of the infrared face image sequence comprises the following steps: and carrying out image contrast enhancement and normalization processing on the infrared face image sequence. In some embodiments of the present application, histogram equalization may be performed on a sequence of infrared face images to enhance image contrast. Histogram equalization, among other things, is a method of enhancing image contrast by stretching the pixel intensity distribution range. In other embodiments, the image contrast can be enhanced by converting the infrared face image sequence by adopting a logarithmic function and a power function.

In addition, the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.

After the embodiment of the application, the process of performing normalization processing on the infrared face image sequence is the same as the process of performing normalization processing on the visible light face image, and details are not repeated here.

In some embodiments of the present application, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers. Wherein, there can be any number of convolution layers before the full connection layer. Illustratively, the cross-modality face recognition model includes five convolutional layers and one fully-connected layer; the first convolution layer and the second convolution layer respectively comprise two convolution layers for feature extraction and a maximum pooling layer for dimension reduction, the third convolution layer to the fifth convolution layer respectively comprise three convolution layers for feature extraction and a maximum pooling layer for dimension reduction, and a feature graph of each layer after operation is subjected to a nonlinear activation function; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation through a convolution layer to extract characteristic values, and then face characteristic vectors are output through a full-connection layer.

Illustratively, the first convolutional layer may include two convolutional layers having a convolutional kernel size of 3 × 3 and a step size of 1 × 1 and a convolutional kernel number of 64 and one maximum pooling layer having a convolutional kernel size of 2 × 2 and a step size of 2 × 2; the second convolutional layer comprises two convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 128 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the third convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 256 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the fourth convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 512 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the fifth convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 512 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; there are 4096 nodes for each of the two fully connected layers. It should be appreciated that the cross-modal neural network described above may take any configuration, and the above examples are not limiting.

In the training process of the cross-mode face recognition model, a convolution kernel and a weight are initialized randomly, and a bias term is set to be 0. And updating network parameters and optimizing the gradient of the cross-modal neural network by adopting a random gradient descent (SGD) algorithm, and stopping training and storing the trained cross-modal neural network when the network iteration number reaches a preset value.

According to the cross-modal face recognition method, the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence is adopted, the face of the face image to be recognized is recognized, and the accuracy of the face image recognition obtained by the cameras in different modes can be improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the cross-modal face recognition method described in the foregoing embodiment, fig. 3 shows a structural block diagram of a cross-modal face recognition apparatus provided in the embodiment of the present application, and for convenience of description, only the relevant portions of the embodiment of the present application are shown.

As shown in fig. 3, fig. 3 is a schematic diagram of a cross-mode face recognition apparatus according to an embodiment of the present application. The cross-modal face recognition apparatus 300 includes:

the acquisition module 301 is used for acquiring a face image to be recognized;

the recognition module 302 is configured to input the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;

In an optional implementation manner, the method further includes:

In an optional implementation manner, the first processing module includes:

In an optional implementation manner, the second processing module includes:

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 4 is a schematic structural diagram of a cross-mode face recognition device according to an embodiment of the present application. As shown in fig. 4, the cross-modality face recognition apparatus 4 of the embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the steps in the method embodiment of fig. 1 being implemented by the processor 40 when the computer program 42 is executed.

The cross-modal face recognition device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The cross-modal face recognition device 4 may include, but is not limited to, a processor 40, a memory 41. It will be understood by those skilled in the art that fig. 4 is merely an example of the cross-modality face recognition device 4, and does not constitute a limitation of the cross-modality face recognition device 4, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may in some embodiments be an internal storage unit of the cross-modality face recognition device 4, such as a hard disk or a memory of the cross-modality face recognition device 4. In other embodiments, the memory 41 may also be an external storage device of the cross-modality face recognition device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the cross-modality face recognition device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the cross-modality face recognition device 4. The memory 41 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 41 may also be used to temporarily store data that has been output or is to be output.

An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a cross-modal face recognition device, enables the cross-modal face recognition device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/electronic device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A cross-mode face recognition method is characterized by comprising the following steps:

collecting a face image to be recognized;

2. The method of claim 1, wherein the cross-modality face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.

3. The method of claim 1, wherein prior to said obtaining the first set of training samples, further comprising:

acquiring the visible light face image sequences of the first preset number;

4. The method of claim 1, wherein prior to said obtaining the first set of training samples, further comprising:

acquiring the infrared human face image sequences of the second preset number;

5. The method of claim 3, wherein performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence comprises:

converting the visible light face image sequence into a gray image;

and carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.

6. The method of claim 4, wherein performing pixel equalization processing on the infrared face image sequence to obtain the infrared face pre-processing image sequence comprises:

7. The cross-modality face recognition method of claim 6, wherein the image contrast enhancement of the infrared face image sequence to obtain an enhanced infrared face image sequence comprises:

8. A cross-modality face recognition apparatus, comprising:

the acquisition module is used for acquiring a face image to be recognized;

9. A cross-modality face recognition device, comprising: memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.