[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112507897A - Cross-modal face recognition method, device, equipment and storage medium - Google Patents

Cross-modal face recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN112507897A
CN112507897A CN202011467115.3A CN202011467115A CN112507897A CN 112507897 A CN112507897 A CN 112507897A CN 202011467115 A CN202011467115 A CN 202011467115A CN 112507897 A CN112507897 A CN 112507897A
Authority
CN
China
Prior art keywords
face
cross
image sequence
face recognition
modal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011467115.3A
Other languages
Chinese (zh)
Inventor
陈碧辉
高通
钱贝贝
黄源浩
肖振中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orbbec Inc
Original Assignee
Orbbec Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orbbec Inc filed Critical Orbbec Inc
Priority to CN202011467115.3A priority Critical patent/CN112507897A/en
Publication of CN112507897A publication Critical patent/CN112507897A/en
Priority to PCT/CN2021/107933 priority patent/WO2022127111A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The method adopts a cross-mode face recognition model trained by a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform face recognition on a face image to be recognized, and can improve the accuracy of face image recognition obtained by cameras of different modes.

Description

Cross-modal face recognition method, device, equipment and storage medium
Technical Field
The present application belongs to the field of image processing technologies, and in particular, to a cross-modality face recognition method, apparatus, device, and storage medium.
Background
The accuracy of face recognition is greatly affected by the illumination of the surrounding environment. The common face recognition technology mainly aims at recognizing face images shot by a near-infrared camera which is not influenced by ambient light. However, in a real environment, uneven illumination and poor illumination often occur, which requires recognition of images acquired by cameras in different modalities, and the current face recognition technology cannot accurately recognize images acquired by cameras in different modalities. Therefore, the prior art has the problem that the accuracy of face image recognition obtained by cameras of different modes is not high.
Disclosure of Invention
The application provides a cross-modal face recognition method, a cross-modal face recognition device, a cross-modal face recognition equipment and a cross-modal face recognition storage medium, which can solve the problem that the accuracy rate of face image recognition obtained by cameras of different modalities is not high.
In a first aspect, the present application provides a cross-modal face recognition method, including:
collecting a face image to be recognized;
inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
In an optional implementation manner, before the obtaining the first training sample set, the method further includes:
acquiring the visible light face image sequences of the first preset number;
and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, before the obtaining the first training sample set, the method further includes:
acquiring the infrared human face image sequences of the second preset number;
and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence includes:
converting the visible light face image sequence into a gray image;
and carrying out normalization processing on the gray level image to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessed image sequence includes:
carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence, includes:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
In a second aspect, the present application provides a cross-modal face recognition apparatus, including:
the acquisition module is used for acquiring a face image to be recognized;
the recognition module is used for inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
In an optional implementation manner, the method further includes:
the first acquisition module is used for acquiring the visible light face image sequences of the first preset number;
and the first processing module is used for carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the method further includes:
the second acquisition module is used for acquiring the infrared human face image sequences of the first preset number;
and the second processing module is used for carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the first processing module includes:
the conversion unit is used for converting the visible light face image sequence into a gray image;
and the first processing unit is used for carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the second processing module includes:
the enhancement module is used for carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and the second processing unit is used for carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the enhancement module is specifically configured to:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
In a third aspect, the present application provides a cross-modality face recognition apparatus, where the cross-modality face recognition apparatus includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, performs the steps of the method of the first aspect as described above.
In the cross-modal face recognition method of the first aspect, the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence is adopted to perform face recognition on the face image to be recognized, so that the accuracy of face image recognition obtained by cameras of different modalities can be improved.
It is understood that the beneficial effects of the second to fifth aspects can be seen from the description of the first aspect, and are not repeated herein
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart illustrating an implementation of a cross-modal face recognition method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a training process of a pre-trained cross-modal face recognition model;
fig. 3 is a schematic diagram of a cross-mode face recognition apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a cross-mode face recognition device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The cross-modal face recognition method provided by the present application is exemplarily described below with reference to specific embodiments. As shown in fig. 1, fig. 1 is a flowchart of an implementation of a cross-modal face recognition method provided in an embodiment of the present application. The implementation can be performed by a cross-modal face recognition device, which includes but is not limited to a self-service terminal, a monitoring device, an attendance device, a server, a robot, a wearable device or a mobile terminal in various application scenarios, and the like. The details are as follows:
and S101, acquiring a face image to be recognized.
In an embodiment of the present application, the facial image to be recognized may be a facial image acquired in a visible mode or an infrared mode. Illustratively, the face image in the visible mode or the face image in the infrared mode may be acquired by a camera of a cross-modality face recognition device, such as a camera of a mobile terminal or an attendance device.
S102, inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition.
In the embodiment of the application, the pre-trained cross-modal face recognition model is a depth convolution neural network model obtained after a depth convolution neural network is pre-trained through a face image in a visible light mode, the pre-trained cross-modal image depth convolution neural network is obtained, prior knowledge is provided for training of the cross-modal image depth convolution neural network, then a face image in the visible light mode and a face image in an infrared mode form a binary training set according to a preset rule, the pre-trained cross-modal image depth convolution neural network is selected for fine tuning, iteration is carried out repeatedly until the performance of the pre-trained cross-modal image depth convolution neural network is not improved any more.
Fig. 2 is a schematic diagram of a training process of a cross-modal face recognition model which is trained in advance. As shown in fig. 2, the training process of the pre-trained cross-modal face recognition model includes the following steps:
s201, a first training sample set is obtained, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.
It should be noted that a color camera or a multispectral camera may be used to capture a sequence of visible light face images that include a face. The visible light face image contains abundant texture features and is easily influenced by ambient light. Therefore, in some optional implementations, by acquiring a first preset number of visible light face image sequences; and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence may include: and segmenting the acquired human face area comprising the visible light human face image and the background area to acquire the visible light human face image.
In some embodiments of the present application, before performing image segmentation, whether a face exists in an image to be processed may be detected by an image detection model, and when an output result of the image detection model shows that no face exists in the image, face segmentation is not necessary, and face segmentation processing is ended at the same time, so as to reduce unnecessary workload. When the output result of the image detection model shows that a face exists in the image, the face screening can be further performed, that is, whether the face meeting the preset condition exists in the image is judged, for example, whether the face meets the requirement can be judged, specifically, the requirement can be preset according to the position and/or size of the face, and if the size of the face area meeting the preset size can be preset, the face area is considered to meet the requirement. When the face meets the preset conditions, subsequent processing is executed, such as performing rotation correction on the image and performing object segmentation processing on the image, and when the face does not meet the preset conditions, the image may not be segmented.
In some embodiments of the application, the visible light face image sequence can be obtained by processing a series of visible light face images as described above, and the visible light face preprocessed image sequence can be obtained by further preprocessing the visible light face image sequence. Optionally, the preprocessing the visible light face image sequence may include: and carrying out gray level conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.
The human face images in the visible light human face image sequence are subjected to gray scale conversion to be gray scale images. Alternatively, the gray scale conversion may be performed by a preset formula of gray scale conversion, which may be expressed as:
Igary=0.2989×R+0.5870×G+0.1140×B
wherein Igray is the gray image output after gray conversion, and R, G, B is the RGB value corresponding to the image before gray conversion.
The converted grayscale image is further normalized, illustratively, by a preset normalization processing formula.
S202, training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.
Because the human face in the visible light mode is complex, the cross-mode neural network can be pre-trained by using the visible light human face preprocessing image sequence. In one embodiment of the application, the visible light face preprocessing image sequence is divided into a training set and a verification set. The training set is not overlapped with the verification set, the preset cross-modal face recognition model is trained by the training set, the preset cross-modal face recognition model is verified through the training of the verification set, meanwhile, a first classification loss function is established, and a neural network which enables the loss of the verification set to be minimum is continuously stored to determine a first cross-modal neural network obtained through final training of the embodiment, wherein the first cross-modal neural network is the first cross-modal face recognition model.
S203, inputting the visible light preprocessing image sequence and the infrared human face preprocessing image sequence into the first cross-modal human face recognition model, and performing retraining on the first cross-modal human face recognition model based on the first classification loss function to obtain a second cross-modal human face recognition model.
It should be noted that the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence. Illustratively, prior to obtaining the second set of training samples, comprising: acquiring the infrared human face image sequences of the second preset number; and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
The pixel equalization processing of the infrared face image sequence comprises the following steps: and carrying out image contrast enhancement and normalization processing on the infrared face image sequence. In some embodiments of the present application, histogram equalization may be performed on a sequence of infrared face images to enhance image contrast. Histogram equalization, among other things, is a method of enhancing image contrast by stretching the pixel intensity distribution range. In other embodiments, the image contrast can be enhanced by converting the infrared face image sequence by adopting a logarithmic function and a power function.
In addition, the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
After the embodiment of the application, the process of performing normalization processing on the infrared face image sequence is the same as the process of performing normalization processing on the visible light face image, and details are not repeated here.
In some embodiments of the present application, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers. Wherein, there can be any number of convolution layers before the full connection layer. Illustratively, the cross-modality face recognition model includes five convolutional layers and one fully-connected layer; the first convolution layer and the second convolution layer respectively comprise two convolution layers for feature extraction and a maximum pooling layer for dimension reduction, the third convolution layer to the fifth convolution layer respectively comprise three convolution layers for feature extraction and a maximum pooling layer for dimension reduction, and a feature graph of each layer after operation is subjected to a nonlinear activation function; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation through a convolution layer to extract characteristic values, and then face characteristic vectors are output through a full-connection layer.
Illustratively, the first convolutional layer may include two convolutional layers having a convolutional kernel size of 3 × 3 and a step size of 1 × 1 and a convolutional kernel number of 64 and one maximum pooling layer having a convolutional kernel size of 2 × 2 and a step size of 2 × 2; the second convolutional layer comprises two convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 128 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the third convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 256 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the fourth convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 512 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; the fifth convolutional layer comprises three convolutional layers with the convolutional kernel size of 3 multiplied by 3, the step size of 1 multiplied by 1 and the convolutional kernel number of 512 and a maximum pooling layer with the convolutional kernel size of 2 multiplied by 2 and the step size of 2 multiplied by 2; there are 4096 nodes for each of the two fully connected layers. It should be appreciated that the cross-modal neural network described above may take any configuration, and the above examples are not limiting.
In the training process of the cross-mode face recognition model, a convolution kernel and a weight are initialized randomly, and a bias term is set to be 0. And updating network parameters and optimizing the gradient of the cross-modal neural network by adopting a random gradient descent (SGD) algorithm, and stopping training and storing the trained cross-modal neural network when the network iteration number reaches a preset value.
According to the cross-modal face recognition method, the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence is adopted, the face of the face image to be recognized is recognized, and the accuracy of the face image recognition obtained by the cameras in different modes can be improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Corresponding to the cross-modal face recognition method described in the foregoing embodiment, fig. 3 shows a structural block diagram of a cross-modal face recognition apparatus provided in the embodiment of the present application, and for convenience of description, only the relevant portions of the embodiment of the present application are shown.
As shown in fig. 3, fig. 3 is a schematic diagram of a cross-mode face recognition apparatus according to an embodiment of the present application. The cross-modal face recognition apparatus 300 includes:
the acquisition module 301 is used for acquiring a face image to be recognized;
the recognition module 302 is configured to input the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
In an optional implementation manner, the cross-modal face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
In an optional implementation manner, the method further includes:
the first acquisition module is used for acquiring the visible light face image sequences of the first preset number;
and the first processing module is used for carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the method further includes:
the second acquisition module is used for acquiring the infrared human face image sequences of the first preset number;
and the second processing module is used for carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the first processing module includes:
the conversion unit is used for converting the visible light face image sequence into a gray image;
and the first processing unit is used for carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.
In an optional implementation manner, the second processing module includes:
the enhancement module is used for carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and the second processing unit is used for carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
In an optional implementation manner, the enhancement module is specifically configured to:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Fig. 4 is a schematic structural diagram of a cross-mode face recognition device according to an embodiment of the present application. As shown in fig. 4, the cross-modality face recognition apparatus 4 of the embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the steps in the method embodiment of fig. 1 being implemented by the processor 40 when the computer program 42 is executed.
The cross-modal face recognition device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The cross-modal face recognition device 4 may include, but is not limited to, a processor 40, a memory 41. It will be understood by those skilled in the art that fig. 4 is merely an example of the cross-modality face recognition device 4, and does not constitute a limitation of the cross-modality face recognition device 4, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.
The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may in some embodiments be an internal storage unit of the cross-modality face recognition device 4, such as a hard disk or a memory of the cross-modality face recognition device 4. In other embodiments, the memory 41 may also be an external storage device of the cross-modality face recognition device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the cross-modality face recognition device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the cross-modality face recognition device 4. The memory 41 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 41 may also be used to temporarily store data that has been output or is to be output.
An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a cross-modal face recognition device, enables the cross-modal face recognition device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/electronic device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A cross-mode face recognition method is characterized by comprising the following steps:
collecting a face image to be recognized;
inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
2. The method of claim 1, wherein the cross-modality face recognition model includes a preset number of convolutional layers and fully-connected layers; the convolutional layers include convolutional layers for feature extraction and max pooling layers for dimension reduction.
3. The method of claim 1, wherein prior to said obtaining the first set of training samples, further comprising:
acquiring the visible light face image sequences of the first preset number;
and carrying out pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
4. The method of claim 1, wherein prior to said obtaining the first set of training samples, further comprising:
acquiring the infrared human face image sequences of the second preset number;
and carrying out pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
5. The method of claim 3, wherein performing pixel equalization processing on the visible light face image sequence to obtain the visible light face pre-processing image sequence comprises:
converting the visible light face image sequence into a gray image;
and carrying out normalization processing on the gray level image to obtain the visible light face preprocessing image sequence.
6. The method of claim 4, wherein performing pixel equalization processing on the infrared face image sequence to obtain the infrared face pre-processing image sequence comprises:
carrying out image contrast enhancement on the infrared human face image sequence to obtain an enhanced infrared human face image sequence;
and carrying out normalization processing on the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
7. The cross-modality face recognition method of claim 6, wherein the image contrast enhancement of the infrared face image sequence to obtain an enhanced infrared face image sequence comprises:
and carrying out histogram equalization processing on the infrared face image sequence, and enhancing the image contrast of the infrared face image sequence to obtain the enhanced infrared face image sequence.
8. A cross-modality face recognition apparatus, comprising:
the acquisition module is used for acquiring a face image to be recognized;
the recognition module is used for inputting the face image to be recognized into a pre-trained cross-mode face recognition model for face recognition;
the pre-trained training process of the cross-modal face recognition model comprises the following steps: acquiring a first training sample set, wherein the first training sample set comprises a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences;
training a preset cross-modal face recognition model according to the visible light face preprocessing image sequence and a first classification loss function to obtain a first cross-modal face recognition model;
inputting the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and performing retraining on the first cross-modal face recognition model based on the first classification loss function to obtain a second cross-modal face recognition model, wherein the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
9. A cross-modality face recognition device, comprising: memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202011467115.3A 2020-12-14 2020-12-14 Cross-modal face recognition method, device, equipment and storage medium Pending CN112507897A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011467115.3A CN112507897A (en) 2020-12-14 2020-12-14 Cross-modal face recognition method, device, equipment and storage medium
PCT/CN2021/107933 WO2022127111A1 (en) 2020-12-14 2021-07-22 Cross-modal face recognition method, apparatus and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011467115.3A CN112507897A (en) 2020-12-14 2020-12-14 Cross-modal face recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112507897A true CN112507897A (en) 2021-03-16

Family

ID=74973029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011467115.3A Pending CN112507897A (en) 2020-12-14 2020-12-14 Cross-modal face recognition method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112507897A (en)
WO (1) WO2022127111A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743379A (en) * 2021-11-03 2021-12-03 杭州魔点科技有限公司 Light-weight living body identification method, system, device and medium for multi-modal characteristics
WO2022127111A1 (en) * 2020-12-14 2022-06-23 奥比中光科技集团股份有限公司 Cross-modal face recognition method, apparatus and device, and storage medium
CN115147679A (en) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 Multi-modal image recognition method and device and model training method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565215B (en) * 2022-07-01 2023-09-15 北京瑞莱智慧科技有限公司 Face recognition algorithm switching method and device and storage medium
CN116167278B (en) * 2023-01-09 2024-09-03 支付宝(杭州)信息技术有限公司 Pipeline simulation method, division device, medium and equipment
CN117373138B (en) * 2023-11-02 2024-11-08 厦门熵基科技有限公司 Cross-modal living fusion detection method and device, storage medium and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450A (en) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on deep convolutional neural network
CN108520220A (en) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 model generating method and device
US20190258885A1 (en) * 2018-02-19 2019-08-22 Avigilon Corporation Method and system for object classification using visible and invisible light images
CN112149635A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Cross-modal face recognition model training method, device, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507897A (en) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 Cross-modal face recognition method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450A (en) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on deep convolutional neural network
US20190258885A1 (en) * 2018-02-19 2019-08-22 Avigilon Corporation Method and system for object classification using visible and invisible light images
CN108520220A (en) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 model generating method and device
CN112149635A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Cross-modal face recognition model training method, device, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张典等: "基于轻量网络的近红外光和可见光融合的异质人脸识别", 小型微型计算机系统 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022127111A1 (en) * 2020-12-14 2022-06-23 奥比中光科技集团股份有限公司 Cross-modal face recognition method, apparatus and device, and storage medium
CN113743379A (en) * 2021-11-03 2021-12-03 杭州魔点科技有限公司 Light-weight living body identification method, system, device and medium for multi-modal characteristics
CN113743379B (en) * 2021-11-03 2022-07-12 杭州魔点科技有限公司 Light-weight living body identification method, system, device and medium for multi-modal characteristics
CN115147679A (en) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 Multi-modal image recognition method and device and model training method and device
CN115147679B (en) * 2022-06-30 2023-11-14 北京百度网讯科技有限公司 Multi-mode image recognition method and device, model training method and device

Also Published As

Publication number Publication date
WO2022127111A1 (en) 2022-06-23

Similar Documents

Publication Publication Date Title
CN112507897A (en) Cross-modal face recognition method, device, equipment and storage medium
CN112528866A (en) Cross-modal face recognition method, device, equipment and storage medium
CN110334605B (en) Gesture recognition method, device, storage medium and equipment based on neural network
CN111079764B (en) Low-illumination license plate image recognition method and device based on deep learning
JP6341650B2 (en) Image processing apparatus, image processing method, and program
Wang et al. Blur image classification based on deep learning
CN110163111A (en) Method, apparatus of calling out the numbers, electronic equipment and storage medium based on recognition of face
CN112651380B (en) Face recognition method, face recognition device, terminal equipment and storage medium
CN110717497B (en) Image similarity matching method, device and computer readable storage medium
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN112464803A (en) Image comparison method and device
CN111814682A (en) Face living body detection method and device
CN114359289A (en) An image processing method and related device
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
CN112508017A (en) Intelligent digital display instrument reading identification method, system, processing equipment and storage medium
CN112084874A (en) Object detection method and device and terminal equipment
CN111126250A (en) Pedestrian re-identification method and device based on PTGAN
CN113569707A (en) Living body detection method, living body detection device, electronic apparatus, and storage medium
CN111144374B (en) Facial expression recognition method and device, storage medium and electronic equipment
CN115116111B (en) Anti-disturbance human face living body detection model training method and device and electronic equipment
Selvi et al. FPGA implementation of a face recognition system
CN117218037A (en) Image definition evaluation method and device, equipment and storage medium
CN113239738B (en) Image blurring detection method and blurring detection device
CN114882308A (en) Biological feature extraction model training method and image segmentation method
CN113240723A (en) Monocular depth estimation method and device and depth evaluation equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210316