CN111274915B - Deep local aggregation descriptor extraction method and system for finger vein image - Google Patents
Deep local aggregation descriptor extraction method and system for finger vein image Download PDFInfo
- Publication number
- CN111274915B CN111274915B CN202010050908.9A CN202010050908A CN111274915B CN 111274915 B CN111274915 B CN 111274915B CN 202010050908 A CN202010050908 A CN 202010050908A CN 111274915 B CN111274915 B CN 111274915B
- Authority
- CN
- China
- Prior art keywords
- finger vein
- module
- image
- vlad
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 210000003462 vein Anatomy 0.000 title claims abstract description 125
- 230000002776 aggregation Effects 0.000 title claims abstract description 30
- 238000004220 aggregation Methods 0.000 title claims abstract description 30
- 238000000605 extraction Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 69
- 239000013598 vector Substances 0.000 claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 18
- 238000010586 diagram Methods 0.000 claims abstract description 13
- 238000005065 mining Methods 0.000 claims abstract description 5
- 238000010276 construction Methods 0.000 claims description 21
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 8
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 2
- 238000012512 characterization method Methods 0.000 abstract description 5
- 238000009826 distribution Methods 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 description 17
- 238000012795 verification Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000036544 posture Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/1347—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/14—Vascular patterns
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses a deep local aggregation descriptor extraction method and a system for a finger vein image, wherein the method comprises the following steps: constructing a basic network module; constructing a VLAD coding module; setting K clustering center vectors as trainable parameters of the network; training the network in batches, wherein the training steps are as follows: preprocessing the finger vein image; the finger vein image obtains a multichannel characteristic diagram through a basic network module; combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding; mining the negative samples difficult to divide to obtain triples, calculating a loss function and reversely transmitting and updating a network weight coefficient; and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network. According to the method, the descriptors which are fixed in dimension and irrelevant to the distribution sequence of the original image blocks are obtained, the problem of matching failure between finger vein images with different sizes and finger vein images with the same category due to the difference of finger gestures is solved, and the depth local aggregation descriptors with better characterization force are obtained.
Description
Technical Field
The invention relates to the technical field of finger vein feature extraction, in particular to a deep local aggregation descriptor extraction method and system for finger vein images.
Background
The finger vein recognition technology is a new generation biological recognition technology, and compared with the traditional biological recognition technology, the finger vein recognition technology has the advantages of non-contact acquisition, living body detection, low equipment cost and the like. Finger vein recognition obtains finger images through an infrared CCD camera, and extracts relevant features of the finger veins for identity authentication and recognition. The acquired finger vein image often has noise interference, how to extract the robust features of the finger vein image is a research focus in the finger vein recognition technology, the representation capability of traditional feature descriptors such as LBP (Local Binary Pattern) and LDC (Local Directional Code) is greatly influenced by image quality, and the feature images retaining the space information need complex template matching for recognition.
In recent years, various solutions based on deep learning are proposed for the finger vein recognition technology, the finger vein image is classified at the pixel level based on the idea of image segmentation for the finger vein verification problem, the method is low in classification speed and difficult to be suitable for actual use scenes, and an existing image classification model is used by a convolution network-based method, so that the model size is large. Furthermore, the extracted features are sensitive to finger gestures, and for finger vein images of different finger gestures, complex preprocessing or template matching schemes need to be employed for recognition. Therefore, a lightweight convolution network model is researched, and the feature descriptors of finger vein images, which are more robust to finger posture changes, are extracted, so that the method has more advantages in practical application.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a depth local aggregation descriptor extraction method and system for finger vein images, which are characterized in that center vectors are connected in series to obtain descriptors with fixed dimensions and irrelevant to the distribution sequence of original image blocks, so that the matching problem between finger vein images with different dimensions and the matching failure problem between finger vein images with the same category caused by the difference of finger gestures are solved, on the basis, VLAD (very large scale analog-to-digital) coding is further carried out on the descriptors to obtain depth local aggregation descriptors with better characterization force, and the depth local aggregation descriptor has excellent performance in finger vein identification and verification tasks, and the network model has the size of only 1.1M, so that the requirement on light weight in engineering application can be better met.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a deep local aggregation descriptor extraction method of a finger vein image, which comprises the following steps:
constructing a basic network module for extracting local features of the finger vein image;
constructing a VLAD coding module for performing VLAD coding on the feature map obtained by the basic network module;
setting K clustering center vectors as trainable parameters of the network;
inputting finger vein images to train the network in batches, wherein the training steps comprise:
preprocessing the finger vein image;
the finger vein image sample obtains a multichannel characteristic diagram through a basic network module;
combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding;
digging a negative sample difficult to divide to obtain a triplet, calculating a loss function and reversely transmitting and updating a network weight coefficient until the iterative training is finished;
and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
As a preferred technical solution, the preprocessing of the finger vein image specifically includes:
extraction of the region of interest: extracting a region of interest of the finger vein training image, and completing finger inclination correction through affine transformation;
carrying out standardization treatment on the region of interest to obtain a final finger vein training sample image;
and adjusting the size of the finger vein training sample image according to the original proportion of the receptive field and the image.
As a preferred technical solution, the extracting of the region of interest specifically includes:
mask by two Sobel operators u ,Mask d Separately detecting finger vein training imagesFitting the upper edge and the lower edge of a finger by a linear regression method, calculating an angle formed by the center line and the horizontal direction, rotating a finger vein training image by affine transformation to finish inclination correction, and finally cutting off a circumscribed rectangle according to the edge of the finger to obtain a region of interest, wherein two Sobel operators are respectively expressed as:
wherein, mask u and Maskd Two Sobel operators are represented extending to 3 x 9.
As an preferable technical solution, the completing the VLAD encoding by combining the multi-channel feature map with the clustering center vector in the VLAD encoding module includes the specific steps of:
the multi-channel feature map is converted into w out ×h out With dimensions C out Is { x }, which describes the local descriptor of the original image i ,i=1,2,…,w out ×h out And inputting the code to VLAD coding module for coding, calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
wherein , and />Respectively represent the ithDescriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described.
As an optimal technical scheme, the method for mining the negative samples difficult to separate to obtain the triples comprises the following specific steps:
selecting two local aggregation descriptors and /> and />Is of the same class,/->Forming a pair of aligned samples;
for each positive sample pairSelecting a negative sample from the other categories +.>Constitutes triplet->Negative sample->Make->Minimum, wherein mark represents a set threshold parameter.
As an preferable technical solution, the calculation formula for calculating the loss function for the triples of the same batch is as follows:
wherein m represents the category of the images of the same batch, and n represents the number of samples in each category.
The invention also provides a deep local aggregation descriptor extraction system of the finger vein image, which comprises the following steps: the system comprises a basic network module construction unit, a VLAD coding module construction unit, a clustering center vector construction unit, a training unit and an extraction unit;
the basic network module construction unit is used for constructing a basic network module, and the basic network module is used for extracting local features of the finger vein image;
the VLAD encoding module constructing unit is used for constructing a VLAD encoding module, and the VLAD encoding module is used for performing VLAD encoding on the feature map obtained by the basic network module;
the cluster center vector construction unit is used for setting K cluster center vectors as trainable parameters of the network;
the training unit is used for inputting finger vein images to train the network in batches, and comprises: the system comprises an image preprocessing module, a multi-channel feature map acquisition module, a combination coding module, a triplet construction module and an iteration updating module;
the image preprocessing module is used for preprocessing the finger vein image;
the multi-channel characteristic map acquisition module is used for acquiring a multi-channel characteristic map from a finger vein image sample through the basic network module;
the combined coding module is used for combining the multi-channel characteristic diagram with the clustering center vector in the VLAD coding module to finish VLAD coding;
the triplet construction module is used for excavating the difficult-to-separate negative samples to obtain triples;
the iteration updating module is used for calculating a loss function and back-propagating and updating a network weight coefficient until the iteration training is finished;
the extraction unit is used for extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
As a preferred technical solution, the structure of the base network module adopts 6 serially connected convolution modules, which are respectively denoted as conv_i, i= {1,2,3,4,5,6}, each convolution module includes a 3×3 Conv2d layer, a BN layer and a Relu activation layer, the number of convolution kernels of each convolution module is respectively 32, 64, 128, the filling of all convolution layers is set to 1, the convolution step sizes of conv_3 and conv_5 are set to 2, and conv_1, conv_2, conv_4 and conv_6 are set to 1.
As a preferable technical scheme, all convolution layers are initialized by adopting an orthogonal matrix, the offset is fixed to be 0, and the weight and the offset of the BN layer are respectively fixed to be 1 and 0.
As a preferred solution, the VLAD coding module sets a 1×1 convolutional layer on the network structure.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) According to the invention, the descriptors are obtained through CNN network end-to-end learning, the network model is only 1.1M in size, and the extracted descriptors can be further used for tasks such as finger vein verification and identification, so that the method is flexible in use and wide in application.
(2) For finger vein images with any size, K clustering centers are obtained through network automatic learning, and the clustering center vectors are connected in series to form descriptor vectors for representing the features of the finger vein images, so that the matching problem between the finger vein images with different sizes is solved.
(3) The invention carries out VLAD coding on the characteristics of the finger vein image, and fully utilizes the information of the characteristic map under the condition that only 1X 1 convolution parameters are additionally introduced, thereby obtaining the finger vein image descriptor with more characterization force.
(4) According to the invention, the triplet samples are constructed during the network training period, so that the requirement on the number of finger vein training images is reduced, and the number of positive and negative samples for training is ensured to be equal; a difficult-to-separate negative sample mining strategy is adopted when a sample is constructed, so that network convergence is quickened; the triplet loss function is adopted to train the network, so that the network is promoted to learn the differences among different finger vein images rather than the label information, and the generalization performance of the method is improved.
Drawings
FIG. 1 is a diagram showing an example of an image of a finger vein data set according to the present embodiment;
FIG. 2 is a schematic diagram of a division manner of the venous data set in the present embodiment;
FIG. 3 is a schematic diagram of a batch training process of the network model according to the present embodiment;
FIG. 4 is a diagram showing training and testing images obtained by extracting a region of interest according to the present embodiment;
FIG. 5 is a flow chart of the network model test according to the present embodiment;
fig. 6 is a schematic structural diagram of a deep local aggregation descriptor extraction system for a finger vein image according to the present embodiment;
fig. 7 is a schematic structural diagram of a basic network module according to the present embodiment;
fig. 8 is a schematic diagram of the VLAD encoding module according to the present embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
Training and testing on three finger vein data sets of SDUMLA, FV-USM and MMCBNU_6000, wherein the SDUMLA data set is from Shandong university, finger vein images are from 636 fingers of 106 subjects, 6 gray-scale BMP images are acquired from the index finger, the middle finger and the ring finger of the left hand, and the resolution ratio of the images is 320 multiplied by 240; the FV-USM dataset was from university of Malaysia, consisting of images of the left hand, right hand index finger and middle finger veins of 123 subjects, the images of the database were from two different experiments, 12 images per finger; the mmcbnu_6000 dataset was from the university of north national republic of korea, consisting of finger vein images from 100 volunteers, each finger image was repeatedly acquired 10 times for a total of 6000 images.
As shown in fig. 1, an image example of the above disclosed finger vein data set is given in the figure, as shown in fig. 2, in this embodiment, 100 finger images of the mmcbnu_6000 finger vein data set are classified into 600 classes according to finger types, each class includes 10 sample images, 300 classes of images are randomly taken as training sets, the training sets have 3000 samples in total, and the rest are used as test sets, and in this embodiment, the test sets are further divided into a registered template library and a sample set to be tested;
the embodiment is mainly realized based on a deep learning framework Pytorch, a display card used in an experiment is GTX1080Ti, and a finger vein descriptor extracted from a test image is used for identifying and verifying tasks.
The embodiment provides a deep local aggregation descriptor extraction method of a finger vein image, which comprises the following steps:
constructing a basic network module for extracting local features of the finger vein image;
constructing a VLAD coding module for performing VLAD coding on the feature map obtained by the basic network module;
setting K clustering center vectors as trainable parameters of a network: the present embodiment sets K dimensions as C out Cluster center vector { c } k K=1, 2, …, K } and uniformly distributed random initialisation thereof c k As determined by network learning, the present embodiment automatically learns by network for finger vein images of arbitrary sizesK clustering centers are obtained, and the clustering center vectors are connected in series to form a descriptor vector representing the characteristics of the finger vein images, and the length of the descriptor vector is fixed to 128 XK, so that the matching problem between the finger vein images with different sizes is solved. In particular, since the descriptor vector is composed of cluster center vectors rather than feature vectors of image local blocks, the two finger vein images can still be correctly matched under the condition that the spatial position difference exists, so that the problem of matching failure caused by finger gesture difference between the finger vein images of the same category is solved, in the embodiment, in order to enable the extracted descriptors to have strong characterization force and a relatively simple form, the number K of the cluster center vectors is set to be 8-15, and the embodiment preferably sets K to be 10.
As shown in fig. 3, the input finger vein image trains the network in batches, and the training steps include:
preprocessing a finger vein image, segmenting a finger region from a background, and intercepting a circumscribed rectangular region at the edge of the finger to obtain the region of interest for training, so that background noise is removed, and original information is kept as much as possible;
in this embodiment, the specific steps of preprocessing the finger vein image include:
extraction of the region of interest: mask by two Sobel operators u ,Mask d Detecting the upper edge and the lower edge of the finger vein training image respectively, fitting the midline of the finger by a linear regression method, calculating the angle formed by the midline and the horizontal direction, rotating the finger vein training image by affine transformation to finish inclination correction, and finally intercepting the circumscribed rectangle of the whole finger according to the edge point at the outermost side of the finger to obtain a region of interest, wherein as shown in fig. 4, training and testing images are obtained according to the region of interest;
in this embodiment, the two Sobel operators are respectively expressed as:
wherein, mask u and Maskd Representing two Sobel operators extended to 3×9;
the method comprises the steps of adjusting the sizes of finger vein training images according to the original proportion of the receptive field and the images, calculating the average aspect ratio of all training images to be 2, calculating the local area with the receptive field of 23 x 23 of each characteristic point of an output characteristic image according to the structural parameters of a basic network module, adjusting the height of an input image to be h=64 and the width to be w=128, wherein the receptive field of each characteristic point in the characteristic image is about 1/3 of the height of the input image, and reflecting richer finger vein information;
carrying out normalization processing of subtracting the mean value and dividing the variance on each training image, and reducing the influence of uneven illumination to obtain a final finger vein training sample image;
building a training batch sampler: the training images are loaded in batches, m categories are randomly selected for each batch, n samples are loaded per category, i.e. m×n samples in total, in this embodiment m=16, n=6;
initializing trainable parameters in a network structure, initializing parameters of a basic network module, wherein the weight of a convolution layer is initialized by an orthogonal matrix, the bias is fixed to 0, and the weight and the bias of a BN layer are fixed to 1 and 0; initializing class center vectors, setting the number K of descriptor clusters as 10, and setting the descriptor cluster center c k K=1, 2, …, K, randomly initialized with a uniform distribution; initializing trainable parameters of the VLAD coding module, initializing parameters of the 1×1 convolution according to a cluster center, wherein the weight of the convolution kernel is initialized to w k =2c k Bias initialisation to b k =-‖c k ‖ 2 ;
Setting the super parameter mark of the triplet loss as 1;
setting the iteration number of the model as 200 times, fixing the learning rate as 0.01, initializing a minimum verification loss and a model preservation path by using an optimization method of Adam (Adaptive moment estimation ) for later preservation of a model with the minimum verification loss, setting the initial iteration number of the model and an initial sample batch as 0, and setting a batch size of algorithm training as 16 multiplied by 6=96 by a batch sampler;
model training iteration times are increased by 1, training of the model is continued, training sample batches are increased by 1, and loading of one batch of samples is started or loading of the next batch of samples is continued;
according to the setting of the batch sampler, 96 preprocessed training images are loaded from the training set;
after the finger vein image training sample image passes through a 6-layer 3×3 convolution module of the basic network module, the pooling effect is realized because the step length of two layers of convolutions is 2, and finally, a feature map with 128 channels and 16×32 pixels is obtained;
the multi-channel feature map is combined with a clustering center vector in a VLAD coding module to finish VLAD coding, and K dimensions are set as C out Cluster center vector { c } k K=1, 2, …, K } and uniformly distributed random initialisation thereof c k Determining through network learning;
the VLAD encoding comprises the following specific steps:
let resolution be w out ×h out C of (2) out Conversion of channel profile to w out ×h out With dimensions C out Is { x }, which describes the local descriptor of the original image i ,i=1,2,…,w out ×h out The present embodiment is specifically expressed as: 16×32 128-dimensional local descriptors, with { x } i I=1, 2, …,512} means that the local aggregate descriptor of 1280 dimension is obtained after passing through the VLAD coding module;
and input into VLAD coding module for coding, i.e. calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
wherein , and />Respectively represent the ith descriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described in (1);
the training images of the same batch are processed by a basic network module and a VLAD coding module to obtain m multiplied by n local aggregation descriptors wherein />The local aggregate descriptors representing the ith training sample in the batch, 96 training samples in one batch of this embodiment obtain 96 descriptors through the network, and form a matrix wherein />The descriptors obtained after each training image is processed by the basic network module and the VLAD encoding module are represented, and elements (i, j) of the Euclidean distance matrix between every two descriptors are obtained through matrix calculation>Matrix arrayThe size is 96×96, and the diagonal element is 0;
digging a negative sample difficult to divide to obtain a triplet, and selecting two local polymerization descriptors and /> and />Is of the same class,/->Forming a pair of aligned samples;
for each positive sample pairSelecting a negative sample from the other categories +.>Constitutes triplet->Negative sample->Make->In the embodiment, when the difference between the distances between the similar descriptor and the heterogeneous descriptor is calculated, the threshold value mark is set to be 1, so that the similar descriptor and the heterogeneous descriptor can be well distinguished;
calculating a loss function and reversely transmitting and updating a network weight coefficient, judging whether one training is completed for all samples, if so, entering a verification loss step, otherwise, returning to continue training;
judging whether the current verification loss is smaller than the minimum loss, if yes, saving the model or updating the saved model, updating the value of the minimum loss, otherwise, entering into judging whether iteration is completed for a set number of times;
judging whether 200 iterations are completed, if yes, finishing training, otherwise turning to a model training iteration step, namely adding 1 to the number of model training iterations, continuing training of a model, adding 1 to a training sample batch, and starting loading a batch of samples or continuing loading the next batch of samples;
in this embodiment, the calculation formula for calculating the loss function for the triples of the same batch is:
wherein m represents the class of the images in the same batch, and n represents the number of samples in each class; in the embodiment, the triplet samples are constructed during the network training, so that the requirement on the number of finger vein training images is reduced, and the number of positive and negative samples for training is ensured to be equal; when a sample is constructed, a difficult-to-separate negative sample mining strategy is used, so that network convergence is quickened; the triplet loss function is used for training the network, so that the network is promoted to learn the differences among different finger vein images rather than the label information, and the generalization performance of the method is improved.
As shown in fig. 5, a trained network is adopted to extract a local aggregation descriptor of a finger vein image to be tested, and the network structure of the test stage is the same as that of the training stage, and the specific steps are as follows:
the method comprises the steps of carrying out image preprocessing, including extraction and standardization of a region of interest and size adjustment, on finger vein images to be detected, adjusting the sizes of all images of a test set to 64×128 after image preprocessing, inputting the preprocessed finger vein images into a trained network to obtain depth local aggregation descriptors of the finger vein images, wherein the obtained descriptors can be further used for identification or verification of finger veins.
In this embodiment, the finger vein recognition and verification tasks are tested separately;
for the finger vein recognition task:
inputting all images of the registered template library into a network to obtain a feature descriptor;
obtaining descriptors from each image in a sample set to be detected through a network respectively, and calculating Euclidean distances between the descriptors and feature descriptors of all registration templates;
sorting the Euclidean distances, and identifying the current sample to be tested as a registered template with the smallest Euclidean distance;
for the finger vein verification task:
testing by using a registered template library, and selecting a proper classification threshold value to be 1;
for each sample to be tested, forming positive sample pairs with other samples in the same class from the test set, randomly selecting equal amounts of heterogeneous samples to form negative sample pairs, and forming 300×5×5×2=15000 pairs in total;
obtaining descriptors from the sample pairs one by one through a network, and calculating Euclidean distance between every two descriptors;
if the Euclidean distance between the two descriptors is lower than 1, the two samples are of the same class, and the verification is successful, otherwise, the verification fails;
finally, the test result is saved, the depth local aggregation descriptor is adopted in the embodiment, special processing is not needed for finger vein images of different finger postures, and matching identification can be directly realized through simple measurement methods such as Euclidean distance calculation and cosine similarity calculation.
In this embodiment, the test method for the other two public databases is substantially the same as the above steps. As shown in table 1 below, the evaluation index related to the experimental result of the test of the finger vein 1:1 verification in the method of the present example was EER (Equal Error Rate), namely, the FAR value when FAR (False Accept Rate) and FRR (False Reject Rate) are equal, and the FAR value when |far-frr| <0.0001 was taken as EER in the experiment.
Table 1 test finger vein 1:1 verification EER results table
SDUMLA | FV-USM | MMCBNU_6000 | |
EER | 0.95% | 0.38% | 0.10% |
As shown in Table 2 below, the experimental results of vein 1:N identification in this example refer to the evaluation index IR (k):
wherein B represents a set of all samples to be tested, B is a certain sample to be tested, rank (B) is a ranking of similarity between the sample to be tested and the same-class sample in the registered template library, U B Representing the number of sample sets to be tested. IR (1) indicates the proportion of the first sample in the test samples to all test samples, with the similarity of the same class of samples in the template library.
TABLE 2 test finger vein 1:N identification IR (k) results table
SDUMLA | FV-USM | MMCBNU_6000 | |
IR(k) | 99.50% | 99.87% | 100% |
As shown in table 3 below, the parameters of the model of this example and the finger vein recognition related times tested on the CPU.
Table 3 model size and time consumption table
Model size | Feature extraction time | Euclidean distance calculation time |
1.1M | 0.0144s | 0.00037s |
As can be seen from the above tables 1 to 3, the network proposed in this embodiment has effectiveness in both tasks of finger vein recognition and verification, and the network model is only 1.1M, the feature extraction time is short, and the calculation of the similarity is short in the case that the descriptor dimension is 1280 in this embodiment; in this embodiment, the feature of the finger vein image is subjected to VLAD encoding, and under the condition that only the 1×1 convolution parameter is additionally introduced, the information of the feature map is fully utilized, so as to obtain the finger vein image descriptor with better characterization force.
As shown in fig. 6, this embodiment further provides a deep local aggregation descriptor extraction system for a finger vein image, including: the system comprises a basic network module construction unit, a VLAD coding module construction unit, a clustering center vector construction unit, a training unit and an extraction unit;
in this embodiment, the basic network module construction unit is configured to construct a basic network module, where the basic network module is configured to extract local features of the finger vein image; the VLAD encoding module construction unit is used for constructing a VLAD encoding module, and the VLAD encoding module is used for performing VLAD encoding on the feature map obtained by the basic network module; the cluster center vector construction unit is used for setting K cluster center vectors as trainable parameters of the network; the training unit is used for inputting the finger vein images to train the network in batches, and the extracting unit is used for extracting the local aggregation descriptors of the finger vein images to be detected by adopting the trained network;
as shown in fig. 7, the structure of the basic network module adopts 6 serially connected convolution modules, which are respectively denoted as conv_i, i= {1,2,3,4,5,6}, each convolution module comprises a 3×3 Conv2d layer, a BN layer and a Relu activation layer, the number of convolution kernels of each convolution module is respectively 32, 64, 128 and 128, the filling of all the convolution layers is set to 1, the convolution steps of conv_3 and conv_5 are set to 2, conv_1, conv_2, conv_4 and conv_6 are set to 1, all the convolution layers are initialized by adopting an orthogonal matrix, the bias is fixed to 0, the weight and the bias of the BN layer are respectively fixed to 1 and 0 without updating the weight and the bias of the BN layer, the model training parameters can be reduced, and the influence on the result is small;
as shown in fig. 8, the VLAD encoding module sets a 1×1 convolutional layer on the network structure, wherein the weights w k =2c k Bias b k =-‖c k ‖ 2 For representing a simplified a k (x i ):
In this embodiment, the training unit includes: the system comprises an image preprocessing module, a multi-channel feature map acquisition module, a combination coding module, a triplet construction module and an iteration updating module;
in this embodiment, the image preprocessing module is configured to preprocess a finger vein image; the multi-channel characteristic map acquisition module is used for acquiring a multi-channel characteristic map from the finger vein image sample through the basic network module; the combination coding module is used for combining the multi-channel characteristic diagram with the clustering center vector in the VLAD coding module to finish VLAD coding; the ternary structure modeling block is used for excavating the negative samples difficult to divide to obtain the ternary group; the iteration updating module is used for calculating the loss function and back-propagating and updating the network weight coefficient until the iteration training is finished.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (9)
1. The deep local aggregation descriptor extraction method for the finger vein image is characterized by comprising the following steps of:
constructing a basic network module for extracting local features of the finger vein image;
constructing a VLAD coding module for performing VLAD coding on the feature map obtained by the basic network module;
the VLAD coding is completed by combining the multi-channel feature map with the clustering center vector in a VLAD coding module, and the method specifically comprises the following steps:
the multi-channel feature map is converted into w out ×h out With dimensions C out Is a local description of the original imageSub { x i ,i=1,2,...,w out ×h out And inputting the code to VLAD coding module for coding, calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
wherein , and />Respectively represent the ith descriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described in (1);
setting K clustering center vectors as trainable parameters of the network;
inputting finger vein images to train the network in batches, wherein the training steps comprise:
preprocessing the finger vein image;
the preprocessed finger vein image is used for obtaining a multichannel characteristic diagram through a basic network module;
combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding;
digging a negative sample difficult to divide to obtain a triplet, calculating a loss function and reversely transmitting and updating a network weight coefficient until the iterative training is finished;
and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
2. The method for extracting deep local polymerization descriptors from a finger vein image according to claim 1, wherein the preprocessing of the finger vein image comprises the following specific steps:
extraction of the region of interest: extracting a region of interest of the finger vein training image, and completing finger inclination correction through affine transformation;
carrying out standardization treatment on the region of interest to obtain a final finger vein training sample image;
and adjusting the size of the finger vein training sample image according to the original proportion of the receptive field and the image.
3. The method for extracting depth local aggregation descriptors of finger vein images according to claim 2, wherein the extracting of the region of interest comprises the specific steps of:
mask by two Sobel operators u ,Mask d Detecting the upper edge and the lower edge of the finger vein training image respectively, fitting the midline of the finger by a linear regression method, calculating the angle formed by the midline and the horizontal direction, rotating the finger vein training image by affine transformation to finish inclination correction, and finally intercepting circumscribed rectangles according to the edge of the finger to obtain a region of interest, wherein two Sobel operators are respectively expressed as:
wherein, mask u and Maskd Two Sobel operators are represented extending to 3 x 9.
4. The method for extracting deep local aggregation descriptors of finger vein images according to claim 1, wherein the mining of difficult-to-separate negative samples to obtain triples specifically comprises the following steps:
selecting two local aggregation descriptors and /> and />Is of the same class,/->Forming a pair of aligned samples;
5. The method for extracting deep local aggregation descriptor in finger vein image according to claim 4, wherein the calculation formula for calculating the loss function for the same batch of triplets is as follows:
wherein m represents the category of the images of the same batch, and n represents the number of samples in each category.
6. A depth local aggregation descriptor extraction system for a finger vein image, comprising: the system comprises a basic network module construction unit, a VLAD coding module construction unit, a clustering center vector construction unit, a training unit and an extraction unit;
the basic network module construction unit is used for constructing a basic network module, and the basic network module is used for extracting local features of the finger vein image;
the VLAD encoding module constructing unit is used for constructing a VLAD encoding module, and the VLAD encoding module is used for performing VLAD encoding on the feature map obtained by the basic network module;
the VLAD coding is completed by combining the multi-channel feature map with the clustering center vector in a VLAD coding module, and the method specifically comprises the following steps:
the multi-channel feature map is converted into w out ×h out With dimensions C out Is { x }, which describes the local descriptor of the original image i ,i=1,2,...,w out ×h out And inputting the code to VLAD coding module for coding, calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
wherein , and />Separate tableShow the ith descriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described in (1);
the cluster center vector construction unit is used for setting K cluster center vectors as trainable parameters of the network;
the training unit is used for inputting finger vein images to train the network in batches, and comprises: the system comprises an image preprocessing module, a multi-channel feature map acquisition module, a combination coding module, a triplet construction module and an iteration updating module;
the image preprocessing module is used for preprocessing the finger vein image;
the multi-channel characteristic map acquisition module is used for acquiring a multi-channel characteristic map from a finger vein image sample through the basic network module;
the combined coding module is used for combining the multi-channel characteristic diagram with the clustering center vector in the VLAD coding module to finish VLAD coding;
the triplet construction module is used for excavating the difficult-to-separate negative samples to obtain triples;
the iteration updating module is used for calculating a loss function and back-propagating and updating a network weight coefficient until the iteration training is finished;
the extraction unit is used for extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
7. The system of claim 6, wherein the basic network module is configured with 6 serially connected convolution modules, respectively denoted conv_i, i= {1,2,3,4,5,6}, each convolution module comprises a 3×3 Conv2d layer, BN layer, and Relu activation layer, the number of convolution kernels of each convolution module is 32, 64, 128, the padding of all convolution layers is set to 1, the convolution steps of conv_3 and conv_5 are set to 2, conv_1, conv_2, conv_4, and conv_6 are set to 1.
8. The depth localized aggregated descriptor extraction system for a finger vein image according to claim 7, wherein all convolution layers are initialized with an orthogonal matrix with bias fixed at 0 and weights and bias for bn layers fixed at 1 and 0, respectively.
9. The depth localized aggregated descriptor extraction system of claim 6 or 7, wherein said VLAD encoding module is configured with a 1 x1 convolutional layer on a network structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010050908.9A CN111274915B (en) | 2020-01-17 | 2020-01-17 | Deep local aggregation descriptor extraction method and system for finger vein image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010050908.9A CN111274915B (en) | 2020-01-17 | 2020-01-17 | Deep local aggregation descriptor extraction method and system for finger vein image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111274915A CN111274915A (en) | 2020-06-12 |
CN111274915B true CN111274915B (en) | 2023-04-28 |
Family
ID=71001095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010050908.9A Active CN111274915B (en) | 2020-01-17 | 2020-01-17 | Deep local aggregation descriptor extraction method and system for finger vein image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111274915B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112200156B (en) * | 2020-11-30 | 2021-04-30 | 四川圣点世纪科技有限公司 | Vein recognition model training method and device based on clustering assistance |
CN112733627B (en) * | 2020-12-28 | 2024-02-09 | 杭州电子科技大学 | Finger vein recognition method based on fusion local and global feature network |
CN112580590B (en) * | 2020-12-29 | 2024-04-05 | 杭州电子科技大学 | Finger vein recognition method based on multi-semantic feature fusion network |
CN112926516B (en) * | 2021-03-26 | 2022-06-14 | 长春工业大学 | Robust finger vein image region-of-interest extraction method |
CN113312989B (en) * | 2021-05-11 | 2023-06-20 | 华南理工大学 | Finger vein feature extraction network based on aggregated descriptors and attention |
CN115018056B (en) * | 2022-06-17 | 2024-09-06 | 华中科技大学 | Training method for local description subnetwork for natural scene image matching |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
CN107977609A (en) * | 2017-11-20 | 2018-05-01 | 华南理工大学 | A kind of finger vein identity verification method based on CNN |
CN109598311A (en) * | 2019-01-23 | 2019-04-09 | 中山大学 | A kind of sub- partial polymerization vector approach of description that space sub-space learning is cut based on symmetric positive definite matrix manifold |
CN110263659A (en) * | 2019-05-27 | 2019-09-20 | 南京航空航天大学 | A kind of finger vein identification method and system based on triple loss and lightweight network |
CN110427832A (en) * | 2019-07-09 | 2019-11-08 | 华南理工大学 | A kind of small data set finger vein identification method neural network based |
-
2020
- 2020-01-17 CN CN202010050908.9A patent/CN111274915B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
CN107977609A (en) * | 2017-11-20 | 2018-05-01 | 华南理工大学 | A kind of finger vein identity verification method based on CNN |
CN109598311A (en) * | 2019-01-23 | 2019-04-09 | 中山大学 | A kind of sub- partial polymerization vector approach of description that space sub-space learning is cut based on symmetric positive definite matrix manifold |
CN110263659A (en) * | 2019-05-27 | 2019-09-20 | 南京航空航天大学 | A kind of finger vein identification method and system based on triple loss and lightweight network |
CN110427832A (en) * | 2019-07-09 | 2019-11-08 | 华南理工大学 | A kind of small data set finger vein identification method neural network based |
Also Published As
Publication number | Publication date |
---|---|
CN111274915A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111274915B (en) | Deep local aggregation descriptor extraction method and system for finger vein image | |
CN111178432B (en) | Weak supervision fine granularity image classification method of multi-branch neural network model | |
CN109800648B (en) | Face detection and recognition method and device based on face key point correction | |
CN107977609B (en) | Finger vein identity authentication method based on CNN | |
CN106529468B (en) | A kind of finger vein identification method and system based on convolutional neural networks | |
CN111368683B (en) | Face image feature extraction method and face recognition method based on modular constraint CenterFace | |
CN109815801A (en) | Face identification method and device based on deep learning | |
CN111027464B (en) | Iris recognition method for jointly optimizing convolutional neural network and sequence feature coding | |
CN110543822A (en) | finger vein identification method based on convolutional neural network and supervised discrete hash algorithm | |
CN110909618B (en) | Method and device for identifying identity of pet | |
US8666122B2 (en) | Assessing biometric sample quality using wavelets and a boosted classifier | |
CN112580590A (en) | Finger vein identification method based on multi-semantic feature fusion network | |
CN111401145B (en) | Visible light iris recognition method based on deep learning and DS evidence theory | |
CN109583379A (en) | A kind of pedestrian's recognition methods again being aligned network based on selective erasing pedestrian | |
CN107967442A (en) | A kind of finger vein identification method and system based on unsupervised learning and deep layer network | |
CN108564040B (en) | Fingerprint activity detection method based on deep convolution characteristics | |
CN106529397B (en) | A kind of man face characteristic point positioning method in unconstrained condition and system | |
CN105512599A (en) | Face identification method and face identification system | |
CN113158955B (en) | Pedestrian re-recognition method based on clustering guidance and paired measurement triplet loss | |
CN111401303A (en) | Cross-visual angle gait recognition method with separated identity and visual angle characteristics | |
CN114998995A (en) | Cross-view-angle gait recognition method based on metric learning and space-time double-flow network | |
CN114973307A (en) | Finger vein identification method and system for generating countermeasure and cosine ternary loss function | |
CN112364974B (en) | YOLOv3 algorithm based on activation function improvement | |
CN111339932B (en) | Palm print image preprocessing method and system | |
CN109145704A (en) | A kind of human face portrait recognition methods based on face character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |