[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111274915B - Deep local aggregation descriptor extraction method and system for finger vein image - Google Patents

Deep local aggregation descriptor extraction method and system for finger vein image Download PDF

Info

Publication number
CN111274915B
CN111274915B CN202010050908.9A CN202010050908A CN111274915B CN 111274915 B CN111274915 B CN 111274915B CN 202010050908 A CN202010050908 A CN 202010050908A CN 111274915 B CN111274915 B CN 111274915B
Authority
CN
China
Prior art keywords
finger vein
module
image
vlad
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010050908.9A
Other languages
Chinese (zh)
Other versions
CN111274915A (en
Inventor
胡永健
文东霞
刘琲贝
王宇飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Sino Singapore International Joint Research Institute
Original Assignee
South China University of Technology SCUT
Sino Singapore International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Sino Singapore International Joint Research Institute filed Critical South China University of Technology SCUT
Priority to CN202010050908.9A priority Critical patent/CN111274915B/en
Publication of CN111274915A publication Critical patent/CN111274915A/en
Application granted granted Critical
Publication of CN111274915B publication Critical patent/CN111274915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1347Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/14Vascular patterns
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a deep local aggregation descriptor extraction method and a system for a finger vein image, wherein the method comprises the following steps: constructing a basic network module; constructing a VLAD coding module; setting K clustering center vectors as trainable parameters of the network; training the network in batches, wherein the training steps are as follows: preprocessing the finger vein image; the finger vein image obtains a multichannel characteristic diagram through a basic network module; combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding; mining the negative samples difficult to divide to obtain triples, calculating a loss function and reversely transmitting and updating a network weight coefficient; and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network. According to the method, the descriptors which are fixed in dimension and irrelevant to the distribution sequence of the original image blocks are obtained, the problem of matching failure between finger vein images with different sizes and finger vein images with the same category due to the difference of finger gestures is solved, and the depth local aggregation descriptors with better characterization force are obtained.

Description

Deep local aggregation descriptor extraction method and system for finger vein image
Technical Field
The invention relates to the technical field of finger vein feature extraction, in particular to a deep local aggregation descriptor extraction method and system for finger vein images.
Background
The finger vein recognition technology is a new generation biological recognition technology, and compared with the traditional biological recognition technology, the finger vein recognition technology has the advantages of non-contact acquisition, living body detection, low equipment cost and the like. Finger vein recognition obtains finger images through an infrared CCD camera, and extracts relevant features of the finger veins for identity authentication and recognition. The acquired finger vein image often has noise interference, how to extract the robust features of the finger vein image is a research focus in the finger vein recognition technology, the representation capability of traditional feature descriptors such as LBP (Local Binary Pattern) and LDC (Local Directional Code) is greatly influenced by image quality, and the feature images retaining the space information need complex template matching for recognition.
In recent years, various solutions based on deep learning are proposed for the finger vein recognition technology, the finger vein image is classified at the pixel level based on the idea of image segmentation for the finger vein verification problem, the method is low in classification speed and difficult to be suitable for actual use scenes, and an existing image classification model is used by a convolution network-based method, so that the model size is large. Furthermore, the extracted features are sensitive to finger gestures, and for finger vein images of different finger gestures, complex preprocessing or template matching schemes need to be employed for recognition. Therefore, a lightweight convolution network model is researched, and the feature descriptors of finger vein images, which are more robust to finger posture changes, are extracted, so that the method has more advantages in practical application.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides a depth local aggregation descriptor extraction method and system for finger vein images, which are characterized in that center vectors are connected in series to obtain descriptors with fixed dimensions and irrelevant to the distribution sequence of original image blocks, so that the matching problem between finger vein images with different dimensions and the matching failure problem between finger vein images with the same category caused by the difference of finger gestures are solved, on the basis, VLAD (very large scale analog-to-digital) coding is further carried out on the descriptors to obtain depth local aggregation descriptors with better characterization force, and the depth local aggregation descriptor has excellent performance in finger vein identification and verification tasks, and the network model has the size of only 1.1M, so that the requirement on light weight in engineering application can be better met.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a deep local aggregation descriptor extraction method of a finger vein image, which comprises the following steps:
constructing a basic network module for extracting local features of the finger vein image;
constructing a VLAD coding module for performing VLAD coding on the feature map obtained by the basic network module;
setting K clustering center vectors as trainable parameters of the network;
inputting finger vein images to train the network in batches, wherein the training steps comprise:
preprocessing the finger vein image;
the finger vein image sample obtains a multichannel characteristic diagram through a basic network module;
combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding;
digging a negative sample difficult to divide to obtain a triplet, calculating a loss function and reversely transmitting and updating a network weight coefficient until the iterative training is finished;
and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
As a preferred technical solution, the preprocessing of the finger vein image specifically includes:
extraction of the region of interest: extracting a region of interest of the finger vein training image, and completing finger inclination correction through affine transformation;
carrying out standardization treatment on the region of interest to obtain a final finger vein training sample image;
and adjusting the size of the finger vein training sample image according to the original proportion of the receptive field and the image.
As a preferred technical solution, the extracting of the region of interest specifically includes:
mask by two Sobel operators u ,Mask d Separately detecting finger vein training imagesFitting the upper edge and the lower edge of a finger by a linear regression method, calculating an angle formed by the center line and the horizontal direction, rotating a finger vein training image by affine transformation to finish inclination correction, and finally cutting off a circumscribed rectangle according to the edge of the finger to obtain a region of interest, wherein two Sobel operators are respectively expressed as:
Figure RE-GDA0002398155150000031
Figure RE-GDA0002398155150000032
wherein, mask u and Maskd Two Sobel operators are represented extending to 3 x 9.
As an preferable technical solution, the completing the VLAD encoding by combining the multi-channel feature map with the clustering center vector in the VLAD encoding module includes the specific steps of:
the multi-channel feature map is converted into w out ×h out With dimensions C out Is { x }, which describes the local descriptor of the original image i ,i=1,2,…,w out ×h out And inputting the code to VLAD coding module for coding, calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
Figure RE-GDA0002398155150000033
Figure RE-GDA0002398155150000034
wherein ,
Figure RE-GDA0002398155150000035
and />
Figure RE-GDA0002398155150000036
Respectively represent the ithDescriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described.
As an optimal technical scheme, the method for mining the negative samples difficult to separate to obtain the triples comprises the following specific steps:
selecting two local aggregation descriptors
Figure RE-GDA0002398155150000041
and />
Figure RE-GDA0002398155150000042
Figure RE-GDA0002398155150000043
and />
Figure RE-GDA0002398155150000044
Is of the same class,/->
Figure RE-GDA0002398155150000045
Forming a pair of aligned samples;
for each positive sample pair
Figure RE-GDA0002398155150000046
Selecting a negative sample from the other categories +.>
Figure RE-GDA0002398155150000047
Constitutes triplet->
Figure RE-GDA0002398155150000048
Negative sample->
Figure RE-GDA0002398155150000049
Make->
Figure RE-GDA00023981551500000410
Minimum, wherein mark represents a set threshold parameter.
As an preferable technical solution, the calculation formula for calculating the loss function for the triples of the same batch is as follows:
Figure RE-GDA00023981551500000411
wherein m represents the category of the images of the same batch, and n represents the number of samples in each category.
The invention also provides a deep local aggregation descriptor extraction system of the finger vein image, which comprises the following steps: the system comprises a basic network module construction unit, a VLAD coding module construction unit, a clustering center vector construction unit, a training unit and an extraction unit;
the basic network module construction unit is used for constructing a basic network module, and the basic network module is used for extracting local features of the finger vein image;
the VLAD encoding module constructing unit is used for constructing a VLAD encoding module, and the VLAD encoding module is used for performing VLAD encoding on the feature map obtained by the basic network module;
the cluster center vector construction unit is used for setting K cluster center vectors as trainable parameters of the network;
the training unit is used for inputting finger vein images to train the network in batches, and comprises: the system comprises an image preprocessing module, a multi-channel feature map acquisition module, a combination coding module, a triplet construction module and an iteration updating module;
the image preprocessing module is used for preprocessing the finger vein image;
the multi-channel characteristic map acquisition module is used for acquiring a multi-channel characteristic map from a finger vein image sample through the basic network module;
the combined coding module is used for combining the multi-channel characteristic diagram with the clustering center vector in the VLAD coding module to finish VLAD coding;
the triplet construction module is used for excavating the difficult-to-separate negative samples to obtain triples;
the iteration updating module is used for calculating a loss function and back-propagating and updating a network weight coefficient until the iteration training is finished;
the extraction unit is used for extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
As a preferred technical solution, the structure of the base network module adopts 6 serially connected convolution modules, which are respectively denoted as conv_i, i= {1,2,3,4,5,6}, each convolution module includes a 3×3 Conv2d layer, a BN layer and a Relu activation layer, the number of convolution kernels of each convolution module is respectively 32, 64, 128, the filling of all convolution layers is set to 1, the convolution step sizes of conv_3 and conv_5 are set to 2, and conv_1, conv_2, conv_4 and conv_6 are set to 1.
As a preferable technical scheme, all convolution layers are initialized by adopting an orthogonal matrix, the offset is fixed to be 0, and the weight and the offset of the BN layer are respectively fixed to be 1 and 0.
As a preferred solution, the VLAD coding module sets a 1×1 convolutional layer on the network structure.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) According to the invention, the descriptors are obtained through CNN network end-to-end learning, the network model is only 1.1M in size, and the extracted descriptors can be further used for tasks such as finger vein verification and identification, so that the method is flexible in use and wide in application.
(2) For finger vein images with any size, K clustering centers are obtained through network automatic learning, and the clustering center vectors are connected in series to form descriptor vectors for representing the features of the finger vein images, so that the matching problem between the finger vein images with different sizes is solved.
(3) The invention carries out VLAD coding on the characteristics of the finger vein image, and fully utilizes the information of the characteristic map under the condition that only 1X 1 convolution parameters are additionally introduced, thereby obtaining the finger vein image descriptor with more characterization force.
(4) According to the invention, the triplet samples are constructed during the network training period, so that the requirement on the number of finger vein training images is reduced, and the number of positive and negative samples for training is ensured to be equal; a difficult-to-separate negative sample mining strategy is adopted when a sample is constructed, so that network convergence is quickened; the triplet loss function is adopted to train the network, so that the network is promoted to learn the differences among different finger vein images rather than the label information, and the generalization performance of the method is improved.
Drawings
FIG. 1 is a diagram showing an example of an image of a finger vein data set according to the present embodiment;
FIG. 2 is a schematic diagram of a division manner of the venous data set in the present embodiment;
FIG. 3 is a schematic diagram of a batch training process of the network model according to the present embodiment;
FIG. 4 is a diagram showing training and testing images obtained by extracting a region of interest according to the present embodiment;
FIG. 5 is a flow chart of the network model test according to the present embodiment;
fig. 6 is a schematic structural diagram of a deep local aggregation descriptor extraction system for a finger vein image according to the present embodiment;
fig. 7 is a schematic structural diagram of a basic network module according to the present embodiment;
fig. 8 is a schematic diagram of the VLAD encoding module according to the present embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
Training and testing on three finger vein data sets of SDUMLA, FV-USM and MMCBNU_6000, wherein the SDUMLA data set is from Shandong university, finger vein images are from 636 fingers of 106 subjects, 6 gray-scale BMP images are acquired from the index finger, the middle finger and the ring finger of the left hand, and the resolution ratio of the images is 320 multiplied by 240; the FV-USM dataset was from university of Malaysia, consisting of images of the left hand, right hand index finger and middle finger veins of 123 subjects, the images of the database were from two different experiments, 12 images per finger; the mmcbnu_6000 dataset was from the university of north national republic of korea, consisting of finger vein images from 100 volunteers, each finger image was repeatedly acquired 10 times for a total of 6000 images.
As shown in fig. 1, an image example of the above disclosed finger vein data set is given in the figure, as shown in fig. 2, in this embodiment, 100 finger images of the mmcbnu_6000 finger vein data set are classified into 600 classes according to finger types, each class includes 10 sample images, 300 classes of images are randomly taken as training sets, the training sets have 3000 samples in total, and the rest are used as test sets, and in this embodiment, the test sets are further divided into a registered template library and a sample set to be tested;
the embodiment is mainly realized based on a deep learning framework Pytorch, a display card used in an experiment is GTX1080Ti, and a finger vein descriptor extracted from a test image is used for identifying and verifying tasks.
The embodiment provides a deep local aggregation descriptor extraction method of a finger vein image, which comprises the following steps:
constructing a basic network module for extracting local features of the finger vein image;
constructing a VLAD coding module for performing VLAD coding on the feature map obtained by the basic network module;
setting K clustering center vectors as trainable parameters of a network: the present embodiment sets K dimensions as C out Cluster center vector { c } k K=1, 2, …, K } and uniformly distributed random initialisation thereof c k As determined by network learning, the present embodiment automatically learns by network for finger vein images of arbitrary sizesK clustering centers are obtained, and the clustering center vectors are connected in series to form a descriptor vector representing the characteristics of the finger vein images, and the length of the descriptor vector is fixed to 128 XK, so that the matching problem between the finger vein images with different sizes is solved. In particular, since the descriptor vector is composed of cluster center vectors rather than feature vectors of image local blocks, the two finger vein images can still be correctly matched under the condition that the spatial position difference exists, so that the problem of matching failure caused by finger gesture difference between the finger vein images of the same category is solved, in the embodiment, in order to enable the extracted descriptors to have strong characterization force and a relatively simple form, the number K of the cluster center vectors is set to be 8-15, and the embodiment preferably sets K to be 10.
As shown in fig. 3, the input finger vein image trains the network in batches, and the training steps include:
preprocessing a finger vein image, segmenting a finger region from a background, and intercepting a circumscribed rectangular region at the edge of the finger to obtain the region of interest for training, so that background noise is removed, and original information is kept as much as possible;
in this embodiment, the specific steps of preprocessing the finger vein image include:
extraction of the region of interest: mask by two Sobel operators u ,Mask d Detecting the upper edge and the lower edge of the finger vein training image respectively, fitting the midline of the finger by a linear regression method, calculating the angle formed by the midline and the horizontal direction, rotating the finger vein training image by affine transformation to finish inclination correction, and finally intercepting the circumscribed rectangle of the whole finger according to the edge point at the outermost side of the finger to obtain a region of interest, wherein as shown in fig. 4, training and testing images are obtained according to the region of interest;
in this embodiment, the two Sobel operators are respectively expressed as:
Figure RE-GDA0002398155150000081
Figure RE-GDA0002398155150000082
wherein, mask u and Maskd Representing two Sobel operators extended to 3×9;
the method comprises the steps of adjusting the sizes of finger vein training images according to the original proportion of the receptive field and the images, calculating the average aspect ratio of all training images to be 2, calculating the local area with the receptive field of 23 x 23 of each characteristic point of an output characteristic image according to the structural parameters of a basic network module, adjusting the height of an input image to be h=64 and the width to be w=128, wherein the receptive field of each characteristic point in the characteristic image is about 1/3 of the height of the input image, and reflecting richer finger vein information;
carrying out normalization processing of subtracting the mean value and dividing the variance on each training image, and reducing the influence of uneven illumination to obtain a final finger vein training sample image;
building a training batch sampler: the training images are loaded in batches, m categories are randomly selected for each batch, n samples are loaded per category, i.e. m×n samples in total, in this embodiment m=16, n=6;
initializing trainable parameters in a network structure, initializing parameters of a basic network module, wherein the weight of a convolution layer is initialized by an orthogonal matrix, the bias is fixed to 0, and the weight and the bias of a BN layer are fixed to 1 and 0; initializing class center vectors, setting the number K of descriptor clusters as 10, and setting the descriptor cluster center c k K=1, 2, …, K, randomly initialized with a uniform distribution; initializing trainable parameters of the VLAD coding module, initializing parameters of the 1×1 convolution according to a cluster center, wherein the weight of the convolution kernel is initialized to w k =2c k Bias initialisation to b k =-‖c k2
Setting the super parameter mark of the triplet loss as 1;
setting the iteration number of the model as 200 times, fixing the learning rate as 0.01, initializing a minimum verification loss and a model preservation path by using an optimization method of Adam (Adaptive moment estimation ) for later preservation of a model with the minimum verification loss, setting the initial iteration number of the model and an initial sample batch as 0, and setting a batch size of algorithm training as 16 multiplied by 6=96 by a batch sampler;
model training iteration times are increased by 1, training of the model is continued, training sample batches are increased by 1, and loading of one batch of samples is started or loading of the next batch of samples is continued;
according to the setting of the batch sampler, 96 preprocessed training images are loaded from the training set;
after the finger vein image training sample image passes through a 6-layer 3×3 convolution module of the basic network module, the pooling effect is realized because the step length of two layers of convolutions is 2, and finally, a feature map with 128 channels and 16×32 pixels is obtained;
the multi-channel feature map is combined with a clustering center vector in a VLAD coding module to finish VLAD coding, and K dimensions are set as C out Cluster center vector { c } k K=1, 2, …, K } and uniformly distributed random initialisation thereof c k Determining through network learning;
the VLAD encoding comprises the following specific steps:
let resolution be w out ×h out C of (2) out Conversion of channel profile to w out ×h out With dimensions C out Is { x }, which describes the local descriptor of the original image i ,i=1,2,…,w out ×h out The present embodiment is specifically expressed as: 16×32 128-dimensional local descriptors, with { x } i I=1, 2, …,512} means that the local aggregate descriptor of 1280 dimension is obtained after passing through the VLAD coding module;
and input into VLAD coding module for coding, i.e. calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
Figure RE-GDA0002398155150000101
Figure RE-GDA0002398155150000102
wherein ,
Figure RE-GDA0002398155150000103
and />
Figure RE-GDA0002398155150000104
Respectively represent the ith descriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described in (1);
the training images of the same batch are processed by a basic network module and a VLAD coding module to obtain m multiplied by n local aggregation descriptors
Figure RE-GDA0002398155150000111
wherein />
Figure RE-GDA0002398155150000112
The local aggregate descriptors representing the ith training sample in the batch, 96 training samples in one batch of this embodiment obtain 96 descriptors through the network, and form a matrix
Figure RE-GDA0002398155150000113
wherein />
Figure RE-GDA0002398155150000114
The descriptors obtained after each training image is processed by the basic network module and the VLAD encoding module are represented, and elements (i, j) of the Euclidean distance matrix between every two descriptors are obtained through matrix calculation>
Figure RE-GDA0002398155150000115
Matrix arrayThe size is 96×96, and the diagonal element is 0;
digging a negative sample difficult to divide to obtain a triplet, and selecting two local polymerization descriptors
Figure RE-GDA0002398155150000116
and />
Figure RE-GDA0002398155150000117
Figure RE-GDA0002398155150000118
and />
Figure RE-GDA0002398155150000119
Is of the same class,/->
Figure RE-GDA00023981551500001110
Forming a pair of aligned samples;
for each positive sample pair
Figure RE-GDA00023981551500001111
Selecting a negative sample from the other categories +.>
Figure RE-GDA00023981551500001112
Constitutes triplet->
Figure RE-GDA00023981551500001113
Negative sample->
Figure RE-GDA00023981551500001114
Make->
Figure RE-GDA00023981551500001115
In the embodiment, when the difference between the distances between the similar descriptor and the heterogeneous descriptor is calculated, the threshold value mark is set to be 1, so that the similar descriptor and the heterogeneous descriptor can be well distinguished;
calculating a loss function and reversely transmitting and updating a network weight coefficient, judging whether one training is completed for all samples, if so, entering a verification loss step, otherwise, returning to continue training;
judging whether the current verification loss is smaller than the minimum loss, if yes, saving the model or updating the saved model, updating the value of the minimum loss, otherwise, entering into judging whether iteration is completed for a set number of times;
judging whether 200 iterations are completed, if yes, finishing training, otherwise turning to a model training iteration step, namely adding 1 to the number of model training iterations, continuing training of a model, adding 1 to a training sample batch, and starting loading a batch of samples or continuing loading the next batch of samples;
in this embodiment, the calculation formula for calculating the loss function for the triples of the same batch is:
Figure RE-GDA00023981551500001116
wherein m represents the class of the images in the same batch, and n represents the number of samples in each class; in the embodiment, the triplet samples are constructed during the network training, so that the requirement on the number of finger vein training images is reduced, and the number of positive and negative samples for training is ensured to be equal; when a sample is constructed, a difficult-to-separate negative sample mining strategy is used, so that network convergence is quickened; the triplet loss function is used for training the network, so that the network is promoted to learn the differences among different finger vein images rather than the label information, and the generalization performance of the method is improved.
As shown in fig. 5, a trained network is adopted to extract a local aggregation descriptor of a finger vein image to be tested, and the network structure of the test stage is the same as that of the training stage, and the specific steps are as follows:
the method comprises the steps of carrying out image preprocessing, including extraction and standardization of a region of interest and size adjustment, on finger vein images to be detected, adjusting the sizes of all images of a test set to 64×128 after image preprocessing, inputting the preprocessed finger vein images into a trained network to obtain depth local aggregation descriptors of the finger vein images, wherein the obtained descriptors can be further used for identification or verification of finger veins.
In this embodiment, the finger vein recognition and verification tasks are tested separately;
for the finger vein recognition task:
inputting all images of the registered template library into a network to obtain a feature descriptor;
obtaining descriptors from each image in a sample set to be detected through a network respectively, and calculating Euclidean distances between the descriptors and feature descriptors of all registration templates;
sorting the Euclidean distances, and identifying the current sample to be tested as a registered template with the smallest Euclidean distance;
for the finger vein verification task:
testing by using a registered template library, and selecting a proper classification threshold value to be 1;
for each sample to be tested, forming positive sample pairs with other samples in the same class from the test set, randomly selecting equal amounts of heterogeneous samples to form negative sample pairs, and forming 300×5×5×2=15000 pairs in total;
obtaining descriptors from the sample pairs one by one through a network, and calculating Euclidean distance between every two descriptors;
if the Euclidean distance between the two descriptors is lower than 1, the two samples are of the same class, and the verification is successful, otherwise, the verification fails;
finally, the test result is saved, the depth local aggregation descriptor is adopted in the embodiment, special processing is not needed for finger vein images of different finger postures, and matching identification can be directly realized through simple measurement methods such as Euclidean distance calculation and cosine similarity calculation.
In this embodiment, the test method for the other two public databases is substantially the same as the above steps. As shown in table 1 below, the evaluation index related to the experimental result of the test of the finger vein 1:1 verification in the method of the present example was EER (Equal Error Rate), namely, the FAR value when FAR (False Accept Rate) and FRR (False Reject Rate) are equal, and the FAR value when |far-frr| <0.0001 was taken as EER in the experiment.
Table 1 test finger vein 1:1 verification EER results table
SDUMLA FV-USM MMCBNU_6000
EER 0.95% 0.38% 0.10%
As shown in Table 2 below, the experimental results of vein 1:N identification in this example refer to the evaluation index IR (k):
Figure RE-GDA0002398155150000131
wherein B represents a set of all samples to be tested, B is a certain sample to be tested, rank (B) is a ranking of similarity between the sample to be tested and the same-class sample in the registered template library, U B Representing the number of sample sets to be tested. IR (1) indicates the proportion of the first sample in the test samples to all test samples, with the similarity of the same class of samples in the template library.
TABLE 2 test finger vein 1:N identification IR (k) results table
SDUMLA FV-USM MMCBNU_6000
IR(k) 99.50% 99.87% 100%
As shown in table 3 below, the parameters of the model of this example and the finger vein recognition related times tested on the CPU.
Table 3 model size and time consumption table
Model size Feature extraction time Euclidean distance calculation time
1.1M 0.0144s 0.00037s
As can be seen from the above tables 1 to 3, the network proposed in this embodiment has effectiveness in both tasks of finger vein recognition and verification, and the network model is only 1.1M, the feature extraction time is short, and the calculation of the similarity is short in the case that the descriptor dimension is 1280 in this embodiment; in this embodiment, the feature of the finger vein image is subjected to VLAD encoding, and under the condition that only the 1×1 convolution parameter is additionally introduced, the information of the feature map is fully utilized, so as to obtain the finger vein image descriptor with better characterization force.
As shown in fig. 6, this embodiment further provides a deep local aggregation descriptor extraction system for a finger vein image, including: the system comprises a basic network module construction unit, a VLAD coding module construction unit, a clustering center vector construction unit, a training unit and an extraction unit;
in this embodiment, the basic network module construction unit is configured to construct a basic network module, where the basic network module is configured to extract local features of the finger vein image; the VLAD encoding module construction unit is used for constructing a VLAD encoding module, and the VLAD encoding module is used for performing VLAD encoding on the feature map obtained by the basic network module; the cluster center vector construction unit is used for setting K cluster center vectors as trainable parameters of the network; the training unit is used for inputting the finger vein images to train the network in batches, and the extracting unit is used for extracting the local aggregation descriptors of the finger vein images to be detected by adopting the trained network;
as shown in fig. 7, the structure of the basic network module adopts 6 serially connected convolution modules, which are respectively denoted as conv_i, i= {1,2,3,4,5,6}, each convolution module comprises a 3×3 Conv2d layer, a BN layer and a Relu activation layer, the number of convolution kernels of each convolution module is respectively 32, 64, 128 and 128, the filling of all the convolution layers is set to 1, the convolution steps of conv_3 and conv_5 are set to 2, conv_1, conv_2, conv_4 and conv_6 are set to 1, all the convolution layers are initialized by adopting an orthogonal matrix, the bias is fixed to 0, the weight and the bias of the BN layer are respectively fixed to 1 and 0 without updating the weight and the bias of the BN layer, the model training parameters can be reduced, and the influence on the result is small;
as shown in fig. 8, the VLAD encoding module sets a 1×1 convolutional layer on the network structure, wherein the weights w k =2c k Bias b k =-‖c k2 For representing a simplified a k (x i ):
Figure RE-GDA0002398155150000141
In this embodiment, the training unit includes: the system comprises an image preprocessing module, a multi-channel feature map acquisition module, a combination coding module, a triplet construction module and an iteration updating module;
in this embodiment, the image preprocessing module is configured to preprocess a finger vein image; the multi-channel characteristic map acquisition module is used for acquiring a multi-channel characteristic map from the finger vein image sample through the basic network module; the combination coding module is used for combining the multi-channel characteristic diagram with the clustering center vector in the VLAD coding module to finish VLAD coding; the ternary structure modeling block is used for excavating the negative samples difficult to divide to obtain the ternary group; the iteration updating module is used for calculating the loss function and back-propagating and updating the network weight coefficient until the iteration training is finished.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims (9)

1. The deep local aggregation descriptor extraction method for the finger vein image is characterized by comprising the following steps of:
constructing a basic network module for extracting local features of the finger vein image;
constructing a VLAD coding module for performing VLAD coding on the feature map obtained by the basic network module;
the VLAD coding is completed by combining the multi-channel feature map with the clustering center vector in a VLAD coding module, and the method specifically comprises the following steps:
the multi-channel feature map is converted into w out ×h out With dimensions C out Is a local description of the original imageSub { x i ,i=1,2,...,w out ×h out And inputting the code to VLAD coding module for coding, calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
Figure QLYQS_1
Figure QLYQS_2
wherein ,
Figure QLYQS_3
and />
Figure QLYQS_4
Respectively represent the ith descriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described in (1);
setting K clustering center vectors as trainable parameters of the network;
inputting finger vein images to train the network in batches, wherein the training steps comprise:
preprocessing the finger vein image;
the preprocessed finger vein image is used for obtaining a multichannel characteristic diagram through a basic network module;
combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding;
digging a negative sample difficult to divide to obtain a triplet, calculating a loss function and reversely transmitting and updating a network weight coefficient until the iterative training is finished;
and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
2. The method for extracting deep local polymerization descriptors from a finger vein image according to claim 1, wherein the preprocessing of the finger vein image comprises the following specific steps:
extraction of the region of interest: extracting a region of interest of the finger vein training image, and completing finger inclination correction through affine transformation;
carrying out standardization treatment on the region of interest to obtain a final finger vein training sample image;
and adjusting the size of the finger vein training sample image according to the original proportion of the receptive field and the image.
3. The method for extracting depth local aggregation descriptors of finger vein images according to claim 2, wherein the extracting of the region of interest comprises the specific steps of:
mask by two Sobel operators u ,Mask d Detecting the upper edge and the lower edge of the finger vein training image respectively, fitting the midline of the finger by a linear regression method, calculating the angle formed by the midline and the horizontal direction, rotating the finger vein training image by affine transformation to finish inclination correction, and finally intercepting circumscribed rectangles according to the edge of the finger to obtain a region of interest, wherein two Sobel operators are respectively expressed as:
Figure QLYQS_5
Figure QLYQS_6
wherein, mask u and Maskd Two Sobel operators are represented extending to 3 x 9.
4. The method for extracting deep local aggregation descriptors of finger vein images according to claim 1, wherein the mining of difficult-to-separate negative samples to obtain triples specifically comprises the following steps:
selecting two local aggregation descriptors
Figure QLYQS_7
and />
Figure QLYQS_8
and />
Figure QLYQS_9
Is of the same class,/->
Figure QLYQS_10
Forming a pair of aligned samples;
for each positive sample pair
Figure QLYQS_11
Selecting a negative sample from the other categories +.>
Figure QLYQS_12
Constitutes triplet->
Figure QLYQS_13
Negative sample->
Figure QLYQS_14
Make->
Figure QLYQS_15
Minimum, wherein mark represents a set threshold parameter.
5. The method for extracting deep local aggregation descriptor in finger vein image according to claim 4, wherein the calculation formula for calculating the loss function for the same batch of triplets is as follows:
Figure QLYQS_16
wherein m represents the category of the images of the same batch, and n represents the number of samples in each category.
6. A depth local aggregation descriptor extraction system for a finger vein image, comprising: the system comprises a basic network module construction unit, a VLAD coding module construction unit, a clustering center vector construction unit, a training unit and an extraction unit;
the basic network module construction unit is used for constructing a basic network module, and the basic network module is used for extracting local features of the finger vein image;
the VLAD encoding module constructing unit is used for constructing a VLAD encoding module, and the VLAD encoding module is used for performing VLAD encoding on the feature map obtained by the basic network module;
the VLAD coding is completed by combining the multi-channel feature map with the clustering center vector in a VLAD coding module, and the method specifically comprises the following steps:
the multi-channel feature map is converted into w out ×h out With dimensions C out Is { x }, which describes the local descriptor of the original image i ,i=1,2,...,w out ×h out And inputting the code to VLAD coding module for coding, calculating K rows C out The elements of the matrix V of columns at the (k, j) positions are:
Figure QLYQS_17
Figure QLYQS_18
wherein ,
Figure QLYQS_19
and />
Figure QLYQS_20
Separate tableShow the ith descriptor x i The jth component of (c) and the kth cluster center c k The j-th component, a k (x i ) Representation descriptor x i Probability belonging to kth cluster, c k′ Representing other cluster center vectors than the kth cluster center vector;
flattening matrix V into one-dimensional vector and effecting L 2 Normalizing to obtain the length of KXC out Is described in (1);
the cluster center vector construction unit is used for setting K cluster center vectors as trainable parameters of the network;
the training unit is used for inputting finger vein images to train the network in batches, and comprises: the system comprises an image preprocessing module, a multi-channel feature map acquisition module, a combination coding module, a triplet construction module and an iteration updating module;
the image preprocessing module is used for preprocessing the finger vein image;
the multi-channel characteristic map acquisition module is used for acquiring a multi-channel characteristic map from a finger vein image sample through the basic network module;
the combined coding module is used for combining the multi-channel characteristic diagram with the clustering center vector in the VLAD coding module to finish VLAD coding;
the triplet construction module is used for excavating the difficult-to-separate negative samples to obtain triples;
the iteration updating module is used for calculating a loss function and back-propagating and updating a network weight coefficient until the iteration training is finished;
the extraction unit is used for extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network.
7. The system of claim 6, wherein the basic network module is configured with 6 serially connected convolution modules, respectively denoted conv_i, i= {1,2,3,4,5,6}, each convolution module comprises a 3×3 Conv2d layer, BN layer, and Relu activation layer, the number of convolution kernels of each convolution module is 32, 64, 128, the padding of all convolution layers is set to 1, the convolution steps of conv_3 and conv_5 are set to 2, conv_1, conv_2, conv_4, and conv_6 are set to 1.
8. The depth localized aggregated descriptor extraction system for a finger vein image according to claim 7, wherein all convolution layers are initialized with an orthogonal matrix with bias fixed at 0 and weights and bias for bn layers fixed at 1 and 0, respectively.
9. The depth localized aggregated descriptor extraction system of claim 6 or 7, wherein said VLAD encoding module is configured with a 1 x1 convolutional layer on a network structure.
CN202010050908.9A 2020-01-17 2020-01-17 Deep local aggregation descriptor extraction method and system for finger vein image Active CN111274915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010050908.9A CN111274915B (en) 2020-01-17 2020-01-17 Deep local aggregation descriptor extraction method and system for finger vein image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010050908.9A CN111274915B (en) 2020-01-17 2020-01-17 Deep local aggregation descriptor extraction method and system for finger vein image

Publications (2)

Publication Number Publication Date
CN111274915A CN111274915A (en) 2020-06-12
CN111274915B true CN111274915B (en) 2023-04-28

Family

ID=71001095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010050908.9A Active CN111274915B (en) 2020-01-17 2020-01-17 Deep local aggregation descriptor extraction method and system for finger vein image

Country Status (1)

Country Link
CN (1) CN111274915B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200156B (en) * 2020-11-30 2021-04-30 四川圣点世纪科技有限公司 Vein recognition model training method and device based on clustering assistance
CN112733627B (en) * 2020-12-28 2024-02-09 杭州电子科技大学 Finger vein recognition method based on fusion local and global feature network
CN112580590B (en) * 2020-12-29 2024-04-05 杭州电子科技大学 Finger vein recognition method based on multi-semantic feature fusion network
CN112926516B (en) * 2021-03-26 2022-06-14 长春工业大学 Robust finger vein image region-of-interest extraction method
CN113312989B (en) * 2021-05-11 2023-06-20 华南理工大学 Finger vein feature extraction network based on aggregated descriptors and attention
CN115018056B (en) * 2022-06-17 2024-09-06 华中科技大学 Training method for local description subnetwork for natural scene image matching

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169415A (en) * 2017-04-13 2017-09-15 西安电子科技大学 Human motion recognition method based on convolutional neural networks feature coding
CN107977609A (en) * 2017-11-20 2018-05-01 华南理工大学 A kind of finger vein identity verification method based on CNN
CN109598311A (en) * 2019-01-23 2019-04-09 中山大学 A kind of sub- partial polymerization vector approach of description that space sub-space learning is cut based on symmetric positive definite matrix manifold
CN110263659A (en) * 2019-05-27 2019-09-20 南京航空航天大学 A kind of finger vein identification method and system based on triple loss and lightweight network
CN110427832A (en) * 2019-07-09 2019-11-08 华南理工大学 A kind of small data set finger vein identification method neural network based

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169415A (en) * 2017-04-13 2017-09-15 西安电子科技大学 Human motion recognition method based on convolutional neural networks feature coding
CN107977609A (en) * 2017-11-20 2018-05-01 华南理工大学 A kind of finger vein identity verification method based on CNN
CN109598311A (en) * 2019-01-23 2019-04-09 中山大学 A kind of sub- partial polymerization vector approach of description that space sub-space learning is cut based on symmetric positive definite matrix manifold
CN110263659A (en) * 2019-05-27 2019-09-20 南京航空航天大学 A kind of finger vein identification method and system based on triple loss and lightweight network
CN110427832A (en) * 2019-07-09 2019-11-08 华南理工大学 A kind of small data set finger vein identification method neural network based

Also Published As

Publication number Publication date
CN111274915A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN111274915B (en) Deep local aggregation descriptor extraction method and system for finger vein image
CN111178432B (en) Weak supervision fine granularity image classification method of multi-branch neural network model
CN109800648B (en) Face detection and recognition method and device based on face key point correction
CN107977609B (en) Finger vein identity authentication method based on CNN
CN106529468B (en) A kind of finger vein identification method and system based on convolutional neural networks
CN111368683B (en) Face image feature extraction method and face recognition method based on modular constraint CenterFace
CN109815801A (en) Face identification method and device based on deep learning
CN111027464B (en) Iris recognition method for jointly optimizing convolutional neural network and sequence feature coding
CN110543822A (en) finger vein identification method based on convolutional neural network and supervised discrete hash algorithm
CN110909618B (en) Method and device for identifying identity of pet
US8666122B2 (en) Assessing biometric sample quality using wavelets and a boosted classifier
CN112580590A (en) Finger vein identification method based on multi-semantic feature fusion network
CN111401145B (en) Visible light iris recognition method based on deep learning and DS evidence theory
CN109583379A (en) A kind of pedestrian&#39;s recognition methods again being aligned network based on selective erasing pedestrian
CN107967442A (en) A kind of finger vein identification method and system based on unsupervised learning and deep layer network
CN108564040B (en) Fingerprint activity detection method based on deep convolution characteristics
CN106529397B (en) A kind of man face characteristic point positioning method in unconstrained condition and system
CN105512599A (en) Face identification method and face identification system
CN113158955B (en) Pedestrian re-recognition method based on clustering guidance and paired measurement triplet loss
CN111401303A (en) Cross-visual angle gait recognition method with separated identity and visual angle characteristics
CN114998995A (en) Cross-view-angle gait recognition method based on metric learning and space-time double-flow network
CN114973307A (en) Finger vein identification method and system for generating countermeasure and cosine ternary loss function
CN112364974B (en) YOLOv3 algorithm based on activation function improvement
CN111339932B (en) Palm print image preprocessing method and system
CN109145704A (en) A kind of human face portrait recognition methods based on face character

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant