CN111274915B

CN111274915B - Deep local aggregation descriptor extraction method and system for finger vein image

Info

Publication number: CN111274915B
Application number: CN202010050908.9A
Authority: CN
Inventors: 胡永健; 文东霞; 刘琲贝; 王宇飞
Original assignee: South China University of Technology SCUT; Sino Singapore International Joint Research Institute
Current assignee: South China University of Technology SCUT; Sino Singapore International Joint Research Institute
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2023-04-28
Anticipated expiration: 2040-01-17
Also published as: CN111274915A

Abstract

The invention discloses a deep local aggregation descriptor extraction method and a system for a finger vein image, wherein the method comprises the following steps: constructing a basic network module; constructing a VLAD coding module; setting K clustering center vectors as trainable parameters of the network; training the network in batches, wherein the training steps are as follows: preprocessing the finger vein image; the finger vein image obtains a multichannel characteristic diagram through a basic network module; combining the multi-channel feature map with a clustering center vector in a VLAD coding module to finish VLAD coding; mining the negative samples difficult to divide to obtain triples, calculating a loss function and reversely transmitting and updating a network weight coefficient; and extracting the local aggregation descriptor of the finger vein image to be detected by adopting a trained network. According to the method, the descriptors which are fixed in dimension and irrelevant to the distribution sequence of the original image blocks are obtained, the problem of matching failure between finger vein images with different sizes and finger vein images with the same category due to the difference of finger gestures is solved, and the depth local aggregation descriptors with better characterization force are obtained.

Description

A deep local aggregation descriptor extraction method and system for finger vein images

技术领域technical field

本发明涉及手指静脉特征提取技术领域，具体涉及一种指静脉图像的深度局部聚合描述子提取方法及系统。The invention relates to the technical field of finger vein feature extraction, in particular to a method and system for extracting deep local aggregation descriptors of finger vein images.

背景技术Background technique

指静脉识别技术是新一代生物识别技术，相比于传统的生物识别技术，指静脉识别具有非接触式采集、活体检测、设备成本低等优点。指静脉识别通过红外线CCD摄像头获取手指图像，并通过提取指静脉相关特征用于身份认证和识别。采集到的指静脉图像往往具有噪声干扰，如何提取指静脉图像的鲁棒特征是指静脉识别技术中的一个研究重点，传统的特征描述子如LBP(Local Binary Pattern)、LDC(Local Directional Code)等的表示能力受图像质量影响较大，保留了空间信息的特征图则需要复杂的模板匹配进行识别。Finger vein recognition technology is a new generation of biometric technology. Compared with traditional biometric technology, finger vein recognition has the advantages of non-contact collection, live detection, and low equipment cost. Finger vein recognition acquires finger images through infrared CCD cameras, and extracts finger vein-related features for identity authentication and identification. The collected finger vein images often have noise interference. How to extract robust features of finger vein images is a research focus in vein recognition technology. Traditional feature descriptors such as LBP (Local Binary Pattern) and LDC (Local Directional Code) The representation ability of such as is greatly affected by the image quality, and the feature map that retains the spatial information requires complex template matching for recognition.

近年来，针对指静脉识别技术提出了多种基于深度学习的解决方案，针对指静脉验证问题，基于图像分割的思想对指静脉图像做像素级分类，该方法分类速度较慢，难以适用于实际使用场景，并且基于卷积网络的方法使用现有的图像分类模型，模型体积较大。此外，所提取的特征对于手指姿势较敏感，对于不同手指姿势的指静脉图像，需要采用复杂的预处理或模板匹配方案来进行识别。因此，研究轻量级的卷积网络模型，提取指静脉图像对手指姿势变化更鲁棒的特征描述子，在实际应用中更具优势。In recent years, a variety of deep learning-based solutions have been proposed for finger vein recognition technology. For the finger vein verification problem, the pixel-level classification of finger vein images is based on the idea of image segmentation. This method is slow in classification and difficult to apply in practice. The scene is used, and the method based on the convolutional network uses the existing image classification model, and the model size is large. In addition, the extracted features are sensitive to finger poses, and complex preprocessing or template matching schemes are required for recognition of finger vein images with different finger poses. Therefore, it is more advantageous in practical applications to study lightweight convolutional network models and extract feature descriptors that are more robust to finger pose changes in finger vein images.

发明内容Contents of the invention

为了克服现有技术存在的缺陷与不足，本发明提供一种指静脉图像的深度局部聚合描述子提取方法及系统，将中心向量串联得到维度固定且与原始图像块分布顺序无关的描述子，解决了不同尺寸指静脉图像之间的匹配问题，以及同一类别指静脉图像之间因手指姿势差异造成的匹配失败问题，在此基础上，进一步对描述子进行VLAD编码得到了更具有表征力的深度局部聚合描述子，在指静脉识别和验证任务中均具有优秀的表现，本发明的网络模型大小仅1.1M，能更好满足工程应用中对轻量化的需求。In order to overcome the defects and deficiencies existing in the prior art, the present invention provides a method and system for extracting depth local aggregated descriptors of finger vein images. The central vectors are concatenated to obtain descriptors with fixed dimensions and independent of the distribution order of the original image blocks, which solves the problem of The matching problem between finger vein images of different sizes and the matching failure problem caused by finger pose differences between finger vein images of the same category are solved. On this basis, the descriptor is further encoded by VLAD to obtain a more expressive depth The local aggregation descriptor has excellent performance in finger vein recognition and verification tasks. The size of the network model of the present invention is only 1.1M, which can better meet the demand for lightweight in engineering applications.

为了达到上述目的，本发明采用以下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

本发明提供一种指静脉图像的深度局部聚合描述子提取方法，包括下述步骤：The present invention provides a method for extracting depth local aggregation descriptors of finger vein images, comprising the following steps:

构建基础网络模块，用于提取指静脉图像的局部特征；Build a basic network module for extracting local features of finger vein images;

构建VLAD编码模块，用于对基础网络模块得到的特征图进行VLAD编码；Construct a VLAD encoding module, which is used to perform VLAD encoding on the feature map obtained by the basic network module;

设置K个聚类中心向量为网络的可训练参数；Set the K cluster center vectors as the trainable parameters of the network;

输入指静脉图像分批对网络进行训练，训练步骤包括：Input finger vein images in batches to train the network, the training steps include:

对指静脉图像进行预处理；Preprocessing the finger vein image;

指静脉图像样本通过基础网络模块得到多通道特征图；Finger vein image samples get multi-channel feature maps through the basic network module;

将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码；The multi-channel feature map is combined with the cluster center vector in the VLAD encoding module to complete the VLAD encoding;

挖掘难分负样本得到三元组，计算损失函数并反向传播更新网络权重系数，直至迭代训练结束；Mining difficult negative samples to obtain triplets, calculating the loss function and backpropagating to update the network weight coefficients until the end of the iterative training;

采用训练好的网络提取待测指静脉图像的局部聚合描述子。The trained network is used to extract the local aggregation descriptor of the finger vein image to be tested.

作为优选的技术方案，所述对指静脉图像进行预处理，具体步骤包括：As a preferred technical solution, said preprocessing the finger vein image, the specific steps include:

感兴趣区域的提取：提取指静脉训练图像的感兴趣区域，通过仿射变换完成手指倾斜校正；Extraction of the region of interest: extract the region of interest of the finger vein training image, and complete the finger tilt correction through affine transformation;

对感兴趣区域进行标准化处理，得到最终的指静脉训练样本图像；Standardize the region of interest to obtain the final finger vein training sample image;

根据感受野和图像的原始比例调整指静脉训练样本图像的尺寸。Adjust the size of finger vein training sample images according to the receptive field and the original scale of the image.

作为优选的技术方案，所述感兴趣区域的提取，具体步骤包括：As a preferred technical solution, the extraction of the region of interest, the specific steps include:

通过两个Sobel算子Mask_u,Mask_d分别检测指静脉训练图像的上下边缘，通过线性回归的方法拟合出手指的中线，并计算中线与水平方向所成的角度，通过仿射变换对指静脉训练图像进行旋转，完成倾斜校正，最后根据手指边缘截取外切矩形得到感兴趣区域，两个Sobel算子分别表示为：Two Sobel operators Mask _u and Mask _d are used to detect the upper and lower edges of the finger vein training image respectively, and the midline of the finger is fitted by the method of linear regression, and the angle formed by the midline and the horizontal direction is calculated, and the finger vein is adjusted by affine transformation. The vein training image is rotated, and the tilt correction is completed. Finally, the region of interest is obtained by intercepting the circumscribed rectangle according to the edge of the finger. The two Sobel operators are expressed as:

其中，Mask_u和Mask_d表示两个延长至3×9的Sobel算子。Among them, Mask _u and Mask _d represent two Sobel operators extended to 3×9.

作为优选的技术方案，所述将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码，具体步骤包括：As a preferred technical solution, the multi-channel feature map is combined with the clustering center vector in the VLAD encoding module to complete the VLAD encoding, and the specific steps include:

所述多通道特征图转化为w_out×h_out个维度为C_out的描述原始图像的局部描述子{x_i,i＝1,2,…,w_out×h_out}，并输入VLAD编码模块进行编码，计算K行 C_out列的矩阵V，在(k,j)位置的元素为：The multi-channel feature map is converted into w _out ×h _out local descriptors { _xi ,i=1,2,...,w _out ×h _out } that describe the original image with dimension C _out , and input to the VLAD encoding module To encode, calculate the matrix V of K rows and C _out columns, and the elements at (k, j) positions are:

其中，和分别表示第i个描述子x_i的第j个分量、第k个聚类中心c_k的第j 个分量，a_k(x_i)表示描述子x_i属于第k个聚类簇的概率，c_k′表示除第k个聚类中心向量以外的其他聚类中心向量；in, and represent the jth component of the i-th descriptor x _i and the j-th component of the k-th cluster center c _k respectively, a _k (xi ₎ represents the probability that the descriptor x _i belongs to the k-th cluster, c _k' represents other cluster center vectors except the kth cluster center vector;

将矩阵V展平成一维向量并实行L₂归一化，得到长度为K×C_out的局部聚合描述子。Flatten the matrix V into a one-dimensional vector and perform _L2 normalization to obtain a local aggregation descriptor with a length of K×C _out .

作为优选的技术方案，所述挖掘难分负样本得到三元组，具体步骤包括：As a preferred technical solution, the mining of indistinguishable negative samples to obtain triplets, the specific steps include:

选取两个局部聚合描述子和和的样本属于同一类别，构成一对正样本对；Select two local aggregation descriptors and and The samples belong to the same category, Constitute a pair of positive samples;

为每个正样本对选取一个来自其它类别的负样本构成三元组负样本使最小，其中，marg 表示设定的阈值参数。For each positive sample pair Pick a negative sample from another class form a triplet negative sample make Min, where marg represents the set threshold parameter.

作为优选的技术方案，所述计算损失函数并反向传播更新网络权重系数，对同批次的三元组计算损失函数的计算公式为：As a preferred technical solution, the calculation of the loss function and backpropagation updates the network weight coefficients, and the calculation formula for calculating the loss function for triples of the same batch is:

其中，m表示同批次图像的类别，n表示每个类别中的样本个数。Among them, m represents the category of the same batch of images, and n represents the number of samples in each category.

本发明还提供一种指静脉图像的深度局部聚合描述子提取系统，包括：基础网络模块构建单元、VLAD编码模块构建单元、聚类中心向量构建单元、训练单元和提取单元；The present invention also provides a deep local aggregation descriptor extraction system for finger vein images, including: a basic network module construction unit, a VLAD encoding module construction unit, a cluster center vector construction unit, a training unit and an extraction unit;

所述基础网络模块构建单元用于构建基础网络模块，所述基础网络模块用于提取指静脉图像的局部特征；The basic network module construction unit is used to build a basic network module, and the basic network module is used to extract local features of finger vein images;

所述VLAD编码模块构建单元用于构建VLAD编码模块，所述VLAD编码模块用于对基础网络模块得到的特征图进行VLAD编码；The VLAD encoding module construction unit is used to construct a VLAD encoding module, and the VLAD encoding module is used to perform VLAD encoding on the feature map obtained by the basic network module;

所述聚类中心向量构建单元用于设置K个聚类中心向量为网络的可训练参数；The cluster center vector construction unit is used to set K cluster center vectors as network trainable parameters;

所述训练单元用于输入指静脉图像分批对网络进行训练，所述训练单元包括：图像预处理模块、多通道特征图获取模块、结合编码模块、三元组构建模块和迭代更新模块；The training unit is used to input finger vein images to train the network in batches, and the training unit includes: an image preprocessing module, a multi-channel feature map acquisition module, a combination encoding module, a triplet construction module and an iterative update module;

所述图像预处理模块用于对指静脉图像进行预处理；The image preprocessing module is used to preprocess the finger vein image;

所述多通道特征图获取模块用于将指静脉图像样本通过基础网络模块得到多通道特征图；The multi-channel feature map acquisition module is used to obtain the multi-channel feature map through the basic network module through the finger vein image sample;

所述结合编码模块用于将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码；The combination encoding module is used to combine the multi-channel feature map in the VLAD encoding module to complete the VLAD encoding in combination with the clustering center vector;

所述三元组构建模块用于挖掘难分负样本得到三元组；The triplet building block is used to mine difficult negative samples to obtain triplets;

所述迭代更新模块用于计算损失函数并反向传播更新网络权重系数，直至迭代训练结束；The iterative update module is used to calculate the loss function and backpropagate to update the network weight coefficients until the end of the iterative training;

所述提取单元用于采用训练好的网络提取待测指静脉图像的局部聚合描述子。The extraction unit is used to extract the local aggregation descriptor of the finger vein image to be tested by using the trained network.

作为优选的技术方案，所述基础网络模块的结构采用6个串联的卷积模块，分别表示为Conv_i，i＝{1,2,3,4,5,6}，每个卷积模块包含3×3的Conv2d层、BN 层和Relu激活层，各卷积模块的卷积核的数量分别为32、32、64、64、128、 128，所有卷积层的填充都设为1，Conv_3和Conv_5的卷积步长设为2，Conv_1、 Conv_2、Conv_4和Conv_6均设为1。As a preferred technical solution, the structure of the basic network module adopts 6 concatenated convolution modules, respectively represented as Conv_i, i={1,2,3,4,5,6}, each convolution module contains 3 ×3 Conv2d layer, BN layer and Relu activation layer, the number of convolution kernels of each convolution module is 32, 32, 64, 64, 128, 128, the padding of all convolution layers is set to 1, Conv_3 and The convolution step of Conv_5 is set to 2, and Conv_1, Conv_2, Conv_4 and Conv_6 are all set to 1.

作为优选的技术方案，所有卷积层采用正交矩阵初始化，偏置固定为0，BN 层的权重和偏置分别固定为1和0。As a preferred technical solution, all convolutional layers are initialized with an orthogonal matrix, the bias is fixed at 0, and the weight and bias of the BN layer are fixed at 1 and 0, respectively.

作为优选的技术方案，所述VLAD编码模块在网络结构上设置一个1×1卷积层。As a preferred technical solution, the VLAD encoding module sets a 1×1 convolutional layer on the network structure.

本发明与现有技术相比，具有如下优点和有益效果：Compared with the prior art, the present invention has the following advantages and beneficial effects:

(1)本发明通过CNN网络端到端学习得到描述子，网络模型大小仅1.1M，所提取的描述子可进一步地用于指静脉验证和识别等任务，使用灵活，应用广泛。(1) The present invention obtains descriptors through CNN network end-to-end learning. The size of the network model is only 1.1M. The extracted descriptors can be further used for tasks such as finger vein verification and recognition, which are flexible and widely used.

(2)对于任意尺寸的指静脉图像，本发明通过网络自动学习得到K个聚类中心，将聚类中心向量串联构成表示指静脉图像特征的描述子向量，从而解决了不同尺寸指静脉图像之间的匹配问题，由于描述子向量是由聚类中心向量组成而非图像局部块的特征向量，因此在两幅指静脉图像存在空间位置差异的情况下仍能正确匹配，从而解决了同一类别指静脉图像之间因手指姿势差异造成的匹配失败问题。(2) For a finger vein image of any size, the present invention obtains K cluster centers through network automatic learning, and connects the cluster center vectors in series to form a descriptor vector representing the features of the finger vein image, thereby solving the problem of different sizes of finger vein images. Since the descriptor vector is composed of the cluster center vector rather than the feature vector of the local block of the image, it can still be matched correctly even if there is a difference in the spatial position of the two finger vein images, thus solving the problem of the same category index. Matching failure problem caused by difference in finger pose between vein images.

(3)本发明对指静脉图像的特征进行VLAD编码，在仅额外引入了1×1 卷积参数量的情况下，充分利用了特征图的信息，得到更具有表征力的指静脉图像描述子。(3) The present invention performs VLAD encoding on the features of the finger vein image, and only introduces an additional 1×1 convolution parameter, fully utilizes the information of the feature map, and obtains a more expressive finger vein image descriptor .

(4)本发明在网络训练期间构造三元组样本，减少对指静脉训练图像数量上的需求，同时保证用于训练的正负样本数量相等；在构造样本时采用难分负样本挖掘策略，加快网络收敛；采用三元组损失函数训练网络，促使网络着重学习不同指静脉图像之间的差异而非标签信息，提高方法的泛化性能。(4) The present invention constructs triplet samples during network training, reduces the demand on the number of finger vein training images, and ensures that the number of positive and negative samples used for training is equal; when constructing samples, it adopts a difficult-to-separate negative sample mining strategy, Speed up the network convergence; the triplet loss function is used to train the network, so that the network focuses on learning the differences between different finger vein images rather than label information, and improves the generalization performance of the method.

附图说明Description of drawings

图1为本实施例指静脉数据集的图像示例图；Fig. 1 is the image example diagram of the finger vein data set of the present embodiment;

图2为本实施例静脉数据集的划分方式示意图；Fig. 2 is the schematic diagram of the division mode of the vein data set of the present embodiment;

图3为本实施例网络模型分批训练的流程示意图；Fig. 3 is the schematic flow chart of batch training of the network model of the present embodiment;

图4为本实施例提取感兴趣区域得到的训练和测试图像示例图；FIG. 4 is an example diagram of training and test images obtained by extracting a region of interest in this embodiment;

图5为本实施例网络模型测试的流程示意图；Fig. 5 is a schematic flow chart of the network model test of the present embodiment;

图6为本实施例指静脉图像的深度局部聚合描述子提取系统的结构示意图；FIG. 6 is a schematic structural diagram of a deep local aggregation descriptor extraction system for finger vein images in this embodiment;

图7为本实施例基础网络模块的结构示意图；FIG. 7 is a schematic structural diagram of the basic network module of this embodiment;

图8为本实施例VLAD编码模块的结构示意图。FIG. 8 is a schematic structural diagram of a VLAD encoding module in this embodiment.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

实施例Example

本实施例在SDUMLA、FV-USM、MMCBNU_6000三个指静脉数据集上的训练和测试，SDUMLA数据集来自山东大学，指静脉图像来自106名受试者的 636个手指，从左右手食指、中指、无名指各采集6张灰度BMP图像，图像的分辨率为320×240；FV-USM数据集来自马来西亚大学，由123名受试者的左手、右手食指和中指静脉图像组成，数据库的图像来自两次不同实验，每根手指有 12张图像；MMCBNU_6000数据集来自韩国全北国立大学，由来自100名志愿者的手指静脉图像组成，每个手指图像重复采集10次，总共有6000张图像。In this example, training and testing are performed on three finger vein data sets of SDUMLA, FV-USM, and MMCBNU_6000. The SDUMLA data set comes from Shandong University, and the finger vein images come from 636 fingers of 106 subjects. Six gray-scale BMP images were collected for each ring finger, and the resolution of the images was 320×240; the FV-USM dataset was from the University of Malaysia, which consisted of vein images of the left and right index fingers and middle fingers of 123 subjects, and the images of the database came from two There are 12 images for each finger in different experiments; the MMCBNU_6000 dataset comes from Chonbuk National University in South Korea, which consists of finger vein images from 100 volunteers. Each finger image is collected 10 times, with a total of 6000 images.

如图1所示，图中给出了上述公开的指静脉数据集的图像示例，如图2所示，本实施例将MMCBNU_6000指静脉数据集的100个身份的手指图像按手指类别分为600类，每一类别包含10张样本图像，随机取出300个类别的图像作为训练集，则训练集总共有3000个样本，剩下的作为测试集，本实施例将测试集进一步划分为已注册模板库和待测样本集；As shown in Figure 1, the image example of the above-mentioned disclosed finger vein data set is given in the figure, as shown in Figure 2, the finger images of 100 identities of the MMCBNU_6000 finger vein data set are divided into 600 according to the finger category in this embodiment Classes, each category contains 10 sample images, and images of 300 categories are randomly selected as the training set, then the training set has a total of 3000 samples, and the rest are used as the test set. In this embodiment, the test set is further divided into registered templates Library and sample set to be tested;

本实施例主要基于深度学习框架Pytorch来实现，实验所用显卡为 GTX1080Ti，测试图像中提取的指静脉描述子用于识别和验证任务。This embodiment is mainly implemented based on the deep learning framework Pytorch. The graphics card used in the experiment is GTX1080Ti, and the finger vein descriptor extracted from the test image is used for identification and verification tasks.

本实施例提供一种指静脉图像的深度局部聚合描述子提取方法，包括下述步骤：This embodiment provides a method for extracting a deep local aggregation descriptor of a finger vein image, comprising the following steps:

设置K个聚类中心向量为网络的可训练参数：本实施例设置K个维度为C_out的聚类中心向量{c_k,k＝1,2,…,K}并对其进行均匀分布随机初始化，c_k通过网络学习确定，对于任意尺寸的指静脉图像，本实施例通过网络自动学习得到K个聚类中心，将聚类中心向量串联构成表示指静脉图像特征的描述子向量，其长度固定为128×K，从而解决了不同尺寸指静脉图像之间的匹配问题。特别地，由于描述子向量是由聚类中心向量组成而非图像局部块的特征向量，因此在两幅指静脉图像存在空间位置差异的情况下仍能正确匹配，从而解决了同一类别指静脉图像之间因手指姿势差异造成的匹配失败问题，在本实施例中，为使所提取的描述子既有较强的表征力又有较精简的形式，将聚类中心向量的数量K设置为8～15，本实施例优选地将K设置为10。Set K clustering center vectors as network trainable parameters: In this embodiment, K clustering center vectors {c _k ,k=1,2,...,K} whose dimension is C _out are set and uniformly distributed randomly Initialization, c _k is determined through network learning. For finger vein images of any size, this embodiment obtains K cluster centers through network automatic learning, and the cluster center vectors are concatenated to form a descriptor vector representing finger vein image features, whose length It is fixed to 128×K, thus solving the matching problem between finger vein images of different sizes. In particular, since the descriptor vector is composed of the cluster center vector rather than the feature vector of the local block of the image, it can still be correctly matched when there is a difference in the spatial position of the two finger vein images, thereby solving the problem of the same category of finger vein images. The problem of matching failures caused by differences in finger postures between . ~15, K is preferably set to 10 in this embodiment.

如图3所示，输入指静脉图像分批对网络进行训练，训练步骤包括：As shown in Figure 3, input finger vein images to train the network in batches, and the training steps include:

对指静脉图像进行预处理，用于训练的感兴趣区域通过将手指区域从背景中分割出来，并截取手指边缘的外切矩形区域得到，这样去除了背景噪声，同时尽可能多地保留了原始信息；The finger vein image is preprocessed, and the region of interest used for training is obtained by segmenting the finger area from the background and intercepting the circumscribed rectangular area of the finger edge, thus removing the background noise while retaining as much as possible of the original image. information;

在本实施例中，对指静脉图像进行预处理的具体步骤包括：In this embodiment, the specific steps of preprocessing the finger vein image include:

感兴趣区域的提取：通过两个Sobel算子Mask_u,Mask_d分别检测指静脉训练图像的上下边缘，通过线性回归的方法拟合出手指的中线，并计算中线与水平方向所成的角度，通过仿射变换对指静脉训练图像进行旋转，完成倾斜校正，最后根据手指最外侧的边缘点截取整个手指的外切矩形得到感兴趣区域，如图4 所示，根据感兴趣区域得到训练和测试图像；Extraction of the region of interest: use two Sobel operators Mask _u and Mask _d to detect the upper and lower edges of the finger vein training image respectively, fit the midline of the finger through linear regression, and calculate the angle formed by the midline and the horizontal direction, The finger vein training image is rotated by affine transformation to complete the tilt correction, and finally the circumscribed rectangle of the entire finger is intercepted according to the outermost edge point of the finger to obtain the region of interest, as shown in Figure 4, training and testing are obtained according to the region of interest image;

在本实施例中，两个Sobel算子分别表示为：In this embodiment, the two Sobel operators are represented as:

其中，Mask_u和Mask_d表示两个延长至3×9的Sobel算子；Among them, Mask _u and Mask _d represent two Sobel operators extended to 3×9;

根据感受野和图像的原始比例调整指静脉训练图像的尺寸，计算所有训练图像的平均宽高比为2，根据基础网络模块的结构参数可计算出，输出特征图每一个特征点的感受野为23×23的局部区域，将输入图像的高调整为h＝64，宽调整为w＝128，此时特征图中的每个特征点的感受野的高度约为输入图像高度的 1/3，能体现较为丰富的指静脉信息；Adjust the size of the finger vein training image according to the receptive field and the original ratio of the image, and calculate the average aspect ratio of all training images to be 2. According to the structural parameters of the basic network module, the receptive field of each feature point in the output feature map can be calculated as For a local area of 23×23, the height of the input image is adjusted to h=64, and the width is adjusted to w=128. At this time, the height of the receptive field of each feature point in the feature map is about 1/3 of the height of the input image. Can reflect richer finger vein information;

对每一幅训练图像进行减去均值除以方差的标准化处理，减少光照不均的影响，得到最终的指静脉训练样本图像；For each training image, the standardization process of subtracting the mean value and dividing the variance is performed to reduce the influence of uneven illumination and obtain the final finger vein training sample image;

构建训练批次采样器：按批次载入训练图像，每批随机选择m个类别，每类别载入n个样本，即总共m×n个样本，在本实施例中，m＝16，n＝6；Build a training batch sampler: load training images in batches, randomly select m categories in each batch, and load n samples in each category, that is, a total of m×n samples. In this embodiment, m=16, n = 6;

初始化网络结构中的可训练参数，初始化基础网络模块的参数，其中卷积层的权重用正交矩阵初始化，偏置固定为0，将BN层的权重和偏置固定为1和 0；初始化类中心向量，将描述子聚类数目K设为10，描述子聚类中心 c_k,k＝1,2,…,K，采用均匀分布随机初始化；初始化VLAD编码模块的可训练参数，将1×1卷积的参数根据聚类中心来初始化，其中卷积核的权重初始化为 w_k＝2c_k，偏置初始化为b_k＝-‖c_k‖²；Initialize the trainable parameters in the network structure, initialize the parameters of the basic network module, where the weight of the convolutional layer is initialized with an orthogonal matrix, the bias is fixed to 0, and the weight and bias of the BN layer are fixed to 1 and 0; the initialization class Center vector, the number of descriptor clusters K is set to 10, the descriptor cluster centers c _k ,k=1,2,…,K, randomly initialized with uniform distribution; the trainable parameters of the VLAD coding module are initialized, and 1× 1 The parameters of the convolution are initialized according to the cluster center, where the weight of the convolution kernel is initialized to w _k =2c _k , and the bias is initialized to b _k =-‖c _k ‖ ² ;

设置三元组损失的超参数marg为1；Set the hyperparameter marg of the triplet loss to 1;

设置模型迭代次数为200次，学习率固定为0.01，优化方法为Adam(Adaptivemoment estimation，自适应矩估计)，初始化最小验证损失和模型保存路径，用于后续保存验证损失最小的模型，设置模型初始迭代次数和初始样本批次为0，批次采样器的设置决定了算法训练的批尺寸batch size为16×6＝96；Set the number of model iterations to 200, the learning rate is fixed at 0.01, the optimization method is Adam (Adaptive moment estimation, adaptive moment estimation), initialize the minimum verification loss and the model saving path, and use it to save the model with the smallest verification loss later, set the initial model The number of iterations and the initial sample batch are 0, and the setting of the batch sampler determines that the batch size of the algorithm training is 16×6=96;

模型训练迭代次数加1，继续模型的训练，训练样本批次加1，开始载入一批样本或继续载入下一批样本；Increase the number of model training iterations by 1, continue the training of the model, increase the batch of training samples by 1, start loading a batch of samples or continue to load the next batch of samples;

根据批次采样器的设定，从训练集中载入96张预处理后的训练图像；Load 96 preprocessed training images from the training set according to the settings of the batch sampler;

指静脉图像训练样本图像通过基础网络模块的6层3×3的卷积模块后，因其中有两层卷积的步长为2，实现了池化的效果，最后得到128通道尺寸为16×32 像素的特征图；After the finger vein image training sample image passes through the 6-layer 3×3 convolution module of the basic network module, because there are two layers of convolution with a step size of 2, the effect of pooling is realized, and finally the 128-channel size is 16× 32-pixel feature map;

将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码，设置K个维度为C_out的聚类中心向量{c_k,k＝1,2,…,K}并对其进行均匀分布随机初始化，c_k通过网络学习确定；The multi-channel feature map is combined with the cluster center vector in the VLAD encoding module to complete the VLAD encoding, and K cluster center vectors {c _k ,k=1,2,…,K} whose dimension is C _out are set and uniformed The distribution is randomly initialized, and c _k is determined through network learning;

VLAD编码具体步骤包括：The specific steps of VLAD encoding include:

将分辨率为w_out×h_out的C_out通道特征图转化为w_out×h_out个维度为C_out的描述原始图像的局部描述子{x_i,i＝1,2,…,w_out×h_out}，本实施例具体表示为： 16×32个128维的局部描述子，用{x_i,i＝1,2,…,512}表示，通过VLAD编码模块后得到1280维的局部聚合描述子；Transform the C _out channel feature map with resolution w _out ×h _out into w _out ×h _out local descriptors describing the original image with dimension C _out { _xi ,i=1,2,...,w _out × h _out }, this embodiment is specifically expressed as: 16×32 128-dimensional local descriptors, represented by { _xi ,i=1,2,...,512}, and a 1280-dimensional local aggregation is obtained after passing through the VLAD coding module descriptor;

并输入VLAD编码模块进行编码，即计算K行C_out列的矩阵V，在(k,j)位置的元素为：And input the VLAD encoding module for encoding, that is, calculate the matrix V of K rows and C _out columns, and the elements at (k, j) positions are:

将矩阵V展平成一维向量并实行L₂归一化，得到长度为K×C_out的局部聚合描述子；Flatten the matrix V into a one-dimensional vector and perform _L2 normalization to obtain a local aggregation descriptor with a length of K×C _out ;

对同一批次训练图像经过基础网络模块、VLAD编码模块的处理，共得到m×n个局部聚合描述子其中表示该批次中第i个训练样本的局部聚合描述子，本实施例的一个批次96个训练样本通过网络得到96 个描述子，组成矩阵其中表示每一幅训练图像经过基础网络模块、VLAD编码模块的处理后得到的描述子，通过矩阵计算得到两两描述子之间的欧氏距离矩阵的(i,j)位置的元素矩阵大小为96×96，对角线的元素为0；The same batch of training images is processed by the basic network module and the VLAD encoding module, and a total of m×n local aggregation descriptors are obtained in Indicates the local aggregation descriptor of the i-th training sample in the batch. A batch of 96 training samples in this embodiment obtains 96 descriptors through the network to form a matrix in Represents the descriptor obtained after each training image is processed by the basic network module and the VLAD encoding module, and the element of the (i,j) position of the Euclidean distance matrix between two descriptors is obtained through matrix calculation The size of the matrix is 96×96, and the elements of the diagonal are 0;

挖掘难分负样本得到三元组，选取两个局部聚合描述子和和的样本属于同一类别，构成一对正样本对；Mining difficult negative samples to obtain triplets, select two local aggregation descriptors and and The samples belong to the same category, Constitute a pair of positive samples;

为每个正样本对选取一个来自其它类别的负样本构成三元组负样本使最小，其中，marg 表示设定的阈值参数，在本实施例中，在计算同类描述子和异类描述子的距离之差时将阈值marg设为1，能较好地区分同类和异类描述子；For each positive sample pair Pick a negative sample from another class form a triplet negative sample make Minimum, where marg represents the set threshold parameter. In this embodiment, the threshold marg is set to 1 when calculating the distance difference between similar descriptors and heterogeneous descriptors, which can better distinguish similar and heterogeneous descriptors;

计算损失函数并反向传播更新网络权重系数，判断是否对所有样本都完成了一次训练，如果完成一次训练则进入验证损失步骤，否则返回继续训练；Calculate the loss function and backpropagate to update the network weight coefficients to determine whether a training has been completed for all samples. If a training is completed, enter the verification loss step, otherwise return to continue training;

判断当前的验证损失是否小于最小损失，如果是则保存模型或更新已保存的模型，更新最小损失的值，否则进入判断是否迭代完成所设次数；Judging whether the current verification loss is less than the minimum loss, if so, save the model or update the saved model, and update the value of the minimum loss, otherwise enter to judge whether to complete the set number of iterations;

判断是否完成了200次迭代，如果是，则结束训练，否则转到模型训练迭代步骤，即模型训练迭代次数加1，继续模型的训练，训练样本批次加1，开始载入一批样本或继续载入下一批样本；Determine whether 200 iterations have been completed, if so, end the training, otherwise go to the model training iteration step, that is, increase the number of model training iterations by 1, continue the model training, increase the training sample batch by 1, and start loading a batch of samples or Continue loading the next batch of samples;

在本实施例中，对同批次的三元组计算损失函数的计算公式为：In this embodiment, the calculation formula for calculating the loss function for triplets in the same batch is:

其中，m表示同批次图像的类别，n表示每个类别中的样本个数；本实施例在网络训练期间构造三元组样本，减少对指静脉训练图像数量上的需求同时保证用于训练的正负样本数量相等；在构造样本时使用难分负样本挖掘策略，加快网络收敛；使用三元组损失函数训练网络，促使网络着重学习不同指静脉图像之间的差异而非标签信息，提高方法的泛化性能。Among them, m represents the category of the same batch of images, and n represents the number of samples in each category; this embodiment constructs triplet samples during network training, reducing the demand on the number of finger vein training images and ensuring that they are used for training The number of positive and negative samples is equal; when constructing samples, the indistinguishable negative sample mining strategy is used to speed up network convergence; the triple loss function is used to train the network, so that the network focuses on learning the differences between different finger vein images rather than label information, improving generalization performance of the method.

如图5所示，采用训练好的网络提取待测指静脉图像的局部聚合描述子，测试阶段的网络结构与训练阶段的网络结构相同，具体步骤如下：As shown in Figure 5, the trained network is used to extract the local aggregation descriptor of the finger vein image to be tested. The network structure of the test phase is the same as that of the training phase. The specific steps are as follows:

对待测指静脉图像实施与网络训练时一样的图像预处理步骤，包括感兴趣区域提取和标准化，以及尺寸调整，图像预处理后将测试集所有图像尺寸调整为64×128，将预处理后的指静脉图像输入训练好的网络，即可得到其深度局部聚合描述子，所得描述子可进一步用于指静脉的识别或验证。The finger vein images to be tested are subjected to the same image preprocessing steps as in network training, including region of interest extraction and standardization, and size adjustment. After image preprocessing, all images in the test set are resized to 64×128, and the preprocessed Input the finger vein image into the trained network to obtain its deep local aggregation descriptor, which can be further used for finger vein recognition or verification.

在本实施例中，分别测试指静脉识别和验证任务；In this embodiment, finger vein recognition and verification tasks are tested respectively;

对于指静脉识别任务：For the finger vein recognition task:

将已注册模板库的所有图像输入网络得到特征描述子；Input all images of the registered template library into the network to obtain feature descriptors;

对于待测样本集中的每幅图像分别通过网络得到描述子，计算描述子与所有注册模板的特征描述子之间的欧氏距离；For each image in the sample set to be tested, the descriptor is obtained through the network, and the Euclidean distance between the descriptor and the feature descriptors of all registered templates is calculated;

对欧氏距离进行排序，识别当前待测样本为与之欧氏距离最小的已注册模板；Sort the Euclidean distance, and identify the current sample to be tested as the registered template with the smallest Euclidean distance;

对于指静脉验证任务：For the finger vein verification task:

通过使用已注册模板库进行测试，选择合适的二分类阈值为1；By using the registered template library for testing, select an appropriate binary classification threshold of 1;

对于每一个待测样本分别与来自测试集的同类别的其他样本组成正样本对，随机选择等量的异类样本组成负样本对，组成正负样本对共300×5×5×2＝15000 对；For each sample to be tested and other samples of the same category from the test set to form a positive sample pair, randomly select the same amount of heterogeneous samples to form a negative sample pair, forming a total of 300×5×5×2=15000 pairs of positive and negative samples ;

将样本对一一通过网络得到描述子，计算两两描述子之间的欧氏距离；Get the descriptors by passing the sample pairs one by one through the network, and calculate the Euclidean distance between any pair of descriptors;

若两个描述子之间的欧氏距离低于1，表明两个样本为相同类别，验证成功，否则验证失败；If the Euclidean distance between the two descriptors is lower than 1, it indicates that the two samples belong to the same category, and the verification is successful, otherwise the verification fails;

最终保存测试结果，本实施例采用深度局部聚合描述子，无需对不同手指姿势的指静脉图像进行特殊处理，可直接通过计算欧氏距离、余弦相似度等简单度量方法实现匹配识别。Finally, the test results are saved. In this embodiment, deep local aggregation descriptors are used, and no special processing is required for finger vein images of different finger poses. Matching and recognition can be realized directly by calculating simple measurement methods such as Euclidean distance and cosine similarity.

在本实施例中，对于其他两个公开数据库的测试方法与上述步骤基本相同。如下表1所示，本实施例方法测试指静脉1:1验证的实验结果所涉及的评价指标是EER(EqualError Rate)，即FAR(False Accept Rate)与FRR(False Reject Rate) 相等时的FAR值，实验中取|FAR-FRR|<0.0001时的FAR值作为EER。In this embodiment, the testing methods for the other two public databases are basically the same as the above steps. As shown in Table 1 below, the evaluation index involved in the experimental results of the 1:1 verification of the method of this embodiment is EER (EqualError Rate), that is, the FAR when FAR (False Accept Rate) is equal to FRR (False Reject Rate) In the experiment, the FAR value when |FAR-FRR|<0.0001 is taken as EER.

表1测试指静脉1:1验证EER结果表Table 1 Test finger vein 1:1 verification EER result table

the SDUMLASDUMLA FV-USMFV-USM MMCBNU_6000MMCBNU_6000 EEREER 0.95％0.95% 0.38％0.38% 0.10％ 0.10%

如下表2所示，本实施例指静脉1:N识别的实验结果，涉及的评价指标是IR(k)：As shown in Table 2 below, this embodiment refers to the experimental results of vein 1:N recognition, and the evaluation index involved is IR(k):

其中，B表示所有待测样本的集合，b为某个待测样本，rank(b)是待测样本与已注册模板库中同类别样本的相似度的排序，U_B表示待测样本集的数量。 IR(1)表示测试样本中与模板库中同一类别的样本的相似度排在第一位的样本占所有测试样本的比例。Among them, B represents the set of all samples to be tested, b is a certain sample to be tested, rank(b) is the ranking of the similarity between the samples to be tested and the samples of the same category in the registered template library, U _B represents the rank of the sample set to be tested quantity. IR(1) represents the ratio of the sample whose similarity ranks first to the samples of the same category in the template library in the test sample to all the test samples.

表2测试指静脉1:N识别IR(k)结果表Table 2 Test finger vein 1:N recognition IR(k) result table

the SDUMLASDUMLA FV-USMFV-USM MMCBNU_6000MMCBNU_6000 IR(k)IR(k) 99.50％99.50% 99.87％99.87% 100％ 100%

如下表3所示，本实施例模型的参数量和在CPU上测试的指静脉识别相关时间。As shown in Table 3 below, the parameters of the model in this embodiment and the correlation time of finger vein recognition tested on the CPU.

表3模型大小和时间耗费表Table 3 Model size and time consumption table

模型大小model size 特征提取时间feature extraction time 欧氏距离计算时间Euclidean distance calculation time 1.1M1.1M 0.0144s0.0144s 0.00037s 0.00037s

从上表1-表3可以看出，本实施例所提出的网络在指静脉识别和验证两种任务上均具有有效性，且网络模型仅1.1M，特征提取时间短，在本实施例中描述子维度为1280的情况下，相似度的计算耗费时间短；本实施例对指静脉图像的特征进行VLAD编码，在仅额外引入了1×1卷积参数量的情况下，充分利用了特征图的信息，得到更具有表征力的指静脉图像描述子。It can be seen from the above table 1-table 3 that the network proposed in this embodiment is effective in both finger vein recognition and verification tasks, and the network model is only 1.1M, and the feature extraction time is short. In this embodiment When the description sub-dimension is 1280, the calculation of the similarity takes a short time; this embodiment performs VLAD encoding on the features of the finger vein image, and makes full use of the features when only an additional 1×1 convolution parameter is introduced. Figure information, get a more expressive finger vein image descriptor.

如图6所示，本实施例还提供一种指静脉图像的深度局部聚合描述子提取系统，包括：基础网络模块构建单元、VLAD编码模块构建单元、聚类中心向量构建单元、训练单元和提取单元；As shown in Figure 6, this embodiment also provides a system for extracting deep local aggregated descriptors of finger vein images, including: a basic network module construction unit, a VLAD encoding module construction unit, a cluster center vector construction unit, a training unit and an extraction unit. unit;

在本实施例中，基础网络模块构建单元用于构建基础网络模块，所述基础网络模块用于提取指静脉图像的局部特征；VLAD编码模块构建单元用于构建 VLAD编码模块，VLAD编码模块用于对基础网络模块得到的特征图进行VLAD 编码；聚类中心向量构建单元用于设置K个聚类中心向量为网络的可训练参数；训练单元用于输入指静脉图像分批对网络进行训练，提取单元用于采用训练好的网络提取待测指静脉图像的局部聚合描述子；In this embodiment, the basic network module construction unit is used to construct the basic network module, and the basic network module is used to extract the local features of the finger vein image; the VLAD encoding module construction unit is used to construct the VLAD encoding module, and the VLAD encoding module is used for Perform VLAD encoding on the feature map obtained by the basic network module; the cluster center vector construction unit is used to set K cluster center vectors as network trainable parameters; the training unit is used to input finger vein images to train the network in batches, extract The unit is used to extract the local aggregation descriptor of the finger vein image to be tested by using the trained network;

如图7所示，基础网络模块的结构采用6个串联的卷积模块，分别表示为 Conv_i，i＝{1,2,3,4,5,6}，每个卷积模块包含3×3的Conv2d层、BN层和Relu激活层，各卷积模块的卷积核的数量分别为32、32、64、64、128、128，所有卷积层的填充都设为1，Conv_3和Conv_5的卷积步长设为2，Conv_1、Conv_2、 Conv_4和Conv_6均设为1，所有卷积层采用正交矩阵初始化，偏置固定为0，且不更新BN层的权重和偏置，将BN层的权重和偏置分别固定为1和0，能减少模型训练参数，同时对结果影响很小；As shown in Figure 7, the structure of the basic network module adopts 6 convolution modules connected in series, denoted as Conv_i, i={1,2,3,4,5,6}, each convolution module contains 3×3 The Conv2d layer, BN layer, and Relu activation layer of the Conv2d layer, the number of convolution kernels of each convolution module are 32, 32, 64, 64, 128, and 128, and the padding of all convolutional layers is set to 1. Conv_3 and Conv_5 The convolution step size is set to 2, Conv_1, Conv_2, Conv_4 and Conv_6 are all set to 1, all convolutional layers are initialized with an orthogonal matrix, the bias is fixed at 0, and the weight and bias of the BN layer are not updated, the BN layer The weight and bias of are fixed at 1 and 0 respectively, which can reduce model training parameters and have little effect on the results;

如图8所示，VLAD编码模块在网络结构上设置一个1×1卷积层，其中权重w_k＝2c_k，偏置b_k＝-‖c_k‖²，用于表示简化后的a_k(x_i)：As shown in Figure 8, the VLAD encoding module sets a 1×1 convolutional layer on the network structure, where the weight w _k =2c _k and the bias b _k =-‖c _k ‖ ² are used to represent the simplified a _k ( _xi ):

在本实施例中，训练单元包括：图像预处理模块、多通道特征图获取模块、结合编码模块、三元组构建模块和迭代更新模块；In this embodiment, the training unit includes: an image preprocessing module, a multi-channel feature map acquisition module, a combined encoding module, a triplet construction module and an iterative update module;

在本实施例中，图像预处理模块用于对指静脉图像进行预处理；多通道特征图获取模块用于将指静脉图像样本通过基础网络模块得到多通道特征图；结合编码模块用于将多通道特征图在VLAD编码模块中结合聚类中心向量完成 VLAD编码；三元组构建模块用于挖掘难分负样本得到三元组；迭代更新模块用于计算损失函数并反向传播更新网络权重系数，直至迭代训练结束。In this embodiment, the image preprocessing module is used to preprocess the finger vein image; the multi-channel feature map acquisition module is used to pass the finger vein image sample through the basic network module to obtain a multi-channel feature map; the combined coding module is used to combine the multi-channel feature map The channel feature map is combined with the clustering center vector in the VLAD encoding module to complete the VLAD encoding; the triplet construction module is used to mine difficult negative samples to obtain triplets; the iterative update module is used to calculate the loss function and backpropagate to update the network weight coefficients , until the end of the iterative training.

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.

Claims

1. a depth local aggregation descriptor extraction method of finger vein image, is characterized in that, comprises the following steps:

Build a basic network module for extracting local features of finger vein images;

Construct a VLAD encoding module, which is used to perform VLAD encoding on the feature map obtained by the basic network module;

The multi-channel feature map is combined with the cluster center vector in the VLAD encoding module to complete the VLAD encoding. The specific steps include:

The multi-channel feature map is converted into w _out × h _out local descriptors { _xi , i=1, 2, ..., w _out × h _out } that describe the original image with dimension C _out , and input into VLAD The encoding module encodes and calculates the matrix V of K rows and C _out columns, and the elements at (k, j) positions are:

in, and represent the jth component of the i-th descriptor x _i and the j-th component of the k-th cluster center c _k respectively, a _k (xi ₎ represents the probability that the descriptor x _i belongs to the k-th cluster, c _k' represents other cluster center vectors except the kth cluster center vector;

Flatten the matrix V into a one-dimensional vector and perform _L2 normalization to obtain a local aggregation descriptor with a length of K×C _out ;

Set the K cluster center vectors as the trainable parameters of the network;

Input finger vein images in batches to train the network, the training steps include:

Preprocessing the finger vein image;

The preprocessed finger vein image obtains a multi-channel feature map through the basic network module;

The multi-channel feature map is combined with the cluster center vector in the VLAD encoding module to complete the VLAD encoding;

Mining difficult negative samples to obtain triplets, calculating the loss function and backpropagating to update the network weight coefficients until the end of the iterative training;

The trained network is used to extract the local aggregation descriptor of the finger vein image to be tested.

2. the depth local aggregation descriptor extracting method of finger vein image according to claim 1, is characterized in that, described finger vein image is preprocessed, and concrete steps comprise:

Extraction of the region of interest: extract the region of interest of the finger vein training image, and complete the finger tilt correction through affine transformation;

Standardize the region of interest to obtain the final finger vein training sample image;

Adjust the size of finger vein training sample images according to the receptive field and the original scale of the image.

3. the depth local aggregation descriptor extraction method of finger vein image according to claim 2, is characterized in that, the extraction of described region of interest, concrete steps comprise:

Two Sobel operators Mask _u and Mask _d are used to detect the upper and lower edges of the finger vein training image respectively, and the midline of the finger is fitted by the method of linear regression, and the angle formed by the midline and the horizontal direction is calculated, and the finger vein is adjusted by affine transformation. The vein training image is rotated, and the tilt correction is completed. Finally, the region of interest is obtained by intercepting the circumscribed rectangle according to the edge of the finger. The two Sobel operators are expressed as:

Among them, Mask _u and Mask _d represent two Sobel operators extended to 3×9.

4. the depth local aggregation descriptor extraction method of finger vein image according to claim 1, is characterized in that, described digging indistinguishable negative sample obtains triplet, and concrete steps comprise:

Select two local aggregation descriptors and and The samples belong to the same category, Constitute a pair of positive samples;

For each positive sample pair Pick a negative sample from another class form a triplet negative sample make Minimum, where marg represents the set threshold parameter.

5. The method for extracting depth local aggregation descriptors of finger vein images according to claim 4, wherein the calculation loss function is backpropagated to update the network weight coefficients, and the loss function is calculated for triplets of the same batch The calculation formula is:

Among them, m represents the category of the same batch of images, and n represents the number of samples in each category.

6. A system for extracting depth local aggregation descriptors of finger vein images, comprising: a basic network module construction unit, a VLAD coding module construction unit, a cluster center vector construction unit, a training unit and an extraction unit;

The basic network module construction unit is used to build a basic network module, and the basic network module is used to extract local features of finger vein images;

The VLAD encoding module construction unit is used to construct a VLAD encoding module, and the VLAD encoding module is used to perform VLAD encoding on the feature map obtained by the basic network module;

The cluster center vector construction unit is used to set K cluster center vectors as network trainable parameters;

The training unit is used to input finger vein images to train the network in batches, and the training unit includes: an image preprocessing module, a multi-channel feature map acquisition module, a combination encoding module, a triplet construction module and an iterative update module;

The image preprocessing module is used to preprocess the finger vein image;

The multi-channel feature map acquisition module is used to obtain the multi-channel feature map through the basic network module through the finger vein image sample;

The combination encoding module is used to combine the multi-channel feature map in the VLAD encoding module to complete the VLAD encoding in combination with the clustering center vector;

The triplet building block is used to mine difficult negative samples to obtain triplets;

The iterative update module is used to calculate the loss function and backpropagate to update the network weight coefficients until the end of the iterative training;

The extraction unit is used to extract the local aggregation descriptor of the finger vein image to be tested by using the trained network.

7. the deep local aggregation descriptor extraction system of finger vein image according to claim 6, is characterized in that, the structure of described basic network module adopts 6 convolution modules connected in series, is expressed as Conv_i respectively, i={1 , 2, 3, 4, 5, 6}, each convolution module contains 3×3 Conv2d layer, BN layer and Relu activation layer, and the number of convolution kernels of each convolution module is 32, 32, 64, 64, 128, 128, the padding of all convolutional layers is set to 1, the convolution stride of Conv_3 and Conv_5 is set to 2, and Conv_1, Conv_2, Conv_4 and Conv_6 are all set to 1.

8. the depth local aggregation descriptor extraction system of finger vein image according to claim 7, is characterized in that, all convolution layers adopt orthogonal matrix initialization, bias is fixed to 0, and the weight of BN layer and bias are respectively fixed for 1 and 0.

9. The deep local aggregation descriptor extraction system of finger vein images according to claim 6 or 7, wherein the VLAD encoding module is provided with a 1×1 convolutional layer on the network structure.