CN111274915B - Deep local aggregation descriptor extraction method and system for finger vein image - Google Patents
Deep local aggregation descriptor extraction method and system for finger vein image Download PDFInfo
- Publication number
- CN111274915B CN111274915B CN202010050908.9A CN202010050908A CN111274915B CN 111274915 B CN111274915 B CN 111274915B CN 202010050908 A CN202010050908 A CN 202010050908A CN 111274915 B CN111274915 B CN 111274915B
- Authority
- CN
- China
- Prior art keywords
- finger vein
- module
- vlad
- image
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 210000003462 vein Anatomy 0.000 title claims abstract description 129
- 230000002776 aggregation Effects 0.000 title claims abstract description 37
- 238000004220 aggregation Methods 0.000 title claims abstract description 37
- 238000000605 extraction Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 72
- 239000013598 vector Substances 0.000 claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 238000005065 mining Methods 0.000 claims abstract description 7
- 238000010276 construction Methods 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 8
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 abstract description 7
- 238000009826 distribution Methods 0.000 abstract description 3
- 238000012512 characterization method Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 17
- 238000012795 verification Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/1347—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/14—Vascular patterns
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
Description
技术领域technical field
本发明涉及手指静脉特征提取技术领域,具体涉及一种指静脉图像的深度局部聚合描述子提取方法及系统。The invention relates to the technical field of finger vein feature extraction, in particular to a method and system for extracting deep local aggregation descriptors of finger vein images.
背景技术Background technique
指静脉识别技术是新一代生物识别技术,相比于传统的生物识别技术,指静脉识别具有非接触式采集、活体检测、设备成本低等优点。指静脉识别通过红外线CCD摄像头获取手指图像,并通过提取指静脉相关特征用于身份认证和识别。采集到的指静脉图像往往具有噪声干扰,如何提取指静脉图像的鲁棒特征是指静脉识别技术中的一个研究重点,传统的特征描述子如LBP(Local Binary Pattern)、LDC(Local Directional Code)等的表示能力受图像质量影响较大,保留了空间信息的特征图则需要复杂的模板匹配进行识别。Finger vein recognition technology is a new generation of biometric technology. Compared with traditional biometric technology, finger vein recognition has the advantages of non-contact collection, live detection, and low equipment cost. Finger vein recognition acquires finger images through infrared CCD cameras, and extracts finger vein-related features for identity authentication and identification. The collected finger vein images often have noise interference. How to extract robust features of finger vein images is a research focus in vein recognition technology. Traditional feature descriptors such as LBP (Local Binary Pattern) and LDC (Local Directional Code) The representation ability of such as is greatly affected by the image quality, and the feature map that retains the spatial information requires complex template matching for recognition.
近年来,针对指静脉识别技术提出了多种基于深度学习的解决方案,针对指静脉验证问题,基于图像分割的思想对指静脉图像做像素级分类,该方法分类速度较慢,难以适用于实际使用场景,并且基于卷积网络的方法使用现有的图像分类模型,模型体积较大。此外,所提取的特征对于手指姿势较敏感,对于不同手指姿势的指静脉图像,需要采用复杂的预处理或模板匹配方案来进行识别。因此,研究轻量级的卷积网络模型,提取指静脉图像对手指姿势变化更鲁棒的特征描述子,在实际应用中更具优势。In recent years, a variety of deep learning-based solutions have been proposed for finger vein recognition technology. For the finger vein verification problem, the pixel-level classification of finger vein images is based on the idea of image segmentation. This method is slow in classification and difficult to apply in practice. The scene is used, and the method based on the convolutional network uses the existing image classification model, and the model size is large. In addition, the extracted features are sensitive to finger poses, and complex preprocessing or template matching schemes are required for recognition of finger vein images with different finger poses. Therefore, it is more advantageous in practical applications to study lightweight convolutional network models and extract feature descriptors that are more robust to finger pose changes in finger vein images.
发明内容Contents of the invention
为了克服现有技术存在的缺陷与不足,本发明提供一种指静脉图像的深度局部聚合描述子提取方法及系统,将中心向量串联得到维度固定且与原始图像块分布顺序无关的描述子,解决了不同尺寸指静脉图像之间的匹配问题,以及同一类别指静脉图像之间因手指姿势差异造成的匹配失败问题,在此基础上,进一步对描述子进行VLAD编码得到了更具有表征力的深度局部聚合描述子,在指静脉识别和验证任务中均具有优秀的表现,本发明的网络模型大小仅1.1M,能更好满足工程应用中对轻量化的需求。In order to overcome the defects and deficiencies existing in the prior art, the present invention provides a method and system for extracting depth local aggregated descriptors of finger vein images. The central vectors are concatenated to obtain descriptors with fixed dimensions and independent of the distribution order of the original image blocks, which solves the problem of The matching problem between finger vein images of different sizes and the matching failure problem caused by finger pose differences between finger vein images of the same category are solved. On this basis, the descriptor is further encoded by VLAD to obtain a more expressive depth The local aggregation descriptor has excellent performance in finger vein recognition and verification tasks. The size of the network model of the present invention is only 1.1M, which can better meet the demand for lightweight in engineering applications.
为了达到上述目的,本发明采用以下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
本发明提供一种指静脉图像的深度局部聚合描述子提取方法,包括下述步骤:The present invention provides a method for extracting depth local aggregation descriptors of finger vein images, comprising the following steps:
构建基础网络模块,用于提取指静脉图像的局部特征;Build a basic network module for extracting local features of finger vein images;
构建VLAD编码模块,用于对基础网络模块得到的特征图进行VLAD编码;Construct a VLAD encoding module, which is used to perform VLAD encoding on the feature map obtained by the basic network module;
设置K个聚类中心向量为网络的可训练参数;Set the K cluster center vectors as the trainable parameters of the network;
输入指静脉图像分批对网络进行训练,训练步骤包括:Input finger vein images in batches to train the network, the training steps include:
对指静脉图像进行预处理;Preprocessing the finger vein image;
指静脉图像样本通过基础网络模块得到多通道特征图;Finger vein image samples get multi-channel feature maps through the basic network module;
将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码;The multi-channel feature map is combined with the cluster center vector in the VLAD encoding module to complete the VLAD encoding;
挖掘难分负样本得到三元组,计算损失函数并反向传播更新网络权重系数,直至迭代训练结束;Mining difficult negative samples to obtain triplets, calculating the loss function and backpropagating to update the network weight coefficients until the end of the iterative training;
采用训练好的网络提取待测指静脉图像的局部聚合描述子。The trained network is used to extract the local aggregation descriptor of the finger vein image to be tested.
作为优选的技术方案,所述对指静脉图像进行预处理,具体步骤包括:As a preferred technical solution, said preprocessing the finger vein image, the specific steps include:
感兴趣区域的提取:提取指静脉训练图像的感兴趣区域,通过仿射变换完成手指倾斜校正;Extraction of the region of interest: extract the region of interest of the finger vein training image, and complete the finger tilt correction through affine transformation;
对感兴趣区域进行标准化处理,得到最终的指静脉训练样本图像;Standardize the region of interest to obtain the final finger vein training sample image;
根据感受野和图像的原始比例调整指静脉训练样本图像的尺寸。Adjust the size of finger vein training sample images according to the receptive field and the original scale of the image.
作为优选的技术方案,所述感兴趣区域的提取,具体步骤包括:As a preferred technical solution, the extraction of the region of interest, the specific steps include:
通过两个Sobel算子Masku,Maskd分别检测指静脉训练图像的上下边缘,通过线性回归的方法拟合出手指的中线,并计算中线与水平方向所成的角度,通过仿射变换对指静脉训练图像进行旋转,完成倾斜校正,最后根据手指边缘截取外切矩形得到感兴趣区域,两个Sobel算子分别表示为:Two Sobel operators Mask u and Mask d are used to detect the upper and lower edges of the finger vein training image respectively, and the midline of the finger is fitted by the method of linear regression, and the angle formed by the midline and the horizontal direction is calculated, and the finger vein is adjusted by affine transformation. The vein training image is rotated, and the tilt correction is completed. Finally, the region of interest is obtained by intercepting the circumscribed rectangle according to the edge of the finger. The two Sobel operators are expressed as:
其中,Masku和Maskd表示两个延长至3×9的Sobel算子。Among them, Mask u and Mask d represent two Sobel operators extended to 3×9.
作为优选的技术方案,所述将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码,具体步骤包括:As a preferred technical solution, the multi-channel feature map is combined with the clustering center vector in the VLAD encoding module to complete the VLAD encoding, and the specific steps include:
所述多通道特征图转化为wout×hout个维度为Cout的描述原始图像的局部描述子{xi,i=1,2,…,wout×hout},并输入VLAD编码模块进行编码,计算K行 Cout列的矩阵V,在(k,j)位置的元素为:The multi-channel feature map is converted into w out ×h out local descriptors { xi ,i=1,2,...,w out ×h out } that describe the original image with dimension C out , and input to the VLAD encoding module To encode, calculate the matrix V of K rows and C out columns, and the elements at (k, j) positions are:
其中,和分别表示第i个描述子xi的第j个分量、第k个聚类中心ck的第j 个分量,ak(xi)表示描述子xi属于第k个聚类簇的概率,ck′表示除第k个聚类中心向量以外的其他聚类中心向量;in, and represent the jth component of the i-th descriptor x i and the j-th component of the k-th cluster center c k respectively, a k (xi ) represents the probability that the descriptor x i belongs to the k-th cluster, c k' represents other cluster center vectors except the kth cluster center vector;
将矩阵V展平成一维向量并实行L2归一化,得到长度为K×Cout的局部聚合描述子。Flatten the matrix V into a one-dimensional vector and perform L2 normalization to obtain a local aggregation descriptor with a length of K×C out .
作为优选的技术方案,所述挖掘难分负样本得到三元组,具体步骤包括:As a preferred technical solution, the mining of indistinguishable negative samples to obtain triplets, the specific steps include:
选取两个局部聚合描述子和 和的样本属于同一类别,构成一对正样本对;Select two local aggregation descriptors and and The samples belong to the same category, Constitute a pair of positive samples;
为每个正样本对选取一个来自其它类别的负样本构成三元组负样本使最小,其中,marg 表示设定的阈值参数。For each positive sample pair Pick a negative sample from another class form a triplet negative sample make Min, where marg represents the set threshold parameter.
作为优选的技术方案,所述计算损失函数并反向传播更新网络权重系数,对同批次的三元组计算损失函数的计算公式为:As a preferred technical solution, the calculation of the loss function and backpropagation updates the network weight coefficients, and the calculation formula for calculating the loss function for triples of the same batch is:
其中,m表示同批次图像的类别,n表示每个类别中的样本个数。Among them, m represents the category of the same batch of images, and n represents the number of samples in each category.
本发明还提供一种指静脉图像的深度局部聚合描述子提取系统,包括:基础网络模块构建单元、VLAD编码模块构建单元、聚类中心向量构建单元、训练单元和提取单元;The present invention also provides a deep local aggregation descriptor extraction system for finger vein images, including: a basic network module construction unit, a VLAD encoding module construction unit, a cluster center vector construction unit, a training unit and an extraction unit;
所述基础网络模块构建单元用于构建基础网络模块,所述基础网络模块用于提取指静脉图像的局部特征;The basic network module construction unit is used to build a basic network module, and the basic network module is used to extract local features of finger vein images;
所述VLAD编码模块构建单元用于构建VLAD编码模块,所述VLAD编码模块用于对基础网络模块得到的特征图进行VLAD编码;The VLAD encoding module construction unit is used to construct a VLAD encoding module, and the VLAD encoding module is used to perform VLAD encoding on the feature map obtained by the basic network module;
所述聚类中心向量构建单元用于设置K个聚类中心向量为网络的可训练参数;The cluster center vector construction unit is used to set K cluster center vectors as network trainable parameters;
所述训练单元用于输入指静脉图像分批对网络进行训练,所述训练单元包括:图像预处理模块、多通道特征图获取模块、结合编码模块、三元组构建模块和迭代更新模块;The training unit is used to input finger vein images to train the network in batches, and the training unit includes: an image preprocessing module, a multi-channel feature map acquisition module, a combination encoding module, a triplet construction module and an iterative update module;
所述图像预处理模块用于对指静脉图像进行预处理;The image preprocessing module is used to preprocess the finger vein image;
所述多通道特征图获取模块用于将指静脉图像样本通过基础网络模块得到多通道特征图;The multi-channel feature map acquisition module is used to obtain the multi-channel feature map through the basic network module through the finger vein image sample;
所述结合编码模块用于将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码;The combination encoding module is used to combine the multi-channel feature map in the VLAD encoding module to complete the VLAD encoding in combination with the clustering center vector;
所述三元组构建模块用于挖掘难分负样本得到三元组;The triplet building block is used to mine difficult negative samples to obtain triplets;
所述迭代更新模块用于计算损失函数并反向传播更新网络权重系数,直至迭代训练结束;The iterative update module is used to calculate the loss function and backpropagate to update the network weight coefficients until the end of the iterative training;
所述提取单元用于采用训练好的网络提取待测指静脉图像的局部聚合描述子。The extraction unit is used to extract the local aggregation descriptor of the finger vein image to be tested by using the trained network.
作为优选的技术方案,所述基础网络模块的结构采用6个串联的卷积模块,分别表示为Conv_i,i={1,2,3,4,5,6},每个卷积模块包含3×3的Conv2d层、BN 层和Relu激活层,各卷积模块的卷积核的数量分别为32、32、64、64、128、 128,所有卷积层的填充都设为1,Conv_3和Conv_5的卷积步长设为2,Conv_1、 Conv_2、Conv_4和Conv_6均设为1。As a preferred technical solution, the structure of the basic network module adopts 6 concatenated convolution modules, respectively represented as Conv_i, i={1,2,3,4,5,6}, each convolution module contains 3 ×3 Conv2d layer, BN layer and Relu activation layer, the number of convolution kernels of each convolution module is 32, 32, 64, 64, 128, 128, the padding of all convolution layers is set to 1, Conv_3 and The convolution step of Conv_5 is set to 2, and Conv_1, Conv_2, Conv_4 and Conv_6 are all set to 1.
作为优选的技术方案,所有卷积层采用正交矩阵初始化,偏置固定为0,BN 层的权重和偏置分别固定为1和0。As a preferred technical solution, all convolutional layers are initialized with an orthogonal matrix, the bias is fixed at 0, and the weight and bias of the BN layer are fixed at 1 and 0, respectively.
作为优选的技术方案,所述VLAD编码模块在网络结构上设置一个1×1卷积层。As a preferred technical solution, the VLAD encoding module sets a 1×1 convolutional layer on the network structure.
本发明与现有技术相比,具有如下优点和有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:
(1)本发明通过CNN网络端到端学习得到描述子,网络模型大小仅1.1M,所提取的描述子可进一步地用于指静脉验证和识别等任务,使用灵活,应用广泛。(1) The present invention obtains descriptors through CNN network end-to-end learning. The size of the network model is only 1.1M. The extracted descriptors can be further used for tasks such as finger vein verification and recognition, which are flexible and widely used.
(2)对于任意尺寸的指静脉图像,本发明通过网络自动学习得到K个聚类中心,将聚类中心向量串联构成表示指静脉图像特征的描述子向量,从而解决了不同尺寸指静脉图像之间的匹配问题,由于描述子向量是由聚类中心向量组成而非图像局部块的特征向量,因此在两幅指静脉图像存在空间位置差异的情况下仍能正确匹配,从而解决了同一类别指静脉图像之间因手指姿势差异造成的匹配失败问题。(2) For a finger vein image of any size, the present invention obtains K cluster centers through network automatic learning, and connects the cluster center vectors in series to form a descriptor vector representing the features of the finger vein image, thereby solving the problem of different sizes of finger vein images. Since the descriptor vector is composed of the cluster center vector rather than the feature vector of the local block of the image, it can still be matched correctly even if there is a difference in the spatial position of the two finger vein images, thus solving the problem of the same category index. Matching failure problem caused by difference in finger pose between vein images.
(3)本发明对指静脉图像的特征进行VLAD编码,在仅额外引入了1×1 卷积参数量的情况下,充分利用了特征图的信息,得到更具有表征力的指静脉图像描述子。(3) The present invention performs VLAD encoding on the features of the finger vein image, and only introduces an additional 1×1 convolution parameter, fully utilizes the information of the feature map, and obtains a more expressive finger vein image descriptor .
(4)本发明在网络训练期间构造三元组样本,减少对指静脉训练图像数量上的需求,同时保证用于训练的正负样本数量相等;在构造样本时采用难分负样本挖掘策略,加快网络收敛;采用三元组损失函数训练网络,促使网络着重学习不同指静脉图像之间的差异而非标签信息,提高方法的泛化性能。(4) The present invention constructs triplet samples during network training, reduces the demand on the number of finger vein training images, and ensures that the number of positive and negative samples used for training is equal; when constructing samples, it adopts a difficult-to-separate negative sample mining strategy, Speed up the network convergence; the triplet loss function is used to train the network, so that the network focuses on learning the differences between different finger vein images rather than label information, and improves the generalization performance of the method.
附图说明Description of drawings
图1为本实施例指静脉数据集的图像示例图;Fig. 1 is the image example diagram of the finger vein data set of the present embodiment;
图2为本实施例静脉数据集的划分方式示意图;Fig. 2 is the schematic diagram of the division mode of the vein data set of the present embodiment;
图3为本实施例网络模型分批训练的流程示意图;Fig. 3 is the schematic flow chart of batch training of the network model of the present embodiment;
图4为本实施例提取感兴趣区域得到的训练和测试图像示例图;FIG. 4 is an example diagram of training and test images obtained by extracting a region of interest in this embodiment;
图5为本实施例网络模型测试的流程示意图;Fig. 5 is a schematic flow chart of the network model test of the present embodiment;
图6为本实施例指静脉图像的深度局部聚合描述子提取系统的结构示意图;FIG. 6 is a schematic structural diagram of a deep local aggregation descriptor extraction system for finger vein images in this embodiment;
图7为本实施例基础网络模块的结构示意图;FIG. 7 is a schematic structural diagram of the basic network module of this embodiment;
图8为本实施例VLAD编码模块的结构示意图。FIG. 8 is a schematic structural diagram of a VLAD encoding module in this embodiment.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
实施例Example
本实施例在SDUMLA、FV-USM、MMCBNU_6000三个指静脉数据集上的训练和测试,SDUMLA数据集来自山东大学,指静脉图像来自106名受试者的 636个手指,从左右手食指、中指、无名指各采集6张灰度BMP图像,图像的分辨率为320×240;FV-USM数据集来自马来西亚大学,由123名受试者的左手、右手食指和中指静脉图像组成,数据库的图像来自两次不同实验,每根手指有 12张图像;MMCBNU_6000数据集来自韩国全北国立大学,由来自100名志愿者的手指静脉图像组成,每个手指图像重复采集10次,总共有6000张图像。In this example, training and testing are performed on three finger vein data sets of SDUMLA, FV-USM, and MMCBNU_6000. The SDUMLA data set comes from Shandong University, and the finger vein images come from 636 fingers of 106 subjects. Six gray-scale BMP images were collected for each ring finger, and the resolution of the images was 320×240; the FV-USM dataset was from the University of Malaysia, which consisted of vein images of the left and right index fingers and middle fingers of 123 subjects, and the images of the database came from two There are 12 images for each finger in different experiments; the MMCBNU_6000 dataset comes from Chonbuk National University in South Korea, which consists of finger vein images from 100 volunteers. Each finger image is collected 10 times, with a total of 6000 images.
如图1所示,图中给出了上述公开的指静脉数据集的图像示例,如图2所示,本实施例将MMCBNU_6000指静脉数据集的100个身份的手指图像按手指类别分为600类,每一类别包含10张样本图像,随机取出300个类别的图像作为训练集,则训练集总共有3000个样本,剩下的作为测试集,本实施例将测试集进一步划分为已注册模板库和待测样本集;As shown in Figure 1, the image example of the above-mentioned disclosed finger vein data set is given in the figure, as shown in Figure 2, the finger images of 100 identities of the MMCBNU_6000 finger vein data set are divided into 600 according to the finger category in this embodiment Classes, each category contains 10 sample images, and images of 300 categories are randomly selected as the training set, then the training set has a total of 3000 samples, and the rest are used as the test set. In this embodiment, the test set is further divided into registered templates Library and sample set to be tested;
本实施例主要基于深度学习框架Pytorch来实现,实验所用显卡为 GTX1080Ti,测试图像中提取的指静脉描述子用于识别和验证任务。This embodiment is mainly implemented based on the deep learning framework Pytorch. The graphics card used in the experiment is GTX1080Ti, and the finger vein descriptor extracted from the test image is used for identification and verification tasks.
本实施例提供一种指静脉图像的深度局部聚合描述子提取方法,包括下述步骤:This embodiment provides a method for extracting a deep local aggregation descriptor of a finger vein image, comprising the following steps:
构建基础网络模块,用于提取指静脉图像的局部特征;Build a basic network module for extracting local features of finger vein images;
构建VLAD编码模块,用于对基础网络模块得到的特征图进行VLAD编码;Construct a VLAD encoding module, which is used to perform VLAD encoding on the feature map obtained by the basic network module;
设置K个聚类中心向量为网络的可训练参数:本实施例设置K个维度为Cout的聚类中心向量{ck,k=1,2,…,K}并对其进行均匀分布随机初始化,ck通过网络学习确定,对于任意尺寸的指静脉图像,本实施例通过网络自动学习得到K个聚类中心,将聚类中心向量串联构成表示指静脉图像特征的描述子向量,其长度固定为128×K,从而解决了不同尺寸指静脉图像之间的匹配问题。特别地,由于描述子向量是由聚类中心向量组成而非图像局部块的特征向量,因此在两幅指静脉图像存在空间位置差异的情况下仍能正确匹配,从而解决了同一类别指静脉图像之间因手指姿势差异造成的匹配失败问题,在本实施例中,为使所提取的描述子既有较强的表征力又有较精简的形式,将聚类中心向量的数量K设置为8~15,本实施例优选地将K设置为10。Set K clustering center vectors as network trainable parameters: In this embodiment, K clustering center vectors {c k ,k=1,2,...,K} whose dimension is C out are set and uniformly distributed randomly Initialization, c k is determined through network learning. For finger vein images of any size, this embodiment obtains K cluster centers through network automatic learning, and the cluster center vectors are concatenated to form a descriptor vector representing finger vein image features, whose length It is fixed to 128×K, thus solving the matching problem between finger vein images of different sizes. In particular, since the descriptor vector is composed of the cluster center vector rather than the feature vector of the local block of the image, it can still be correctly matched when there is a difference in the spatial position of the two finger vein images, thereby solving the problem of the same category of finger vein images. The problem of matching failures caused by differences in finger postures between . ~15, K is preferably set to 10 in this embodiment.
如图3所示,输入指静脉图像分批对网络进行训练,训练步骤包括:As shown in Figure 3, input finger vein images to train the network in batches, and the training steps include:
对指静脉图像进行预处理,用于训练的感兴趣区域通过将手指区域从背景中分割出来,并截取手指边缘的外切矩形区域得到,这样去除了背景噪声,同时尽可能多地保留了原始信息;The finger vein image is preprocessed, and the region of interest used for training is obtained by segmenting the finger area from the background and intercepting the circumscribed rectangular area of the finger edge, thus removing the background noise while retaining as much as possible of the original image. information;
在本实施例中,对指静脉图像进行预处理的具体步骤包括:In this embodiment, the specific steps of preprocessing the finger vein image include:
感兴趣区域的提取:通过两个Sobel算子Masku,Maskd分别检测指静脉训练图像的上下边缘,通过线性回归的方法拟合出手指的中线,并计算中线与水平方向所成的角度,通过仿射变换对指静脉训练图像进行旋转,完成倾斜校正,最后根据手指最外侧的边缘点截取整个手指的外切矩形得到感兴趣区域,如图4 所示,根据感兴趣区域得到训练和测试图像;Extraction of the region of interest: use two Sobel operators Mask u and Mask d to detect the upper and lower edges of the finger vein training image respectively, fit the midline of the finger through linear regression, and calculate the angle formed by the midline and the horizontal direction, The finger vein training image is rotated by affine transformation to complete the tilt correction, and finally the circumscribed rectangle of the entire finger is intercepted according to the outermost edge point of the finger to obtain the region of interest, as shown in Figure 4, training and testing are obtained according to the region of interest image;
在本实施例中,两个Sobel算子分别表示为:In this embodiment, the two Sobel operators are represented as:
其中,Masku和Maskd表示两个延长至3×9的Sobel算子;Among them, Mask u and Mask d represent two Sobel operators extended to 3×9;
根据感受野和图像的原始比例调整指静脉训练图像的尺寸,计算所有训练图像的平均宽高比为2,根据基础网络模块的结构参数可计算出,输出特征图每一个特征点的感受野为23×23的局部区域,将输入图像的高调整为h=64,宽调整为w=128,此时特征图中的每个特征点的感受野的高度约为输入图像高度的 1/3,能体现较为丰富的指静脉信息;Adjust the size of the finger vein training image according to the receptive field and the original ratio of the image, and calculate the average aspect ratio of all training images to be 2. According to the structural parameters of the basic network module, the receptive field of each feature point in the output feature map can be calculated as For a local area of 23×23, the height of the input image is adjusted to h=64, and the width is adjusted to w=128. At this time, the height of the receptive field of each feature point in the feature map is about 1/3 of the height of the input image. Can reflect richer finger vein information;
对每一幅训练图像进行减去均值除以方差的标准化处理,减少光照不均的影响,得到最终的指静脉训练样本图像;For each training image, the standardization process of subtracting the mean value and dividing the variance is performed to reduce the influence of uneven illumination and obtain the final finger vein training sample image;
构建训练批次采样器:按批次载入训练图像,每批随机选择m个类别,每类别载入n个样本,即总共m×n个样本,在本实施例中,m=16,n=6;Build a training batch sampler: load training images in batches, randomly select m categories in each batch, and load n samples in each category, that is, a total of m×n samples. In this embodiment, m=16, n = 6;
初始化网络结构中的可训练参数,初始化基础网络模块的参数,其中卷积层的权重用正交矩阵初始化,偏置固定为0,将BN层的权重和偏置固定为1和 0;初始化类中心向量,将描述子聚类数目K设为10,描述子聚类中心 ck,k=1,2,…,K,采用均匀分布随机初始化;初始化VLAD编码模块的可训练参数,将1×1卷积的参数根据聚类中心来初始化,其中卷积核的权重初始化为 wk=2ck,偏置初始化为bk=-‖ck‖2;Initialize the trainable parameters in the network structure, initialize the parameters of the basic network module, where the weight of the convolutional layer is initialized with an orthogonal matrix, the bias is fixed to 0, and the weight and bias of the BN layer are fixed to 1 and 0; the initialization class Center vector, the number of descriptor clusters K is set to 10, the descriptor cluster centers c k ,k=1,2,…,K, randomly initialized with uniform distribution; the trainable parameters of the VLAD coding module are initialized, and 1× 1 The parameters of the convolution are initialized according to the cluster center, where the weight of the convolution kernel is initialized to w k =2c k , and the bias is initialized to b k =-‖c k ‖ 2 ;
设置三元组损失的超参数marg为1;Set the hyperparameter marg of the triplet loss to 1;
设置模型迭代次数为200次,学习率固定为0.01,优化方法为Adam(Adaptivemoment estimation,自适应矩估计),初始化最小验证损失和模型保存路径,用于后续保存验证损失最小的模型,设置模型初始迭代次数和初始样本批次为0,批次采样器的设置决定了算法训练的批尺寸batch size为16×6=96;Set the number of model iterations to 200, the learning rate is fixed at 0.01, the optimization method is Adam (Adaptive moment estimation, adaptive moment estimation), initialize the minimum verification loss and the model saving path, and use it to save the model with the smallest verification loss later, set the initial model The number of iterations and the initial sample batch are 0, and the setting of the batch sampler determines that the batch size of the algorithm training is 16×6=96;
模型训练迭代次数加1,继续模型的训练,训练样本批次加1,开始载入一批样本或继续载入下一批样本;Increase the number of model training iterations by 1, continue the training of the model, increase the batch of training samples by 1, start loading a batch of samples or continue to load the next batch of samples;
根据批次采样器的设定,从训练集中载入96张预处理后的训练图像;Load 96 preprocessed training images from the training set according to the settings of the batch sampler;
指静脉图像训练样本图像通过基础网络模块的6层3×3的卷积模块后,因其中有两层卷积的步长为2,实现了池化的效果,最后得到128通道尺寸为16×32 像素的特征图;After the finger vein image training sample image passes through the 6-
将多通道特征图在VLAD编码模块中结合聚类中心向量完成VLAD编码,设置K个维度为Cout的聚类中心向量{ck,k=1,2,…,K}并对其进行均匀分布随机初始化,ck通过网络学习确定;The multi-channel feature map is combined with the cluster center vector in the VLAD encoding module to complete the VLAD encoding, and K cluster center vectors {c k ,k=1,2,…,K} whose dimension is C out are set and uniformed The distribution is randomly initialized, and c k is determined through network learning;
VLAD编码具体步骤包括:The specific steps of VLAD encoding include:
将分辨率为wout×hout的Cout通道特征图转化为wout×hout个维度为Cout的描述原始图像的局部描述子{xi,i=1,2,…,wout×hout},本实施例具体表示为: 16×32个128维的局部描述子,用{xi,i=1,2,…,512}表示,通过VLAD编码模块后得到1280维的局部聚合描述子;Transform the C out channel feature map with resolution w out ×h out into w out ×h out local descriptors describing the original image with dimension C out { xi ,i=1,2,...,w out × h out }, this embodiment is specifically expressed as: 16×32 128-dimensional local descriptors, represented by { xi ,i=1,2,...,512}, and a 1280-dimensional local aggregation is obtained after passing through the VLAD coding module descriptor;
并输入VLAD编码模块进行编码,即计算K行Cout列的矩阵V,在(k,j)位置的元素为:And input the VLAD encoding module for encoding, that is, calculate the matrix V of K rows and C out columns, and the elements at (k, j) positions are:
其中,和分别表示第i个描述子xi的第j个分量、第k个聚类中心ck的第j 个分量,ak(xi)表示描述子xi属于第k个聚类簇的概率,ck′表示除第k个聚类中心向量以外的其他聚类中心向量;in, and represent the jth component of the i-th descriptor x i and the j-th component of the k-th cluster center c k respectively, a k (xi ) represents the probability that the descriptor x i belongs to the k-th cluster, c k' represents other cluster center vectors except the kth cluster center vector;
将矩阵V展平成一维向量并实行L2归一化,得到长度为K×Cout的局部聚合描述子;Flatten the matrix V into a one-dimensional vector and perform L2 normalization to obtain a local aggregation descriptor with a length of K×C out ;
对同一批次训练图像经过基础网络模块、VLAD编码模块的处理,共得到m×n个局部聚合描述子其中表示该批次中第i个训练样本的局部聚合描述子,本实施例的一个批次96个训练样本通过网络得到96 个描述子,组成矩阵其中表示每一幅训练图像经过基础网络模块、VLAD编码模块的处理后得到的描述子,通过矩阵计算得到两两描述子之间的欧氏距离矩阵的(i,j)位置的元素矩阵大小为96×96,对角线的元素为0;The same batch of training images is processed by the basic network module and the VLAD encoding module, and a total of m×n local aggregation descriptors are obtained in Indicates the local aggregation descriptor of the i-th training sample in the batch. A batch of 96 training samples in this embodiment obtains 96 descriptors through the network to form a matrix in Represents the descriptor obtained after each training image is processed by the basic network module and the VLAD encoding module, and the element of the (i,j) position of the Euclidean distance matrix between two descriptors is obtained through matrix calculation The size of the matrix is 96×96, and the elements of the diagonal are 0;
挖掘难分负样本得到三元组,选取两个局部聚合描述子和 和的样本属于同一类别,构成一对正样本对;Mining difficult negative samples to obtain triplets, select two local aggregation descriptors and and The samples belong to the same category, Constitute a pair of positive samples;
为每个正样本对选取一个来自其它类别的负样本构成三元组负样本使最小,其中,marg 表示设定的阈值参数,在本实施例中,在计算同类描述子和异类描述子的距离之差时将阈值marg设为1,能较好地区分同类和异类描述子;For each positive sample pair Pick a negative sample from another class form a triplet negative sample make Minimum, where marg represents the set threshold parameter. In this embodiment, the threshold marg is set to 1 when calculating the distance difference between similar descriptors and heterogeneous descriptors, which can better distinguish similar and heterogeneous descriptors;
计算损失函数并反向传播更新网络权重系数,判断是否对所有样本都完成了一次训练,如果完成一次训练则进入验证损失步骤,否则返回继续训练;Calculate the loss function and backpropagate to update the network weight coefficients to determine whether a training has been completed for all samples. If a training is completed, enter the verification loss step, otherwise return to continue training;
判断当前的验证损失是否小于最小损失,如果是则保存模型或更新已保存的模型,更新最小损失的值,否则进入判断是否迭代完成所设次数;Judging whether the current verification loss is less than the minimum loss, if so, save the model or update the saved model, and update the value of the minimum loss, otherwise enter to judge whether to complete the set number of iterations;
判断是否完成了200次迭代,如果是,则结束训练,否则转到模型训练迭代步骤,即模型训练迭代次数加1,继续模型的训练,训练样本批次加1,开始载入一批样本或继续载入下一批样本;Determine whether 200 iterations have been completed, if so, end the training, otherwise go to the model training iteration step, that is, increase the number of model training iterations by 1, continue the model training, increase the training sample batch by 1, and start loading a batch of samples or Continue loading the next batch of samples;
在本实施例中,对同批次的三元组计算损失函数的计算公式为:In this embodiment, the calculation formula for calculating the loss function for triplets in the same batch is:
其中,m表示同批次图像的类别,n表示每个类别中的样本个数;本实施例在网络训练期间构造三元组样本,减少对指静脉训练图像数量上的需求同时保证用于训练的正负样本数量相等;在构造样本时使用难分负样本挖掘策略,加快网络收敛;使用三元组损失函数训练网络,促使网络着重学习不同指静脉图像之间的差异而非标签信息,提高方法的泛化性能。Among them, m represents the category of the same batch of images, and n represents the number of samples in each category; this embodiment constructs triplet samples during network training, reducing the demand on the number of finger vein training images and ensuring that they are used for training The number of positive and negative samples is equal; when constructing samples, the indistinguishable negative sample mining strategy is used to speed up network convergence; the triple loss function is used to train the network, so that the network focuses on learning the differences between different finger vein images rather than label information, improving generalization performance of the method.
如图5所示,采用训练好的网络提取待测指静脉图像的局部聚合描述子,测试阶段的网络结构与训练阶段的网络结构相同,具体步骤如下:As shown in Figure 5, the trained network is used to extract the local aggregation descriptor of the finger vein image to be tested. The network structure of the test phase is the same as that of the training phase. The specific steps are as follows:
对待测指静脉图像实施与网络训练时一样的图像预处理步骤,包括感兴趣区域提取和标准化,以及尺寸调整,图像预处理后将测试集所有图像尺寸调整为64×128,将预处理后的指静脉图像输入训练好的网络,即可得到其深度局部聚合描述子,所得描述子可进一步用于指静脉的识别或验证。The finger vein images to be tested are subjected to the same image preprocessing steps as in network training, including region of interest extraction and standardization, and size adjustment. After image preprocessing, all images in the test set are resized to 64×128, and the preprocessed Input the finger vein image into the trained network to obtain its deep local aggregation descriptor, which can be further used for finger vein recognition or verification.
在本实施例中,分别测试指静脉识别和验证任务;In this embodiment, finger vein recognition and verification tasks are tested respectively;
对于指静脉识别任务:For the finger vein recognition task:
将已注册模板库的所有图像输入网络得到特征描述子;Input all images of the registered template library into the network to obtain feature descriptors;
对于待测样本集中的每幅图像分别通过网络得到描述子,计算描述子与所有注册模板的特征描述子之间的欧氏距离;For each image in the sample set to be tested, the descriptor is obtained through the network, and the Euclidean distance between the descriptor and the feature descriptors of all registered templates is calculated;
对欧氏距离进行排序,识别当前待测样本为与之欧氏距离最小的已注册模板;Sort the Euclidean distance, and identify the current sample to be tested as the registered template with the smallest Euclidean distance;
对于指静脉验证任务:For the finger vein verification task:
通过使用已注册模板库进行测试,选择合适的二分类阈值为1;By using the registered template library for testing, select an appropriate binary classification threshold of 1;
对于每一个待测样本分别与来自测试集的同类别的其他样本组成正样本对,随机选择等量的异类样本组成负样本对,组成正负样本对共300×5×5×2=15000 对;For each sample to be tested and other samples of the same category from the test set to form a positive sample pair, randomly select the same amount of heterogeneous samples to form a negative sample pair, forming a total of 300×5×5×2=15000 pairs of positive and negative samples ;
将样本对一一通过网络得到描述子,计算两两描述子之间的欧氏距离;Get the descriptors by passing the sample pairs one by one through the network, and calculate the Euclidean distance between any pair of descriptors;
若两个描述子之间的欧氏距离低于1,表明两个样本为相同类别,验证成功,否则验证失败;If the Euclidean distance between the two descriptors is lower than 1, it indicates that the two samples belong to the same category, and the verification is successful, otherwise the verification fails;
最终保存测试结果,本实施例采用深度局部聚合描述子,无需对不同手指姿势的指静脉图像进行特殊处理,可直接通过计算欧氏距离、余弦相似度等简单度量方法实现匹配识别。Finally, the test results are saved. In this embodiment, deep local aggregation descriptors are used, and no special processing is required for finger vein images of different finger poses. Matching and recognition can be realized directly by calculating simple measurement methods such as Euclidean distance and cosine similarity.
在本实施例中,对于其他两个公开数据库的测试方法与上述步骤基本相同。如下表1所示,本实施例方法测试指静脉1:1验证的实验结果所涉及的评价指标是EER(EqualError Rate),即FAR(False Accept Rate)与FRR(False Reject Rate) 相等时的FAR值,实验中取|FAR-FRR|<0.0001时的FAR值作为EER。In this embodiment, the testing methods for the other two public databases are basically the same as the above steps. As shown in Table 1 below, the evaluation index involved in the experimental results of the 1:1 verification of the method of this embodiment is EER (EqualError Rate), that is, the FAR when FAR (False Accept Rate) is equal to FRR (False Reject Rate) In the experiment, the FAR value when |FAR-FRR|<0.0001 is taken as EER.
表1测试指静脉1:1验证EER结果表Table 1 Test finger vein 1:1 verification EER result table
如下表2所示,本实施例指静脉1:N识别的实验结果,涉及的评价指标是IR(k):As shown in Table 2 below, this embodiment refers to the experimental results of vein 1:N recognition, and the evaluation index involved is IR(k):
其中,B表示所有待测样本的集合,b为某个待测样本,rank(b)是待测样本与已注册模板库中同类别样本的相似度的排序,UB表示待测样本集的数量。 IR(1)表示测试样本中与模板库中同一类别的样本的相似度排在第一位的样本占所有测试样本的比例。Among them, B represents the set of all samples to be tested, b is a certain sample to be tested, rank(b) is the ranking of the similarity between the samples to be tested and the samples of the same category in the registered template library, U B represents the rank of the sample set to be tested quantity. IR(1) represents the ratio of the sample whose similarity ranks first to the samples of the same category in the template library in the test sample to all the test samples.
表2测试指静脉1:N识别IR(k)结果表Table 2 Test finger vein 1:N recognition IR(k) result table
如下表3所示,本实施例模型的参数量和在CPU上测试的指静脉识别相关时间。As shown in Table 3 below, the parameters of the model in this embodiment and the correlation time of finger vein recognition tested on the CPU.
表3模型大小和时间耗费表Table 3 Model size and time consumption table
从上表1-表3可以看出,本实施例所提出的网络在指静脉识别和验证两种任务上均具有有效性,且网络模型仅1.1M,特征提取时间短,在本实施例中描述子维度为1280的情况下,相似度的计算耗费时间短;本实施例对指静脉图像的特征进行VLAD编码,在仅额外引入了1×1卷积参数量的情况下,充分利用了特征图的信息,得到更具有表征力的指静脉图像描述子。It can be seen from the above table 1-table 3 that the network proposed in this embodiment is effective in both finger vein recognition and verification tasks, and the network model is only 1.1M, and the feature extraction time is short. In this embodiment When the description sub-dimension is 1280, the calculation of the similarity takes a short time; this embodiment performs VLAD encoding on the features of the finger vein image, and makes full use of the features when only an additional 1×1 convolution parameter is introduced. Figure information, get a more expressive finger vein image descriptor.
如图6所示,本实施例还提供一种指静脉图像的深度局部聚合描述子提取系统,包括:基础网络模块构建单元、VLAD编码模块构建单元、聚类中心向量构建单元、训练单元和提取单元;As shown in Figure 6, this embodiment also provides a system for extracting deep local aggregated descriptors of finger vein images, including: a basic network module construction unit, a VLAD encoding module construction unit, a cluster center vector construction unit, a training unit and an extraction unit. unit;
在本实施例中,基础网络模块构建单元用于构建基础网络模块,所述基础网络模块用于提取指静脉图像的局部特征;VLAD编码模块构建单元用于构建 VLAD编码模块,VLAD编码模块用于对基础网络模块得到的特征图进行VLAD 编码;聚类中心向量构建单元用于设置K个聚类中心向量为网络的可训练参数;训练单元用于输入指静脉图像分批对网络进行训练,提取单元用于采用训练好的网络提取待测指静脉图像的局部聚合描述子;In this embodiment, the basic network module construction unit is used to construct the basic network module, and the basic network module is used to extract the local features of the finger vein image; the VLAD encoding module construction unit is used to construct the VLAD encoding module, and the VLAD encoding module is used for Perform VLAD encoding on the feature map obtained by the basic network module; the cluster center vector construction unit is used to set K cluster center vectors as network trainable parameters; the training unit is used to input finger vein images to train the network in batches, extract The unit is used to extract the local aggregation descriptor of the finger vein image to be tested by using the trained network;
如图7所示,基础网络模块的结构采用6个串联的卷积模块,分别表示为 Conv_i,i={1,2,3,4,5,6},每个卷积模块包含3×3的Conv2d层、BN层和Relu激活层,各卷积模块的卷积核的数量分别为32、32、64、64、128、128,所有卷积层的填充都设为1,Conv_3和Conv_5的卷积步长设为2,Conv_1、Conv_2、 Conv_4和Conv_6均设为1,所有卷积层采用正交矩阵初始化,偏置固定为0,且不更新BN层的权重和偏置,将BN层的权重和偏置分别固定为1和0,能减少模型训练参数,同时对结果影响很小;As shown in Figure 7, the structure of the basic network module adopts 6 convolution modules connected in series, denoted as Conv_i, i={1,2,3,4,5,6}, each convolution module contains 3×3 The Conv2d layer, BN layer, and Relu activation layer of the Conv2d layer, the number of convolution kernels of each convolution module are 32, 32, 64, 64, 128, and 128, and the padding of all convolutional layers is set to 1. Conv_3 and Conv_5 The convolution step size is set to 2, Conv_1, Conv_2, Conv_4 and Conv_6 are all set to 1, all convolutional layers are initialized with an orthogonal matrix, the bias is fixed at 0, and the weight and bias of the BN layer are not updated, the BN layer The weight and bias of are fixed at 1 and 0 respectively, which can reduce model training parameters and have little effect on the results;
如图8所示,VLAD编码模块在网络结构上设置一个1×1卷积层,其中权重wk=2ck,偏置bk=-‖ck‖2,用于表示简化后的ak(xi):As shown in Figure 8, the VLAD encoding module sets a 1×1 convolutional layer on the network structure, where the weight w k =2c k and the bias b k =-‖c k ‖ 2 are used to represent the simplified a k ( xi ):
在本实施例中,训练单元包括:图像预处理模块、多通道特征图获取模块、结合编码模块、三元组构建模块和迭代更新模块;In this embodiment, the training unit includes: an image preprocessing module, a multi-channel feature map acquisition module, a combined encoding module, a triplet construction module and an iterative update module;
在本实施例中,图像预处理模块用于对指静脉图像进行预处理;多通道特征图获取模块用于将指静脉图像样本通过基础网络模块得到多通道特征图;结合编码模块用于将多通道特征图在VLAD编码模块中结合聚类中心向量完成 VLAD编码;三元组构建模块用于挖掘难分负样本得到三元组;迭代更新模块用于计算损失函数并反向传播更新网络权重系数,直至迭代训练结束。In this embodiment, the image preprocessing module is used to preprocess the finger vein image; the multi-channel feature map acquisition module is used to pass the finger vein image sample through the basic network module to obtain a multi-channel feature map; the combined coding module is used to combine the multi-channel feature map The channel feature map is combined with the clustering center vector in the VLAD encoding module to complete the VLAD encoding; the triplet construction module is used to mine difficult negative samples to obtain triplets; the iterative update module is used to calculate the loss function and backpropagate to update the network weight coefficients , until the end of the iterative training.
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010050908.9A CN111274915B (en) | 2020-01-17 | 2020-01-17 | Deep local aggregation descriptor extraction method and system for finger vein image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010050908.9A CN111274915B (en) | 2020-01-17 | 2020-01-17 | Deep local aggregation descriptor extraction method and system for finger vein image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111274915A CN111274915A (en) | 2020-06-12 |
CN111274915B true CN111274915B (en) | 2023-04-28 |
Family
ID=71001095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010050908.9A Expired - Fee Related CN111274915B (en) | 2020-01-17 | 2020-01-17 | Deep local aggregation descriptor extraction method and system for finger vein image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111274915B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112200156B (en) * | 2020-11-30 | 2021-04-30 | 四川圣点世纪科技有限公司 | Vein recognition model training method and device based on clustering assistance |
CN112733627B (en) * | 2020-12-28 | 2024-02-09 | 杭州电子科技大学 | Finger vein recognition method based on fusion local and global feature network |
CN112580590B (en) * | 2020-12-29 | 2024-04-05 | 杭州电子科技大学 | Finger vein recognition method based on multi-semantic feature fusion network |
CN112926516B (en) * | 2021-03-26 | 2022-06-14 | 长春工业大学 | Robust finger vein image region-of-interest extraction method |
CN113312989B (en) * | 2021-05-11 | 2023-06-20 | 华南理工大学 | A Finger Vein Feature Extraction Network Based on Aggregated Descriptors and Attention |
CN115018056B (en) * | 2022-06-17 | 2024-09-06 | 华中科技大学 | Training method for local description subnetwork for natural scene image matching |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
CN107977609A (en) * | 2017-11-20 | 2018-05-01 | 华南理工大学 | A kind of finger vein identity verification method based on CNN |
CN109598311A (en) * | 2019-01-23 | 2019-04-09 | 中山大学 | A kind of sub- partial polymerization vector approach of description that space sub-space learning is cut based on symmetric positive definite matrix manifold |
CN110263659A (en) * | 2019-05-27 | 2019-09-20 | 南京航空航天大学 | A kind of finger vein identification method and system based on triple loss and lightweight network |
CN110427832A (en) * | 2019-07-09 | 2019-11-08 | 华南理工大学 | A kind of small data set finger vein identification method neural network based |
-
2020
- 2020-01-17 CN CN202010050908.9A patent/CN111274915B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
CN107977609A (en) * | 2017-11-20 | 2018-05-01 | 华南理工大学 | A kind of finger vein identity verification method based on CNN |
CN109598311A (en) * | 2019-01-23 | 2019-04-09 | 中山大学 | A kind of sub- partial polymerization vector approach of description that space sub-space learning is cut based on symmetric positive definite matrix manifold |
CN110263659A (en) * | 2019-05-27 | 2019-09-20 | 南京航空航天大学 | A kind of finger vein identification method and system based on triple loss and lightweight network |
CN110427832A (en) * | 2019-07-09 | 2019-11-08 | 华南理工大学 | A kind of small data set finger vein identification method neural network based |
Also Published As
Publication number | Publication date |
---|---|
CN111274915A (en) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111274915B (en) | Deep local aggregation descriptor extraction method and system for finger vein image | |
CN111178432B (en) | Weakly supervised fine-grained image classification method based on multi-branch neural network model | |
CN103605972B (en) | Non-restricted environment face verification method based on block depth neural network | |
CN103761531B (en) | The sparse coding license plate character recognition method of Shape-based interpolation contour feature | |
CN111027464B (en) | Iris Recognition Method Jointly Optimized for Convolutional Neural Network and Sequential Feature Coding | |
CN111368683B (en) | Face Image Feature Extraction Method and Face Recognition Method Based on Modular Constraint CenterFace | |
CN110543822A (en) | A Finger Vein Recognition Method Based on Convolutional Neural Network and Supervised Discrete Hash Algorithm | |
CN105718889B (en) | Face ID Recognition Method Based on GB(2D)2PCANet Deep Convolution Model | |
CN113205026B (en) | Improved vehicle type recognition method based on fast RCNN deep learning network | |
CN108460356A (en) | A kind of facial image automated processing system based on monitoring system | |
CN107038416B (en) | A Pedestrian Detection Method Based on Improved HOG Feature of Binary Image | |
CN111611924A (en) | A Mushroom Recognition Method Based on Deep Transfer Learning Model | |
CN109446922B (en) | Real-time robust face detection method | |
CN104809481A (en) | Natural scene text detection method based on adaptive color clustering | |
CN113158955B (en) | Pedestrian re-recognition method based on clustering guidance and paired measurement triplet loss | |
CN109344856B (en) | Offline signature identification method based on multilayer discriminant feature learning | |
CN111027570B (en) | Image multi-scale feature extraction method based on cellular neural network | |
CN110443257A (en) | A kind of conspicuousness detection method based on Active Learning | |
CN110555386A (en) | Face recognition identity authentication method based on dynamic Bayes | |
CN114495170A (en) | A method and system for pedestrian re-identification based on local suppression of self-attention | |
CN114998995A (en) | Cross-view-angle gait recognition method based on metric learning and space-time double-flow network | |
CN115527269B (en) | Intelligent human body posture image recognition method and system | |
CN107133579A (en) | Based on CSGF (2D)2The face identification method of PCANet convolutional networks | |
CN108520539A (en) | A Method of Image Object Detection Based on Sparse Learning Variable Model | |
CN107967481A (en) | A kind of image classification method based on locality constraint and conspicuousness |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20230428 |
|
CF01 | Termination of patent right due to non-payment of annual fee |