Fingerprint singular point detection method based on RCNN
Technical Field
The invention relates to an image singular point detection method, in particular to a fingerprint singular point detection method, belonging to the field of computer vision and deep learning.
Background
Because of its uniqueness, fingerprint images are now widely used as identification labels in access permissions, criminal investigations and other aspects, and people can determine the owner of a certain fingerprint image by judging the consistency of the used fingerprint image and the image in the database. The singular points are used as essential global features and obvious marks on the fingerprint image, have the features which are not changed along with rotation, deformation and the like, and are suitable for various fingerprint identification scenes such as fingerprint retrieval, fingerprint classification and the like.
The poincare index is applied in the singular point detection of the fingerprint, and the method using the poincare index is generally susceptible to picture noise and poor in performance on a low-quality fingerprint picture, and is easy to cause huge calculation amount. Most of the existing singular point detection methods are based on the increase of poincare exponent. If the poincare index and a multi-scale detection algorithm are combined, the method only needs to calculate singular points of possible regions, the detection speed can be effectively improved, but the detection accuracy is not ideal, and in addition, the performance of the zero-pole model combined with the Hough transform method is limited by the accuracy of the poincare index.
The deep convolutional neural network promotes the development of many advanced computer vision directions nowadays, is widely used in the fields of biological pattern recognition, video recognition and the like, and achieves good effect. The RCNN network has a strong effect on target detection, and uses a convolutional neural network with high capacity to propagate a candidate region from bottom to top so as to achieve the purposes of target positioning and segmentation. For the condition that the training data of the label is less, the RCNN can use the trained parameters as assistance to conduct fine adjustment, the recognition effect can be well improved, in addition, the RCNN adopts a mode that supervised pre-training is conducted under a large number of samples and fine adjustment is conducted under a small number of samples, and the problems that small samples are difficult to train and even are over-fitted are effectively solved.
The invention content is as follows:
in view of the defects of the conventional method, the present invention provides a fingerprint singular point detection method based on RCNN, the implementation process of which is shown in fig. 1, and the purpose of the present invention is to more efficiently and accurately extract the fingerprint singular point shown in fig. 2b from the fingerprint image shown in fig. 2a, and at the same time, to reduce the requirement on the quality of the sample fingerprint image.
In order to achieve the purpose, the invention carries out the following steps after the computer reads the original fingerprint image:
step one, constructing a data set: acquiring 256 x 320 original fingerprint gray level images containing noise points, manually enhancing the images, marking out a group channel, normalizing the images, and dividing a training set and a test set according to a ratio of 8: 2;
step two, image enhancement: and constructing a de-coding convolutional neural network for image enhancement, wherein the de-coding convolutional neural network consists of a coding network module and a decoding network module. Training an image enhancement network by using an original data set, and storing 256 x 320 fingerprint pictures output by network prediction as input of the third step;
step three, image segmentation: dividing the enhanced fingerprint image into a plurality of regions with the size of 41 x 41 according to a grid, manually marking the category to which each region belongs, representing the category by using a matrix as a group route, and then setting a probability threshold for screening classified results. The Res-net classifier is trained using the enhanced image dataset. Aiming at the output result of each region, reserving the region higher than the probability threshold value for detecting the coordinate of the singular point;
step four, singular point detection: taking the area image containing the singular points in the third step as input, taking the normalized fingerprint coordinates as output, and performing FCN training, wherein the essential is to perform regression on the proposed region of interest;
step five, accuracy calculation: and (4) extracting the prediction result of the FCN in the step four, comparing the prediction result with the true value, and calculating the prediction accuracy of the method. And taking the Euclidean distance between the predicted point and the real point as a basis, and regarding the point with the distance lower than the threshold value as successful detection.
Aiming at the first step, the artificial image enhancement means performs filtering, noise reduction and other operations by using an image processing technology, the tagging groudtruth means that the position of a singular point is manually tagged, the coordinate of the singular point is read, and the coordinate is stored as a csv file. The image normalization means that the gray values of all the pixel points are divided by 255, so that the gray values are in the range of [0,1 ].
And aiming at the second step, the image enhancement network consists of an encoder network and a decoder network, wherein the encoder network is structurally composed of two identical convolution layers (the convolution kernel is 3 x 3, the number of channels is 16 and 64 in sequence, and the step length is 1) and a maximum pooling layer (the window size is 2 x 2) module. The encoder network consists of two network modules including one upsampling layer (window size 2 x 2) and two convolutional layers (convolution kernel 3 x 3, channel number 64 and 16 in turn, step size 1), and finally, one convolutional layer with convolution kernel 1 x 1. In the training process, the mean square error is used as a loss function, and a random gradient descent algorithm is used for parameter optimization.
Aiming at the third step, the matrix for labeling each region type is as follows:
where C is a class matrix, Ci∈{0,1},i=1,2,3,c1Indicating whether or not the region contains singular points, c2Indicating whether the area contains a core point, c3Indicating whether the area contains triangle points. The probability threshold depends on the particular data set and is generally slightly less than the maximum of the prediction probability. The specific structure of Res-net in the step is as follows: a convolutional layer with convolution kernel of 5 × 5 and channel number of 16, a downsampled layer with window size of 2 × 2 and channel number of 16, a convolutional layer with convolution kernel of 5 × 5 and channel number of 32, and a residual network extended to the next convolutional layer, a downsampled layer with window size of 2 × 2 and channel number of 64, a convolutional layer with convolution kernel of 5 × 5 and channel number of 64, and a fully-connected layer. And setting the training parameters of the network as the network in the second step.
Aiming at the fourth step, the training set is a 41 × 41 regional gray image which is higher than the probability threshold in the third step, the coordinate of the singular point in the region is calculated through the coordinate marked by the original image, and the singular point coordinate is normalized, and the specific steps are as follows:
wherein x
iAs original coordinates, x
i' is the coordinate of the singular point in the area gray scale picture,
is the coordinate value after the normalization,
and n is the fingerprint picture in the data set. The FCN of this step consists of four similar modules, each module consisting of two convolutional layers (convolutional kernel 3 x 3, number of channels 16, 64, 128 and 256 in order) and one maximum pooling layer (window size 2 x 2), the number of fully connected layers being 2, and the number of nodes being 256 and 2, respectively. In this network, regression is performed using random gradient descent, and the network learns by back-propagating the mean square error. Because the input picture is small, the CNN in the step can be effectively learned, and the output predicted value has high accuracy.
For step five, the euclidean distance used is as follows:
wherein p isx,py,gx,gyRespectively representing the horizontal and vertical coordinates of the predicted point and the horizontal and vertical coordinates of the real singular point, and the threshold is a threshold value.
And aiming at the step five, the threshold value is determined according to the size of the picture, generally about one tenth of the image size, and the threshold value is 20 pixel points according to the size of the picture.
Description of the drawings:
FIG. 1 is a flow chart of one embodiment of the present invention
FIGS. 2a and b show the original drawing and the detection result of the embodiment of FIG. 1
The specific implementation process comprises the following steps:
the RCNN-based fingerprint singular point detection method is further described below with reference to a flowchart and an embodiment.
The whole method mainly comprises the following five steps: constructing a data set, enhancing a fingerprint image, segmenting the fingerprint image, detecting singular point coordinates and detecting accuracy.
Acquiring 256 x 320 fingerprint original gray level images containing noise points, manually enhancing the images, marking out a group route, normalizing the images, and dividing a training set and a test set according to a ratio of 8: 2;
and step two, constructing a de-coding convolutional neural network for image enhancement, wherein the de-coding convolutional neural network consists of a coding network module and a decoding network module. Training an image enhancement network by using an original data set, and storing 256 × 320 fingerprint images output by network prediction as input of the third step;
and step three, dividing the enhanced fingerprint image into regions with the size of 41 × 41 according to grids, manually marking the category to which each region belongs, representing the categories by using a matrix, and then setting a probability threshold value for screening classified results. Training a Res-net classifier by using the enhanced image data set, and reserving a region higher than a probability threshold value for singular point coordinate detection;
step four, taking the area image containing the singular points in the step three as input, taking the normalized fingerprint coordinates as output, and carrying out FCN training;
and step five, extracting the prediction result of the FCN in the step four, comparing the prediction result with the true value, and calculating the prediction accuracy of the method. And taking the Euclidean distance between the predicted point and the real point as a basis, and regarding the point with the distance lower than the threshold value as successful detection.
Aiming at the first step, the artificial image enhancement means performs filtering, noise reduction and other operations by using an image processing technology, the tagging groudtruth means that the position of a singular point is manually tagged, the coordinate of the singular point is read, and the coordinate is stored as a csv file. The image normalization means that the gray values of all the pixel points are divided by 255, so that the gray values are in the range of [0,1 ].
And aiming at the second step, the image enhancement network consists of an encoder network and a decoder network, wherein the encoder network is structurally composed of two identical convolution layers (the convolution kernel is 3 x 3, the number of channels is 16 and 64 in sequence, and the step length is 1) and a maximum pooling layer (the window size is 2 x 2) module. The encoder network consists of two network modules including one upsampling layer (window size 2 x 2) and two convolutional layers (convolution kernel 3 x 3, channel number 64 and 16 in turn, step size 1), and finally, one convolutional layer with convolution kernel 1 x 1. In the training process, the mean square error is used as a loss function, and a random gradient descent algorithm is used for parameter optimization.
Aiming at the third step, the matrix for labeling each region type is as follows:
where C is a class matrix, Ci∈{0,1},i=1,2,3,c1Indicating whether or not the region contains singular points, c2Indicating whether the area contains a core point, c3Indicating whether the area contains triangle points. The probability threshold depends on the particular data set and is generally slightly less than the maximum of the prediction probability. The specific structure of Res-net in the step is as follows: a convolutional layer with convolution kernel of 5 × 5 and channel number of 16, a downsampled layer with window size of 2 × 2 and channel number of 16, a convolutional layer with convolution kernel of 5 × 5 and channel number of 32, and a residual network extended to the next convolutional layer, a downsampled layer with window size of 2 × 2 and channel number of 64, a convolutional layer with convolution kernel of 5 × 5 and channel number of 64, and a fully-connected layer. And setting the training parameters of the network as the network in the second step.
Aiming at the fourth step, the training set is a 41 × 41 regional gray image which is higher than the probability threshold in the third step, the coordinate of the singular point in the region is calculated through the coordinate marked by the original image, and the singular point coordinate is normalized, and the specific steps are as follows:
wherein x
iAs original coordinates, x
i' is the coordinate of the singular point in the area gray scale picture,
is the coordinate value after the normalization,
and n is the fingerprint picture in the data set. The FCN of this step consists of four similar modules, each module consisting of two convolutional layers (convolutional kernel 3 x 3, number of channels 16, 64, 128 and 256 in order) and one maximum pooling layer (window size 2 x 2), the number of fully connected layers being 2, and the number of nodes being 256 and 2, respectively. In this network, regression is performed using random gradient descent, and the network learns by back-propagating the mean square error. Because the input picture is small, the CNN in the step can be effectively learned, and the output predicted value has high accuracy.
For step five, the euclidean distance used is as follows:
wherein p isx,py,gx,gyRespectively representing the horizontal and vertical coordinates of the predicted point and the horizontal and vertical coordinates of the real singular point, and the threshold is a threshold value.
And aiming at the step five, the threshold value is determined according to the size of the picture, generally about one tenth of the image size, and the threshold value is 20 pixel points according to the size of the picture.
The fingerprint singular point detection method based on the RCNN provided by the invention is based on the RCNN framework to achieve the advantages of high detection speed, high accuracy and high efficiency, the requirement on the quality of the fingerprint image is reduced in the image enhancement process, and the data enhancement operation is not required in the process to simplify the training process by the block network.
The method provided by the invention is described in detail, the invention principle and the implementation method are explained by applying specific examples, and the description of the examples is only used for understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and the content of the present specification should not be construed as a limitation of the present invention.