CN111340022A

CN111340022A - Identity card information identification method and device, computer equipment and storage medium

Info

Publication number: CN111340022A
Application number: CN202010111192.9A
Authority: CN
Inventors: 管水城; 温凯雯; 吕仲琪; 顾正
Original assignee: Shenzhen Huayun Zhongsheng Technology Co ltd
Current assignee: Shenzhen Huayun Zhongsheng Technology Co ltd
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2020-06-26

Abstract

The invention relates to an identification card information identification method, an identification card information identification device, computer equipment and a storage medium, wherein the identification card information identification method comprises the steps of obtaining identification card picture data to be identified so as to form data to be identified; inputting data to be recognized into a text direction recognition model for direction recognition to obtain a recognition result; adjusting the direction of the data to be recognized according to the recognition result to obtain intermediate data; inputting the intermediate data into a text region detection model to perform text region detection so as to obtain a detection result; cutting the intermediate data according to the detection result to obtain processed data; inputting the processed data into a text information recognition model for text information recognition to obtain an information recognition result; carrying out face recognition on data to be recognized to obtain face coordinate information; and sending the information identification result and the face coordinate information to a terminal so that the terminal displays the information identification result and the face coordinate information. The invention realizes the improvement of the efficiency and the accuracy of identification card identification.

Description

Identity card information identification method and device, computer equipment and storage medium

Technical Field

The invention relates to an identification card identification method, in particular to an identification card information identification method, an identification card information identification device, computer equipment and a storage medium.

Background

The identity card is a legal certificate for citizens to perform social activities, maintain social order, guarantee the legal rights and interests of citizens and prove the identity of citizens. The identity card is used in countless scenes such as family registration, military service registration, marital registration, admission, employment, official certificate transaction, hostel registration procedure transaction, remittance extraction, article mailing, professional qualification examination taking and other transaction. OCR (Optical Character Recognition) is mainly used for document Recognition and certificate Recognition, which can digitize a printed document to quickly and accurately extract effective information, and certificate Recognition is to digitize a certificate scan or a certificate by a terminal device such as a mobile phone with a camera so as to improve work efficiency and reduce work intensity. As a branch of the artificial intelligence field, deep learning can improve the application range of OCR recognition, and the character region extraction applied to OCR can greatly improve the accuracy of OCR on character extraction.

In recent years, the real-name system of the resident identification card is frequently seen in various practical fields of various industries, the real-name systems in different fields are continuously appeared, the traditional manual input method consumes a large amount of time and energy of business personnel, and meanwhile, the accuracy of information acquisition of the identification card is difficult to ensure, so that the workload of the business personnel is greatly reduced by utilizing the deep learning technology to rapidly and intelligently identify the resident identification card, and the business efficiency is improved.

The existing OCR identification technology of the identity card covers character region detection, character cutting and character identification, the mature method of the character region detection comprises template matching, opencv graying, binarization, picture corrosion, a deep learning method CPTN (scene character detection), and the like, the character identification method comprises training a character identification model, a convolutional neural Network and a cyclic neural Network through artificially designed features such as HOG (Histogram of Oriented gradient), but the realization operation is cumbersome, the identification speed is slow, the mobility of the template is poor, and the collected target can influence the identification effect to a great extent due to the difference in scale; the network model is extremely complex, a large amount of data needs to be collected for label training, and the training cost is high; the model identification precision is unstable, the noise immunity is poor, and the model identification method cannot adapt to various complex scenes; the configuration requirement on the deployment of the production environment is high, and the later operation and maintenance process is not easy to maintain. By adopting the mode to identify the information of the identity card, the efficiency is low, and the accuracy is low.

Therefore, it is necessary to design a new method to improve the efficiency and accuracy of identification card identification.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an identification card information identification method, an identification card information identification device, computer equipment and a storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme: the identification method of the identity card information comprises the following steps:

acquiring identification card picture data to be identified to form data to be identified;

inputting data to be recognized into a text direction recognition model for direction recognition to obtain a recognition result;

adjusting the direction of the data to be recognized according to the recognition result to obtain intermediate data;

inputting the intermediate data into a text region detection model to perform text region detection so as to obtain a detection result;

cutting the intermediate data according to the detection result to obtain processed data;

inputting the processed data into a text information recognition model for text information recognition to obtain an information recognition result;

carrying out face recognition on data to be recognized to obtain face coordinate information;

sending the information identification result and the face coordinate information to a terminal so that the terminal can display the information identification result and the face coordinate information;

the text direction recognition model is obtained by training a first neural network by using text data with text direction labels as first training data;

the text region detection model is obtained by training a migrated Yolov3 neural network by using text data with text region labels as second training data;

the text information recognition model is obtained by training a migrated CRNN neural network by taking a text box picture with a text information label as third training data.

The further technical scheme is as follows: the text direction recognition model is obtained by training a first neural network by using text data with text direction labels as first training data, and comprises the following steps:

acquiring text data with a text direction label to obtain first sample data;

dividing the first sample data into first training data and first test data;

constructing a first neural network and a first loss function;

inputting first training data into a first neural network for convolution training to obtain a first training result;

calculating a first training result and a loss value of the text direction label by using a first loss function to obtain a first loss value;

judging whether the first loss value is kept unchanged;

if the first loss value is not maintained, adjusting parameters of the first neural network, and executing convolution training by inputting first training data into the first neural network to obtain a first training result;

if the first loss value is kept unchanged, inputting first test data into a first neural network for convolution test to obtain a first test result;

judging whether the first test result meets the condition;

if the first test result meets the condition, taking the first neural network as a text direction recognition model;

and if the first test result does not meet the condition, executing the adjustment of the parameters of the first neural network.

The further technical scheme is as follows: the obtaining of the text data with the text direction label to obtain the first sample data includes:

acquiring picture data with a text to obtain initial data;

performing data enhancement processing on the initial data to obtain secondary data;

and labeling the text direction label on the secondary data to obtain first sample data.

The further technical scheme is as follows: the text region detection model is obtained by training a migrated YOLOv3 neural network by using text data with text region labels as second training data, and comprises the following steps:

acquiring text data with a text area label to obtain second sample data;

dividing the second sample data into second training data and second test data;

constructing a migrated YOLOv3 neural network and a second loss function;

inputting the second training data into the migrated YOLOv3 neural network for convolution training to obtain a second training result;

calculating the loss values of the second training result and the text region label by using a second loss function to obtain a second loss value;

judging whether the second loss value is kept unchanged;

if the second loss value is not maintained, adjusting parameters of the migrated YOLOv3 neural network, and performing convolution training by inputting second training data into the migrated YOLOv3 neural network to obtain a second training result;

if the second loss value is kept unchanged, inputting second test data into the migrated YOLOv3 neural network for convolution test to obtain a second test result;

judging whether the second test result meets the condition;

if the second test result meets the condition, taking the migrated YOLOv3 neural network as a text region detection model;

and if the second test result does not meet the condition, executing the adjustment of the parameters of the migrated YOLOv3 neural network.

The further technical scheme is as follows: the text information recognition model is obtained by training a migrated CRNN neural network by taking a text box picture with a text information label as third training data, and comprises the following steps:

acquiring a textbox picture of the text information label to obtain third sample data;

dividing the third sample data into third training data and third test data;

constructing a migrated CRNN neural network and a third loss function;

inputting the third training data into the migrated CRNN neural network for convolution training to obtain a third training result;

calculating a loss value of a third training result and the text information label by using a third loss function to obtain a third loss value;

judging whether the third loss value is kept unchanged;

if the third loss value is not maintained, adjusting parameters of the migrated CRNN neural network, and executing convolution training by inputting third training data into the migrated CRNN neural network to obtain a third training result;

if the third loss value is kept unchanged, inputting third test data into the migrated CRNN neural network for convolution test to obtain a third test result;

judging whether the third test result meets the condition;

if the third test result meets the condition, taking the migrated CRNN neural network as a text information identification model;

and if the third test result does not meet the condition, executing the adjustment of the parameters of the migrated CRNN neural network.

The further technical scheme is as follows: the migrated CRNN neural network adopts a bidirectional long-short term memory network, and the last layer of the migrated CRNN neural network is a full connection layer.

The further technical scheme is as follows: the method for carrying out face recognition on the data to be recognized to obtain face coordinate information comprises the following steps:

and adopting a face-recognition library to perform face recognition on the data to be recognized so as to obtain face coordinate information.

The invention also provides an identification card information recognition device, which comprises:

the image data acquisition unit is used for acquiring the image data of the identity card to be identified so as to form data to be identified;

the direction recognition unit is used for inputting the data to be recognized into the text direction recognition model for direction recognition so as to obtain a recognition result;

the direction adjusting unit is used for adjusting the direction of the data to be recognized according to the recognition result to obtain intermediate data;

the text region detection unit is used for inputting the intermediate data into the text region detection model to perform text region detection so as to obtain a detection result;

the cutting unit is used for cutting the intermediate data according to the detection result to obtain processed data;

the information identification unit is used for inputting the processed data into the text information identification model to carry out text information identification so as to obtain an information identification result;

the face recognition unit is used for carrying out face recognition on data to be recognized so as to obtain face coordinate information;

and the sending unit is used for sending the recognition result and the face coordinate information to a terminal so as to enable the terminal to display the recognition result and the face coordinate information.

The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.

The invention also provides a storage medium storing a computer program which, when executed by a processor, is operable to carry out the method as described above.

Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of recognizing the text direction of data to be recognized through a text direction recognition model, processing the data to be recognized according to a recognition result, then performing region detection through a text region detection model to obtain data only comprising a text region, finally performing text information recognition through the text information recognition model, and then realizing head portrait face detection of the identity card by utilizing a face-recognition library, so that the efficiency and the accuracy of identity card recognition are improved.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of an identification card information identification method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of an identification card information identification method according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of an identification card information recognition apparatus according to an embodiment of the present invention;

FIG. 4 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of an identification card information identification method according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of an identification card information identification method according to an embodiment of the present invention. The identification card information identification method is applied to a server, the server and a terminal carry out data interaction, the terminal inputs identification card picture data to be identified, the server identifies and analyzes the input data to obtain identification card text information, face coordinates and other information, and the obtained information is sent to the terminal for display.

Fig. 2 is a schematic flow chart of an identification card information identification method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S180.

S110, obtaining the image data of the identification card to be identified to form data to be identified.

In this embodiment, the data to be identified is the data of the identification card picture obtained by directly shooting through the terminal, and can be the data of the identification card picture obtained by shooting at any angle, and the data has no special requirements on the shooting angle and the like, and is high in use convenience.

The user inputs the picture data of the identity card from the terminal, and the data type supports common picture formats, such as: png, JPG, jpeg, JPG, and the like.

And S120, inputting the data to be recognized into the text direction recognition model for direction recognition to obtain a recognition result.

In this embodiment, the identification result is an inclination angle of the data to be identified or an angle of the identification card parallel to the horizontal line.

The text direction recognition model is obtained by training a first neural network by using text data with text direction labels as first training data; the recognition mode with high efficiency and high accuracy can be achieved by using the model formed by the convolutional neural network to recognize the text direction. In this embodiment, the first neural network comprises a migrated VGG16 neural network or a deep neural network.

In one embodiment, the text direction recognition model is obtained by training a first neural network using text data with text direction labels as first training data, and may include the following steps S120 a-S120 j.

And S120, 120a, acquiring the text data with the text direction label to obtain first sample data.

In this embodiment, the first sample data is identification card picture data inclined at different angles from the horizontal line, and the identification card picture data is labeled with a text direction label, where the text direction label is an angle required for rotating the identification card picture to be parallel to the horizontal line.

In one embodiment, the step S120a includes steps S120a 1-S120 a 3.

And S120a1, acquiring picture data with texts to obtain initial data.

In this embodiment, the initial data refers to the data of the identification card picture obtained by means of web crawler and terminal shooting.

Mode such as shoot through web crawler and terminal has obtained partial resident card picture data, the general noise of ID card picture data that web crawler obtained is great, mainly embody that picture resolution is lower, the part region is sheltered from or is fuzzy, the angle transformation is big and when shooing because the partial region highlight that the camera lens reflection of light leads to etc. the resident card picture data noise that the terminal was independently shot and is obtained is less relatively, for the data acquisition condition of user under the simulation real scene, adjust a plurality of angles, with distance between the camera lens and background region etc. shoot, the data format who obtains includes: jpg, png, etc.

And S120a2, performing data enhancement processing on the initial data to obtain secondary data.

In this embodiment, the secondary data refers to the resident identification card data formed after being enhanced by means such as different-angle rotation processing. The data set is expanded through a data enhancement means, and the specific enhancement means comprises the following steps: translation, clipping, color change, noise disturbance, and the like.

And S120a3, labeling text direction labels on the secondary data to obtain first sample data.

Specifically, the resident identification card picture data are enhanced through means such as rotation to obtain data of a part of different angles, the angle of rotation is all directions and is 90 degrees at random, and text direction labels are labeled, and the labeled text direction labels include: 0 degrees, 90 degrees, 180 degrees, 270 degrees, and the like.

S120, 120b, dividing the first sample data into first training data and first test data.

After the first sample data are divided into first training data and first test data, the first training data are used for training a first neural network, the training is carried out until the first neural network can output text directions meeting requirements, and then the first test data are used for verifying the trained first neural network, so that the whole first neural network can output text directions with accuracy meeting the requirements when being used as a text direction recognition model.

S120c, constructing a first neural network and a first loss function.

In this embodiment, the first neural network is the convolutional neural network VGG16, and VGG16 is an innovative model architecture in the field of image classification, includes 16 convolutional layers and a full-link layer, has the characteristics of high classification accuracy, simple model structure and the like, and is an excellent feature extractor, and the performance of the pre-training model of the feature extractor is verified on an ImageNet large-scale image data set. The method comprises the steps of further performing scene migration learning on an identity card text direction recognition model by using a pre-training model of VGG16 as a basic skeleton model, wherein the text direction recognition model is a 4-class problem, labels of the text direction recognition model are 0 degree, 90 degrees, 180 degrees and 270 degrees, the number of output neurons of the last layer of network structure of the text direction recognition model is set to be 4, further dividing first training data and first test data, wherein the dividing ratio is 9:1, adjusting a model hyper-parameter, and training to obtain an optimal text direction recognition model.

S120d, the first training data are input into the first neural network for convolution training, and a first training result is obtained.

In the present embodiment, the first training result includes an angle of the text direction, i.e., an angle at which the text data is rotated to a direction parallel to the horizontal line.

S120e, calculating a loss value of the first training result and the text direction label by using a first loss function to obtain a first loss value.

In this embodiment, the first loss value refers to a degree of fitting of the first training result to the text direction label.

Specifically, the degree of fitting between the first training result and the text direction label is calculated by using a loss function, which can also be regarded as the degree of difference.

And S120f, judging whether the first loss value is kept unchanged.

In this embodiment, when the first loss value remains unchanged, that is, the current first neural network has converged, that is, the first loss value is substantially unchanged and very small, it also indicates that the current first neural network can be used as a text direction recognition model, generally, the first loss value is larger when training is started, and the first loss value is smaller when training is performed later, and if the first loss value does not remain unchanged, it indicates that the current first neural network cannot be used as a text direction recognition model, that is, the estimated text direction is not accurate, which may result in inaccurate text information recognition in a later period.

S120g, if the first loss value is not maintained, adjusting parameters of the first neural network, and performing convolution training by inputting the first training data into the first neural network, so as to obtain a first training result.

In this embodiment, adjusting the parameter of the first neural network refers to adjusting the weight value of each layer in the first neural network. By continuous training, a first neural network meeting the requirements can be obtained.

And S120h, if the first loss value is kept unchanged, inputting the first test data into the first neural network for convolution test to obtain a first test result.

In this embodiment, the first test result refers to a text direction corresponding to the first test data obtained after the text direction identification is performed on the first test data.

S120i, judging whether the first test result meets the condition;

s120j, if the first test result meets the condition, taking the first neural network as a text direction recognition model;

if the first test result does not meet the condition, the step S120g is executed.

When the two indexes of the precision and the recall rate of the first test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the first test result can be considered to be in accordance with the requirements; otherwise, the first test result is considered to be not qualified. The training is stopped when the first neural network converges. And testing the first neural network after the first neural network is trained, and if the first test result is not good, adjusting a training strategy to train the first neural network again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and after the test is finished after the first neural network is trained, the execution accuracy of the whole first neural network is evaluated by using two indexes of precision and recall rate.

And S130, adjusting the direction of the data to be recognized according to the recognition result to obtain intermediate data.

In this embodiment, the intermediate data refers to the identification card picture data parallel to the horizontal line formed by rotating and the like according to the text direction output by the text direction recognition model.

And S140, inputting the intermediate data into a text region detection model to perform text region detection so as to obtain a detection result.

In this embodiment, the detection result refers to the text box information where the text content is located, and includes the length and width of the text region, the center point coordinate, and the inclination angle.

Wherein the text region detection model is obtained by training the migrated YOLOv3 neural network by using text data with text region labels as second training data. The text region identification is carried out by utilizing the model formed by the convolutional neural network, and the identification mode with high efficiency and high accuracy can be achieved.

In one embodiment, the text region detection model is obtained by training the migrated YOLOv3 neural network with text data with text region labels as the second training data, and may include steps S140a to S140 j.

S140a, obtaining the text data with the text region label to obtain second sample data.

In this embodiment, the second sample data is text data that is parallel to the horizontal line and has text region tags with tag data formats txt and xml, respectively, that is, id card picture data.

Specifically, data formed by labeling a text region label on the second sample data after the rotation processing of the recognition result output by the text direction recognition model may be used as the second sample data. Carrying out text box region marking on the acquired resident identification card data, and defining a text region on the following parts according to the requirements in a service scene: the marking tool is improved labelImg, so that the length and width of a text area, the coordinate of a central point and the inclination angle can be automatically calculated in the marking process.

S140b, dividing the second sample data into second training data and second test data.

After the second sample data is divided into second training data and second test data, the second training data is used for training the migrated YOLOv3 neural network, the migrated YOLOv3 neural network is trained to be capable of outputting text regions meeting requirements, and then the second test data is used for verifying the trained migrated YOLOv3 neural network, so that the whole migrated YOLOv3 neural network can output text region information with accuracy meeting the requirements when being used as a text region detection model.

S140c, constructing a migrated YOLOv3 neural network and a second loss function.

In this embodiment, the migrated YOLOv3 neural network adopts a convolutional neural network YOLOv3, and YOLOv3 is a model of a milestone in the field of target detection, and has the characteristics of high detection accuracy, high speed and the like, and the scenic target detection requirement can be realized by less business data. And (2) further migrating and training a scenized identity card text region detection model by using a YOLOv3 pre-training model, and further improving the network model, namely increasing the convolutional layer to 74 layers to improve the detection precision, wherein the text region detection model is a two-classification problem, a target text region is in one class, other regions are in the other class, second training data and second test data are further divided, wherein the division ratio is 9:1, reasonable-adjustment hyper-parameters are set, and the optimal text region detection model is obtained through training.

And S140d, inputting the second training data into the migrated YOLOv3 neural network for convolution training to obtain a second training result.

In this embodiment, the second training result includes the length and width of the text region, the center point coordinates, and the tilt angle.

S140e, calculating a loss value of the second training result and the text region label by using the second loss function to obtain a second loss value.

In this embodiment, the second loss value refers to a degree of fitting of the second training result to the text region label.

Specifically, the fitting degree between the second training result and the text direction label is calculated by using a second loss function, which can also be regarded as a difference degree.

S140f, whether the second loss value is kept unchanged is judged.

In this embodiment, when the second loss value remains unchanged, that is, the current migrated YOLOv3 neural network has converged, that is, the second loss value is substantially unchanged and very small, it also indicates that the current migrated YOLOv3 neural network can be used as a text region detection model, generally, the second loss value is larger at the beginning of training, and the second loss value is smaller as training progresses, if the second loss value does not remain unchanged, it indicates that the current migrated YOLOv3 neural network cannot be used as a text region detection model, that is, the estimated text region information is not accurate, which may result in inaccurate text information recognition in the later period.

S140g, if the second loss value is not maintained, adjusting parameters of the migrated YOLOv3 neural network, and performing convolution training by inputting the second training data into the migrated YOLOv3 neural network, so as to obtain a second training result.

In this embodiment, adjusting the parameters of the migrated YOLOv3 neural network refers to adjusting the weight values of each layer in the migrated YOLOv3 neural network. By continuous training, a migrated YOLOv3 neural network meeting the requirements can be obtained.

And S140h, if the second loss value is kept unchanged, inputting the second test data into the migrated YOLOv3 neural network for convolution test to obtain a second test result.

In this embodiment, the second test result refers to that after the text region detection is performed on the second test data, the text region information corresponding to the second test data can be obtained.

S140, 140i, judging whether the second test result meets the condition;

s140j, if the second test result meets the condition, taking the migrated YOLOv3 neural network as a text region detection model;

When the two indexes of the precision and the recall rate of the second test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the second test result can be considered to be in accordance with the requirements; otherwise, the second test result is considered to be not qualified. The migrated YOLOv3 neural network stopped training when converged. Testing the migrated Yolov3 neural network after the migrated Yolov3 neural network is trained, and if the second test result is not good, adjusting the training strategy to train the migrated Yolov3 neural network again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and the test after the training of the migrated YOLOv3 neural network is completed evaluates the execution accuracy of the entire migrated YOLOv3 neural network by using two indexes of precision and recall rate.

And S150, cutting the intermediate data according to the detection result to obtain processed data.

In this embodiment, the processed data refers to the content of the text region left only after the intermediate data is cut.

And S160, inputting the processed data into a text information identification model for text information identification to obtain an information identification result.

In this embodiment, the information identification result refers to specific contents of a text, including name, gender, ethnicity, birth, address, and a national identification number.

The text information recognition model is obtained by training a migrated CRNN neural network by taking a text box picture with a text information label as third training data, and the text information recognition is carried out by utilizing a model formed by a convolutional neural network, so that a recognition mode with high efficiency and high accuracy can be achieved.

In an embodiment, the text information recognition model is obtained by training the migrated CRNN neural network using the text box picture with the text information label as the third training data, and may include the following steps S160a to S160 j.

S160a, obtaining a textbox picture of the text information label to obtain third sample data.

In this embodiment, the third sample data refers to a text box picture only including a text region, and specifically, the detection result output by the text region detection model may be cut to form the third sample data, so as to collect training data of the text recognition model.

S160b, dividing the third sample data into third training data and third test data.

And constructing a text word stock through text deduplication processing, and randomly adding different noise information with a specified proportion into text box data obtained by segmentation in order to improve the anti-noise capability of a text recognition model, wherein the noise information comprises: gaussian noise, poisson noise, salt and pepper noise, etc.; the third training data and the third test data are further divided, wherein the division ratio is 9: 1. And the picture of one text area corresponds to one txt file, and the content of the txt file is the text content of the picture of the text area.

After the third sample data is divided into third training data and third test data, the third training data is used for training the migrated CRNN, the text information which meets the requirements can be output by the trained migrated CRNN, and then the third test data is used for verifying the trained migrated CRNN so as to ensure that the text information which meets the requirements in accuracy can be output when the whole migrated CRNN serves as a text information recognition model.

S160c, constructing a migrated CRNN neural network and a third loss function.

In this embodiment, the migrated CRNN neural network performs training of an identification card text recognition model using a convolutional recurrent neural network, the convolutional layer of the designed convolutional recurrent neural network model is 6 layers, the convolutional recurrent neural network adopts a bidirectional LSTM (Long Short-Term Memory network), the last layer of network is a fully-connected layer, relevant training evaluation indexes are set, the accuracy of a test set is determined, training is stopped when the accuracy of the test set reaches an expected value, and finally the identification card text information recognition model is obtained. Convolutional recurrent neural networks have some unique advantages over traditional neural network models: the method has the same properties as DCNN when information is directly learned from image data, does not need manual features or preprocessing steps including binarization, segmentation, component positioning and the like, and has the same properties as RNN (recurrent neural Network), and a series of labels can be generated; thirdly, the length of the class sequence object is not restricted, and the height is only required to be normalized in a training stage and a testing stage; a better or more competitive performance is obtained on scene text such as character recognition.

S160d, inputting the third training data into the migrated CRNN neural network for convolution training to obtain a third training result.

In this embodiment, the third training result includes text information content, and is precisely matched to field information such as name, gender, ethnicity, birth, address, and national identification number.

S160e, calculating a loss value of the third training result and the text information label by using a third loss function to obtain a third loss value.

In this embodiment, the third loss value refers to a degree of fitting of the third training result to the text information label.

Specifically, the fitting degree between the third training result and the text information label is calculated by using a third loss function, which may also be regarded as a difference degree.

S160f, determining whether the third loss value remains unchanged.

In this embodiment, when the third loss value remains unchanged, that is, the current migrated CRNN neural network has converged, that is, the third loss value is substantially unchanged and very small, it also indicates that the current migrated CRNN neural network can be used as the text information recognition model, generally, the third loss value is larger at the beginning of training, and the third loss value is smaller as training progresses, and if the third loss value does not remain unchanged, it indicates that the current migrated CRNN neural network cannot be used as the text information recognition model, that is, the estimated text information is not accurate.

S160g, if the third loss value is not maintained, adjusting parameters of the migrated CRNN neural network, and performing convolution training by inputting the third training data into the migrated CRNN neural network, so as to obtain a second training result.

In this embodiment, adjusting the parameters of the migrated CRNN neural network refers to adjusting the weight values of each layer in the migrated CRNN neural network. And through continuous training, a migrated CRNN neural network meeting the requirement can be obtained.

And S160h, if the third loss value is kept unchanged, inputting the third test data into the migrated CRNN neural network for convolution test to obtain a third test result.

In this embodiment, the third test result indicates that after the text region detection is performed on the third test data, the text region information corresponding to the third test data can be obtained.

S160i, judging whether the third test result meets the condition;

s160, 160j, if the third test result meets the condition, taking the migrated CRNN neural network as a text information recognition model;

if the third test result does not meet the condition, the step S150g is executed.

When the two indexes of the precision and the recall rate of the third test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the third test result can be considered to be in accordance with the requirements; otherwise, the third test result is considered to be not qualified. The migrated CRNN neural network stops training when it converges. Testing the migrated CRNN after the migrated CRNN is trained, and if the third test result is not good, adjusting the training strategy to train the migrated CRNN again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and the testing after the training of the migrated CRNN is finished, and the execution accuracy of the whole migrated CRNN is evaluated by using two indexes of precision and recall rate.

And S170, carrying out face recognition on the data to be recognized to obtain face coordinate information.

In this embodiment, the face coordinate information refers to coordinate information of a face avatar located in the data to be recognized.

Specifically, a face _ recognition library is adopted to perform face recognition on data to be recognized so as to obtain face coordinate information.

The face-recognition library is used for realizing the head portrait face detection and the original face coordinate return of the identity card, the face-recognition is a powerful and easily accessible open source project for face detection and recognition, a large-scale open source face data set of Labeled Faces in the Wild is used for testing based on a deep learning model in a C + + open source library dlib leading in the industry, the extremely high accuracy of 99.38% is achieved, and the face-recognition method is introduced into the method of the embodiment, so that the high-precision and rapid head portrait face detection of the identity card can be realized.

And S180, sending the information recognition result and the face coordinate information to a terminal so that the terminal can display the information recognition result and the face coordinate information.

The user inputs the picture data of the identity card from the terminal, and the data type supports common picture formats, such as: png, JPG, jpeg, JPG and the like, after data processing, the text direction recognition model firstly recognizes the text direction of the picture, and adjusts the corresponding angle, the adjusted ID card picture enters the text region detection model, the detected text region information is output, the post-processed text region information enters the text information recognition model in parallel, and an information recognition result is output, noise data is filtered by using the methods of rules and the like, the field information of name, gender, ethnicity, birth, address, citizen identity card number and the like is accurately matched, finally, the face _ recognosis library is called to realize the head portrait face detection of the ID card, and the coordinates of the original face are returned. The method can realize automatic high-precision and high-efficiency identification of the resident identification card and face detection of the head portrait of the identification card from end to end.

According to the identification card information identification method, the text direction of the data to be identified is identified through the text direction identification model, the data to be identified is processed according to the identification result, then the region detection is carried out through the text region detection model to obtain the data only including the text region, finally the text information identification is carried out through the text information identification model, the face detection of the head portrait of the identification card is realized through the face _ recognition library, and the identification card identification efficiency and accuracy are improved.

Fig. 3 is a schematic block diagram of an identification card information identification apparatus 300 according to an embodiment of the present invention. As shown in fig. 3, the present invention also provides an identification card information recognition apparatus 300 corresponding to the above identification card information recognition method. The identification card information recognition apparatus 300 includes a unit for performing the above-described identification card information recognition method, and the apparatus may be configured in a server. Specifically, referring to fig. 3, the identification card information recognition apparatus 300 includes a picture data acquisition unit 301, a direction recognition unit 302, a direction adjustment unit 303, a text region detection unit 304, a segmentation unit 305, an information recognition unit 306, a face recognition unit 307, and a transmission unit 308.

A picture data acquiring unit 301, configured to acquire identity card picture data to be identified to form data to be identified; the direction recognition unit 302 is configured to input data to be recognized into the text direction recognition model for direction recognition, so as to obtain a recognition result; a direction adjusting unit 303, configured to perform direction adjustment on the data to be recognized according to the recognition result to obtain intermediate data; a text region detection unit 304, configured to input the intermediate data to a text region detection model for text region detection to obtain a detection result; a cutting unit 305, configured to cut the intermediate data according to the detection result to obtain processed data; the information identification unit 306 is used for inputting the processed data into a text information identification model for text information identification so as to obtain an information identification result; a face recognition unit 307, configured to perform face recognition on data to be recognized to obtain face coordinate information; a sending unit 308, configured to send the recognition result and the face coordinate information to the terminal, so that the terminal displays the recognition result and the face coordinate information.

Specifically, the face recognition unit 307 is configured to perform face recognition on the data to be recognized by using a face _ recognition library to obtain face coordinate information.

In an embodiment, the apparatus further includes: the first building unit is used for training a first neural network by using the text data with the text direction labels as first training data to obtain a text direction recognition model.

In an embodiment, the apparatus further includes: and the second construction unit is used for training the migrated Yolov3 neural network by using the text data with the text region labels as second training data to obtain a text region detection model.

In an embodiment, the apparatus further includes: and the third construction unit is used for training the migrated CRNN neural network by taking the text box picture with the text information label as third training data so as to obtain a text information recognition model.

In an embodiment, the first constructing unit includes a first sample obtaining subunit, a first dividing subunit, a first network constructing subunit, a first training subunit, a first calculating subunit, a first loss value judging subunit, a first adjusting subunit, a first testing subunit, and a first testing result judging subunit.

The first sample acquiring subunit is used for acquiring text data with a text direction label to obtain first sample data; the first dividing unit is used for dividing the first sample data into first training data and first test data; a first network construction subunit, configured to construct a first neural network and a first loss function; the first training subunit is used for inputting the first training data into the first neural network for convolution training to obtain a first training result; the first calculating subunit is used for calculating a first training result and a loss value of the text direction label by using a first loss function to obtain a first loss value; a first loss value judging subunit, configured to judge whether the first loss value remains unchanged; a first adjusting subunit, configured to adjust a parameter of the first neural network if the first loss value is not maintained, and perform convolution training by inputting first training data into the first neural network, so as to obtain a first training result; the first test subunit is used for inputting the first test data into the first neural network for convolution test to obtain a first test result if the first loss value is kept unchanged; a first test result judging subunit, configured to judge whether the first test result meets a condition; if the first test result meets the condition, taking the first neural network as a text direction recognition model; and if the first test result does not meet the condition, executing the adjustment of the parameters of the first neural network.

In an embodiment, the first sample acquiring subunit includes an initial data acquiring module, an enhancement processing module, and a labeling module.

The initial data acquisition module is used for acquiring picture data with texts to obtain initial data; the enhancement processing module is used for carrying out data enhancement processing on the initial data to obtain secondary data; and the marking module is used for marking the text direction label on the secondary data to obtain the first sample data.

In an embodiment, the second constructing unit includes a second sample obtaining subunit, a second dividing subunit, a second network constructing subunit, a second training subunit, a second calculating subunit, a second loss value judging subunit, a second adjusting subunit, a second testing subunit, and a second testing result judging subunit.

A second sample acquiring subunit, configured to acquire text data with a text region tag to obtain second sample data; the second dividing subunit is used for dividing the second sample data into second training data and second test data; a second network construction subunit, configured to construct a migrated YOLOv3 neural network and a second loss function; the second training subunit is used for inputting second training data into the migrated YOLOv3 neural network for convolution training to obtain a second training result; the second calculating subunit is configured to calculate a second training result and a loss value of the text region label by using a second loss function to obtain a second loss value; a second loss value judgment subunit, configured to judge whether the second loss value remains unchanged; a second adjusting subunit, configured to adjust a parameter of the migrated YOLOv3 neural network if the second loss value is not maintained, and perform convolution training by inputting second training data into the migrated YOLOv3 neural network, so as to obtain a second training result; the second testing subunit is configured to, if the second loss value remains unchanged, input second testing data into the migrated YOLOv3 neural network for a convolution test to obtain a second testing result; a second test result judging subunit, configured to judge whether the second test result meets a condition; if the second test result meets the condition, taking the migrated YOLOv3 neural network as a text region detection model; and if the second test result does not meet the condition, executing the adjustment of the parameters of the migrated YOLOv3 neural network.

In an embodiment, the third constructing unit includes a third sample obtaining subunit, a third dividing subunit, a third network constructing subunit, a third training subunit, a third calculating subunit, a third loss value judging subunit, a third adjusting subunit, a third testing subunit, and a third testing result judging subunit.

The third sample acquiring subunit is used for acquiring a text box picture of the text information label to obtain third sample data; a third dividing subunit, configured to divide the third sample data into third training data and third test data; the third network construction subunit is used for constructing the migrated CRNN neural network and a third loss function; the third training subunit is used for inputting third training data into the migrated CRNN neural network for convolution training to obtain a third training result; the third calculation subunit is used for calculating a loss value of the third training result and the text information label by using a third loss function so as to obtain a third loss value; a third loss value judgment subunit, configured to judge whether the third loss value remains unchanged; a third adjusting subunit, configured to adjust a parameter of the migrated CRNN neural network if the third loss value is not maintained, and perform convolution training by inputting third training data into the migrated CRNN neural network, so as to obtain a third training result; a third testing subunit, configured to, if the third loss value remains unchanged, input third testing data into the migrated CRNN neural network for performing a convolution test to obtain a third testing result; a third test result judging subunit, configured to judge whether the third test result meets a condition; if the third test result meets the condition, taking the migrated CRNN neural network as a text information identification model; and if the third test result does not meet the condition, executing the adjustment of the parameters of the migrated CRNN neural network.

It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation process of the identification card information identification apparatus 300 and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, no further description is provided herein.

The identification card information recognition apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032 include program instructions that, when executed, cause the processor 502 to perform a method of identification card information identification.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute an identification card information identification method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

acquiring identification card picture data to be identified to form data to be identified; inputting data to be recognized into a text direction recognition model for direction recognition to obtain a recognition result; adjusting the direction of the data to be recognized according to the recognition result to obtain intermediate data; inputting the intermediate data into a text region detection model to perform text region detection so as to obtain a detection result; cutting the intermediate data according to the detection result to obtain processed data; inputting the processed data into a text information recognition model for text information recognition to obtain an information recognition result; carrying out face recognition on data to be recognized to obtain face coordinate information; and sending the information identification result and the face coordinate information to a terminal so that the terminal displays the information identification result and the face coordinate information.

The text direction recognition model is obtained by training a first neural network by using text data with text direction labels as first training data; the text region detection model is obtained by training a migrated Yolov3 neural network by using text data with text region labels as second training data; the text information recognition model is obtained by training a migrated CRNN neural network by taking a text box picture with a text information label as third training data.

In an embodiment, when the processor 502 implements the step that the text direction recognition model is obtained by training the first neural network by using the text data with the text direction labels as the first training data, the following steps are implemented:

acquiring text data with a text direction label to obtain first sample data; dividing the first sample data into first training data and first test data; constructing a first neural network and a first loss function; inputting first training data into a first neural network for convolution training to obtain a first training result; calculating a first training result and a loss value of the text direction label by using a first loss function to obtain a first loss value; judging whether the first loss value is kept unchanged; if the first loss value is not maintained, adjusting parameters of the first neural network, and executing convolution training by inputting first training data into the first neural network to obtain a first training result; if the first loss value is kept unchanged, inputting first test data into a first neural network for convolution test to obtain a first test result; judging whether the first test result meets the condition; if the first test result meets the condition, taking the first neural network as a text direction recognition model; and if the first test result does not meet the condition, executing the adjustment of the parameters of the first neural network.

In an embodiment, when the processor 502 implements the step of obtaining the text data with the text direction tag to obtain the first sample data, the following steps are specifically implemented:

acquiring picture data with a text to obtain initial data; performing data enhancement processing on the initial data to obtain secondary data; and labeling the text direction label on the secondary data to obtain first sample data.

In an embodiment, when the processor 502 implements the step that the text region detection model is obtained by training the migrated YOLOv3 neural network with the text data with the text region labels as the second training data, the following steps are implemented:

acquiring text data with a text area label to obtain second sample data; dividing the second sample data into second training data and second test data; constructing a migrated YOLOv3 neural network and a second loss function; inputting the second training data into the migrated YOLOv3 neural network for convolution training to obtain a second training result; calculating the loss values of the second training result and the text region label by using a second loss function to obtain a second loss value; judging whether the second loss value is kept unchanged; if the second loss value is not maintained, adjusting parameters of the migrated YOLOv3 neural network, and performing convolution training by inputting second training data into the migrated YOLOv3 neural network to obtain a second training result; if the second loss value is kept unchanged, inputting second test data into the migrated YOLOv3 neural network for convolution test to obtain a second test result; judging whether the second test result meets the condition; if the second test result meets the condition, taking the migrated YOLOv3 neural network as a text region detection model; and if the second test result does not meet the condition, executing the adjustment of the parameters of the migrated YOLOv3 neural network.

In an embodiment, when the processor 502 implements the step that the text information recognition model is obtained by training the migrated CRNN neural network using a text box picture with a text information tag as third training data, the following steps are specifically implemented:

acquiring a textbox picture of the text information label to obtain third sample data; dividing the third sample data into third training data and third test data; constructing a migrated CRNN neural network and a third loss function; inputting the third training data into the migrated CRNN neural network for convolution training to obtain a third training result; calculating a loss value of a third training result and the text information label by using a third loss function to obtain a third loss value; judging whether the third loss value is kept unchanged; if the third loss value is not maintained, adjusting parameters of the migrated CRNN neural network, and executing convolution training by inputting third training data into the migrated CRNN neural network to obtain a third training result; if the third loss value is kept unchanged, inputting third test data into the migrated CRNN neural network for convolution test to obtain a third test result; judging whether the third test result meets the condition; if the third test result meets the condition, taking the migrated CRNN neural network as a text information identification model; and if the third test result does not meet the condition, executing the adjustment of the parameters of the migrated CRNN neural network.

The migrated CRNN neural network adopts a bidirectional long-short term memory network, and the last layer of the migrated CRNN neural network is a full connection layer.

In an embodiment, when implementing the step of performing face recognition on the data to be recognized to obtain face coordinate information, the processor 502 specifically implements the following steps:

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:

In an embodiment, when the processor executes the computer program to implement the step that the text direction recognition model is obtained by training the first neural network by using text data with text direction labels as first training data, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of obtaining text data with text direction labels to obtain first sample data, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of training the migrated YOLOv3 neural network by using text data with text region labels as second training data, the processor implements the following steps:

In an embodiment, when the processor executes the computer program to implement the step that the text information recognition model is obtained by training the migrated CRNN neural network using a text box picture with a text information tag as third training data, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of performing face recognition on the data to be recognized to obtain the face coordinate information, the following steps are specifically implemented:

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The identification card information identification method is characterized by comprising the following steps:

2. The method for identifying identity card information according to claim 1, wherein the text direction identification model is obtained by training a first neural network by using text data with text direction labels as first training data, and comprises:

acquiring text data with a text direction label to obtain first sample data;

dividing the first sample data into first training data and first test data;

constructing a first neural network and a first loss function;

judging whether the first loss value is kept unchanged;

judging whether the first test result meets the condition;

3. The method for identifying identity card information according to claim 2, wherein the obtaining text data with text direction labels to obtain first sample data comprises:

acquiring picture data with a text to obtain initial data;

4. The method for identifying ID card information of claim 1, wherein the text region detection model is obtained by training migrated Yolov3 neural network with text data with text region labels as second training data, and comprises:

acquiring text data with a text area label to obtain second sample data;

dividing the second sample data into second training data and second test data;

constructing a migrated YOLOv3 neural network and a second loss function;

judging whether the second loss value is kept unchanged;

judging whether the second test result meets the condition;

5. The identification card information recognition method of claim 1, wherein the text information recognition model is obtained by training a migrated CRNN neural network with a text box picture with a text information label as third training data, and comprises:

dividing the third sample data into third training data and third test data;

constructing a migrated CRNN neural network and a third loss function;

judging whether the third loss value is kept unchanged;

judging whether the third test result meets the condition;

6. The method for identifying ID card information of claim 5, wherein the migrated CRNN neural network employs a bidirectional long-short term memory network, and a last layer of the migrated CRNN neural network is a full connection layer.

7. The identification card information recognition method of claim 1, wherein the performing face recognition on the data to be recognized to obtain face coordinate information comprises:

8. ID card information identification device, its characterized in that includes:

9. A computer device, characterized in that the computer device comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program implements the method according to any of claims 1 to 7.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.