WO2019071660A1

WO2019071660A1 - Bill information identification method, electronic device, and readable storage medium

Info

Publication number: WO2019071660A1
Application number: PCT/CN2017/108735
Authority: WO
Inventors: 王健宗; 韩茂琨; 刘鹏; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2017-10-09
Filing date: 2017-10-31
Publication date: 2019-04-18
Also published as: CN107798299A; CN107798299B

Abstract

The present application relates to a bill information identification method, an electronic device, and a readable storage medium. The method comprises: determining, according to a predetermined mapping relationship between fields to be identified and region identification models, a corresponding region identification model for each field to be identified in a bill image; calling the corresponding region identification models to perform region identification on character line regions of the bill image; identifying, from the bill image, target boxes containing character information and having a fixed width of a predetermined value, and joining, according to an identification order, the target boxes containing character information in the same line to form a target character line region containing the character information; determining, according to a predetermined mapping relationship between the fields to be identified and character identification models, a corresponding character identification model for each field to be identified; and calling the corresponding character identification model to perform character identification for the target character line region of the field to be identified. The present application reduces the error rate when identifying bill information.

Description

Bill information identification method, electronic device and readable storage medium

The present application is based on the priority of the Chinese Patent Application entitled "Payment Information Identification Method, Electronic Device and Readable Storage Medium", filed on October 9, 2017, with the application number of CN201710930679.8, which is filed on October 9, 2017. The entire content is incorporated herein by reference.

Technical field

The present application relates to the field of computer technologies, and in particular, to a ticket information identification method, an electronic device, and a readable storage medium.

Background technique

Nowadays, with the development of the economy and the improvement of people's living standards, more and more people choose to purchase medical, commercial, financial and other insurance. In order to improve the user's insurance claims experience and improve the efficiency of insurance claims, some insurance companies have launched self-service claims services. For example, in the process of medical insurance claims, users only need to upload photos of outpatient or hospital bills to the insurance company system, insurance companies. The salesperson will enter the information on the picture uploaded by the user into the claim system for the next step. This self-service settlement method greatly facilitates the user's process of claim settlement. However, this self-service settlement method brings convenience. At the same time of the claims process, it increases the work pressure of the insurance company's business personnel. The problem is mainly caused by the need to spend a lot of manpower to process the image uploaded by the user, which is inefficient and the error rate of data entry is high.

Summary of the invention

The purpose of the present application is to provide a ticket information identification method, an electronic device, and a readable storage medium, which are intended to improve the efficiency of ticket information identification and reduce the error rate of ticket information recognition.

In order to achieve the above object, a first aspect of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a ticket information recognition system operable on the processor, and the ticket information The identification system implements the following steps when executed by the processor:

After receiving the picture of the ticket to be processed, determining a region recognition model corresponding to each field to be identified in the ticket image according to a predetermined mapping relationship between the field to be identified and the region identification model, and calling corresponding to each field to be identified The area recognition model performs area recognition on the line character area of the ticket picture, and identifies a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and the included character information is in the same line. The frames are spliced together in the order of recognition to form a target line character region containing character information;

Determining, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and calling a corresponding character recognition model for character recognition for each target character region of the to-be-identified field And respectively identifying character information included in a target line character region of each of the to-be-identified fields.

In addition, in order to achieve the above object, a second aspect of the present application provides a ticket information identification method, where the ticket information identification method includes:

Step 1: After receiving the picture of the bill to be processed, determining the area recognition model corresponding to each field to be identified in the ticket picture according to the mapping relationship between the predetermined field to be identified and the area identification model, for each field to be identified, Invoking a corresponding area recognition model to perform area recognition on the line character area of the ticket picture, identifying a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and placing the included character information in the same The target boxes of the rows are stitched together in the order of recognition. Forming a target line character area containing character information;

Step 2: Determine, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and call a corresponding character recognition model for each of the target line character regions of the to-be-identified field Character recognition is performed to respectively identify character information included in a target line character region of each of the to-be-identified fields.

Further, in order to achieve the above object, a third aspect of the present application provides a computer readable storage medium storing a ticket information identification system, the ticket information identification system being executable by at least one processor, Taking the at least one processor to perform the following steps:

The method for identifying the ticket information and the readable storage medium proposed by the present application, the area recognition model corresponding to each field to be identified in the ticket image is used to identify the area of each line to be recognized in the line character area of the ticket picture, and identify A small frame containing character information and a fixed width is a preset value, and the small boxes containing the character information in the same line are sequentially stitched to form a target line character area containing character information, and the character recognition model corresponding to the field to be identified is called. Character recognition is performed on the target line character area. Since the identified line character area containing the character information is the width of the unified fixed preset value, the character information can be specific to the smaller sub-area, and the sub-area containing the character information has a good approximation. In the target line character area when character recognition is performed by the character recognition model, there are much less interference factors than the character information, thereby reducing the error rate of the ticket information recognition.

DRAWINGS

1 is a schematic diagram of an operating environment of a preferred embodiment of the ticket information identification system 10 of the present application;

FIG. 2 is a schematic flowchart diagram of an embodiment of a method for identifying a bill information according to the present application.

Detailed ways

In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, when the combination of technical solutions is contradictory or impossible to achieve. It should be considered that the combination of such technical solutions does not exist and is not within the scope of protection claimed herein.

The application provides a ticket information identification system. Please refer to FIG. 1 , which is a schematic diagram of an operating environment of a preferred embodiment of the ticket information identification system 10 of the present application.

In the present embodiment, the ticket information identification system 10 is installed and operated in the electronic device 1. The electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a display 13. Figure 1 shows only the electronic device 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

The memory 11 comprises at least one type of readable storage medium, which in some embodiments may be an internal storage unit of the electronic device 1, such as a hard disk or memory of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc. Further, the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device. The memory 11 is used to store application software and various types of data installed in the electronic device 1, such as program codes of the ticket information recognition system 10, and the like. The memory 11 can also be used to temporarily store data that has been output or is about to be output.

The processor 12, in some embodiments, may be a central processing unit (CPU), a microprocessor or other data processing chip for running program code or processing data stored in the memory 11, for example The ticket information recognition system 10 and the like are executed.

The display 13 in some embodiments may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like. The display 13 is for displaying information processed in the electronic device 1 and a user interface for displaying visualization, such as a bill picture to be processed, recognized character information, and the like. The components 11-13 of the electronic device 1 communicate with one another via a system bus.

The ticket information identification system 10 includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement various embodiments of the present application.

Wherein, when the ticket information identification system 10 is executed by the processor 12, the following steps are implemented:

Step S1: After receiving the picture of the bill to be processed, determining, according to a predetermined mapping relationship between the field to be identified and the area identification model, an area recognition model corresponding to each field to be identified in the ticket picture, for each field to be identified, Invoking a corresponding area recognition model to perform area recognition on the line character area of the ticket picture, identifying a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and placing the included character information in the same The target frames of the rows are stitched together in the order of recognition to form a target line character region containing character information.

In this embodiment, the ticket information identification system 10 receives a bill picture of the to-be-identified processing sent by the user through the terminal device 2, and the bill picture includes a bill picture related to insurance, medical, financial, and the like, such as an outpatient or hospital bill picture. For example, receiving a picture of a ticket sent by a user on a client installed in a terminal device such as a mobile phone, a tablet computer, or a self-service terminal device, or receiving the user to send the message on a browser system in a terminal such as a mobile phone, a tablet computer, or a self-service terminal device. The picture of the bill coming.

A region identification model corresponding to the type of the field to be identified is pre-configured, for example, a first recognition model is pre-set for the text class field, and a second recognition model is preset for the digital class field, for the date/ The time class field is pre-set with a third recognition model, the fourth recognition model is pre-set for the currency class field, and so on. In this way, after receiving the picture of the ticket to be processed, according to a predetermined field to be identified (such as a text class field, a numeric class field, a date/time class field, a currency class field, etc.) Determining, by the mapping relationship with the area identification model, an area identification model corresponding to each of the to-be-identified fields, and calling, for each of the to-be-identified fields, a corresponding area recognition model to perform area recognition on the line character area of the ticket picture. A small frame that contains character information and has a fixed width of a preset value (for example, 16 pixels width) is identified on the note image, and the small frame containing the character information in the same line is stitched together in order. A target line character area containing character information is formed. Wherein, when determining the area recognition model corresponding to each to-be-identified field, the method may include:

A1. After receiving the picture of the bill to be processed, the pre-trained bill picture recognition model is used to identify the bill type in the received picture, and output the identification result of the bill category (for example, the category of the medical bill includes the outpatient bill, Hospitalization bills, as well as other types of notes).

A2: Performing a tilt correction on the received ticket image by using a predetermined correction rule; in an optional implementation manner, the predetermined correction rule is: using a Hough probability algorithm to find the ticket As many small straight lines as possible in the image; all straight lines are determined from the found small straight lines, and the straight lines whose x coordinate values are not much different in the determined straight line are sequentially connected in the order of the corresponding y coordinate values. According to the size of the x coordinate value, it is divided into several classes, or the straight lines whose y coordinate values are not much different in the determined straight line are sequentially connected in the order of the corresponding x coordinate values, and are classified into several classes according to the size of the y coordinate value; All horizontal lines belonging to a class are used as a target class line, and the longest line closest to each target class line is found by least squares method; the slope of each long line is calculated, and the median of the slopes of each long line is calculated. Mean, compare the median and mean of the calculated slope to determine the smaller one, and adjust the image tilt according to the smaller one to determine the received bill No inclination correction is normal piece of the picture.

A3. Determine, according to a mapping relationship between the predetermined ticket category and the to-be-identified field, a field to be identified corresponding to the identified ticket category;

A4. Determine, according to a predetermined mapping relationship between the to-be-identified field and the area recognition model, an area recognition model corresponding to each of the to-be-identified fields.

In an optional implementation manner, the area recognition model is a convolutional neural network model, and the training process for the area recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number (for example, 100,000) of bill picture samples for the to-be-identified field;

C2, setting a second preset number (for example, 10) of different aspect ratios and setting a fixed width to a preset value on each of the first bill number samples (for example, 16 pixels) For example, a small frame of 16 pixels wide);

C3. Mark a small frame containing part or all of the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, a first preset ratio (for example, 80%) of the ticket picture samples as the sample picture to be trained, and remaining the first training set and the second training set. a sample of the bill image as a sample image to be verified;

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to a preset threshold (for example, 98%), the training is completed, or if the verification pass rate is less than the preset threshold, increase the number of ticket picture samples, and repeat steps C2, C3, and C4. , C5, C6.

Step S2: determining, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and target line characters for each of the to-be-identified fields The area is called by the corresponding character recognition model for character recognition to respectively identify the character information included in the target line character area of each of the to-be-identified fields.

In this embodiment, after the target line character area of each to-be-identified field is identified by using the area recognition model, the character recognition corresponding to each of the to-be-identified fields may be determined according to a predetermined mapping relationship between the to-be-identified field and the character recognition model. a model, in response to the identified target line character regions of each of the to-be-identified fields, calling a corresponding character recognition model for character recognition to respectively identify character information included in a target line character region of each of the to-be-identified fields, completing the entire The character information of the ticket picture is identified.

In an optional implementation manner, the character recognition model is a Long-Short Term Memory (LSTM), and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number (for example, 100,000) of ticket picture samples, where the ticket picture sample contains only one line of character information of the to-be-identified field, the font is black, the background is white, and each ticket is The name of the picture sample is named as the character information of the field to be identified contained therein;

D2. The bill picture samples are divided into a first data set and a second data set according to a ratio of X:Y (for example, 8:2), and the number of bill picture samples in the first data set is larger than the bill picture sample in the second data set. Quantity, the first data set as a training set, and the second data set as a test set, where X is greater than 0 and Y is greater than 0;

D3. The sample of the bill image in the first data set is sent to the time recurrent neural network model for model training, and the second data is used for the trained model every certain period of time or a preset number of iterations (for example, every 1000 iterations). The set is tested to evaluate the effect of the currently trained model. During the test, the trained model is used to identify the character information of the ticket image sample in the second data set, and compares with the name of the tested ticket picture sample to calculate the error of the recognition result and the labeling result, and the error calculation uses the editing distance. As a calculation standard. If the training model obtains divergence of the character information recognition error of the ticket picture sample during the test, the training parameters are adjusted and retrained, so that the error of the character information recognition of the ticket picture sample can be converged during the training. After the error converges, the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.

Compared with the prior art, in this embodiment, the area identification model corresponding to each to-be-identified field in the ticket picture performs area identification on each line character area in the ticket picture, and identifies the character information and the fixed width. a small frame of preset values, and the small boxes containing the character information in the same row are sequentially stitched to form a target line character region containing character information, and the character recognition model corresponding to the field to be recognized is called to the target line character region. Perform character recognition. Since the identified line character area containing the character information is the width of the unified fixed preset value, the character information can be specific to the smaller sub-area, and the sub-area containing the character information has a good approximation. In the target line character area when character recognition is performed by the character recognition model, there are much less interference factors than the character information, thereby reducing the error rate of the ticket information recognition.

In an optional embodiment, based on the embodiment of FIG. 1 above, the ticket picture recognition model is a deep convolutional neural network model (eg, the deep convolutional neural network model may be in a CaffeNet environment) The selected deep convolutional neural network SSD (Single Shot MultiBox Detector) algorithm model, the deep convolutional neural network model consists of one input layer, 13 convolutional layers, 5 pooling layers, 2 fully connected layers, One classification layer is formed. The detailed structure of the deep convolutional neural network model is shown in Table 1 below:

Table 1

Where: Layer Name indicates the name of each layer, Input indicates the input layer, Conv indicates the convolution layer of the model, Conv1 indicates the first convolution layer of the model, MaxPool indicates the maximum pooling layer of the model, and MaxPool1 indicates the model. 1 maximum pooling layer, Fc represents the fully connected layer in the model, Fc1 represents the first fully connected layer in the model, Softmax represents the Softmax classifier; Batch Size represents the number of input images of the current layer; Kernel Size represents the current layer volume The scale of the kernel (for example, the Kernel Size can be equal to 3, indicating that the scale of the convolution kernel is 3x3); the Stride Size indicates the moving step size of the convolution kernel, that is, the distance moved to the next convolution position after one convolution is completed. ; Pad Size indicates the size of the image fill in the current network layer. It should be noted that the pooling mode of the pooling layer in this embodiment includes, but is not limited to, Mean pooling, Max pooling, Overlapping, L2pooling, Local Contrast. Normalization, Stochasticpooling, Def-pooling, and more.

The training process of the ticket picture recognition model is as follows:

B1. Prepare a preset number (for example, 1000 sheets) of bill pictures marked with corresponding bill categories for each preset bill category (for example, the preset bill category may include two types of outpatient bills and hospital bills) Sample; in this embodiment, before the training, the ticket picture sample is also processed as follows:

According to the aspect ratio information and the position of the seal, the transposition of the bill picture is determined, and the flip adjustment is made: when the aspect ratio is greater than 1, the height and width of the bill picture are reversed, and if the stamp position is on the left side of the bill picture, the bill is The image is rotated clockwise by ninety degrees. If the stamp position is on the right side of the bill image, the bill image is rotated counterclockwise by ninety degrees. When the aspect ratio is less than 1, the bill image height and width are not reversed. The position is on the lower side of the ticket picture, and the ticket image is rotated clockwise by one hundred and eighty degrees.

Find out the data with serious problems, such as the missing or beyond the entire image range, and the markedly located data in the center of the ticket, such as the wrongly marked data, to clean up the data to ensure that the data is accurate.

Correcting the inverted annotation data, the annotation data of each object refers to the position information of the rectangular frame of the object, and the coordinates of the upper left corner of the rectangle (xmin, ymin) and the coordinates of the lower right corner (xmax, Ymax) indicates four numbers. If xmax < xmin, reverse the position and do the same for the y coordinate to ensure max>min.

In this way, it is ensured that the sample picture of the ticket for the model training is a picture of the ticket whose height and width are not reversed and marked accurately, so as to facilitate the subsequent model training to be more accurate and effective.

B2, dividing the ticket picture sample corresponding to each preset ticket category into a training subset of a first ratio (for example, 80%) and a verification subset of a second ratio (for example, 20%), and respectively The ticket picture samples are mixed to obtain a training set, and the ticket picture samples in the respective verification subsets are mixed to obtain a verification set;

B3. Training the ticket picture recognition model by using the training set;

B4. Using the verification set to verify the accuracy of the ticket picture recognition model of the training, if the accuracy rate is greater than or equal to the preset accuracy rate, the training ends, or if the accuracy rate is less than the preset accuracy rate, then each one is added. The number of ticket picture samples corresponding to the preset ticket category is re-executed, and steps B2, B3, and B4 are re-executed.

As shown in FIG. 2, FIG. 2 is a schematic flowchart of a method for identifying a ticket information according to an embodiment of the present invention. The method for identifying a ticket information includes the following steps:

Step S10: After receiving the picture of the ticket to be processed, determining, according to the mapping relationship between the field to be identified and the area identification model, the area recognition model corresponding to each field to be identified in the ticket picture, for each field to be identified, Invoking a corresponding area recognition model to perform area recognition on the line character area of the ticket picture, identifying a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and placing the included character information in the same The target frames of the rows are stitched together in the order of recognition to form a target line character region containing character information.

A region identification model corresponding to the type of the field to be identified is pre-configured, for example, a first recognition model is pre-set for the text class field, and a second recognition model is preset for the digital class field, for the date/ The time class field is pre-set with a third recognition model, the fourth recognition model is pre-set for the currency class field, and so on. In this way, after receiving the picture of the ticket to be processed, determining according to a predetermined mapping relationship between the to-be-identified field (such as a text class field, a numeric class field, a date/time class field, a currency class field, and the like) and the region recognition model. An area recognition model corresponding to each of the to-be-identified fields, For each of the to-be-identified fields, the corresponding area recognition model is called to perform area recognition on the line character area of the ticket picture, and the character information is recognized from the ticket picture and the fixed width is a preset value (for example, 16 pieces) The small frame of the pixel width is the target frame, and the small boxes containing the character information in the same line are stitched together in order to form a target line character region containing character information. Wherein, when determining the area recognition model corresponding to each to-be-identified field, the method may include:

Step S20: Determine, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and call a corresponding character recognition model for each of the target line character regions of the to-be-identified field. Character recognition to identify each of the to-be-identified words The character information contained in the target line character area of the segment.

In an optional embodiment, based on the foregoing embodiment, the ticket picture recognition model is a deep convolutional neural network model (for example, the deep convolutional neural network model may be selected based on a CaffeNet environment) Deep Spool (Single Shot MultiBox Detector) algorithm model, the deep convolutional neural network model consists of 1 input layer, 13 convolutional layers, 5 pooling layers, 2 fully connected layers, 1 classification Layer composition. The detailed structure of the deep convolutional neural network model is shown in Table 1 below:

Table 1

The training process of the ticket picture recognition model is as follows:

B1, for each preset ticket category (for example, the preset ticket category may include two types of outpatient bills and hospital bills), preparing a preset number (for example, 1000 sheets) of bill picture samples marked with corresponding bill categories; In the example, before the training, the bill image sample is processed as follows:

B3. Training the ticket picture recognition model by using the training set;

Moreover, the present application also provides a computer readable storage medium storing a ticket information identification system, the ticket information identification system being executable by at least one processor to cause the at least one processor The specific implementation process of the steps S10, S20, and S30 of the ticket information identification method is as described above, and is not described here.

It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device comprising a series of elements includes those elements. It also includes other elements that are not explicitly listed, or elements that are inherent to such a process, method, article, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

The preferred embodiments of the present application have been described above with reference to the drawings, and are not intended to limit the scope of the application. The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Also, although at The logical order is shown in the flowchart, but in some cases the steps shown or described may be performed in a different order than the ones described herein.

A person skilled in the art can implement the present application in various variants without departing from the scope and spirit of the present application. For example, the features of one embodiment can be used in another embodiment to obtain another embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the application should be within the scope of the application.

Claims

An electronic device, comprising: a memory, a processor, on the memory, a ticket information recognition system operable on the processor, wherein the ticket information recognition system is The following steps are implemented during execution:

After receiving the picture of the ticket to be processed, determining a region recognition model corresponding to each field to be identified in the ticket image according to a predetermined mapping relationship between the field to be identified and the region identification model, and calling corresponding to each field to be identified The area recognition model performs area recognition on the line character area of the ticket picture, and identifies a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and the included character information is in the same line. The frames are spliced together in the order of recognition to form a target line character region containing character information;

Determining, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and calling a corresponding character recognition model for character recognition for each target character region of the to-be-identified field And respectively identifying character information included in a target line character region of each of the to-be-identified fields.
The electronic device according to claim 1, wherein the determining the region recognition model corresponding to each to-be-identified field in the ticket image according to the mapping relationship between the predetermined to-be-identified field and the region recognition model comprises:

A1, using a pre-trained ticket picture recognition model to identify the ticket type of the received ticket picture, and outputting the recognition result of the ticket category;

A2, using a predetermined correction rule to perform tilt correction on the received bill image;

A3. Determine, according to a mapping relationship between the predetermined ticket category and the to-be-identified field, a field to be identified corresponding to the identified ticket category;

A4. Determine, according to a predetermined mapping relationship between the to-be-identified field and the area recognition model, an area recognition model corresponding to each of the to-be-identified fields.
The electronic device according to claim 1, wherein the region recognition model is a convolutional neural network model, and the training process for the region recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number of bill picture samples for the to-be-identified field;

C2, setting a second preset number of different height-to-width ratios and a fixed width to a preset value on each of the plurality of ticket image samples;

C3. Mark a small frame containing the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as the sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as the to-be-verified Sample picture

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to the preset threshold, the training is completed, or if the verification pass rate is less than the preset threshold, the number of ticket picture samples is increased, and steps C2, C3, C4, C5, and C6 are repeatedly executed.
The electronic device according to claim 2, wherein the region recognition model is a convolutional neural network model, and the training process for the region recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number of bill picture samples for the to-be-identified field;

C2, setting a second preset number of different height-to-width ratios and a fixed width to a preset value on each of the plurality of ticket image samples;

C3. Mark a small frame containing the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as the sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as the to-be-verified Sample picture

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to the preset threshold, the training is completed, or if the verification pass rate is less than the preset threshold, the number of ticket picture samples is increased, and steps C2, C3, C4, C5, and C6 are repeatedly executed.
The electronic device according to claim 1, wherein the character recognition model is a time recurrent neural network model LSTM, and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number of ticket picture samples, where the ticket picture sample only contains one line of character information of the to-be-identified field, and name each ticket picture sample as the included identification field. Character information;

D2, dividing the bill picture sample into a first data set and a second data set according to a ratio of X:Y, wherein the number of bill picture samples in the first data set is greater than the number of bill picture samples in the second data set, and the first data set is As a training set, the second data set is used as a test set, where X is greater than 0 and Y is greater than 0;

D3, sending the sample of the bill image in the first data set to a preset time recurrent neural network model for model training, using the trained model to view the bill image in the second data set every preset time or a preset number of iterations The sample performs character information recognition, and compares the character information recognition result with the name of the tested ticket picture sample to calculate the error of the character information recognition result; if the trained model scatters the error of the character information recognition of the ticket picture sample Then, the preset training parameters are adjusted and the model is retrained until the error converges, and the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.
The electronic device according to claim 2, wherein the character recognition model is a time recurrent neural network model LSTM, and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number of ticket picture samples, where the ticket picture sample only contains one line of character information of the to-be-identified field, and name each ticket picture sample as the included identification field. Character information;

D2, dividing the bill picture sample into a first data set and a second data set according to a ratio of X:Y, wherein the number of bill picture samples in the first data set is greater than the number of bill picture samples in the second data set, and the first data set is As a training set, the second data set is used as a test set, where X is greater than 0 and Y is greater than 0;

D3, sending the sample of the bill image in the first data set to a preset time recurrent neural network model for model training, using the trained model to view the bill image in the second data set every preset time or a preset number of iterations The sample performs character information recognition, and compares the character information recognition result with the name of the tested ticket picture sample to calculate the error of the character information recognition result; if the trained model scatters the error of the character information recognition of the ticket picture sample Then, the preset training parameters are adjusted and the model is retrained until the error converges, and the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.
The electronic device according to claim 2, wherein the bill picture recognition model is a deep convolutional neural network model, and the deep convolutional neural network model comprises one input layer, 13 convolution layers, and 5 pools. The layer, the two fully connected layers, and one sorting layer are formed; the training process of the bill picture recognition model is as follows:

S1, preparing a preset number of bill picture samples marked with corresponding bill categories for each preset bill category;

S2, dividing the ticket picture sample corresponding to each preset ticket category into a training subset of the first ratio and a verification subset of the second ratio, mixing the ticket picture samples in each training subset to obtain a training set, and The sample of the ticket pictures in each verification subset is mixed to obtain a verification set;

S3. Train the ticket picture recognition model by using the training set;

S4. Using the verification set to verify the accuracy of the ticket picture recognition model of the training, if the accuracy rate is greater than or equal to the preset accuracy rate, the training ends, or if the accuracy rate is less than the preset accuracy rate, then each one is added. The number of ticket picture samples corresponding to the preset ticket category is determined, and steps S2, S3, and S4 are re-executed.
A method for identifying a bill information, wherein the bill information identifying method comprises:

Step 1: After receiving the picture of the bill to be processed, determining the area recognition model corresponding to each field to be identified in the ticket picture according to the mapping relationship between the predetermined field to be identified and the area identification model, for each field to be identified, Invoking a corresponding area recognition model to perform area recognition on the line character area of the ticket picture, identifying a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and placing the included character information in the same The target frames of the rows are stitched together in the order of recognition to form a target line character region containing character information;

Step 2: Determine, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and call a corresponding character recognition model for each of the target line character regions of the to-be-identified field Character recognition is performed to respectively identify character information included in a target line character region of each of the to-be-identified fields.
The ticket information identifying method according to claim 8, wherein the determining the region recognition model corresponding to each to-be-identified field in the ticket image according to the mapping relationship between the predetermined to-be-identified field and the region recognition model comprises:

A1, using a pre-trained ticket picture recognition model to identify the ticket type of the received ticket picture, and outputting the recognition result of the ticket category;

A2, using a predetermined correction rule to perform tilt correction on the received bill image;

A3. Determine, according to a mapping relationship between the predetermined ticket category and the to-be-identified field, a field to be identified corresponding to the identified ticket category;

A4. Determine each of the foregoing according to a predetermined mapping relationship between the to-be-identified field and the area recognition model. The area identification model corresponding to the field to be identified.
The ticket information identifying method according to claim 8, wherein the area recognition model is a convolutional neural network model, and the training process for the area recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number of bill picture samples for the to-be-identified field;

C2, setting a second preset number of different height-to-width ratios and a fixed width to a preset value on each of the plurality of ticket image samples;

C3. Mark a small frame containing the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as the sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as the to-be-verified Sample picture

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to the preset threshold, the training is completed, or if the verification pass rate is less than the preset threshold, the number of ticket picture samples is increased, and steps C2, C3, C4, C5, and C6 are repeatedly executed.
The ticket information identifying method according to claim 9, wherein the area recognition model is a convolutional neural network model, and the training process for the area recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number of bill picture samples for the to-be-identified field;

C2, setting a second preset number of different height-to-width ratios and a fixed width to a preset value on each of the plurality of ticket image samples;

C3. Mark a small frame containing the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as the sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as the to-be-verified Sample picture

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to the preset threshold, the training is completed, or if the verification pass rate is less than the preset threshold, the number of ticket picture samples is increased, and steps C2, C3, C4, C5, and C6 are repeatedly executed.
The ticket information identifying method according to claim 8, wherein the character recognition model is a time recurrent neural network model LSTM, and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number of ticket picture samples, and only the ticket picture sample Character information including a row of the to-be-identified field, and naming the name of each ticket picture sample as the character information of the to-be-identified field;

D2, dividing the bill picture sample into a first data set and a second data set according to a ratio of X:Y, wherein the number of bill picture samples in the first data set is greater than the number of bill picture samples in the second data set, and the first data set is As a training set, the second data set is used as a test set, where X is greater than 0 and Y is greater than 0;

D3, sending the sample of the bill image in the first data set to a preset time recurrent neural network model for model training, using the trained model to view the bill image in the second data set every preset time or a preset number of iterations The sample performs character information recognition, and compares the character information recognition result with the name of the tested ticket picture sample to calculate the error of the character information recognition result; if the trained model scatters the error of the character information recognition of the ticket picture sample Then, the preset training parameters are adjusted and the model is retrained until the error converges, and the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.
The ticket information identifying method according to claim 9, wherein the character recognition model is a time recurrent neural network model LSTM, and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number of ticket picture samples, where the ticket picture sample only contains one line of character information of the to-be-identified field, and name each ticket picture sample as the included identification field. Character information;

D2, dividing the bill picture sample into a first data set and a second data set according to a ratio of X:Y, wherein the number of bill picture samples in the first data set is greater than the number of bill picture samples in the second data set, and the first data set is As a training set, the second data set is used as a test set, where X is greater than 0 and Y is greater than 0;

D3, sending the sample of the bill image in the first data set to a preset time recurrent neural network model for model training, using the trained model to view the bill image in the second data set every preset time or a preset number of iterations The sample performs character information recognition, and compares the character information recognition result with the name of the tested ticket picture sample to calculate the error of the character information recognition result; if the trained model scatters the error of the character information recognition of the ticket picture sample Then, the preset training parameters are adjusted and the model is retrained until the error converges, and the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.
The ticket information identifying method according to claim 9, wherein the bill image recognition model is a deep convolutional neural network model, and the deep convolutional neural network model comprises an input layer, 13 convolution layers, and 5 The pooling layer, the two fully connected layers, and one sorting layer are formed; the training process of the bill image recognition model is as follows:

S1, preparing a preset number of bill picture samples marked with corresponding bill categories for each preset bill category;

S2, dividing the ticket picture sample corresponding to each preset ticket category into a training subset of the first ratio and a verification subset of the second ratio, mixing the ticket picture samples in each training subset to obtain a training set, and The sample of the ticket pictures in each verification subset is mixed to obtain a verification set;

S3. Train the ticket picture recognition model by using the training set;

S4. Using the verification set to verify the accuracy of the ticket picture recognition model of the training, if the accuracy rate is greater than or equal to the preset accuracy rate, the training ends, or if the accuracy rate is less than the preset accuracy rate, then each one is added. The number of ticket picture samples corresponding to the preset ticket category is determined, and steps S2, S3, and S4 are re-executed.
A computer readable storage medium, characterized in that the computer readable storage medium stores a ticket information identification system, the ticket information identification system being executable by at least one processor to cause the at least one processor to execute The following steps:

After receiving the picture of the ticket to be processed, determining a region recognition model corresponding to each field to be identified in the ticket image according to a predetermined mapping relationship between the field to be identified and the region identification model, and calling corresponding to each field to be identified The area recognition model performs area recognition on the line character area of the ticket picture, and identifies a target frame containing the character information and having a fixed width as a preset value from the ticket picture, and the included character information is in the same line. The frames are spliced together in the order of recognition to form a target line character region containing character information;

Determining, according to a predetermined mapping relationship between the to-be-identified field and the character recognition model, a character recognition model corresponding to each of the to-be-identified fields, and calling a corresponding character recognition model for character recognition for each target character region of the to-be-identified field And respectively identifying character information included in a target line character region of each of the to-be-identified fields.
The computer readable storage medium according to claim 15, wherein the determining, according to a predetermined mapping relationship between the to-be-identified field and the region recognition model, the region recognition model corresponding to each of the to-be-identified fields in the ticket image comprises: :

A1, using a pre-trained ticket picture recognition model to identify the ticket type of the received ticket picture, and outputting the recognition result of the ticket category;

A2, using a predetermined correction rule to perform tilt correction on the received bill image;

A3. Determine, according to a mapping relationship between the predetermined ticket category and the to-be-identified field, a field to be identified corresponding to the identified ticket category;

A4. Determine, according to a predetermined mapping relationship between the to-be-identified field and the area recognition model, an area recognition model corresponding to each of the to-be-identified fields.
The computer readable storage medium according to claim 15, wherein the region recognition model is a convolutional neural network model, and the training process for the region recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number of bill picture samples for the to-be-identified field;

C2, setting a second preset number of different height-to-width ratios and a fixed width to a preset value on each of the plurality of ticket image samples;

C3. Mark a small frame containing the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as the sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as the to-be-verified Sample picture

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to the preset threshold, the training is completed, or if the verification pass rate is less than the preset threshold, the number of ticket picture samples is increased, and steps C2, C3, C4, C5, and C6 are repeatedly executed.
The computer readable storage medium according to claim 16, wherein the region recognition model is a convolutional neural network model, and the training process for the region recognition model corresponding to a field to be identified is as follows:

C1. Obtain a preset number of bill picture samples for the to-be-identified field;

C2, setting a second preset number of different height-to-width ratios and a fixed width to a preset value on each of the plurality of ticket image samples;

C3. Mark a small frame containing the character information of the to-be-identified field on each ticket picture sample;

C4. The ticket picture sample containing the character information of the to-be-identified field is classified into the first training set, and the ticket picture sample that does not include the character information of the to-be-identified field is classified into the second training set;

C5. Extracting, from the first training set and the second training set, the first preset ratio of the ticket picture samples as the sample picture to be trained, and using the remaining ticket picture samples in the first training set and the second training set as the to-be-verified Sample picture

C6: performing model training by using the extracted sample images to be trained to generate the region recognition model, and verifying the generated region recognition model by using each sample image to be verified;

C7. If the verification pass rate is greater than or equal to the preset threshold, the training is completed, or if the verification pass rate is less than the preset threshold, the number of ticket picture samples is increased, and steps C2, C3, C4, C5, and C6 are repeatedly executed.
The computer readable storage medium according to claim 15, wherein the character recognition model is a time recurrent neural network model LSTM, and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number of ticket picture samples, where the ticket picture sample only contains one line of character information of the to-be-identified field, and name each ticket picture sample as the included identification field. Character information;

D2, dividing the bill picture sample into a first data set and a second data set according to a ratio of X:Y, wherein the number of bill picture samples in the first data set is greater than the number of bill picture samples in the second data set, and the first data set is As a training set, the second data set is used as a test set, where X is greater than 0 and Y is greater than 0;

D3, sending the sample of the bill image in the first data set to a preset time recurrent neural network model for model training, using the trained model to view the bill image in the second data set every preset time or a preset number of iterations The sample performs character information recognition, and compares the character information recognition result with the name of the tested ticket picture sample to calculate the error of the character information recognition result; if the trained model scatters the error of the character information recognition of the ticket picture sample Then, the preset training parameters are adjusted and the model is retrained until the error converges, and the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.
The computer readable storage medium according to claim 16, wherein the character recognition model is a time recurrent neural network model LSTM, and the training process for a character recognition model corresponding to a field to be identified is as follows:

D1. For the to-be-identified field, obtain a preset number of ticket picture samples, where the ticket picture sample only contains one line of character information of the to-be-identified field, and name each ticket picture sample as the included identification field. Character information;

D2, dividing the bill picture sample into a first data set and a second data set according to a ratio of X:Y, The number of ticket picture samples in the first data set is greater than the number of ticket picture samples in the second data set, the first data set is used as a training set, and the second data set is used as a test set, where X is greater than 0 and Y is greater than 0;

D3, sending the sample of the bill image in the first data set to a preset time recurrent neural network model for model training, using the trained model to view the bill image in the second data set every preset time or a preset number of iterations The sample performs character information recognition, and compares the character information recognition result with the name of the tested ticket picture sample to calculate the error of the character information recognition result; if the trained model scatters the error of the character information recognition of the ticket picture sample Then, the preset training parameters are adjusted and the model is retrained until the error converges, and the model training is ended, and the generated model is used as the final character recognition model corresponding to the to-be-identified field.