[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114049686A - Signature recognition model training method and device and electronic equipment - Google Patents

Signature recognition model training method and device and electronic equipment Download PDF

Info

Publication number
CN114049686A
CN114049686A CN202111345986.2A CN202111345986A CN114049686A CN 114049686 A CN114049686 A CN 114049686A CN 202111345986 A CN202111345986 A CN 202111345986A CN 114049686 A CN114049686 A CN 114049686A
Authority
CN
China
Prior art keywords
sample
signature
utilized
picture
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111345986.2A
Other languages
Chinese (zh)
Inventor
王晓燕
黄聚
钦夏孟
范森
吕鹏原
章成全
姚锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111345986.2A priority Critical patent/CN114049686A/en
Publication of CN114049686A publication Critical patent/CN114049686A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a signature recognition model training method and device and electronic equipment, and relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision. The specific scheme is as follows: obtaining a sample to be utilized and corresponding name marking information from a preset sample library; the sample library comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect identification result after feedback when signature identification is carried out on the basis of the signature identification model; acquiring a text line picture corresponding to a sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized; and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized. By the scheme, the iterative signature recognition model can be trained automatically, and the labor cost of the iterative signature recognition model is greatly reduced.

Description

Signature recognition model training method and device and electronic equipment
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and specifically relates to a signature recognition model training method and device and electronic equipment.
Background
For digital enterprises, there is a high demand for intelligent recognition of handwritten signatures. The so-called intelligent recognition of the handwritten signature is to automatically recognize the signature content of the handwritten signature.
The signature recognition model is used for recognizing the signature picture containing the handwritten signature, and is a common recognition means. In order to ensure the accuracy of the signature recognition model, the signature recognition model needs to be iterated, that is, the model needs to be updated and trained. In the related art, the training model is updated manually, for example: and manually selecting a new signature picture as a sample, and manually labeling the signature area and name information in the sample.
Disclosure of Invention
The disclosure provides a signature recognition model training method and device and electronic equipment.
According to an aspect of the present disclosure, there is provided a signature recognition model training method, the method including:
obtaining a sample to be utilized and corresponding name marking information from a preset sample library; the sample library comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
acquiring a text line picture corresponding to the sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized;
and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
According to another aspect of the present disclosure, there is provided an information processing system including: the system comprises a signature identification subsystem and a model training subsystem;
the signature identification subsystem is used for acquiring the name of the picture to be identified when receiving that the identification result fed back by a user aiming at the picture to be identified is wrong, taking the picture to be identified as a first type sample, and storing the name as corresponding name marking information into a preset sample library; the identification result is a result obtained by identifying the picture to be identified based on a signature identification model trained in advance;
the model training subsystem is used for acquiring a sample to be utilized and corresponding name marking information from a preset sample library; acquiring a text line picture corresponding to the sample to be utilized; and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
According to another aspect of the present disclosure, there is provided a signature recognition model training apparatus, the apparatus including:
the first acquisition module is used for acquiring a sample to be utilized and corresponding name marking information from a preset sample library; the sample library at least comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
the second obtaining module is used for obtaining a text line picture corresponding to the sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized;
and the training module is used for updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the signature recognition model training method described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the steps of the signature recognition model training method described above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of the signature recognition model training method described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a signature recognition model training method provided by an embodiment of the present disclosure;
fig. 2(a) is a sample to be utilized in the embodiment of the present disclosure, and fig. 2(b) is a minimum bounding rectangle containing signature text in the embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an information handling system provided by an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a signature recognition model training apparatus provided in an embodiment of the present disclosure;
FIG. 5 is a block diagram of an electronic device for implementing a signature recognition model training method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
At present, the handwritten signature of a transactor is needed to be checked on contracts and confirmation documents in various fields of finance, government affairs, office work and the like, and then whether the handwritten signature of the transactor is consistent with the real name of the user is needed to be checked.
In the related technology, a manual checking mode is adopted for signature recognition, namely whether the handwritten signature of a transactor is consistent with the real name of the transactor is recognized. However, for a signature scenario in which a large number of signatures are generated, a large number of human operators are required to participate in the review, which is inefficient. Moreover, in the manual auditing mode, frequent communication, auditing waiting and the like are required, which may affect the use experience of the signature user.
Based on this, the signature recognition service takes place at the end of life. The signature identification service can realize the identification of the text content of the handwritten signature and participate in personal identity verification, and can save more than 90% of business processing time, such as contract signing time. Moreover, manual guidance is reduced, so that the workload of workers can be saved; interference can be reduced, and signature experience of a user is improved.
For signature recognition services, the signature recognition model is an important means for implementing services. For the signature recognition model, after the signature picture containing the handwritten signature is collected, name labeling can be performed on the signature picture, and then the signature recognition model is trained in a supervision mode so as to recognize the text content in the signature picture, namely the handwritten signature, through the trained signature recognition model. It can be understood that the handwriting style of each person is different, the signature picture features are distributed more dispersedly, and the identification precision is limited by the amount of labeled data. Therefore, a large number of signature pictures are needed during model training, and the signature pictures need to be labeled one by one, wherein the large number of signature pictures cover various types of handwriting data, namely various character contents and handwriting styles.
However, signature recognition models tend to have accuracy guarantees only for samples that can be covered or similar in the training set. In order to ensure the accuracy of the signature recognition model, the signature recognition model needs to be iterated, that is, the model needs to be updated and trained. In the related art, the training model is updated manually, for example: and manually selecting a large number of new signature pictures as samples, manually labeling the signature areas and name information in the samples, and the like to serve as a new training set of the model, and continuously iterating the model.
Therefore, a large amount of labor cost is consumed for updating the training signature recognition model in a manual mode.
In order to solve the above problems and achieve automatic update training of a signature recognition model, embodiments of the present disclosure provide a signature recognition model training method, system, apparatus, device, and storage medium.
First, a method for training a signature recognition model provided in the embodiments of the present disclosure is described below.
The signature recognition model training method provided by the embodiment of the disclosure is applied to electronic equipment. In practical applications, the electronic device may be a server or a terminal device, which is reasonable. In addition, the scene to which the signature Recognition model training method is applied may be, but is not limited to, a counter signature scene, an entry signature scene, an OCR (Optical Character Recognition) scene, and the like.
The signature recognition model training method provided by the embodiment of the disclosure can comprise the following steps:
obtaining a sample to be utilized and corresponding name marking information from a preset sample library; the sample library at least comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
acquiring a text line picture corresponding to the sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized;
and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
In this embodiment, the predetermined sample library includes a first type of sample, that is, a signature picture recognized incorrectly based on the signature recognition model, and corresponding name labeling information; acquiring a sample to be utilized and corresponding name marking information from a sample library; further, a text line picture corresponding to the sample to be utilized is obtained, wherein the text line picture is a signature area in the sample to be utilized; and finally, updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized. Therefore, the scheme does not need to manually select a new training set, manually mark a signature region and the like, so that the signature recognition model can be automatically updated and trained, and the labor cost of manually iterating the signature recognition model is greatly reduced.
The following describes a signature recognition model training method provided by the present disclosure with reference to the accompanying drawings.
As shown in fig. 1, a signature recognition model training method provided by the present disclosure may include the following steps:
s101, obtaining a sample to be utilized and corresponding name marking information from a preset sample library; the sample library at least comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
the signature recognition model is an artificial intelligence model obtained by pre-training based on signature picture samples of a certain order of magnitude (such as one million) and name labeling information. The signature recognition model can be obtained by training any deep learning model, and the structure of the deep learning model can be set according to actual conditions.
After the training of the signature recognition model is completed, in order to improve the precision of signature recognition, iteration can be performed on the signature recognition model, that is, the signature recognition model is subjected to updating training. In order to update and train the signature recognition model, a model update condition for the signature recognition model may be set in advance, so that when the update condition for the signature recognition model is satisfied, update and train for the model may be started. For example, the model update condition may be, every predetermined time; a predetermined number of samples collected, etc., is possible.
And the sample library is a database, namely a collection of organized and uniformly managed mass data which is stored in a computer for a long time, and is used for storing the signature picture as the sample and the corresponding name marking information, wherein the signature picture is a picture including written signatures. In this embodiment, the predetermined sample library includes a plurality of first-type samples and corresponding name labeling information. In this way, the samples to be utilized obtained from the predetermined sample library may comprise the first type of samples. It is to be understood that the number of samples to be utilized may be plural, and the processing procedure is the same for each sample to be utilized.
It is understood that the preliminarily trained signature recognition model is applied to the signature recognition subsystem providing the signature recognition service, so that the signature recognition subsystem can recognize the signature information by using the trained signature recognition model. In order to construct a predetermined sample library, the signature identification subsystem also has a confirmation feedback function and is used for recording that a user feeds back the signature identification model and identifying correct or wrong signature pictures and name marking information; furthermore, the signature picture with the wrong identification result after feedback can be used as a first type sample, and the first type sample and the name marking information corresponding to the first type sample can be automatically stored in a sample library. In addition, in an implementation manner, the signature recognition subsystem can also record a label corresponding to a signature picture which is correctly or incorrectly recognized, so that the signature picture which is incorrectly recognized and corresponding name labeling information fed back by a user can be found according to the label.
It should be noted that, the name labeling information, that is, the real name of the signer corresponding to the signature picture, may be determined in various ways, for example: and when the user feeds back the recognition result of the recognition error aiming at the signature picture, the user is instructed to manually input a real name, or the name of the user is recorded in the database in advance.
Optionally, the sample library further includes a second type of sample and corresponding name labeling information; wherein the second type of sample is a signature picture generated in a predetermined time period and related to a predetermined signature scene.
The predetermined signature scenario may include, but is not limited to: enrollment signatures, over-the-counter signatures, etc. It can be understood that signature pictures with ultra-large scale are generated in the scenes and are updated continuously, so that signature pictures generated in a predetermined signature scene within a predetermined time period and corresponding annotation information can be collected and added to a predetermined sample library.
Thus, the predetermined sample library may include, in addition to the first type of sample identified by the user feedback, a second type of sample, i.e., a new sample collected continuously with respect to the predetermined signature scenario.
Therefore, the first type samples and the second type samples are continuously updated samples, the first type samples and the second type samples and the corresponding name marking information thereof are continuously stored in the sample library, the sample library can be continuously updated and enriched, signature pictures generated in a preset signature scene are fully utilized, data waste is avoided, and the range of handwritten data types covered by the sample library is enlarged.
S102, acquiring a text line picture corresponding to the sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized;
since the sample to be utilized may contain non-signature regions, for example: and in the blank area, the training of the signature identification model only needs to utilize the text content in the signature area, so after the sample to be utilized is obtained, the text line image of the signature area of the sample to be utilized is extracted.
In one implementation, obtaining the text line picture corresponding to the sample to be utilized may include steps a1-a 2:
a1, carrying out signature region detection on a sample to be utilized to obtain a detection result;
the detection result may be coordinate information of the signature region, for example, fig. 2(a) is a sample to be utilized, and fig. 2(b) is a minimum bounding rectangle containing signature characters. The coordinate information can be generated according to four end points of the minimum circumscribed rectangle of the signature characters, and at the moment, the coordinate information can uniquely represent the signature area. There are various ways of performing signature region detection on the sample to be utilized.
For example, the performing signature region detection on the sample to be utilized to obtain a detection result may include:
carrying out binarization processing on the sample to be utilized, and determining the minimum circumscribed rectangle of each character area in the sample after binarization processing to obtain the detection result of the signature area;
or,
and detecting the signature area in the sample to be utilized based on a pre-trained character detection model for detecting the signature area to obtain a detection result.
In order to save the manual labeling cost, the detection result can be obtained by processing the binarization of the sample to be utilized or detecting the signature area in the sample to be utilized by using a pre-trained character detection model.
It can be understood that, the specific process of the binarization processing is to perform graying processing on the color image of the sample to be utilized to obtain a grayscale image, and then divide each pixel of the obtained grayscale image into two levels. For example, a suitable threshold may be set for the grayscale image to determine whether a pixel is a target or a background, and the pixel exceeding the threshold is set to 1, and the pixel below the threshold is set to 0, so as to obtain a binarized image. And then determining the minimum circumscribed rectangle of each character area in the sample after the binarization processing to obtain the detection result of the signature area.
Moreover, the character detection model is a model for positioning the character region, which is obtained by pre-training based on each sample and character marking information, and the specific training process of the character detection model is not limited by the disclosure. The character marking information corresponding to each sample can be obtained through manual marking, and can also be obtained through a mode of carrying out binarization processing on the samples, which is reasonable. For example, in an implementation manner, a certain number (e.g., five thousand) of signature pictures may be selected, label is set, and 4 corner coordinates of a circumscribed rectangle of a character in the signature pictures are marked, so that the character detection model is trained based on the signature pictures and the label of the 4 corner coordinates thereof.
A2, extracting a signature region from the sample to be utilized as a text line picture corresponding to the sample to be utilized based on the detection result.
If the extracted signature region is the minimum circumscribed rectangle containing each character region, considering that the rectangles are always inclined, after the signature region is extracted, affine transformation can be carried out on the signature region to correct the inclined rectangles, so that the signature recognition model can be conveniently recognized, and the signature recognition model can be better trained.
It is understood that the step of generating the text line picture may be performed after obtaining the to-be-utilized sample when the model update condition for the signature recognition model is satisfied, where the step of obtaining the text line picture corresponding to the to-be-utilized sample includes steps a1-a 2. In addition, in order to improve the efficiency of updating and training the signature recognition model, when a sample to be utilized is stored in the sample library, a corresponding text line picture can be automatically generated and stored in the sample library, and when the model updating condition aiming at the signature recognition model is met, the generated text line picture can be directly acquired from the sample library.
S103, updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
In one implementation, the update training of the signature recognition model may include: inputting a text line picture corresponding to a sample to be utilized into a signature identification model to obtain an output result of the signature identification model for identifying the sample to be utilized; determining a loss value based on the output result and the name labeling information of the sample to be utilized; and continuously adjusting model parameters of the signature recognition model in training, and continuously training, thereby continuously reducing the loss value of the model and achieving the purpose of updating iteration.
In addition, when the initially constructed basic signature recognition model is trained, the collected signature picture samples are subjected to signature region detection, the signature regions are extracted as basic text line pictures, and then the basic text line pictures and name labeling information thereof are used as a training set of the initially constructed signature recognition model.
In a specific implementation process, aiming at training of a basic signature recognition model, a certain number of signature pictures with name information of signatories can be extracted from a database, and signature areas in the signature pictures are extracted as basic text line pictures. By selecting the signature picture with the name information of the signatory, the name information of the signature picture can be prevented from being manually marked. Certainly, if the number of the signature pictures with the name information of the signatory in the database is insufficient, a certain number of signature pictures without the name information of the signatory can be selected, and the name information of the certain number of signature pictures is labeled in a manual labeling mode, for example, labels label are set for the signature pictures without the name information of the signatory, and characters in the signature pictures, namely the actual names of the signatory, are labeled.
In this embodiment, the predetermined sample library includes a first type of sample, that is, a signature picture recognized incorrectly based on the signature recognition model, and corresponding name labeling information; acquiring a sample to be utilized and corresponding name marking information from a sample library; further, a text line picture corresponding to the sample to be utilized is obtained, wherein the text line picture is a signature area in the sample to be utilized; and finally, updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized. Therefore, the scheme does not need to manually select a new training set, manually mark a signature region and the like, so that the signature recognition model can be automatically updated and trained, and the labor cost of manually iterating the signature recognition model is greatly reduced.
Optionally, in another embodiment, the number of the samples to be utilized is multiple; before updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized, the method further comprises steps B1-B4:
b1, counting the frequency of occurrence of each character in the name marking information corresponding to each sample to be utilized;
because various characters with different frequencies can appear in the signature, such as high appearance frequency of common characters and low appearance frequency of rare characters, the appearance frequency of each character is in a long tail distribution state, namely, the appearance frequency of a large number of characters is low, and only when the samples of each type of characters are sufficient, the signature recognition model can achieve high recognition precision. Therefore, the occurrence frequency of each character in the name marking information corresponding to each sample to be utilized can be counted, and the low-frequency characters are correspondingly processed, so that the identification precision of the signature identification model on the low-frequency characters is improved.
B2, detecting the target characters with the occurrence frequency lower than a preset frequency threshold;
and taking the characters with the frequency lower than the threshold value in the statistical result as target characters.
B3, determining a to-be-utilized sample containing the target characters in the corresponding name labeling information as a to-be-processed sample;
the sample to be utilized containing the target text, that is, the sample where the text with the occurrence frequency lower than the predetermined frequency threshold is located, can be used as the sample to be processed.
B4, performing data enhancement processing on the text line picture corresponding to the sample to be processed to obtain a target picture;
for example, the data enhancement processing may perform picture resizing (resize), picture rotation (rotate), adding each edge pixel (padding), and the like on the text line picture corresponding to the sample to be processed. In addition, it is also possible to manually collect samples for low-frequency words, or manually write low-frequency words of different writing types as target samples, and extract text line pictures of the manually collected and written samples.
On the basis of B1-B4, correspondingly, the updating and training of the signature recognition model based on the text line picture and name marking information corresponding to the sample to be utilized may include:
updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized, the target sample and the corresponding name marking information; the name marking information corresponding to the target sample is as follows: and name marking information corresponding to the sample to be processed.
At this time, the samples used when the signature recognition model is updated and trained include the text line picture and name labeling information corresponding to the sample to be used, the target sample obtained after data enhancement processing is performed on the sample to be processed corresponding to the low-frequency character, and the corresponding name labeling information, wherein the name labeling information is the name labeling information corresponding to the designated sample.
In addition, the occurrence frequency of characters in the first type of sample with the identification errors fed back by the user can be counted, the steps are executed on the low-frequency characters in the first type of sample, and the signature identification model is trained more accurately aiming at the low-frequency characters in the characters with the identification errors of the model.
Similarly, when the initially constructed basic signature recognition model is trained, the low-frequency characters in the training set can be counted, the basic target sample obtained after data enhancement processing is performed on the to-be-processed sample corresponding to the low-frequency characters is performed, the basic signature recognition model is trained by taking the name labeling information of the basic target sample as the training set of the initially constructed signature recognition model, and therefore the long-tail distribution of the occurrence frequency of different characters in the training set is improved.
In this embodiment, a to-be-utilized sample of a target character with an occurrence frequency lower than a predetermined frequency threshold is used as a to-be-processed sample, data enhancement processing is performed on the to-be-processed sample to obtain a target sample, and the signature recognition model is updated and trained based on a text line picture and name labeling information corresponding to the to-be-utilized sample, the target sample and corresponding name labeling information. Therefore, the scheme of the embodiment can improve the identification precision of the signature identification model.
Therefore, the training method of the signature recognition model provided by the disclosure can continuously supplement and update the first type sample with the wrong recognition of the signature recognition model, introduce the new second type sample, and automatically update the training signature recognition model based on the text line picture and name marking information corresponding to the first type sample and the second type sample, thereby reducing the cost of manually collecting a training set, manually marking and manually iterating, and simultaneously ensuring that the real data can be fully utilized in the training process. Therefore, with the data volume continuously covering more signature handwriting and character samples, the generalization and the recognition precision of the signature recognition model can be continuously improved, and the effect of online use is achieved.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
According to an embodiment of the present disclosure, there is also provided an information processing system, as shown in fig. 3, including: a signature recognition subsystem 310 and a model training subsystem 320;
the signature identification subsystem 310 is configured to, when it is received that an identification result fed back by a user for a picture to be identified is incorrect, obtain a name of the picture to be identified, use the picture to be identified as a first type of sample, and store the name as corresponding name labeling information in a predetermined sample library; the identification result is a result obtained by identifying the picture to be identified based on a signature identification model trained in advance;
the model training subsystem 320 is configured to obtain a sample to be utilized and corresponding name labeling information from a predetermined sample library; acquiring a text line picture corresponding to the sample to be utilized; and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
For specific implementation of each function of the model training subsystem, reference may be made to corresponding contents in the foregoing method embodiments, which are not described herein again.
In the embodiment, when receiving that the identification result fed back by the user aiming at the picture to be identified is wrong, the signature identification subsystem acquires the name of the picture to be identified, takes the picture to be identified as a first type sample, and stores the name as corresponding name marking information into a preset sample library; the model training subsystem acquires a sample to be utilized and corresponding name marking information from a preset sample library; and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized. Therefore, in the scheme, the error samples fed back by the user are automatically stored in the sample library, and the signature recognition model is automatically updated and trained on the basis of the text line pictures and the name marking information corresponding to the samples to be utilized. Therefore, the iterative signature recognition model can be trained automatically through the scheme, and the labor cost of the iterative signature recognition model is greatly reduced.
According to an embodiment of the present disclosure, there is also provided a signature recognition model training apparatus, as shown in fig. 4, the apparatus including:
a first obtaining module 410, configured to obtain a sample to be utilized and corresponding name labeling information from a predetermined sample library; the sample library at least comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
a second obtaining module 420, configured to obtain a text line picture corresponding to the sample to be utilized, where the text line picture is a signature area in the sample to be utilized;
and the training module 430 is configured to update and train the signature recognition model based on the text line picture and the name tagging information corresponding to the sample to be utilized.
Optionally, the sample library further includes a second type of sample and corresponding name labeling information; wherein the second type of sample is a signature picture about a predetermined signature scene acquired within a predetermined time period.
Optionally, the second obtaining module 420 includes:
the detection submodule is used for carrying out signature region detection on the sample to be utilized to obtain a detection result;
and the extraction submodule is used for extracting a signature area from the sample to be utilized as a text line picture corresponding to the sample to be utilized based on the detection result.
Optionally, the detecting sub-module performs signature region detection on the sample to be utilized to obtain a detection result, including:
carrying out binarization processing on the sample to be utilized, and determining the minimum circumscribed rectangle of each character area in the sample after binarization processing to obtain the detection result of the signature area;
or,
and detecting the signature area in the sample to be utilized based on a pre-trained character detection model for detecting the signature area to obtain a detection result.
Optionally, the number of the samples to be utilized is multiple; the device further comprises:
the statistical module is used for counting the occurrence frequency of each character in the name marking information corresponding to each sample to be utilized;
the detection module is used for detecting the target characters with the occurrence frequency lower than a preset frequency threshold;
the determining module is used for determining a sample to be utilized containing the target characters in the corresponding name marking information as an appointed sample;
the enhancement module is used for carrying out data enhancement processing on the text line picture corresponding to the specified sample to obtain a target picture;
the training module is specifically configured to:
updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized, and the target picture and the corresponding name marking information;
the name marking information corresponding to the target picture is as follows: and name marking information corresponding to the specified sample.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
An electronic device provided by the present disclosure may include:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the signature recognition model training method described above.
The present disclosure provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned signature recognition model training method.
In yet another embodiment provided by the present disclosure, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the signature recognition model training method described above in the above embodiments.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above. For example, in some embodiments, the signature recognition model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When loaded into RAM 503 and executed by the computing unit 501, a computer program may perform one or more steps of the signature recognition model training method described above. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the signature recognition model training method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (14)

1. A method of signature recognition model training, the method comprising:
obtaining a sample to be utilized and corresponding name marking information from a preset sample library; the sample library comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
acquiring a text line picture corresponding to the sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized;
and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
2. The method of claim 1, wherein the sample library further comprises a second type of sample and corresponding name tagging information; wherein the second type of sample is a signature picture generated in a predetermined time period and related to a predetermined signature scene.
3. The method according to claim 1 or 2, wherein obtaining the text line picture corresponding to the sample to be utilized comprises:
carrying out signature area detection on the sample to be utilized to obtain a detection result;
and extracting a signature region from the sample to be utilized as a text line picture corresponding to the sample to be utilized based on the detection result.
4. The method of claim 3, wherein the performing signature region detection on the sample to be utilized to obtain a detection result comprises:
carrying out binarization processing on the sample to be utilized, and determining the minimum circumscribed rectangle of each character area in the sample after binarization processing to obtain the detection result of the signature area;
or,
and detecting the signature area in the sample to be utilized based on a pre-trained character detection model for detecting the signature area to obtain a detection result.
5. The method according to claim 1 or 2, wherein the number of samples to be utilized is plural; before updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized, the method further comprises the following steps:
counting the occurrence frequency of each character in the name marking information corresponding to each sample to be utilized;
detecting target characters with the occurrence frequency lower than a preset frequency threshold;
determining a to-be-utilized sample containing the target characters in the corresponding name labeling information as a to-be-processed sample;
performing data enhancement processing on the text line picture corresponding to the sample to be processed to obtain a target sample;
the updating and training of the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized comprises the following steps:
updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized, the target sample and the corresponding name marking information;
the name marking information corresponding to the target sample is as follows: and name marking information corresponding to the sample to be processed.
6. An information processing system comprising: the system comprises a signature identification subsystem and a model training subsystem;
the signature identification subsystem is used for acquiring the name of the picture to be identified when receiving that the identification result fed back by a user aiming at the picture to be identified is wrong, taking the picture to be identified as a first type sample, and storing the name as corresponding name marking information into a preset sample library; the identification result is a result obtained by identifying the picture to be identified based on a signature identification model trained in advance;
the model training subsystem is used for acquiring a sample to be utilized and corresponding name marking information from a preset sample library; acquiring a text line picture corresponding to the sample to be utilized; and updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
7. A signature recognition model training apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a sample to be utilized and corresponding name marking information from a preset sample library; the sample library at least comprises a first type of sample and corresponding name labeling information; the first type of sample is a signature picture with an incorrect feedback identification result when signature identification is carried out based on the signature identification model;
the second obtaining module is used for obtaining a text line picture corresponding to the sample to be utilized, wherein the text line picture is a signature area in the sample to be utilized;
and the training module is used for updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized.
8. The apparatus of claim 7, wherein the sample library further comprises a second type of sample and corresponding name tagging information; wherein the second type of sample is a signature picture generated in a predetermined time period and related to a predetermined signature scene.
9. The apparatus of claim 7 or 8, wherein the second obtaining means comprises:
the detection submodule is used for carrying out signature region detection on the sample to be utilized to obtain a detection result;
and the extraction submodule is used for extracting a signature area from the sample to be utilized as a text line picture corresponding to the sample to be utilized based on the detection result.
10. The apparatus of claim 9, wherein the detection sub-module performs signature region detection on the sample to be utilized to obtain a detection result, and the detection sub-module includes:
carrying out binarization processing on the sample to be utilized, and determining the minimum circumscribed rectangle of each character area in the sample after binarization processing to obtain the detection result of the signature area;
or,
and detecting the signature area in the sample to be utilized based on a pre-trained character detection model for detecting the signature area to obtain a detection result.
11. The apparatus according to claim 7 or 8, wherein the number of samples to be utilized is plural; the device further comprises:
the statistical module is used for counting the occurrence frequency of each character in the name marking information corresponding to each sample to be utilized;
the detection module is used for detecting the target characters with the occurrence frequency lower than a preset frequency threshold;
the determining module is used for determining a sample to be utilized containing the target characters in the corresponding name marking information as an appointed sample;
the enhancement module is used for carrying out data enhancement processing on the text line picture corresponding to the specified sample to obtain a target picture;
the training module is specifically configured to:
updating and training the signature recognition model based on the text line picture and the name marking information corresponding to the sample to be utilized, and the target picture and the corresponding name marking information;
the name marking information corresponding to the target picture is as follows: and name marking information corresponding to the specified sample.
12. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
13. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
14. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.
CN202111345986.2A 2021-11-15 2021-11-15 Signature recognition model training method and device and electronic equipment Pending CN114049686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111345986.2A CN114049686A (en) 2021-11-15 2021-11-15 Signature recognition model training method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111345986.2A CN114049686A (en) 2021-11-15 2021-11-15 Signature recognition model training method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114049686A true CN114049686A (en) 2022-02-15

Family

ID=80208948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111345986.2A Pending CN114049686A (en) 2021-11-15 2021-11-15 Signature recognition model training method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114049686A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116305076A (en) * 2023-03-30 2023-06-23 重庆傲雄在线信息技术有限公司 Signature-based identity information registration sample online updating method, system and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060704A1 (en) * 2016-08-30 2018-03-01 Baidu Online Network Technology (Beijing) Co., Ltd. Method And Apparatus For Image Character Recognition Model Generation, And Vertically-Oriented Character Image Recognition
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal
JP2019528520A (en) * 2016-08-31 2019-10-10 富士通株式会社 Classification network training apparatus, character recognition apparatus and method for character recognition
CN111444906A (en) * 2020-03-24 2020-07-24 腾讯科技(深圳)有限公司 Image recognition method based on artificial intelligence and related device
CN111461238A (en) * 2020-04-03 2020-07-28 讯飞智元信息科技有限公司 Model training method, character recognition method, device, equipment and storage medium
CN111626279A (en) * 2019-10-15 2020-09-04 西安网算数据科技有限公司 Negative sample labeling training method and highly-automated bill identification method
CN112200312A (en) * 2020-09-10 2021-01-08 北京达佳互联信息技术有限公司 Method and device for training character recognition model and storage medium
CN112418304A (en) * 2020-11-19 2021-02-26 北京云从科技有限公司 OCR (optical character recognition) model training method, system and device
CN113313022A (en) * 2021-05-27 2021-08-27 北京百度网讯科技有限公司 Training method of character recognition model and method for recognizing characters in image
CN113378833A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Image recognition model training method, image recognition device and electronic equipment
CN113569916A (en) * 2021-06-30 2021-10-29 佛山喀视科技有限公司 Ceramic tile label recognition model training method and ceramic tile label recognition method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180060704A1 (en) * 2016-08-30 2018-03-01 Baidu Online Network Technology (Beijing) Co., Ltd. Method And Apparatus For Image Character Recognition Model Generation, And Vertically-Oriented Character Image Recognition
JP2019528520A (en) * 2016-08-31 2019-10-10 富士通株式会社 Classification network training apparatus, character recognition apparatus and method for character recognition
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal
CN111626279A (en) * 2019-10-15 2020-09-04 西安网算数据科技有限公司 Negative sample labeling training method and highly-automated bill identification method
CN111444906A (en) * 2020-03-24 2020-07-24 腾讯科技(深圳)有限公司 Image recognition method based on artificial intelligence and related device
CN111461238A (en) * 2020-04-03 2020-07-28 讯飞智元信息科技有限公司 Model training method, character recognition method, device, equipment and storage medium
CN112200312A (en) * 2020-09-10 2021-01-08 北京达佳互联信息技术有限公司 Method and device for training character recognition model and storage medium
CN112418304A (en) * 2020-11-19 2021-02-26 北京云从科技有限公司 OCR (optical character recognition) model training method, system and device
CN113313022A (en) * 2021-05-27 2021-08-27 北京百度网讯科技有限公司 Training method of character recognition model and method for recognizing characters in image
CN113378833A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Image recognition model training method, image recognition device and electronic equipment
CN113569916A (en) * 2021-06-30 2021-10-29 佛山喀视科技有限公司 Ceramic tile label recognition model training method and ceramic tile label recognition method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
贾建忠;: "基于小波变换和CPN网络的手写签名鉴别", 计算机与现代化, no. 07, 15 July 2020 (2020-07-15) *
贾真;冶忠林;尹红风;何大可;: "基于Tri-training与噪声过滤的弱监督关系抽取", 中文信息学报, no. 04, 15 July 2016 (2016-07-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116305076A (en) * 2023-03-30 2023-06-23 重庆傲雄在线信息技术有限公司 Signature-based identity information registration sample online updating method, system and storage medium
CN116305076B (en) * 2023-03-30 2024-03-08 重庆亲笔签数字科技有限公司 Signature-based identity information registration sample online updating method, system and storage medium

Similar Documents

Publication Publication Date Title
KR102171220B1 (en) Character recognition method, device, server and storage medium of claim documents
CN107330471B (en) Method and device for problem location of feedback content, computer equipment and storage medium
CN106980856B (en) Formula identification method and system and symbolic reasoning calculation method and system
CN112699775B (en) Certificate identification method, device, equipment and storage medium based on deep learning
CN113705554A (en) Training method, device and equipment of image recognition model and storage medium
CN113382279B (en) Live broadcast recommendation method, device, equipment, storage medium and computer program product
CN107491536B (en) Test question checking method, test question checking device and electronic equipment
US11341319B2 (en) Visual data mapping
CN110826494A (en) Method and device for evaluating quality of labeled data, computer equipment and storage medium
CN113780098A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN110909123A (en) Data extraction method and device, terminal equipment and storage medium
CN113313114B (en) Certificate information acquisition method, device, equipment and storage medium
CN113221918A (en) Target detection method, and training method and device of target detection model
CN114090601A (en) Data screening method, device, equipment and storage medium
CN113657395A (en) Text recognition method, and training method and device of visual feature extraction model
CN113963364A (en) Target laboratory test report generation method and device, electronic equipment and storage medium
CN115934928A (en) Information extraction method, device, equipment and storage medium
CN114092948B (en) Bill identification method, device, equipment and storage medium
CN111625567A (en) Data model matching method, device, computer system and readable storage medium
CN115909376A (en) Text recognition method, text recognition model training device and storage medium
CN112801016B (en) Ballot data statistics method, device, equipment and medium
CN114049686A (en) Signature recognition model training method and device and electronic equipment
CN114638501A (en) Business data processing method and device, computer equipment and storage medium
CN114386013A (en) Automatic student status authentication method and device, computer equipment and storage medium
CN114626457A (en) Target detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination