[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114663886A - Text recognition method, model training method and device - Google Patents

Text recognition method, model training method and device Download PDF

Info

Publication number
CN114663886A
CN114663886A CN202210198346.1A CN202210198346A CN114663886A CN 114663886 A CN114663886 A CN 114663886A CN 202210198346 A CN202210198346 A CN 202210198346A CN 114663886 A CN114663886 A CN 114663886A
Authority
CN
China
Prior art keywords
initial
character
characters
text
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210198346.1A
Other languages
Chinese (zh)
Inventor
王昆
余威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210198346.1A priority Critical patent/CN114663886A/en
Publication of CN114663886A publication Critical patent/CN114663886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)

Abstract

The disclosure provides a text recognition method, a model training method and a model training device, and relates to image processing, deep learning and natural language understanding in an artificial intelligence technology. The specific implementation scheme is as follows: the method comprises the steps of carrying out optical character recognition on an acquired image to be recognized to obtain initial characters of the image to be recognized, analyzing the initial characters to obtain category attributes of the initial characters, carrying out error correction processing on the wrong characters to obtain correct characters for correcting the wrong characters if the category attributes of the initial characters represent that the initial characters are the wrong characters, generating text contents of the image to be recognized according to the initial characters and the correct characters, avoiding the defect of character errors caused by an OCR recognition technology, and improving the accuracy and reliability of text recognition.

Description

Text recognition method, model training method and device
Technical Field
The present disclosure relates to image processing, deep learning, and natural language understanding in an artificial intelligence technology, and in particular, to a text recognition method, a model training method, and a device.
Background
Optical Character Recognition (OCR) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into computer text using Character Recognition methods.
In the prior art, the acquisition of text content in an image to be recognized is usually realized by using an OCR recognition technology.
However, the OCR recognition technology may have a technical problem that a character recognition error is generated, thereby resulting in a low accuracy of the acquired text content.
Disclosure of Invention
The disclosure provides a text recognition method for improving accuracy of text recognition, and a training method and device of a model.
According to a first aspect of the present disclosure, there is provided a text recognition method including:
carrying out optical character recognition on the acquired image to be recognized to obtain initial characters of the image to be recognized;
analyzing the initial characters to obtain the category attributes of the initial characters, and if the category attributes of the initial characters represent that the initial characters are wrong characters, performing error correction on the wrong characters to obtain correct characters for correcting the wrong characters;
and generating the text content of the image to be recognized according to the initial characters and the correct characters.
According to a second aspect of the present disclosure, there is provided a training method of a model, comprising:
obtaining a first sample data set, wherein the first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises an error word;
and training according to the first sample data set to obtain an initialization model parameter, and training according to the initialization model parameter to obtain a character position discriminator, wherein the character position discriminator is used for analyzing and processing initial characters of an image to be recognized to obtain the category attribute of the initial characters.
According to a third aspect of the present disclosure, there is provided a text recognition apparatus including:
the identification unit is used for carrying out optical character identification on the acquired image to be identified to obtain initial characters of the image to be identified;
the analysis unit is used for analyzing and processing the initial characters to obtain the category attributes of the initial characters;
the error correction unit is used for performing error correction processing on the wrong characters to obtain correct characters for correcting the wrong characters if the category attributes of the initial characters represent that the initial characters are wrong characters;
and the generating unit is used for generating the text content of the image to be recognized according to the initial characters and the correct characters.
According to a fourth aspect of the present disclosure, there is provided a training apparatus for a model, comprising:
an obtaining unit, configured to obtain a first sample data set, where the first sample data set includes an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text includes at least one wrong word;
the first training unit is used for training according to the first sample data set to obtain an initialization model parameter;
and the second training unit is used for training according to the initialization model parameters to obtain a character position discriminator, wherein the character position discriminator is used for analyzing and processing initial characters of an image to be recognized to obtain the category attributes of the initial characters.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first or second aspect.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first or second aspect.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first or second aspect.
The embodiment provides a text recognition method, a model training method and a device, wherein the method comprises the steps of determining the category attribute of an initial character, determining the correct character of the character for error correction when the category attribute of the initial character represents that the initial character is the wrong character, so as to determine the technical characteristics of the text content of an image to be recognized by combining the correct character, avoiding the defect of character error caused by an OCR (optical character recognition) technology, and improving the accuracy and reliability of text recognition.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a text recognition method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;
FIG. 8 is a schematic diagram according to a seventh embodiment of the present disclosure;
FIG. 9 is a schematic diagram according to an eighth embodiment of the present disclosure;
FIG. 10 is a schematic diagram according to a ninth embodiment of the present disclosure;
FIG. 11 is a block diagram of an electronic device for implementing a text recognition method and a training method of a model according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
With the development of Artificial Intelligence (AI) technology, the way of identifying text content in an image by human has been replaced by the Artificial Intelligence technology, for example, an OCR technology can be used to identify the image to obtain the text content in the image.
It is understood that images may be classified based on different dimensions, such as based on how the images are formed, images may be classified, such as into pictures and photographs, and so on. The images may also be classified based on their content, such as by being divided into ticket images (e.g., images of checks, tax stamps, etc.), sign images (e.g., images of restaurant signs, images of warning signs, etc.), and so forth.
Since the OCR recognition technology is limited by the limit of the deep learning model itself and the limit of the quality of the image, when the image is recognized based on the OCR recognition technology to obtain the text content in the image, the text content may have a situation of missing words or multiple words, and the text content may also have a situation of wrong words.
In order to avoid at least one of the above technical problems, the inventors of the present disclosure have made creative efforts to obtain the inventive concept of the present disclosure: determining initial characters in the image based on an OCR recognition technology, analyzing the initial characters to correct the wrong characters when the wrong characters exist in the initial characters so as to obtain correct characters, and determining the text content of the image according to the initial characters and the correct characters.
Based on the inventive concept, the present disclosure provides a text recognition method, a model training method, and a device thereof, which are applied to image processing, deep learning, and natural language understanding in an artificial intelligence technology, so as to improve the accuracy of text recognition.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, and as shown in fig. 1, a text recognition method according to an embodiment of the present disclosure includes:
s101: and carrying out optical character recognition on the acquired image to be recognized to obtain initial characters of the image to be recognized.
For example, the main execution body of this embodiment may be a text recognition device, the text recognition device may be a server (such as a local server, or a cloud server, or a server cluster), or may be a terminal device, or may be a processor, or may be a chip, and the like, which is not limited in this embodiment.
The "to-be-recognized" in the image to be recognized is used for distinguishing the image to be recognized from other images, such as a sample image in the following text, and cannot be understood as a definition of the image to be recognized. And the image to be recognized can be understood as the image that needs to be recognized.
In conjunction with the above analysis, there are many types of images, such as bill images, signboard images, and the like. Accordingly, in the present embodiment, the image to be recognized may be a bill image or a signboard image.
Similarly, the image may be a picture or a photograph. Correspondingly, in this embodiment, the image to be recognized may be a picture or a photograph.
For example, the image to be recognized may be a picture of a ticket, a photograph of a signboard, a picture of a signboard, and the like, which are not listed herein.
This step can be understood as: after the image to be recognized is acquired, OCR recognition may be performed on the image to be recognized to obtain characters in the image to be recognized, and in order to distinguish the characters obtained by OCR recognition from the characters in the following text (such as correct characters), the characters obtained by OCR recognition are referred to as initial characters.
It should be noted that, the embodiment is not limited as to the manner of acquiring the image to be recognized, for example, the following example may be adopted to acquire the image to be recognized:
in one example, the text recognition device may be connected with the image acquisition device and receive the image to be recognized transmitted by the image acquisition device.
The image acquisition device may be a device for acquiring an image to be recognized, for example, the image acquisition device may be a camera.
In another example, the text recognition device may provide an image-loading tool by which a user may transmit an image to be recognized to the text recognition device.
The image loading tool can be an interface used for being connected with external equipment, such as an interface used for being connected with other storage equipment, and the image to be identified transmitted by the external equipment is acquired through the interface; the image loading tool may also be a display device, for example, the text recognition device may input an interface with an image loading function on the display device, and a user may import the image to be recognized into the text recognition device through the interface, and the text recognition device obtains the imported image to be recognized.
S102: and if the initial character is represented as an error character by the initial character category attribute, carrying out error correction processing on the error character to obtain a correct character for correcting the error character.
For example, the category attribute of the initial word may be used to characterize whether the initial word is an error word, such as the category attribute of the initial word characterizes that the initial word is an error word, or the category attribute of the initial word characterizes that the initial word is a non-error word.
In this embodiment, the analysis processing manner is not limited, for example, a network model may be used to analyze the initial characters to obtain the category attributes of the initial characters, and the network model may be a classification network model. That is, the analysis process may be a classification process, such as classifying the original text based on a classification network model to determine whether the original text is an incorrect text (i.e., the category attribute of the original text is obtained).
For another example, the initial text may also be analyzed in a context-dependent manner to obtain a category attribute of the initial text. For example, the analysis process may be a semantic analysis process, such as performing an analysis process on the original text based on the semantics between the contexts of the original text to determine whether the original text is an incorrect text (i.e., the category attribute of the original text is obtained).
Similarly, the error correction processing manner is not limited in this embodiment.
S103: and generating the text content of the image to be recognized according to the initial characters and the correct characters.
In combination with the analysis, the initial characters have wrong characters, and the correct characters are characters for correcting the wrong characters, so that when the text content of the image to be recognized is generated based on the initial characters and the correct characters, the generated text content of the image to be recognized has higher accuracy and reliability.
Based on the above analysis, the present disclosure provides a text recognition method, including: the method comprises the steps of carrying out optical character recognition on an acquired image to be recognized to obtain initial characters of the image to be recognized, carrying out analysis processing on the initial characters to obtain category attributes of the initial characters, carrying out error correction processing on the error characters to obtain correct characters for correcting the error characters if the category attributes of the initial characters represent that the initial characters are the wrong characters, and generating text contents of the image to be recognized according to the initial characters and the correct characters, wherein in the embodiment, the category attributes of the initial characters are determined, so that when the category attributes of the initial characters represent that the initial characters are the wrong characters, the correct characters for correcting the wrong characters are determined, the technical characteristics of the text contents of the image to be recognized are determined by combining the correct characters, and the defect of character errors caused by an OCR (optical character recognition) technology is avoided, the technical effects of accuracy and reliability of text recognition are improved.
Fig. 2 is a schematic diagram of a second embodiment of the present disclosure, and as shown in fig. 2, a text recognition method of the embodiment of the present disclosure includes:
s201: and carrying out optical character recognition on the acquired image to be recognized to obtain initial characters of the image to be recognized.
Wherein the number of the initial characters is multiple.
It should be understood that, in order to avoid redundant statements, the present embodiment will not be described again with respect to the same technical features of the present embodiment as those of the above embodiments.
S202: and analyzing and processing each initial character one by one to obtain the category attribute corresponding to each initial character.
For example, as shown in fig. 3, if the image to be recognized is a signboard image, after OCR recognition is performed on the signboard image, an initial character "hai lu happy hot pot" of the signboard image is obtained. That is, the number of the initial characters obtained is plural, and specifically seven.
Correspondingly, analyzing and processing seven initial characters one by one, for example, analyzing and processing the 'sea' character to obtain the category attribute of the 'sea' character; then, analyzing and processing the road characters to obtain the category attributes of the road characters; and repeating the steps until the pot character is analyzed and processed to obtain the category attribute of the pot character.
The category attribute of the 'sea' word can represent whether the 'sea' is the correct word or not; the category attribute of the 'way' word can represent whether the 'way' is the correct word or not; by analogy, the category attribute of the "pan" word can represent whether the "pan" is the correct word.
In this embodiment, when the number of the initial characters is multiple, each initial character is analyzed one by one to determine the category attribute corresponding to each initial character, that is, to determine whether each initial character is a correct character one by one, so as to analyze all the initial characters, so that the analysis has the technical effects of comprehensiveness and completeness.
In some embodiments, each initial word has a location attribute, and accordingly, S202 may include: and sequentially determining the category attribute corresponding to each initial character according to the position attribute corresponding to each initial character.
The position attribute may be coordinate information, such as pixel coordinates, that is, each initial character has pixel coordinates, and according to the pixel coordinates corresponding to each initial character, a precedence order relationship of each initial character based on an image coordinate system on the signboard image may be determined, and based on the precedence order relationship, each initial character is sequentially analyzed, so as to obtain a category attribute corresponding to each initial character.
For example, as can be seen from the above analysis and fig. 3, the "sea" word has a pixel coordinate, and the "road" word also has a pixel coordinate, and according to the pixel coordinate of the "sea" word and the pixel coordinate of the "road" word, the precedence order relationship between the "sea" word and the "road" word in the signboard image can be determined, that is, the "sea" word is before the "road" word, the "sea" word is analyzed to obtain the category attribute of the "sea" word, and then the "road" word is analyzed to obtain the category attribute of the "road" word, and so on, which is not listed one by one.
It should be noted that the initial character category attribute is determined by combining the initial character position attribute, and since the position attribute has uniqueness, the initial character category attribute determined based on the position attribute has the technical effects of higher accuracy and reliability.
For example, when the "Hai Lu Hai le chafing dish" includes two "le" words, if the two "le" words have the same position attribute, and if the pixel coordinates of the two "le" words are the same, it indicates that there is a disadvantage of repeated recognition in the result of OCR recognition, and it may be determined that the category attribute of the "le" word represents that the "le" word is a redundant word, and one of the "le" words may be eliminated in order to improve the accuracy of text recognition.
In some embodiments, the network model may be combined to determine the type attribute of the original text, for example, a text position discriminator may be trained in advance to discriminate the original text based on the text position discriminator to obtain the category attribute of the original text.
The character position discriminator is obtained by training based on an initialization model parameter, the initialization model parameter is obtained by training based on a first sample data set, the first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises one wrong character.
Similarly, a "first" in the first sample dataset is used to distinguish the first sample dataset from other sample datasets, such as a second sample dataset in the following, and is not to be construed as a definition of the first sample dataset.
For example, an initial Point of Interest (POI) name text may be obtained, the initial POI name text may be modified, for example, one or more words in the initial POI name text may be modified into an error word, a variant POI name text may be obtained, and a first sample data set may be constructed based on the initial POI name text and the variant POI name text.
The initial point of interest name text and the variant point of interest name text in the first sample data set may be both referred to as sample data, that is, an initial point of interest name text is a sample data, and an interest name text of a variant is also a sample data.
The number of sample data in the first sample data set may be determined based on a requirement, a history, a test, and the like, and this embodiment is not limited. And how many modified interest point name texts an initial interest point name text is modified, this embodiment is not limited. And the number of modified words of an initial point of interest name text to obtain a variant point of interest name text, which is not limited in this embodiment.
The interest point name text can be understood as the text to which the focused text corresponds to the name. In the embodiment, the character position discriminator is obtained by training the name text of the interest point, and compared with the text corpus based on daily dialogue corpus or internet, the character position discriminator can achieve more targeted training, so that the character position discriminator has higher accuracy and reliability.
Especially, when the image to be recognized is a signboard image, the text in the signboard image to be recognized is the text corresponding to the name of the signboard image, and determining the text content of the signboard image based on the method of the present embodiment may have the technical effects of strong pertinence and reliability.
In this embodiment, the character position discriminator is used to determine the category attribute of the initial character, so that the efficiency of determining the category attribute of the initial character can be improved, and the character position discriminator is trained based on the initial interest point name text and the variant interest point name text, so that the character position discriminator has higher accuracy, that is, when the character position discriminator is used to determine the category attribute of the initial character, the technical effect of improving the accuracy of determining the category attribute of the initial character can be achieved.
For example, in combination with the above analysis and fig. 3, when OCR recognition is performed on the signboard image to obtain "hai lu happy hot pot", the "hai lu happy hot pot" may be input to the character position discriminator, and the category attribute corresponding to each of the "hai lu happy hot pot" may be output.
In some embodiments, different category attributes may be represented by different flag bits, where the flag bit W represents wrong and is used to represent incorrect text; the flag bit D represents duplicate and is used for representing redundant characters; the flag bit R represents right and is used to characterize the correct text.
As shown in fig. 3, the output "Hai Lu Hai le chaffy dish" has the corresponding category attributes of RWRRD, i.e. the flag bits of "Hai" word, "Kuai" word, the first "le" word, "Huo" word, and "Pan" word are R, i.e. they are all correct words; the flag bit of the way word is W, namely the way word is an error word; the flag bit of the second word is D, i.e. the second word is redundant.
In combination with the above analysis, each initial character has a position attribute, and accordingly, the flag bit may be a flag bit of the initial character representing the position attribute.
S203: and if the category attribute of the initial character represents that the initial character is a redundant character, removing the redundant character.
Illustratively, in combination with the above analysis, if the second word "le" is an unnecessary word, the unnecessary word of the second word "le" is eliminated, so as to obtain the "Hai Lu Hao le chafing dish".
In this embodiment, when the category attribute of the initial character represents that the initial character is a redundant character, the redundant character is removed to avoid character duplication, so as to improve the technical effects of accuracy and reliability of text recognition.
S204: if the type attribute of the initial character indicates that the initial character is an error character, mask processing is performed on the error character in the initial character.
Illustratively, in conjunction with the above analysis, the "way" word is an erroneous word, and accordingly, the "way" word is masked, as shown in fig. 3.
S205: and predicting the initial characters after mask processing to obtain a candidate set, and acquiring correct characters from the candidate set.
Wherein the candidate set includes error correction words for replacing erroneous words.
Illustratively, in combination with the above analysis and fig. 3, the "way" word is masked, and the initial word after masking is "sea (mask)" happy hot pot ", and the" way "word is predicted according to the" sea happy hot pot ", that is, the" way "word is predicted, so as to obtain a candidate set including error correction words for replacing the" way "word.
That is, the candidate set includes one or more error correction words, and if there is one error correction word, the error correction word may be determined to be a correct word, and if there are a plurality of error correction words, one error correction word may be obtained from the plurality of error correction words, and the obtained error correction word may be determined to be a correct word, so as to replace the incorrect word "way" based on the correct word.
In this embodiment, by combining "masking processing + prediction" to obtain a candidate set, and acquiring a correct word for correcting an incorrect word based on the candidate set, the technical effects of accuracy and reliability of text recognition can be improved.
In some embodiments, the candidate set may be obtained by combining a network Model, such as a mask Language recall Model (MLM) trained in advance, and the initial characters after mask processing are input into the mask Language recall Model to output the candidate set.
Exemplarily, in combination with the above embodiment and fig. 3, inputting the initial text after the mask processing as "sea [ mask ] happy hotpot" into the mask language recall model to obtain a candidate set: { Xian, Shang, Sai, nan }. Namely, the candidate set comprises four error correcting words of Xian, Yang, Sai and nan.
The mask language recall model is generated based on training of a second sample data set, and the second sample data set comprises sample interest point name texts.
Similarly, in the embodiment, the candidate set is obtained by combining the network model, so that the technical effects of efficiency and accuracy of the candidate set can be improved.
The sample data in the second sample data set may be the same as or different from the sample data in the first sample data set, and this embodiment is not limited.
In some embodiments, training the mask language recall model comprises the steps of:
the first step is as follows: and acquiring a second sample data set, wherein the second sample data set comprises a sample interest point name text.
Similarly, in the embodiment, the number of the sample interest point name texts may be determined based on the needs, the history, the experiment, and the like, which is not limited in the embodiment.
The second step is as follows: and performing mask processing on any character in the sample interest point name text to obtain a sample interest point name text after mask processing.
The third step: and predicting characters processed by the mask in the sample interest point name text after mask processing based on a preset basic network model to obtain predicted characters.
The fourth step: calculating the loss value MLM of the predicted characters and the labeled characters (i.e. the pre-labeled real characters)lossAnd adjusting parameters of the basic network model according to the loss values, and training to obtain a training mask language recall model.
If the masked word is y, the loss value MLM can be calculated by equation 1lossFormula 1:
MLMloss=-log(Pmask_yi)
where mask _ yi is the prediction (softmax) probability of the true word for word y.
It should be noted that, in this embodiment, the recall model for training the mask language is a sample interest point name text, which can avoid the tedious calculation of loss values, thereby improving the training efficiency and the technical effect of reliability.
In some embodiments, the base network model may be transformations including an encoder (encoder), and since the sampled sample interest point name trains the mask language recall model, the computation amount of the loss value is relatively reduced, and thus, the structure of the encoder in the transformations may be a six-layer structure to obtain an optimal combination of model parameters and inference performance.
In some embodiments, the number of error correction words is multiple, and obtaining the correct word from the candidate set includes the following steps:
the first step is as follows: and acquiring the font structure attribute of the wrong character and acquiring the font structure attribute of each error correction character.
Wherein the font structure attribute is used for representing the stroke content and/or the stroke sequence of the character.
The second step is as follows: and determining correct characters from the error correction characters according to the font structure attribute of the error characters and the font structure attribute corresponding to each error correction character.
Illustratively, in combination with the above analysis, if the wrong text is "way", and the error-correcting text is "xian, yang, mao, nan", then the font structure attribute of the "way" text is obtained, and the font structure attributes corresponding to the "xian, yang, mao, nan" text are obtained, so as to determine the correct text from the "xian, yang, mao, nan".
For example, the stroke content of the "way" word may be obtained, and the stroke content of the "xian" word, the stroke content of the "yang" word, the stroke content of the "steal" word, and the stroke content of the "nan" word may be obtained, so that the correct word may be determined from the "xian, yang, steal, and nan" according to the respective corresponding stroke content of the "way, xian, yang, steal, and south".
For another example, the stroke order of the "road" character can be obtained, and the stroke order of the "bright" character, the stroke order of the "foreign" character, the stroke order of the "stolen" character and the stroke order of the "south" character can be obtained, so that the correct character can be determined from the "fresh, foreign, stolen and south" characters according to the respective corresponding stroke order of the "road, fresh, foreign, stolen and south".
For another example, the stroke content and the stroke sequence of the road character can be obtained, and the stroke content and the stroke sequence of the Xian character, the stroke content and the stroke sequence of the foreign character, the stroke content and the stroke sequence of the stolen character, the stroke content and the stroke sequence of the south character can be obtained, so that the correct character can be determined from the Xian, the foreign, the stolen and the south characters according to the stroke content and the stroke sequence respectively corresponding to the road, the foreign, the stolen and the south characters.
It should be noted that different characters have different font structure attributes, and in this embodiment, the correct character is determined by combining the font structure attributes, so that the technical effects of accuracy and reliability of the determined correct character can be improved.
In some embodiments, the second step may comprise the sub-steps of:
the first sub-step: and calculating the similarity between the font structure attribute of the error correction characters and the font structure attribute of the error characters aiming at the font structure attribute of each error correction character.
The second substep: and determining correct characters from the error correction characters according to the similarity.
Illustratively, in conjunction with the above analysis, a similarity between the font structure attribute of the "way" word and the font structure attribute of the "Xian" word is calculated (for the sake of distinction, this similarity is referred to as a first similarity); calculating the similarity between the font structure attribute of the road character and the font structure attribute of the foreign character (for the sake of distinction, the similarity is called as a second similarity); calculating the similarity between the font structure attribute of the road character and the font structure attribute of the theft character (for the sake of convenience, the similarity is called as a third similarity); calculating the similarity between the font structure attribute of the road character and the font structure attribute of the south character (for the sake of convenience, the similarity is called as fourth similarity); and determining the correct character according to the first similarity, the second similarity, the third similarity and the fourth similarity.
The calculation method may adopt a shortest edit distance algorithm.
In this embodiment, the similarity is calculated based on the font structure attribute, so that the correct character is determined based on the similarity, and the similarity between two characters can be represented relatively closely by the similarity, so that the correct character determined based on the similarity has the technical effects of higher reliability and accuracy.
In some embodiments, the second substep may comprise the following refinement steps:
a first thinning step: the maximum similarity is determined from the similarities.
A second refining step: and extracting the error correction characters corresponding to the maximum similarity from the candidate set, and determining the error correction characters corresponding to the maximum similarity as correct characters.
For example, in combination with the above analysis, after the first similarity, the second similarity, the third similarity, and the fourth similarity are calculated, the maximum similarity may be determined from the first similarity, the second similarity, the third similarity, and the fourth similarity, so as to determine the correct character.
For example, if the third similarity is the maximum similarity among the four similarities, it indicates that the greater the similarity between the "stolen" word and the "road" word, the "stolen" word is determined to be the correct word.
S206: and generating the text content of the image to be recognized according to the initial characters from which the redundant characters are removed and the correct characters.
Illustratively, in combination with the above analysis and fig. 3, the original text from which the redundant text is removed is "haolu happy hot pot", the incorrect text is "road", the correct text is "steal", and the text content of the image to be recognized is "pirate happy hot pot".
That is to say, the correct characters can be substituted for the wrong characters in the initial characters, so as to obtain the text content of the image to be recognized, thereby avoiding the defects of character errors or repetition and the like caused by OCR (optical character recognition), and further improving the technical effects of accuracy and reliability of text recognition.
By combining the above analysis, the OCR recognition model, the character position discriminator, the mask language recall model may be trained in advance, and the calculation module for determining the similarity may be combined to recognize the image to be recognized, so as to obtain the text content of the image to be recognized, and the modules (i.e. the OCR recognition model, the character position discriminator, the mask language recall model, and the calculation module) are decoupled from each other, so that the recognition failure (e.g. recognition error) may be traced, so as to achieve the technical effects of higher accuracy and reliability of text recognition.
The character position discriminator, the mask language recall model and the calculation module can be taken as a whole, such as an error correction module, so as to correct the recognition result of the OCR recognition model, thereby obtaining the accurate text content of the image to be recognized.
Fig. 4 is a schematic diagram of a third embodiment of the present disclosure, and as shown in fig. 4, a training method of a model of the embodiment of the present disclosure includes:
s401: a first sample dataset is acquired.
The first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises an error word.
For example, the execution subject of this embodiment may be a training device of the model, the training device of the model may be a server (e.g., a local server, or a cloud server, or a server cluster), or may be a terminal device, or may be a processor, or may be a chip, and the like, which is not limited in this embodiment.
In combination with the above analysis, the training device of the model may be the same as the text recognition device, or may be a different device, and this embodiment is not limited thereto.
If the training device of the model and the text recognition device are different devices, after the training device of the model trains to obtain the character position discriminator, the character position discriminator can be transmitted to the text recognition device, so that the text recognition device deploys the character position discriminator and recognizes the image to be recognized to obtain the text content of the image to be recognized.
Or after the training device of the model trains to obtain the character position discriminator, if the text recognition device needs to recognize the image to be recognized, the character position discriminator trained by the training device of the model can be called, so as to obtain the text content of the image to be recognized.
Similarly, in order to avoid redundant descriptions, the technical features of the present embodiment that are the same as those of the above embodiments are not described again in this embodiment.
S402: and training according to the first sample data set to obtain an initialization model parameter, and training according to the initialization model parameter to obtain a character position discriminator.
The character position discriminator is used for analyzing and processing initial characters of the image to be recognized to obtain the category attributes of the initial characters.
Fig. 5 is a schematic diagram of a fourth embodiment of the present disclosure, and as shown in fig. 5, the method for training a model of the embodiment of the present disclosure includes:
s501: and acquiring an initial interest point name text, and modifying the initial interest point name text to obtain a variant interest point name text.
In one example, the character components in the initial point of interest name text may be modified to obtain a variant point of interest name text corresponding thereto.
For example, if the original point of interest name text includes the word "young", the radical of the word "young" may be modified to obtain the word "yang", and the word "yang" may be included in the variant point of interest name text.
In another example, words in the initial point of interest name text may be redundantly processed, resulting in a variant point of interest name text corresponding thereto.
For example, if the original point of interest name text includes the word "nation," the "nation" word may be redundantly processed to yield the word "nation," and if the variant point of interest name text includes two repeated words "nation.
It should be understood that the above examples are for illustrative purposes only, and that the manner in which variant interest name text may be derived based on the initial interest name text is not to be construed as a limitation on the manner in which variant interest name text is derived.
S502: a first sample dataset is constructed that includes an initial point of interest name text, and a variant point of interest name text.
S503: and inputting the first sample data set into an initial language discriminator model to perform classification training on the initial language discriminator model to obtain an initialization model parameter.
In this embodiment, the first sample data set is used to perform classification training on the initial language discriminator model to obtain an initialization model parameter, so that an OCR recognition model is subsequently trained based on the initialization model parameter, and a character position discriminator is obtained through two stages of training, so that the character position discriminator has a higher discrimination capability, thereby improving the technical effect of determining the accuracy and reliability of the initial character category attribute based on the character position discriminator.
Similarly, in some embodiments, the initial language discriminator model also includes an encoder, which is a six-layer structure encoder.
In some embodiments, S503 may include the steps of:
the first step is as follows: and inputting the first sample data set into an initial language discriminator model to obtain the prediction type attribute corresponding to each character in the first sample data set.
The second step is as follows: and determining initialization model parameters according to the prediction category attribute corresponding to each character and the marking category attribute corresponding to each character.
The prediction type attribute may be a prediction result obtained by predicting a type attribute corresponding to each character in the first sample data set based on a parameter of the initial language discriminator model. Such as predicting the type attribute of a certain character to be a wrong character.
The labeling type attribute may be understood as a real type attribute of a pre-labeled character, which may be labeled manually or in other manners, and this embodiment is not limited. For example, the type attribute of a certain character is marked as an error character based on a manual marking mode.
The initial language discriminator model is trained by combining the prediction category attribute and the labeling category attribute, if the parameter of the initial language discriminator model is adjusted, so that the difference value between the prediction type attribute and the labeling type attribute of the same character is smaller than a preset threshold value, the optimized language discriminator model is obtained, and the parameter of the optimized language discriminator model is determined as the initialization model parameter, so that the character position discriminator obtained based on the initialization model parameter training has more accurate and reliable prediction capability, and the technical effect of the accuracy and reliability of the initial character category attribute determined based on the character position discriminator can be achieved.
In some embodiments, the first step may comprise the sub-steps of:
the first substep: for each sample data, the sample data is an initial interest point name text or a variant interest point name text, and the position attribute corresponding to each character in the sample data is determined.
The second sub-step: and determining the prediction type attribute corresponding to each character in the sample data one by one according to the position attribute corresponding to each character in the sample data.
For example, if the sample data is "yanaab store," where "a" is an arbitrary character and "B" is also an arbitrary character, the "yanaab store" is input to the initial language discriminator model, and the language discriminator model may predict each character in the "yanaab store" in turn according to the determined position attributes, with the position attributes corresponding to each character in the "yanaab store".
For example, the Chinese character 'yang' is predicted to obtain a prediction category attribute W corresponding to the Chinese character 'yang', namely the Chinese character 'yang' is an error character; predicting the first A word to obtain a prediction type attribute R corresponding to the first A word, namely the first A word is a correct word; predicting the second A word to obtain a prediction type attribute D corresponding to the second A word, namely the second A word is a redundant word; predicting the B-th word to obtain a prediction type attribute R corresponding to the B-th word, namely the B-th word is a correct word; and predicting the 'shop' word to obtain a prediction type attribute R corresponding to the 'shop' word, namely the 'shop' word is a correct word.
In this embodiment, the prediction category attribute is determined by combining the position attribute, so that the capability of the initial language discriminator model to accurately recognize the redundant characters can be trained, that is, the technical effects of effectiveness and reliability of recognizing the redundant characters can be improved.
S504: and determining the initialization model parameters as the model parameters of the preset optical character recognition model.
S505: and training the optical character recognition model according to the acquired third sample data set to obtain the character position discriminator.
Wherein the third sample data set comprises sample image text.
That is, in this embodiment, after an initial language discriminator model is trained to obtain an optimized language discriminator model, parameters of the optimized language discriminator model are determined as initialization model parameters, and the initialization model parameters are determined as model parameters of an OCR recognition model, so as to train the OCR recognition model, thereby obtaining a character position discriminator.
Sample image text may be understood as text in the acquired sample image. Similarly, the number of the sample image texts may be determined based on the needs, history, experiments, and the like, which is not limited in this embodiment.
In this embodiment, the OCR recognition model is trained by combining the initialization model parameters, so that the OCR recognition model training can be converged conveniently, and the efficiency of obtaining the character position discriminator by training can be improved. The process of training to obtain the character position discriminator can be understood as two stages of training, wherein the training in the first stage is the classification training of an initial language discriminator model, so that the character position discriminator has higher classification performance; the second stage of training is recognition training of the OCR recognition model, so that the character position discriminator has stronger recognition performance, and the character position discriminator has the technical effects of higher accuracy and reliability.
In some embodiments, S505 comprises: and adjusting the model parameters of the OCR recognition model according to the third sample data set to obtain a character position discriminator.
Similarly, the OCR recognition model has model parameters, the OCR recognition model with the model parameters is adopted to recognize the sample image text to obtain predicted text content of the sample image text, the text content of the sample image text is labeled in advance to obtain labeled text content of the sample image text, and the model parameters of the OCR recognition model are adjusted based on loss values between the predicted text content and the labeled text content to obtain the character position discriminator.
In some embodiments, the sample image text is an image text recognized based on the OCR recognition model sample image, resulting in a false recognition result.
In the embodiment, the image text of the wrong recognition result is used as the sample image text, so that the recognition capability of the OCR recognition model can be improved, and the character position discriminator has the technical effects of higher accuracy and reliability.
Fig. 6 is a schematic diagram of a fifth embodiment of the present disclosure, and as shown in fig. 6, a text recognition apparatus 600 of the embodiment of the present disclosure includes:
the identifying unit 601 is configured to perform optical character identification on the acquired image to be identified to obtain initial characters of the image to be identified.
The analyzing unit 602 is configured to analyze the initial characters to obtain the category attributes of the initial characters.
An error correction unit 603, configured to, if the type attribute of the initial character indicates that the initial character is an erroneous character, perform error correction processing on the erroneous character to obtain a correct character for correcting the erroneous character.
The generating unit 604 is configured to generate text content of the image to be recognized according to the initial characters and the correct characters.
Fig. 7 is a schematic diagram of a sixth embodiment of the present disclosure, and as shown in fig. 7, a text recognition apparatus 700 of the embodiment of the present disclosure includes:
the recognition unit 701 is configured to perform optical character recognition on the acquired image to be recognized to obtain initial characters of the image to be recognized.
The analyzing unit 702 is configured to analyze the initial characters to obtain the category attributes of the initial characters.
In some embodiments, the number of initial words is multiple; the analyzing unit 702 is configured to analyze and process each initial character one by one to obtain a category attribute corresponding to each initial character.
In some embodiments, each initial word has a location attribute; the analyzing unit 702 is configured to sequentially determine the category attribute corresponding to each initial character according to the position attribute corresponding to each initial character.
In some embodiments, the analysis unit 702 is configured to input the initial text to a pre-trained text-bit discriminator and output a category attribute of the initial text.
The character position discriminator is obtained by training based on an initialization model parameter, the initialization model parameter is obtained by training based on a first sample data set, the first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises one wrong character.
The error correction unit 703 is configured to, if the type attribute of the initial character indicates that the initial character is an erroneous character, perform error correction processing on the erroneous character to obtain a correct character for correcting the erroneous character.
As can be seen in fig. 7, in some embodiments, the error correction unit 703 includes:
a mask subunit 7031, configured to perform mask processing on the erroneous text in the initial text.
The predictor 7032 is configured to predict the initial characters after the mask processing, so as to obtain a candidate set.
In some embodiments, the predictor 7032 is configured to input the masked initial text to a pre-trained mask language recall model, and output a candidate set.
The mask language recall model is generated based on training of a second sample data set, and the second sample data set comprises sample interest point name texts.
A first obtaining subunit 7033, configured to obtain correct characters from the candidate set; wherein the candidate set includes error correction words for replacing erroneous words.
In some embodiments, the number of error correction words is plural; the first acquiring subunit 7033 includes:
and the acquisition module is used for acquiring the font structure attribute of the wrong character and acquiring the font structure attribute of each error correction character, wherein the font structure attribute is used for representing the stroke content and/or the stroke sequence of the character.
And the first determining module is used for determining correct characters from the error correction characters according to the font structure attributes of the wrong characters and the font structure attributes corresponding to the error correction characters.
In some embodiments, the first determining module comprises:
and the calculation submodule is used for calculating the similarity between the font structure attribute of each error correction character and the font structure attribute of the error character according to the font structure attribute of each error correction character.
And the determining submodule is used for determining correct characters from the error correction characters according to the similarity.
In some embodiments, the determining submodule is configured to determine the maximum similarity from the similarities, extract the error-corrected text corresponding to the maximum similarity from the candidate set, and determine the error-corrected text corresponding to the maximum similarity as the correct text.
And the eliminating unit 704 is configured to, if the category attribute of the initial character indicates that the initial character is a redundant character, eliminate the redundant character to obtain a text content of the image to be recognized.
The generating unit 705 is configured to generate text content of the image to be recognized according to the initial characters and the correct characters.
As can be seen from fig. 7, in some embodiments, the generating unit 705 is configured to replace the incorrect word in the initial word with the correct word, so as to obtain the text content of the image to be recognized.
Fig. 8 is a schematic diagram of a seventh embodiment of the present disclosure, and as shown in fig. 8, the training apparatus 800 of the model of the embodiment of the present disclosure includes:
an obtaining unit 801, configured to obtain a first sample data set, where the first sample data set includes an initial point of interest name text and a variant point of interest name text obtained by modifying the initial point of interest name text, and the variant point of interest name text includes at least one wrong word.
A first training unit 802, configured to train to obtain initialization model parameters according to the first sample data set.
The second training unit 803 is configured to obtain a text position discriminator according to the training of the initialization model parameter, where the text position discriminator is configured to analyze and process an initial text of the image to be recognized, and obtain a category attribute of the initial text.
Fig. 9 is a schematic diagram of an eighth embodiment of the present disclosure, and as shown in fig. 9, the training apparatus 900 of the model of the embodiment of the present disclosure includes:
an obtaining unit 901, configured to obtain a first sample data set, where the first sample data set includes an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text includes at least one wrong word.
As can be seen in fig. 9, in some embodiments, the obtaining unit 901 includes:
a second obtaining subunit 9011, configured to obtain an initial point of interest name text.
And a modifying subunit 9012, configured to modify the initial interest point name text to obtain a modified interest point name text.
A first training unit 902, configured to train to obtain initialization model parameters according to the first sample data set.
In some embodiments, the first training unit 902 is configured to input the first sample data set to the initial language discriminator model to perform classification training on the initial language discriminator model, so as to obtain the initialization model parameters.
As can be seen in fig. 9, in some embodiments, the first training unit 902 includes:
an input subunit 9021, configured to input the first sample data set to the initial language discriminator model, to obtain a prediction category attribute corresponding to each character in the first sample data set.
In some embodiments, the input subunit 9021 includes:
and the second determining module is used for determining the position attribute corresponding to each character in the sample data, wherein the sample data is the initial interest point name text or the variant interest point name text aiming at each sample data.
And the third determining module is used for determining the prediction category attribute corresponding to each character in the sample data one by one according to the position attribute corresponding to each character in the sample data.
A first determining subunit 9022, configured to determine an initialization model parameter according to the prediction category attribute corresponding to each character and the label category attribute corresponding to each character.
And a second training unit 903, configured to obtain a character position discriminator through training according to the initialization model parameter, where the character position discriminator is configured to analyze and process an initial character of the image to be recognized, so as to obtain a category attribute of the initial character.
As can be seen in fig. 9, in some embodiments, the second training unit 903 includes:
a second determining subunit 9031, configured to determine the initialization model parameters as model parameters of a preset optical character recognition model.
And the training subunit 9032 is configured to train the optical character recognition model according to the acquired third sample data set to obtain a character position discriminator, where the third sample data set includes a sample image text.
In some embodiments, the training subunit 9032 is configured to adjust a model parameter of the optical character recognition model according to the third sample data set, so as to obtain the text position discriminator.
In some embodiments, the sample image text is an image text recognized based on the sample image of the optical character recognition model, resulting in a false recognition result.
Fig. 10 is a schematic diagram according to a ninth embodiment of the present disclosure, and as shown in fig. 10, an electronic device 1000 in the present disclosure may include: a processor 1001 and a memory 1002.
A memory 1002 for storing programs; the Memory 1002 may include a volatile Memory (RAM), such as a Static Random Access Memory (SRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and the like; the memory may also comprise a non-volatile memory, such as a flash memory. The memory 1002 is used to store computer programs (e.g., applications, functional modules, etc. that implement the above-described methods), computer instructions, etc., which may be stored in one or more of the memories 1002 in a partitioned manner. And the above-described computer program, computer instructions, data, etc. may be called by the processor 1001.
The computer programs, computer instructions, etc., described above may be stored in partitions in the one or more memories 1002. And the above-mentioned computer program, computer instruction, etc. can be called by the processor 1001.
A processor 1001 for executing the computer program stored in the memory 1002 to implement the steps of the method according to the above embodiments.
Reference may be made in particular to the description relating to the previous method embodiments.
The processor 1001 and the memory 1002 may be separate structures or may be an integrated structure integrated together. When the processor 1001 and the memory 1002 are separate structures, the memory 1002 and the processor 1001 may be coupled through a bus 1003.
The electronic device of this embodiment may execute the technical solution in the method, and the specific implementation process and the technical principle are the same, which are not described herein again.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.
FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as a text recognition method and a training method of a model. For example, in some embodiments, the text recognition methods and the training methods of the models may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into RAM 1103 and executed by the computing unit 1101, one or more steps of the text recognition method and the training method of the model described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform the text recognition method and the training method of the model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (41)

1. A text recognition method, comprising:
carrying out optical character recognition on the acquired image to be recognized to obtain initial characters of the image to be recognized;
analyzing the initial characters to obtain the category attributes of the initial characters, and if the category attributes of the initial characters represent that the initial characters are wrong characters, performing error correction on the wrong characters to obtain correct characters for correcting the wrong characters;
and generating the text content of the image to be recognized according to the initial characters and the correct characters.
2. The method of claim 1, wherein the initial number of words is a plurality; analyzing and processing the initial characters to obtain the category attributes of the initial characters, wherein the method comprises the following steps:
and analyzing and processing each initial character one by one to obtain the category attribute corresponding to each initial character.
3. The method of claim 2, wherein each initial literal has a location attribute; analyzing and processing each initial character one by one to obtain the category attribute corresponding to each initial character, wherein the category attribute comprises the following steps:
and sequentially determining the category attribute corresponding to each initial character according to the position attribute corresponding to each initial character.
4. The method according to any one of claims 2-3, wherein analyzing the initial text to obtain a category attribute of the initial text comprises:
inputting the initial characters to a pre-trained character position discriminator, and outputting the category attributes of the initial characters;
the character position discriminator is obtained by training based on an initialization model parameter, the initialization model parameter is obtained by training based on a first sample data set, the first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises an error character.
5. The method of any one of claims 1-4, wherein error correcting the erroneous text to obtain correct text for correcting the erroneous text comprises:
performing mask processing on the wrong characters in the initial characters;
predicting the initial characters after mask processing to obtain a candidate set, and acquiring the correct characters from the candidate set; wherein the candidate set includes error correction words for replacing the erroneous words.
6. The method of claim 5, wherein the number of error correction words is plural; obtaining the correct word from the candidate set, including:
acquiring the font structure attribute of the wrong character and acquiring the font structure attribute of each error correction character, wherein the font structure attribute is used for representing the stroke content and/or the stroke sequence of the character;
and determining the correct characters from the error-corrected characters according to the font structure attribute of the error characters and the font structure attribute corresponding to each error-corrected character.
7. The method according to claim 6, wherein the correct text is determined from the error corrected text according to the font structure attribute of the erroneous text and the font structure attribute corresponding to each error corrected text; the method comprises the following steps:
calculating the similarity between the font structure attribute of each error correction character and the font structure attribute of the error character according to the font structure attribute of each error correction character;
and determining the correct characters from the error correction characters according to the similarity.
8. The method of claim 7, wherein determining the correct word from each error corrected word based on each similarity comprises:
determining the maximum similarity from the similarities;
and extracting the error correction characters corresponding to the maximum similarity from the candidate set, and determining the error correction characters corresponding to the maximum similarity as the correct characters.
9. The method of any of claims 5-8, wherein predicting the masked initial text to obtain a candidate set comprises:
inputting the initial characters after mask processing to a mask language recall model trained in advance, and outputting the candidate set;
the mask language recall model is generated based on training of a second sample data set, and the second sample data set comprises sample interest point name texts.
10. The method according to any one of claims 1-9, wherein generating the text content of the image to be recognized from the initial words and the correct words comprises:
and replacing the wrong characters in the initial characters with the correct characters to obtain the text content of the image to be recognized.
11. The method of any of claims 1-10, after parsing the initial text to obtain a category attribute of the initial text, the method further comprising:
and if the category attribute of the initial character represents that the initial character is a redundant character, removing the redundant character to obtain the text content of the image to be recognized.
12. A method of training a model, comprising:
obtaining a first sample data set, wherein the first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises an error word;
and training according to the first sample data set to obtain an initialization model parameter, and training according to the initialization model parameter to obtain a character position discriminator, wherein the character position discriminator is used for analyzing and processing initial characters of an image to be recognized to obtain the category attribute of the initial characters.
13. The method of claim 12, wherein acquiring a first sample dataset comprises:
and acquiring the initial interest point name text, and modifying the initial interest point name text to obtain the variant interest point name text.
14. The method of claim 12 or 13, wherein training from the first sample data set to derive initialization model parameters comprises:
and inputting the first sample data set into an initial language discriminator model to perform classification training on the initial language discriminator model to obtain the initialized model parameters.
15. The method of claim 14, wherein inputting the first sample data set to an initial language discriminator model for classification training of the initial language discriminator model to obtain the initialization model parameters comprises:
inputting the first sample data set to an initial language discriminator model to obtain a prediction category attribute corresponding to each character in the first sample data set;
and determining the initialization model parameters according to the prediction category attribute corresponding to each character and the marking category attribute corresponding to each character.
16. The method of claim 15, wherein inputting the first sample dataset into an initial language discriminator model to obtain a prediction category attribute corresponding to each text in the first sample dataset comprises:
for each sample data, the sample data is an initial interest point name text or a variant interest point name text, and the position attribute corresponding to each character in the sample data is determined;
and determining the prediction type attribute corresponding to each character in the sample data one by one according to the position attribute corresponding to each character in the sample data.
17. The method of any one of claims 12-16, wherein training a literal arbiter based on the initialized model parameters comprises:
determining the initialization model parameters as model parameters of a preset optical character recognition model;
and training the optical character recognition model according to an acquired third sample data set to obtain the character position discriminator, wherein the third sample data set comprises a sample image text.
18. The method of claim 17, wherein training the optical character recognition model according to the obtained third sample data set to obtain the text position discriminator comprises:
and adjusting the model parameters of the optical character recognition model according to the third sample data set to obtain the character position discriminator.
19. The method of claim 17, wherein the sample image text is an image text recognized based on the optical character recognition model sample image, resulting in a false recognition result.
20. A text recognition apparatus comprising:
the identification unit is used for carrying out optical character identification on the acquired image to be identified to obtain initial characters of the image to be identified;
the analysis unit is used for analyzing and processing the initial characters to obtain the category attributes of the initial characters;
the error correction unit is used for performing error correction processing on the wrong characters to obtain correct characters for correcting the wrong characters if the category attributes of the initial characters represent that the initial characters are wrong characters;
and the generating unit is used for generating the text content of the image to be recognized according to the initial characters and the correct characters.
21. The apparatus of claim 20, wherein the initial number of words is a plurality; the analysis unit is used for analyzing and processing the initial characters one by one to obtain the category attribute corresponding to each initial character.
22. The apparatus of claim 21, wherein each initial literal has a location attribute; and the analysis unit is used for sequentially determining the category attribute corresponding to each initial character according to the position attribute corresponding to each initial character.
23. The apparatus according to any one of claims 21-22, wherein the analysis unit is configured to input the initial text to a pre-trained text-level discriminator, and output a category attribute of the initial text;
the character position discriminator is obtained by training based on an initialization model parameter, the initialization model parameter is obtained by training based on a first sample data set, the first sample data set comprises an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text at least comprises an error character.
24. The apparatus according to any one of claims 20-23, wherein the error correction unit comprises:
a mask subunit, configured to perform mask processing on an incorrect word in the initial word;
the prediction subunit is used for predicting the initial characters after the mask processing to obtain a candidate set;
a first obtaining subunit, configured to obtain the correct word from the candidate set; wherein the candidate set includes error correcting words for replacing the erroneous words.
25. The apparatus of claim 24, wherein the number of error correction words is plural; the first acquisition subunit includes:
the acquisition module is used for acquiring the font structure attribute of the wrong character and acquiring the font structure attribute of each error correction character, wherein the font structure attribute is used for representing the stroke content and/or the stroke sequence of the character;
and the first determining module is used for determining the correct character from each error correction character according to the font structure attribute of the wrong character and the font structure attribute corresponding to each error correction character.
26. The apparatus of claim 25, wherein the first determining means; the method comprises the following steps:
the calculation submodule is used for calculating the similarity between the font structure attribute of each error correction character and the font structure attribute of the wrong character according to the font structure attribute of each error correction character;
and the determining submodule is used for determining the correct characters from the error correction characters according to the similarity.
27. The apparatus according to claim 26, wherein the determining sub-module is configured to determine a maximum similarity from the similarities, extract the error-correcting word corresponding to the maximum similarity from the candidate set, and determine the error-correcting word corresponding to the maximum similarity as the correct word.
28. The apparatus according to any one of claims 24-27, wherein the predictor unit is configured to input the masked initial text to a pre-trained masking language recall model, and output the candidate set;
the mask language recall model is generated based on training of a second sample data set, and the second sample data set comprises sample interest point name texts.
29. The apparatus according to any of claims 20-28, wherein the generating unit is configured to replace a wrong text in the initial text with the correct text to obtain the text content of the image to be recognized.
30. The apparatus of any one of claims 20-29, further comprising:
and the eliminating unit is used for eliminating the redundant characters to obtain the text content of the image to be recognized if the category attribute of the initial characters represents that the initial characters are redundant characters.
31. An apparatus for training a model, comprising:
an obtaining unit, configured to obtain a first sample data set, where the first sample data set includes an initial interest point name text and a variant interest point name text obtained by modifying the initial interest point name text, and the variant interest point name text includes at least one wrong word;
the first training unit is used for training according to the first sample data set to obtain an initialization model parameter;
and the second training unit is used for training according to the initialization model parameters to obtain a character position discriminator, wherein the character position discriminator is used for analyzing and processing initial characters of an image to be recognized to obtain the category attributes of the initial characters.
32. The apparatus of claim 31, wherein the obtaining unit comprises:
the second obtaining subunit is used for obtaining the initial interest point name text;
and the modifying subunit is used for modifying the initial interest point name text to obtain the interest point name text of the variant.
33. The apparatus according to claim 31 or 32, wherein the first training unit is configured to input the first sample data set to an initial language discriminator model for performing classification training on the initial language discriminator model to obtain the initialization model parameters.
34. The apparatus of claim 33, wherein the first training unit comprises:
an input subunit, configured to input the first sample data set to an initial language discriminator model, so as to obtain a prediction category attribute corresponding to each character in the first sample data set;
and the first determining subunit is used for determining the initialization model parameters according to the prediction category attribute corresponding to each character and the marking category attribute corresponding to each character.
35. The apparatus of claim 34, wherein the input subunit comprises:
a second determining module, configured to determine, for each sample data, a position attribute corresponding to each character in the sample data, where the sample data is an initial interest point name text or a variant interest point name text;
and the third determining module is used for determining the prediction category attribute corresponding to each character in the sample data one by one according to the position attribute corresponding to each character in the sample data.
36. The apparatus of any one of claims 31-35, wherein the second training unit comprises:
the second determining subunit is used for determining the initialization model parameters as model parameters of a preset optical character recognition model;
and the training subunit is used for training the optical character recognition model according to an acquired third sample data set to obtain the character position discriminator, wherein the third sample data set comprises a sample image text.
37. The apparatus of claim 36, wherein the training subunit is configured to adjust model parameters of the optical character recognition model according to the third sample data set, so as to obtain the text-level discriminator.
38. The apparatus of claim 37, wherein the sample image text is an image text recognized based on the optical character recognition model sample image, resulting in a false recognition result.
39. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11; or to enable the at least one processor to perform the method of any of claims 12-19.
40. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11; alternatively, the computer instructions are for causing the computer to perform the method of any of claims 12-19.
41. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method of any one of claims 1 to 11; alternatively, the computer program realizes the steps of the method of any one of claims 12-19 when executed by a processor.
CN202210198346.1A 2022-03-01 2022-03-01 Text recognition method, model training method and device Pending CN114663886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210198346.1A CN114663886A (en) 2022-03-01 2022-03-01 Text recognition method, model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210198346.1A CN114663886A (en) 2022-03-01 2022-03-01 Text recognition method, model training method and device

Publications (1)

Publication Number Publication Date
CN114663886A true CN114663886A (en) 2022-06-24

Family

ID=82027694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210198346.1A Pending CN114663886A (en) 2022-03-01 2022-03-01 Text recognition method, model training method and device

Country Status (1)

Country Link
CN (1) CN114663886A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882383A (en) * 2023-07-26 2023-10-13 中信联合云科技有限责任公司 Digital intelligent proofreading system based on text analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882383A (en) * 2023-07-26 2023-10-13 中信联合云科技有限责任公司 Digital intelligent proofreading system based on text analysis

Similar Documents

Publication Publication Date Title
CN110765996B (en) Text information processing method and device
CN114372477B (en) Training method of text recognition model, and text recognition method and device
CN115035538B (en) Training method of text recognition model, and text recognition method and device
CN114821622B (en) Text extraction method, text extraction model training method, device and equipment
CN113313022B (en) Training method of character recognition model and method for recognizing characters in image
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
US12118770B2 (en) Image recognition method and apparatus, electronic device and readable storage medium
US20240304015A1 (en) Method of training deep learning model for text detection and text detection method
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
CN114419035A (en) Product identification method, model training device and electronic equipment
CN113657395A (en) Text recognition method, and training method and device of visual feature extraction model
CN115546488B (en) Information segmentation method, information extraction method and training method of information segmentation model
CN113361523A (en) Text determination method and device, electronic equipment and computer readable storage medium
EP4191544A1 (en) Method and apparatus for recognizing token, electronic device and storage medium
CN115909376A (en) Text recognition method, text recognition model training device and storage medium
US11610396B2 (en) Logo picture processing method, apparatus, device and medium
CN114663886A (en) Text recognition method, model training method and device
CN114998897B (en) Method for generating sample image and training method of character recognition model
CN114120305B (en) Training method of text classification model, and text content recognition method and device
CN113361522B (en) Method and device for determining character sequence and electronic equipment
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium
CN115273103A (en) Text recognition method and device, electronic equipment and storage medium
CN115294581A (en) Method and device for identifying error characters, electronic equipment and storage medium
CN115631502A (en) Character recognition method, character recognition device, model training method, electronic device and medium
CN114662469A (en) Emotion analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination