[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116644737A - Proper noun error correction method based on automatic word stock updating and prefix tree structure - Google Patents

Proper noun error correction method based on automatic word stock updating and prefix tree structure Download PDF

Info

Publication number
CN116644737A
CN116644737A CN202310163847.0A CN202310163847A CN116644737A CN 116644737 A CN116644737 A CN 116644737A CN 202310163847 A CN202310163847 A CN 202310163847A CN 116644737 A CN116644737 A CN 116644737A
Authority
CN
China
Prior art keywords
text
trigger
proper
word
proper noun
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310163847.0A
Other languages
Chinese (zh)
Inventor
王晶
李国定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Consistent Zhifu Hangzhou Technology Co ltd
Original Assignee
Consistent Zhifu Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Consistent Zhifu Hangzhou Technology Co ltd filed Critical Consistent Zhifu Hangzhou Technology Co ltd
Priority to CN202310163847.0A priority Critical patent/CN116644737A/en
Publication of CN116644737A publication Critical patent/CN116644737A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a proper noun error correction method based on automatic word stock updating and prefix tree structure, which comprises the following steps: step 1, automatically updating a proprietary name lexicon; step 2, acquiring trigger words of proper nouns, and acquiring prefix trees related to the trigger words and trigger dictionaries of the proper nouns; step 3, searching trigger words in the text to be corrected based on the prefix tree, and acquiring proper noun candidate words based on the trigger dictionary; and 4, intercepting a plurality of text fragments associated with the trigger words from the text to be corrected, calculating editing distances between the text fragments and the proper noun candidate words, and selecting the longest text fragment to execute corresponding editing operation by taking the minimum editing distance as a target.

Description

Proper noun error correction method based on automatic word stock updating and prefix tree structure
Technical Field
The invention belongs to the technical field of semantic recognition, and particularly relates to a proper noun error correction method based on automatic word stock updating and prefix tree structure.
Background
Proper nouns in news texts often have errors such as spelling, grammar and the like, which often bring poor reading experience to readers, and influence the authenticity of news, and if manual auditing is used, higher cost is brought, and the method for constructing automatic error correction has important significance for improving the auditing efficiency of news and reducing cost. The proper noun error correction is used as a practical floor scene in the text error correction field, and has good application prospect in checking and correcting news texts. The current mainstream technical scheme is divided into two types of error correction based on a model and error correction based on rules, wherein the error correction method based on the model is to train a deep learning model with an error correction function by using labeled training corpus; the rule-based error correction method is to construct a word stock in advance, detect error words through rule logic of making error correction, and search related words in the word stock as correct words against a near-pronunciation dictionary and a near-shape dictionary.
In the error correction scene of proper nouns, the error correction method based on the model is often inferior to the method based on the rule in accuracy and stability, on the other hand, the accuracy of the error correction method based on the rule mainly depends on the word stock, so that new words are required to be continuously supplemented into the word stock in a manual mode, and the cost is high.
Disclosure of Invention
The invention aims to provide a proper noun error correction method based on automatic word stock updating and prefix tree structure, which aims to solve the problems that the error correction method based on a model, which is proposed in the background art, is poor in accuracy and stability and the maintenance cost of the accuracy of the error correction method based on rules is high.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a proper noun error correction method based on automated lexicon updating and prefix tree structure, the method comprising the steps of:
step 1, automatically updating a proprietary name lexicon;
step 2, acquiring trigger words of proper nouns, and acquiring prefix trees related to the trigger words and trigger dictionaries of the proper nouns;
step 3, searching trigger words in the text to be corrected based on the prefix tree, and acquiring proper noun candidate words based on the trigger dictionary; and 4, intercepting a plurality of text fragments associated with the trigger words from the text to be corrected, calculating editing distances between the text fragments and the proper noun candidate words, and selecting the longest text fragment to execute corresponding editing operation by taking the minimum editing distance as a target.
Preferably, in the step 1, the automatic updating of the proper name word stock is to acquire text data through a timed crawling task, and the proper noun recognition is performed through a trained entity recognition model, so as to update the proper name word stock.
Preferably, the step 1 includes the steps of:
step 1.1, crawling news texts containing proper nouns, and performing manual labeling, including labeling of the proper nouns;
step 1.2, training on the labeled text by utilizing NER entity recognition technology to obtain a trained entity recognition model;
step 1.3, regularly crawling news texts containing proper nouns and identifying through a trained entity identification model to obtain new proper nouns;
and step 1.4, adding the new proper nouns into a proper noun library to update the proper noun library.
Preferably, the step 2 includes the steps of:
step 2.1, slicing any proper noun in the proper noun library and taking the sliced proper noun as a trigger word of the proper noun;
step 2.2, constructing a prefix tree based on the trigger words, and constructing a trigger dictionary { key ] based on the corresponding relation between the trigger words and the proper nouns: [ word1, word2, ] }, where key represents a trigger word and word represents a proper noun.
Preferably, the step 2.1 is to make a word length L for any proper noun i The corresponding generation length is in (L i /2,L i ) And (3) slicing the proper nouns into trigger words of the proper nouns through a window sliding method in the window of the interval.
Preferably, the step 4 includes the steps of:
step 4.1, based on the trigger word length, intercepting window fragments at corresponding positions in the text to be corrected;
step 4.2, intercepting a plurality of text fragments in the window fragments based on the offset of the trigger word positions;
step 4.3, comparing each text segment with the candidate word to obtain the editing distance and the editing operation;
and 4.4, reserving an editing operation with the shortest editing distance and the longest text slice as the optimal error correction of the candidate noun.
Preferably, in step 4.1, if the position of the trigger word is the start position or the end position, the characters with the preset length are intercepted before or after the trigger word, and the window segment is formed by the characters with the preset length and the candidate word, otherwise, the window segment is formed by the characters with the preset length and the candidate word are intercepted before and after the trigger word.
Preferably, the step 4.2 comprises the steps of:
step 4.2.1, determining a interception length section of the text fragment, wherein the minimum interception length is the trigger word length, and the maximum interception length is the window fragment length;
and 4.2.2, selecting different interception lengths based on the interception length intervals, and intercepting the text fragments from the window fragments by a window sliding method.
Preferably, in the step 4.4, if the two editing operations are performed on the same place in the error correction text and the error correction operations are identical, only one piece is reserved; if the text fragments for which the two editing operations are directed exist in the contained relation, a longer one is reserved.
Compared with the prior art, the invention has the beneficial effects that:
the invention realizes continuous automatic update of the proper noun error correction word stock by utilizing the entity recognition technology, supplements new words into the proper noun based on automatic update of the proper noun, and solves the problem of high cost of manually updating the word stock; meanwhile, the prefix tree and window sliding error detection method can be used for being compatible with the updated word stock in real time, and the problem of poor reusability is solved.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of an entity recognition model.
Fig. 3 is a schematic diagram of the structure of a prefix tree.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.
Referring to fig. 1, a proper noun error correction method based on automatic lexicon update and prefix tree structure is implemented in 4 steps.
Step 1, automatic update of a proprietary name lexicon: text data is obtained through the timed crawling task, and proper noun recognition is carried out through the trained entity recognition model so as to update a proper noun library.
Specifically, the step 1 includes the following procedures:
step 1.1, crawling news texts containing proper nouns, and performing manual labeling, including labeling of the proper nouns;
step 1.2, training on the labeled text by utilizing NER entity recognition technology to obtain a trained entity recognition model;
step 1.3, regularly crawling news texts containing proper nouns and identifying through a trained entity identification model to obtain new proper nouns;
and step 1.4, adding the new proper nouns into a proper noun library to update the proper noun library.
As one embodiment of the present invention, the specific operation of step 1.1 is as follows: crawling 1000 news texts containing proper nouns from an official website, segmenting each text according to a maximum length 512, finally obtaining 4000 pieces of data, and dividing the 4000 pieces of data into a training set and a verification set according to the proportion of 8:2; labeling proper nouns, belonging categories and head and tail positions thereof in the text contained in each piece of data;word segmentation is carried out on each marked text by using a token word segmentation device to obtain corresponding text sequences (w 1 ,w 2 ,...,w m ) A total of 4000; preprocessing each marked text by adopting a BIO marking mode, and referring to FIG. 1, the preprocessing is specifically as follows: the starting position of each proper noun is marked as "B-X", and other positions than the starting position in each proper noun are marked as "I-X", wherein "X" represents the category represented by the current noun, such as "political policy class", and furthermore, non-proper noun parts in the text are marked as letters "o"; thus, a marker sequence (t) 1 ,t 2 ,...,t m ). The operation of this step 1.2 is as follows: sequence of text (w 1 ,w 2 ,...,w m ) As an input to the pre-training model Bert, the output (v 1 ,v 2 ,...,v m ) Each element v in the output ε Epsilon = 1,2,..a vector of 768 dimensions each; taking the result obtained by the Bert learning of the pre-training model as the input of a full connection layer, so that each element v ε The dimension of (a) is reduced to the number of "BIO" labels to obtain (alpha) 1 ,α 2 ,...,α m ) The method comprises the steps of carrying out a first treatment on the surface of the Will (. Alpha.) 1 ,α 2 ,...,α m ) As the current tag score of the CRF model, a tag transfer matrix is randomly initialized, a tag sequence (t 1 ,t 2 ,...,t m ) As a target path, learning association information between tags, and calculating to obtain a total score of all paths on the assumption that the number of all possible paths transferred by the current text tag is N:
wherein,,score representing ith label path to text tail position m, S i A sum of a label score and a label transfer score representing an i-th label path;
defining a loss function as:
wherein,,tag score representing the i-th position of text under the correct path,/->Tag y representing the i-th position of text in the correct path i Transfer to the (i+1) th position corresponding tag y i+1 Is a fraction of (2);
the entity recognition model Bert-CRF learns on a training set in a gradient descent mode, so that the loss function reaches a minimum value, and after each round of training, the entity recognition model is evaluated on a verification set, and model parameters of the round with the best evaluation effect are saved, so that a trained entity recognition model is obtained.
And 2, acquiring the trigger words of the proper nouns, and acquiring a prefix tree related to the trigger words and a trigger dictionary of the proper nouns.
The step 2 specifically comprises the following steps:
step 2.1, slicing any proper noun in the proper noun library and taking the sliced proper noun as a trigger word of the proper noun;
step 2.2, constructing a prefix tree based on the trigger words, and constructing a trigger dictionary { key ] based on the corresponding relation between the trigger words and the proper nouns: [ word1, word2, ]) wherein the key represents a trigger word and the word represents a proper noun.
As one embodiment of the present invention, in the step 2.1, the length L of any proper noun according to the word i Corresponding to the generation length of L i /2,L i ]And (3) slicing the proper nouns into trigger words of the proper nouns through a window sliding method in the window of the interval. If the term "new tax law" is 2 to 4 characters in the window length interval, it has 3 windows of 2, 3,4, the trigger words of the proper noun are "new", "tax", "new tax", "tax", and "new tax".
In step 2.2 of the present invention, taking the term "personal income tax" as an example, the term "1" indicates that the end of the search is reached, and the tree structure is shown in fig. 3.
And step 3, searching trigger words from the text to be corrected according to the prefix tree, and acquiring all corresponding proper nouns from the trigger word dictionary as candidate words based on the searched trigger words.
For example, if the text to be corrected is "the personal tax should be paid on time and the @ is performed, the trigger word" personal place "is available according to the prefix tree, and all proper nouns [ word1, word2, ] related to the trigger word" personal place "are recorded in the trigger word dictionary.
And 4, intercepting a plurality of text fragments associated with the trigger words from the text to be corrected, calculating editing distances between the text fragments and the proper noun candidate words, and selecting the longest text fragment to execute corresponding editing operation by taking the minimum editing distance as a target.
Edit distance refers to the minimum number of edit operations required to switch from one to the other between two strings, and if their distance is greater, it means that they are different. The permitted editing operations include replacing one character with another, inserting one character, and deleting one character.
The step 4 specifically comprises the following steps:
step 4.1, determining the length of a cutting window based on the length of the trigger word and the position of the trigger word, and cutting out window fragments at corresponding positions in the text to be corrected;
step 4.2, intercepting a plurality of text fragments in the window fragments based on the offset of the trigger word positions;
step 4.3, comparing each text segment with the candidate word to obtain the editing distance and the editing operation;
and 4.4, reserving an editing operation with the shortest editing distance and the longest text slice as the optimal error correction of the corresponding candidate noun.
In step 4.1 of the present invention, if the position of the trigger word is the starting position or the ending position, the character with the preset length is intercepted before or after the trigger word, and the candidate word forms a window segment, otherwise, the character with the preset length is intercepted before or after the trigger word, and the candidate word forms a window segment. Here, the preset length is 2 characters, the length of the trigger word "personal" is 3, and the position of the trigger word is not at the beginning or end of a sentence, so that the characters with the length of 2 are intercepted before and after the trigger word "personal" respectively, and the window segment "personal tax sum" is obtained.
Step 4.2 of the present invention comprises the steps of: step 4.2.1, determining a interception length section of the text fragment based on the trigger word, wherein the minimum interception length is the trigger word length, and the maximum interception length is the window fragment length; and 4.2.2, selecting different interception lengths based on the interception length intervals, and intercepting the text fragments from the window fragments by a window sliding method. Here, the minimum interception length and the maximum interception length of the text segment are 3 and 7 respectively, so that the window length of the sliding window is 5, and is 3, 4, 5, 6 and 7 respectively; the window segments are slide-sliced using these slide windows to obtain a plurality of text segments, such as the text segments "personal", "personal taxed", "tax and" tax "when the window length of the slide window is 3.
In step 4.3 of the invention, the editing distance between each text segment and each candidate word is calculated, if the length of the text segment is greater than the length of the candidate word in the calculating process of the editing distance, deleting operation is firstly carried out, and redundant characters at the initial position and the tail position are deleted; if the initial position and the tail position of the text segment do not have redundant characters, if the deleting operation is required to be performed on the text segment and the deleted characters do not belong to characters of proper nouns, or if the replacing operation is required to be performed on the text segment and the replaced characters are not homophones of the original characters, the text segment is considered to be unmatched with the candidate words, editing operation and editing distance calculation are not performed any more, and comparison of the next text segment is performed; if the total number of the modified characters related to other operations is larger than a preset threshold value, the text segment is considered to be not matched with the candidate word, editing operation and editing distance calculation are not performed any more, and next text segment comparison is performed; here the preset threshold is set to 2.
For example, a window segment is "personal tax" and "corresponds to a plurality of text segments, wherein a text segment is" personal tax "and one of all proper nouns corresponding to the trigger word" personal tax "is" personal income tax ", which is a candidate word to be compared with the text segment" personal tax "to calculate an edit distance, and since the text length of the text segment is 7 and the text length of the candidate word is 5, it is necessary to determine whether there are redundant characters at the start position and the end position, here," personal "and" are deleted "when the edit distance is 3, to obtain" personal tax ", then an insert operation is performed to insert" to obtain "personal income tax", and the edit distance is changed from 3 to 4. Here, since the deletion operation involves the total number of modified characters being 3, exceeding the preset threshold value of 2, it is considered that the text segment does not match the candidate word, and the calculation of the insertion operation and the edit distance is not performed, but the comparison of the next text segment is performed.
In step 4.4 of the present invention, for each window segment, a comparison is performed between all the obtained text segments and all the candidate words, and an editing operation in which the editing distance is the shortest and the text slice is the longest is selected as the optimal error correction, where it should be noted that "an editing operation" includes one or more editing operations, rather than indicating only one editing operation, where "an editing operation" can modify a text segment into a corresponding candidate word.
Furthermore, if the two editing operations edit the same window segment in the error correction text and the error correction operations are consistent, only one window segment is reserved; if the text fragments aimed at by the two editing operations have the contained relation, one editing operation corresponding to the longer text fragment is reserved.

Claims (9)

1. An automatic word stock updating and prefix tree structure-based proper noun error correction method, which is characterized by comprising the following steps:
step 1, automatically updating a proprietary name lexicon;
step 2, acquiring trigger words of proper nouns, and acquiring prefix trees related to the trigger words and trigger dictionaries of the proper nouns;
step 3, searching trigger words in the text to be corrected based on the prefix tree, and acquiring proper noun candidate words based on the trigger dictionary;
and 4, intercepting a plurality of text fragments associated with the trigger words from the text to be corrected, calculating editing distances between the text fragments and the proper noun candidate words, and selecting the longest text fragment to execute corresponding editing operation by taking the minimum editing distance as a target.
2. The method for correcting proper nouns based on automatic word stock updating and prefix tree structure according to claim 1, wherein in the step 1, the proper noun stock is automatically updated by acquiring text data through a timed crawling task, and proper noun recognition is performed through a trained entity recognition model, and the proper noun stock is updated.
3. The method for proper noun error correction based on automatic lexicon update and prefix tree structure according to claim 2, wherein said step 1 comprises the steps of:
step 1.1, crawling news texts containing proper nouns, and performing manual labeling, including labeling of the proper nouns;
step 1.2, training on the labeled text by utilizing NER entity recognition technology to obtain a trained entity recognition model;
step 1.3, regularly crawling news texts containing proper nouns, and identifying through a trained entity identification model to obtain new proper nouns;
and step 1.4, adding the new proper nouns into a proper noun library to update the proper noun library.
4. The method for proper noun error correction based on automatic lexicon update and prefix tree structure according to claim 1, wherein said step 2 comprises the steps of:
step 2.1, slicing any proper noun in the proper noun library and taking the sliced proper noun as a trigger word of the proper noun;
step 2.2, constructing a prefix tree based on the trigger words, and constructing a trigger dictionary { key ] based on the corresponding relation between the trigger words and the proper nouns: [ word1, word2, ] }, where key represents a trigger word and word represents a proper noun.
5. The method for error correction of proper nouns based on automatic word stock update and prefix tree structure according to claim 4, wherein said step 2.1 is to update proper nouns according to the length L of their word i The corresponding generation length is in (L i /2,L i ) And (3) slicing the proper nouns into trigger words of the proper nouns through a window sliding method in the window of the interval.
6. The method for proper noun error correction based on automatic lexicon update and prefix tree structure according to claim 1, wherein said step 4 comprises the steps of:
step 4.1, based on the trigger word length, intercepting window fragments at corresponding positions in the text to be corrected;
step 4.2, intercepting a plurality of text fragments in the window fragments based on the offset of the trigger word positions;
step 4.3, comparing each text segment with the candidate word to obtain the editing distance and the editing operation;
and 4.4, reserving an editing operation with the shortest editing distance and the longest text slice as the optimal error correction of the candidate noun.
7. The method for error correction of proper nouns based on automatic updating and prefix tree structure according to claim 6, wherein in the step 4.1, if the position of the trigger word is the start position or the end position, the character with the preset length is intercepted before or after the trigger word to form a window segment with the candidate word, otherwise, the character with the preset length is intercepted before and after the trigger word to form a window segment with the candidate word.
8. The proper noun error correction method based on an automated update and prefix tree structure according to claim 7, wherein said step 4.2 includes the steps of:
step 4.2.1, determining a interception length section of the text fragment, wherein the minimum interception length is the trigger word length, and the maximum interception length is the window fragment length;
and 4.2.2, selecting different interception lengths based on the interception length intervals, and intercepting the text fragments from the window fragments by a window sliding method.
9. The method for error correction of proper nouns based on automatic updating and prefix tree structure according to claim 7, wherein in the step 4.4, if two editing operations are performed on the same place in the error correction text and the error correction operations are identical, only one editing operation is reserved; if the text fragments for which the two editing operations are directed exist in the contained relation, a longer one is reserved.
CN202310163847.0A 2023-02-09 2023-02-09 Proper noun error correction method based on automatic word stock updating and prefix tree structure Pending CN116644737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310163847.0A CN116644737A (en) 2023-02-09 2023-02-09 Proper noun error correction method based on automatic word stock updating and prefix tree structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310163847.0A CN116644737A (en) 2023-02-09 2023-02-09 Proper noun error correction method based on automatic word stock updating and prefix tree structure

Publications (1)

Publication Number Publication Date
CN116644737A true CN116644737A (en) 2023-08-25

Family

ID=87619326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310163847.0A Pending CN116644737A (en) 2023-02-09 2023-02-09 Proper noun error correction method based on automatic word stock updating and prefix tree structure

Country Status (1)

Country Link
CN (1) CN116644737A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118194862A (en) * 2024-04-07 2024-06-14 人民网股份有限公司 Text error correction method and device based on fault-tolerant suffix automaton

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118194862A (en) * 2024-04-07 2024-06-14 人民网股份有限公司 Text error correction method and device based on fault-tolerant suffix automaton
CN118194862B (en) * 2024-04-07 2024-09-06 人民网股份有限公司 Text error correction method and device based on fault-tolerant suffix automaton

Similar Documents

Publication Publication Date Title
CN108959242B (en) Target entity identification method and device based on part-of-speech characteristics of Chinese characters
CN109635288B (en) Resume extraction method based on deep neural network
CN110276069B (en) Method, system and storage medium for automatically detecting Chinese braille error
CN110413764B (en) Long text enterprise name recognition method based on pre-built word stock
CN105068997B (en) The construction method and device of parallel corpora
CN103631772A (en) Machine translation method and device
CN110096572B (en) Sample generation method, device and computer readable medium
CN112115721A (en) Named entity identification method and device
CN109145287B (en) Indonesia word error detection and correction method and system
CN111444706A (en) Referee document text error correction method and system based on deep learning
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN115510864A (en) Chinese crop disease and pest named entity recognition method fused with domain dictionary
CN112417823B (en) Chinese text word order adjustment and word completion method and system
CN112131351B (en) Segment information extraction model training method based on multi-answer loss function
CN112447172B (en) Quality improvement method and device for voice recognition text
CN116644737A (en) Proper noun error correction method based on automatic word stock updating and prefix tree structure
CN115344668A (en) Multi-field and multi-disciplinary science and technology policy resource retrieval method and device
CN112989839A (en) Keyword feature-based intent recognition method and system embedded in language model
US8977538B2 (en) Constructing and analyzing a word graph
CN113010681B (en) Method for unsupervised selecting medical corpus text based on sentence vectorization
Canisius et al. Bootstrapping information extraction from field books
Cristea et al. From scan to text. Methodology, solutions and perspectives of deciphering old cyrillic Romanian documents into the Latin script
CN109086272B (en) Sentence pattern recognition method and system
Généreux et al. NLP challenges in dealing with OCR-ed documents of derogated quality
CN115858733A (en) Cross-language entity word retrieval method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination