WO2010018453A2 - System and method for processing electronically generated text - Google Patents
System and method for processing electronically generated text Download PDFInfo
- Publication number
- WO2010018453A2 WO2010018453A2 PCT/IB2009/006552 IB2009006552W WO2010018453A2 WO 2010018453 A2 WO2010018453 A2 WO 2010018453A2 IB 2009006552 W IB2009006552 W IB 2009006552W WO 2010018453 A2 WO2010018453 A2 WO 2010018453A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text string
- sequence
- text
- category
- initial
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000010200 validation analysis Methods 0.000 claims description 11
- 230000003287 optical effect Effects 0.000 claims description 3
- 238000012015 optical character recognition Methods 0.000 abstract description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 13
- 235000013399 edible fruits Nutrition 0.000 description 10
- 238000012217 deletion Methods 0.000 description 9
- 230000037430 deletion Effects 0.000 description 9
- 238000012937 correction Methods 0.000 description 8
- 241001465754 Metazoa Species 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 241000196324 Embryophyta Species 0.000 description 6
- 241000282326 Felis catus Species 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000003643 water by type Substances 0.000 description 3
- 244000025254 Cannabis sativa Species 0.000 description 2
- 241000282320 Panthera leo Species 0.000 description 2
- 241000282376 Panthera tigris Species 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 240000008790 Musa x paradisiaca Species 0.000 description 1
- 235000018290 Musa x paradisiaca Nutrition 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- This invention relates to a system and method for processing electronically generated text such as in automatic speech recognition and optical character reading applications and, more particularly, to a system and method for processing or validating text outputted from a speech recognition engine or optical character reader with the aim of diminishing recognition errors.
- Automatic speech recognition is becoming more and more part of everyday life and, at least in particular applications, there is a demand for improved speech recognition in order that increased reliance can be placed on the text generated in response to a spoken string of words, often referred to as an utterance.
- the problem is based on the fact that a computer that generates the textual string in response to a spoken string of words has no common sense to reject nonsensical results and the textual output can, in the absence of any additional processing, come across as absolute nonsense.
- acoustic models based on hidden Markov models trained with a large speech corpus, typically from speakers that are representative of the target user population.
- the acoustic model therefore captures the phonetic properties of the vocabulary to be recognized.
- a large text corpus is used to train a statistical language model and the text corpus will usually contain sentences that are related to the application for which the speech recognition engine is to be used.
- Applicant understands that the most popular language models are n-gram language models which are based on establishing the probability of a sequence of n words. However, this technique has the disadvantage of not incorporating any real language knowledge.
- European patent EP 977175 also refers to use of a knowledgebase in order to enhance accuracy of interpretation of the spoken word as does United States patent US 7383172. The latter utilises a data base of sentences from which a choice can be made.
- a system for processing electronically generated text including electronic text generating means for generating an initial text string; first processing means for producing an intermediate text string; and second processing means having access to a knowledge base for processing at least intermediate text strings optionally as a sequence with one or more other successive intermediate text strings in order to compare the meaning thereof to items in the knowledge base and replace any words that may render the intermediate text string nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string, the system being characterised in that the first processing means is adapted to apply the steps of categorising each word as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category, creating a category sequence corresponding to the text string and comparing the category sequence to a plurality of predetermined permissible sequences to thereby check the syntax of the initial text string, wherein a text string (herein termed an intermediate text string) having a category sequence corresponding to a predetermined permissible
- the invention also provides a component of a system as defined above comprising said first processing means and said second processing means, said component being particularly adapted to be combined with electronic text generating means in order to form a system as defined above.
- a system for processing electronically generated text including electronic text generating means and computer means programmed to receive electronic initial text strings generated by the electronic text generating means that are optionally embodied in the computer means and to produce intermediate text strings on the basis thereof; the computer means being further programmed to compare an intermediate text string optionally as a sequence with one or more other successive intermediate text strings with information contained in a knowledge base and replace any words in the intermediate text string that make it nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string; the system being characterised in that the computer means is still further programmed to categorise each word in the initial text string as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category and to create a category sequence corresponding to the initial text string; to compare the category sequence of the initial text string to a plurality of predetermined permissible category sequences to thereby check the syntax of the initial text string; to pass
- a method of processing text strings generated by electronic text generating means comprising processing initial text strings generated by the electronic text generating means to form first intermediate text strings and comparing a first intermediate text string optionally as a sequence with one or more other successive intermediate text strings with information contained in a knowledge base and replacing words in the intermediate text string that render it nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string, the method being characterized in that it includes the steps of categorising each word in an initial text string as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category to create a category sequence corresponding to the initial text string; comparing the category sequence of the initial text string with a plurality of predetermined permissible category sequences to thereby check the syntax of the initial text string; passing for further treatment as an intermediate text string a text string having a category sequence corresponding to one of a predetermined permissible sequence; and
- the electronic text generating means to be a speech recognition engine or an optical character reading device; for there to be at least three categories, including nouns, verbs, and determinants into which all words in text strings processed are categorised; for each text string to be composed of a limited number of words, typically containing one determinant, one noun and one verb or two determinants, two nouns and one verb in each instance in a logical sequence; for the check of the category sequence of a text string against predetermined permissible sequences to include a check for an absence of a required word (often referred to as a deletion error); for a classification method to be used in respect of at least some or all words in the dictionary of the system, such that a single word or symbol (that itself need not be in the dictionary) can be used to represent a class or group of words; and for items of information in the knowledgebase to be rated according to the number of dictionary words they contain, such that a dictionary or specific word leads to a higher ranking, and a class
- the knowledgebase may be a custom knowledgebase for a particular application or it may be a more general knowledgebase.
- the knowledgebase may consist of two different knowledgebases, namely, a commonsense knowledgebase employed primarily for correcting errors detected, and a validation knowledgebase against which text strings are checked in order to achieve validation.
- each knowledge base is preferably organized into several files or tables, with each file or table using a different format of statements. Some statements may contain knowledge related to specific words, other statements may contain knowledge related to groups of words, or a combination of both, due to the classification method used.
- the classification may consist of using a word or symbol non-existent in the dictionary, to represent a group of words with similar meaning or semantically equivalent, such that fewer statements can be used to represent the knowledge related to a specific application. This allows an algorithm to rank the matches found in the common sense knowledge base, according to where the match was obtained.
- Figure 1 is a block diagram indicating the basic function of a system and method according to the invention as implemented in relation to a speech to text engine;
- Figure 2 is an extended part of the block diagram that deals with deletion errors
- Figure 3 is an extended part of the block diagram of Figure 1 showing the logic error fixing part of the system and method.
- a speech recognition system includes electronic text generating means in the form of a speech to text engine (1 ) for receiving signals from a microphone (2) responsive to vocal utterances and for generating an initial text string indicated by numeral (3).
- the speech to text engine could assume many different forms and could, in particular, be any of those that are commercially available such as those operating on the hidden Markov model basis.
- Each text string that is typically a complete phrase or sentence, is then subjected, as indicated at numeral (4), to categorising of each word into, in this example, one of three categories, namely verbs, nouns and determinants (covering all other words), and each word is tagged accordingly.
- sequence of categories compiled for each text string is then, as provided by this invention, and as indicated at numeral (5), subjected to a comparison process to establish if the sequence of categories associated with a particular text string is a permissible sequence.
- all permissible sequences in any particular application of the speech recognition system are held in a data base for reference purposes.
- two different sequences were set up as being permissible, these being "determinant noun verb" and "determinant noun verb determinant noun”.
- the tagging that is carried out at step (4) and the comparison of step (5) thus form the heart of the first processing means.
- the number of categories of words can be increased by adding such categories as adjectives and conjunctions, for example, with increasing complexity.
- the logic checking process may be any one that is currently in use although the one tested is more fully described below.
- a "process syntax error” is noted as indicated at numeral (7).
- the number of errors is firstly checked and if there is only one, the incorrect word is identified (as indicated at numeral (9)) and if the missing word is determined as indicated at numeral (10) as being a determinant, the determinant is simply replaced with a default determinant as indicated at numeral (11) and the text string returned to the logic checking process (6). If it is another word, the text string is submitted to a sentence correction procedure indicated by numeral (12) and that is more fully set out in, and is described with reference to, Figure 3.
- the text string is submitted for a check deletion error procedure at (13) and the text string is sent for further analysis that is outlined in, and is described with reference to, Figure 2.
- the category sequence of the text string is examined as indicated at numeral (14) for a determining whether the deletion error at hand is on a predetermined list of incorrect sentences that potentially have a solution.
- an algorithm could be used to evaluate each case. If there is no possible solution, the flags are set to indicate that no solution has been found and the result made available to the controlling application that could, for example, request that the utterance be repeated, or alert the user to the situation optionally with a request for manual input.
- the category sequence of the sentence is checked (19) and thereafter a new text string is constructed using a default determinant (20) and the new text string is submitted to the logic check (6).
- a new text string is constructed, as indicated at (21 ), using a dummy word to substitute for the missing word and, as may be necessary, using one or more default determinants.
- This text string is subjected to a procedure for finding a matching word to replace the dummy word, as indicated at (22); a determination made as to whether or not a match was found, as indicated at (23); and if so, a new text string is constructed, as indicated at (24). If a match was not found the flags are set to indicate that no solution has been found and the result made available to the controlling application that could for example request that the utterance be repeated or alert the user to the situation or request manual input.
- a new text string is constructed, as indicated at (25), using one or more default determinants as may be necessary and a dummy word to substitute for the wrong word, as may be applicable.
- the new text string is sent to the logic check (6).
- it is sent to the sentence correction procedure (12) as set out in, and described with reference to, Figure 3.
- the word string is converted to all possible formats at a step indicated by numeral (27) and each is referred to the knowledgebase at a step indicated by numeral (28) to see if any immediate match is found. If a match is found, as indicated at step (29), the sentence is adopted. If a match is not found the word string is sent to a logic error process, indicated by numeral (30), as more fully described with reference to Figure 3.
- Treatment to find a matching word (22) or to fix a logic error (30) are both initiated by parsing the sentence, as indicated at step (31 ) in Figure 3, and the same treatment is given to the correct sentence treatment (12) emanating from either of the instances mentioned above.
- the sentence is parsed to provide all possible converted sentences which will be in the same format as the knowledgebase entries, and each converted sentence is used to attempt to find a match in its corresponding knowledgebase file, and each match is ranked, as indicated at numeral (32), on the basis of the number of specific words in the matching knowledgebase entry.
- the total ranking value for each possibility is calculated and a comparison made at step (33).
- the highest ranking value is selected and a new word introduced into the text string to provide a corrected sentence as indicated at step (34).
- a no solution return (35) results; the flags are set to indicate that no solution has been found and the result made available to the main application that could for example request that the utterance be repeated or alert the user to the situation for other action or simply recording.
- the possible solutions could be used for further processing in order to determine the best alternative based on additional information.
- the first and second processing means are carried out by means of a computer that could also, for example, operate a speech recognition program from which the initial text strings are received by the first processing means.
- a syntax error In this instance the sentence "the cat ate the rat" was outputted as the text string "the cat ate the that”.
- Word tagging carried out at (4) resulted in the sequence "Determinant Noun Verb Determinant Determinant”. This sequence is determined as not permissible at (5).
- Processing of this syntax error (7) results in only one error found at (8).
- the incorrect word is identified at (9) as being the fifth word "that". It is established at (10) that the wrong word is not replacing a determinant. The error thus moves to the correct sentence step (12). Knowing the word position the system tries to find a matching word.
- the sentence is parsed at (31 ) and converted into the following strings: "ANIMALS EAT X", "CAT EAT X”.
- the first string is created by retrieving the classification for "cat” if one exists. If one does not exist, then it will not be created. 'X' is put in as a dummy word in place of the incorrect word for which the system is trying to find a match.
- the system looks up, at (32), the statement "ANIMALS EAT X" from the corresponding common sense knowledge base file, which will contain statements of the type "GROUP_OF_WORDS VERB SPECIFIC_WORD", such as PEOPLE CATCH BUS and ANIMALS DRINK WATER.
- An algorithm is used to lookup the common sense knowledge base for matching words, based on the given words.
- the sentence label for this phrase is determinant noun verb noun. This is processed as a deletion error with only a determinant missing as at (16) and the missing determinant is introduced by way of a default determinant "the” to produce the sentence "the woman waters the man”. Having now generated a sentence that is syntactically correct, the system proceeds to check if the sentence has a semantic error.
- the system parses the sentence to produce all possible logic statements; namely, "WOMAN WATER MAN”; “WOMAN WATER PEOPLE”; “PEOPLE WATER MAN”; and, “PEOPLE WATER PEOPLE".
- Each statement is used to perform a lookup in the corresponding validation knowledge base that contains the logic statements "X WATER MAN”; “WOMAN X MAN”; “WOMAN WATER X”; “X WATER PEOPLE”; “WOMAN X PEOPLE”; “PEOPLE X MAN”; “PEOPLE WATER X”; “PEOPLE X PEOPLE".
- the sentence from the speech recognition engine that is passed to the system is "the mother the child".
- the sentence label is determinant noun determinant noun and a syntax error is detected.
- the sentence label is classifed as a "word missing" situation as indicated at (17) and processed accordingly. Firstly, the system fills in the missing verb, to create the temporary sentence "the mother dummy_verb the child” which is now syntactically correct. Since the position of the incorrect word is already identifed, the system performs a lookup on the common sense knowledge base in which the entries are "MOTHER X CHILD"; MOTHER X PEOPLE"; PEOPLE X CHILD"; AND, "PEOPLE X PEOPLE".
- a further deletion error type of sentence to be analyzed may be "the doctor treated ate" which was actually produced by a speech recognition engine as were many of the other examples given.
- the system detects a syntax error from the sentence label, which is determinant noun verb verb. Further processing indicates a deletion error of the type indicated at (18).
- the sentence label is on the list and is thus categorized.
- the system detects and replaces the incorrect word "ate” and fills in the missing determinant, to produce the temporary sentence "the doctor treated the dummy_noun".
- the system parses the sentence and searches the common sense knowledge base for a solution. The only possibilities are
- Each word string is looked up in its corresponding validation knowledge base file and at (29) it is established that there is no match.
- the text string is therefore passed on for fixing a logic error at (30) as it makes no sense.
- the text string has already been parsed and the various possibilities are ranked at (32) in the manner indicated above. This is done by performing a lookup on the corresponding common sense knowledge base file for each of the above statements. Since it is desired to check for all possibilities, all word alignments are considered, i.e. a lookup is carried out for each of "WOMAN DANCE APPLE”; "X DANCE APPLE”; "WOMAN X APPLE”; and "WOMAN DANCE X". No match is found. A similar search is carried out with other statements.
- a final example illustrates the benefit of separating the common sense knowledge base from the validation knowledge base.
- the validation knowledge base there may be statements like "PEOPLE LOVE FRUIT” and "PEOPLE BURN FRUIT” so that sentences like "the girl loved the apple” or "the woman burned the apple” would not be rejected by the system as semantically incorrect.
- the common sense knowledge base file there would have been two matches with the same ranking and the system would have been unable to correct the sentence "the woman danced the apple” in the example above.
- one of the converted sentences could be “PEOPLE BURN FRUIT”, assuming we used the word “PEOPLE” to classify and represent words such as “ woman”, “man” or “girl”, and likewise, assuming that "FRUIT” was the word or symbol chosen to represent "apple” and other words like "banana”.
- the invention will be particularly useful in applications in which the spectrum of language utilized is somewhat limited although it is also envisaged that the invention could be broadened considerably to apply to general purpose speech recognition systems.
- the invention will be particularly useful as applied to commanding robots, home automation, operating system and software commands, dialogue systems and videogames.
- Somewhat larger vocabulary applications include general and specialized dictation software. It will be understood that by using the procedure generally outlined above, an enhanced speech recognition or optical character reader system is achieved and a reduction made of nonsensical errors.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
- Character Discrimination (AREA)
Abstract
A method and system are disclosed for processing electronically generated text emanating from electronic text generating means such as a speech recognition engine (1 ) or optical character reader that output an initial text string (3). First processing means produce an intermediate text string and second processing means check the intermediate text string optionally as a sequence with one or more other successive intermediate text strings against a knowledge base (28, 32) in order to compare the meaning thereof to items in the knowledge base in an attempt to correct errors in semantics and produce an optionally final processed text string. The first processing means is adapted to apply the steps of categorising each word as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category (4), creating a category sequence corresponding to the text string and comparing the category sequence to a plurality of predetermined permissible sequences to thereby check the syntax of the initial text string (5). An initial text string having a category sequence not corresponding to a predetermined permissible sequence is treated in an effort to remedy the cause of non-correspondence (7).
Description
SYSTEM AND METHOD FOR PROCESSING ELECTRONICALLY
GENERATED TEXT
FIELD OF THE INVENTION
This invention relates to a system and method for processing electronically generated text such as in automatic speech recognition and optical character reading applications and, more particularly, to a system and method for processing or validating text outputted from a speech recognition engine or optical character reader with the aim of diminishing recognition errors.
It is to be understood that whilst the following description will be primarily aimed at automatic speech recognition applications, similar principles and procedures may be applied to text strings generated by an optical character reader, for example.
BACKGROUND TO THE INVENTION
Automatic speech recognition is becoming more and more part of everyday life and, at least in particular applications, there is a demand for improved speech recognition in order that increased reliance can be placed on the text generated in response to a spoken string of words, often referred to as an utterance. The problem is based on the fact that a computer that generates the textual string in response to a spoken string of words has no common sense to reject nonsensical results and the textual output can, in the absence of any additional processing, come across as absolute nonsense.
State of the art speech recognizers use acoustic models based on hidden Markov models trained with a large speech corpus, typically from speakers that are representative of the target user population. The acoustic model therefore captures the phonetic properties of the vocabulary to be
recognized. In order to incorporate some knowledge of the language, a large text corpus is used to train a statistical language model and the text corpus will usually contain sentences that are related to the application for which the speech recognition engine is to be used.
Applicant understands that the most popular language models are n-gram language models which are based on establishing the probability of a sequence of n words. However, this technique has the disadvantage of not incorporating any real language knowledge.
Considerable attention has been given to the problem and various proposals have been put forward to utilise a knowledgebase against which an initial textual string such as a phrase or sentence can be measured in order to check on the sense of the textual string proposed by the speech engine. One major thrust in this direction of which applicant is aware is the OpenMind Common Sense project conducted by Massachusetts Institute of Technology in which more than 700,000 facts were collected in order to be used as a common sense data base.
European patent EP 977175 also refers to use of a knowledgebase in order to enhance accuracy of interpretation of the spoken word as does United States patent US 7383172. The latter utilises a data base of sentences from which a choice can be made.
Apart from the above, it is always a problem in optical character reader applications that, for any one of a number of different reasons, a text string may be incorrectly read and could benefit from subsequent processing in an attempt to correct errors in the text strings developed.
OBJECT OF THE INVENTION
It is an object of this invention to provide a system and method for processing text generated electronically that is aimed at improving the accuracy of the text strings generated in at least particular applications.
SUMMARY OF THE INVENTION
In accordance with one aspect of this invention there is provided a system for processing electronically generated text including electronic text generating means for generating an initial text string; first processing means for producing an intermediate text string; and second processing means having access to a knowledge base for processing at least intermediate text strings optionally as a sequence with one or more other successive intermediate text strings in order to compare the meaning thereof to items in the knowledge base and replace any words that may render the intermediate text string nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string, the system being characterised in that the first processing means is adapted to apply the steps of categorising each word as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category, creating a category sequence corresponding to the text string and comparing the category sequence to a plurality of predetermined permissible sequences to thereby check the syntax of the initial text string, wherein a text string (herein termed an intermediate text string) having a category sequence corresponding to a predetermined permissible sequence is passed for further processing and an initial text string having a category sequence not corresponding to a predetermined permissible sequence is treated in an effort to remedy the cause of non-correspondence.
The invention also provides a component of a system as defined above comprising said first processing means and said second processing means,
said component being particularly adapted to be combined with electronic text generating means in order to form a system as defined above.
In accordance with a second aspect of the invention there is provided a system for processing electronically generated text, the system including electronic text generating means and computer means programmed to receive electronic initial text strings generated by the electronic text generating means that are optionally embodied in the computer means and to produce intermediate text strings on the basis thereof; the computer means being further programmed to compare an intermediate text string optionally as a sequence with one or more other successive intermediate text strings with information contained in a knowledge base and replace any words in the intermediate text string that make it nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string; the system being characterised in that the computer means is still further programmed to categorise each word in the initial text string as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category and to create a category sequence corresponding to the initial text string; to compare the category sequence of the initial text string to a plurality of predetermined permissible category sequences to thereby check the syntax of the initial text string; to pass for further treatment a text string having a category sequence corresponding to one of a predetermined permissible sequence as an intermediate text string; and to treat an initial text string having a category sequence not corresponding to a predetermined permissible sequence, as may be possible, in an effort to remedy the cause of non-correspondence and in an attempt to correct errors.
In accordance with a third aspect of the invention there is provided a method of processing text strings generated by electronic text generating means, the method comprising processing initial text strings generated by the electronic text generating means to form first intermediate text strings and comparing a
first intermediate text string optionally as a sequence with one or more other successive intermediate text strings with information contained in a knowledge base and replacing words in the intermediate text string that render it nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string, the method being characterized in that it includes the steps of categorising each word in an initial text string as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category to create a category sequence corresponding to the initial text string; comparing the category sequence of the initial text string with a plurality of predetermined permissible category sequences to thereby check the syntax of the initial text string; passing for further treatment as an intermediate text string a text string having a category sequence corresponding to one of a predetermined permissible sequence; and treating an initial text string having a category sequence not corresponding to one of the predetermined permissible sequences, as may be possible, in an effort to remedy the cause of non-correspondence.
Further features of the invention provide for the electronic text generating means to be a speech recognition engine or an optical character reading device; for there to be at least three categories, including nouns, verbs, and determinants into which all words in text strings processed are categorised; for each text string to be composed of a limited number of words, typically containing one determinant, one noun and one verb or two determinants, two nouns and one verb in each instance in a logical sequence; for the check of the category sequence of a text string against predetermined permissible sequences to include a check for an absence of a required word (often referred to as a deletion error); for a classification method to be used in respect of at least some or all words in the dictionary of the system, such that a single word or symbol (that itself need not be in the dictionary) can be used to represent a class or group of words; and for items of information in the knowledgebase to be rated according to the number of dictionary words they
contain, such that a dictionary or specific word leads to a higher ranking, and a class word or word representing a group of words, leads to a lower ranking; and for an insoluble situation arising from two different solutions being equally rated, or in any other situation, to be optionally referred to a user for manual selection of an appropriate solution or even re-entry of the relevant text string (utterance).
It is to be understood that text strings having a category sequence not corresponding to a predetermined permissible sequence that are treated in an effort to remedy the cause of non-correspondence are, in the present implementation of the invention, automatically subjected to a logic check.
As regards the knowledgebase to be utilised in the system and method defined above, the knowledgebase may be a custom knowledgebase for a particular application or it may be a more general knowledgebase. In either event, the knowledgebase may consist of two different knowledgebases, namely, a commonsense knowledgebase employed primarily for correcting errors detected, and a validation knowledgebase against which text strings are checked in order to achieve validation. Besides dividing the knowledge base into validation and common sense knowledge, each knowledge base is preferably organized into several files or tables, with each file or table using a different format of statements. Some statements may contain knowledge related to specific words, other statements may contain knowledge related to groups of words, or a combination of both, due to the classification method used. The classification may consist of using a word or symbol non-existent in the dictionary, to represent a group of words with similar meaning or semantically equivalent, such that fewer statements can be used to represent the knowledge related to a specific application. This allows an algorithm to rank the matches found in the common sense knowledge base, according to where the match was obtained.
In order that the above and other features of the invention may be more fully understood an example thereof will now be described with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings:-
Figure 1 is a block diagram indicating the basic function of a system and method according to the invention as implemented in relation to a speech to text engine;
Figure 2 is an extended part of the block diagram that deals with deletion errors; and,
Figure 3 is an extended part of the block diagram of Figure 1 showing the logic error fixing part of the system and method.
DETAILED DESCRIPTION WITH REFERENCE TO THE DRAWINGS
In the embodiment of the invention depicted in the block diagrams of the accompanying drawings, and with initial reference particularly to Figure 1 , a speech recognition system includes electronic text generating means in the form of a speech to text engine (1 ) for receiving signals from a microphone (2) responsive to vocal utterances and for generating an initial text string indicated by numeral (3).
The speech to text engine could assume many different forms and could, in particular, be any of those that are commercially available such as those operating on the hidden Markov model basis. Each text string, that is typically a complete phrase or sentence, is then subjected, as indicated at numeral (4), to categorising of each word into, in this example, one of three
categories, namely verbs, nouns and determinants (covering all other words), and each word is tagged accordingly.
The sequence of categories compiled for each text string is then, as provided by this invention, and as indicated at numeral (5), subjected to a comparison process to establish if the sequence of categories associated with a particular text string is a permissible sequence. For this purpose all permissible sequences in any particular application of the speech recognition system are held in a data base for reference purposes. In tests that were conducted to date, two different sequences were set up as being permissible, these being "determinant noun verb" and "determinant noun verb determinant noun". The tagging that is carried out at step (4) and the comparison of step (5) thus form the heart of the first processing means. Of course, the number of categories of words can be increased by adding such categories as adjectives and conjunctions, for example, with increasing complexity.
In this manner the syntax of the initial text string is checked and if the text string passes this check then the text string is passed to a logic checking process that is indicated at numeral (6). The logic checking process may be any one that is currently in use although the one tested is more fully described below.
Reverting now to the sequence check conducted on each text string, in the event that the sequence of categories does not match a permissible sequence a "process syntax error" is noted as indicated at numeral (7). As indicated at numeral (8), the number of errors is firstly checked and if there is only one, the incorrect word is identified (as indicated at numeral (9)) and if the missing word is determined as indicated at numeral (10) as being a determinant, the determinant is simply replaced with a default determinant as indicated at numeral (11) and the text string returned to the logic checking process (6). If it is another word, the text string is submitted to a sentence
correction procedure indicated by numeral (12) and that is more fully set out in, and is described with reference to, Figure 3.
In the event that more than one error is detected at (8), the text string is submitted for a check deletion error procedure at (13) and the text string is sent for further analysis that is outlined in, and is described with reference to, Figure 2.
In the further processing indicated in Figure 2, the category sequence of the text string is examined as indicated at numeral (14) for a determining whether the deletion error at hand is on a predetermined list of incorrect sentences that potentially have a solution. As an alternative, an algorithm could be used to evaluate each case. If there is no possible solution, the flags are set to indicate that no solution has been found and the result made available to the controlling application that could, for example, request that the utterance be repeated, or alert the user to the situation optionally with a request for manual input.
If a potential solution is detected, the next determination is made at (15) where the errors are classified according to their nature that could be only one or more determinants missing, as indicated by numeral (16); one wrong noun or one wrong verb, and a determinant missing, as indicated by numeral (17); and a wrong word position and also a determinant missing, as indicated by numeral (18).
In the event of only one or more determinants missing, the category sequence of the sentence is checked (19) and thereafter a new text string is constructed using a default determinant (20) and the new text string is submitted to the logic check (6).
In the event that a noun or verb is missing and possibly one or more determinants are missing as well, a new text string is constructed, as
indicated at (21 ), using a dummy word to substitute for the missing word and, as may be necessary, using one or more default determinants. This text string is subjected to a procedure for finding a matching word to replace the dummy word, as indicated at (22); a determination made as to whether or not a match was found, as indicated at (23); and if so, a new text string is constructed, as indicated at (24). If a match was not found the flags are set to indicate that no solution has been found and the result made available to the controlling application that could for example request that the utterance be repeated or alert the user to the situation or request manual input.
In the event that a wrong word position and a missing determinant needs attention, a new text string is constructed, as indicated at (25), using one or more default determinants as may be necessary and a dummy word to substitute for the wrong word, as may be applicable. In the event that the wrong word was in fact a determinant, as determined at (26), the new text string is sent to the logic check (6). In the event that it was a different word, then it is sent to the sentence correction procedure (12) as set out in, and described with reference to, Figure 3.
Reverting now to the logic checking process (6) (see Figure 1 ) that forms part of the second processing means, initially the word string is converted to all possible formats at a step indicated by numeral (27) and each is referred to the knowledgebase at a step indicated by numeral (28) to see if any immediate match is found. If a match is found, as indicated at step (29), the sentence is adopted. If a match is not found the word string is sent to a logic error process, indicated by numeral (30), as more fully described with reference to Figure 3.
Treatment to find a matching word (22) or to fix a logic error (30) are both initiated by parsing the sentence, as indicated at step (31 ) in Figure 3, and the same treatment is given to the correct sentence treatment (12) emanating from either of the instances mentioned above. In each case the sentence is
parsed to provide all possible converted sentences which will be in the same format as the knowledgebase entries, and each converted sentence is used to attempt to find a match in its corresponding knowledgebase file, and each match is ranked, as indicated at numeral (32), on the basis of the number of specific words in the matching knowledgebase entry.
In this regard, it is to be noted that certain words or symbols having a generic character in that they represent any one of a plurality of different specific words (such as "animal" that could mean any specific species such as "lion", "tiger", "cow", "rat" etc) is termed a class word, whereas individual words that are not represented by a class are termed specific words (in this instance each of the individual animals "lion", "tiger", "cow", "rat" etc). For the purposes of scoring, a class word is given a score of 0 and a specific word that exists in the system dictionary is given a score of 1 , and the ranking will be the sum of the scores, which is simply the number of dictionary words in the knowledgebase entry.
The total ranking value for each possibility is calculated and a comparison made at step (33). The highest ranking value is selected and a new word introduced into the text string to provide a corrected sentence as indicated at step (34). In the event that there are more than one ranking with the highest value then a no solution return (35) results; the flags are set to indicate that no solution has been found and the result made available to the main application that could for example request that the utterance be repeated or alert the user to the situation for other action or simply recording. Optionally, the possible solutions could be used for further processing in order to determine the best alternative based on additional information.
In the instant implementation of the system, the first and second processing means are carried out by means of a computer that could also, for example, operate a speech recognition program from which the initial text strings are received by the first processing means.
By way of further explanation, the following is an actual example of the correction of a syntax error. In this instance the sentence "the cat ate the rat" was outputted as the text string "the cat ate the that". Word tagging carried out at (4) resulted in the sequence "Determinant Noun Verb Determinant Determinant". This sequence is determined as not permissible at (5). Processing of this syntax error (7) results in only one error found at (8). The incorrect word is identified at (9) as being the fifth word "that". It is established at (10) that the wrong word is not replacing a determinant. The error thus moves to the correct sentence step (12). Knowing the word position the system tries to find a matching word.
The sentence is parsed at (31 ) and converted into the following strings: "ANIMALS EAT X", "CAT EAT X". The first string is created by retrieving the classification for "cat" if one exists. If one does not exist, then it will not be created. 'X' is put in as a dummy word in place of the incorrect word for which the system is trying to find a match. The system looks up, at (32), the statement "ANIMALS EAT X" from the corresponding common sense knowledge base file, which will contain statements of the type "GROUP_OF_WORDS VERB SPECIFIC_WORD", such as PEOPLE CATCH BUS and ANIMALS DRINK WATER. An algorithm is used to lookup the common sense knowledge base for matching words, based on the given words.
Each statement in the knowledge base is aligned with the error position, that is, they will first be converted to a string like "PEOPLE CATCH X" and
"ANIMALS DRINK X" before a comparison is made. Two matches are found, namely, ANIMALS EAT GRASS and ANIMALS EAT PLANT, "plant" and
"grass" are therefore the matches, however, they are discarded because they both have the same ranking, namely 1. This ranking is derived from allocating a score of "0" to class words and a score of "1" to specific words and adding the scores together for the entire string. The system also looks up the statement "CAT EAT X" on the common sense knowledge base which
contains statements of the type "SPECI FIC_W0RD VERB SPECIFIC_WORD". A single match is found for "CAT EAT RAT" which has a ranking of 2. "rat" is therefore returned as a match. Knowing where the word is supposed to go in the sentence, the system then builds the sentence as "the cat ate the rat" at (34).
It is to be noted that the foregoing is given for illustration purposes only, and that other methods for looking up a text string could be used. The simple text files in the current prototype have been employed. Larger knowledge bases could be stored in a database such as MS Access, and an SQL (Standard Query Language) statement used to retrieve the match. In such instances it would not be necessary to include an "X" in the string as a dummy as it would be sufficient to simply know the relevant position in order to create a proper query to the database.
The following are actual examples of the correction of a deletion error.
The following sentence was passed from the speech recognition engine to the system for processing: "the woman waters man". The sentence label for this phrase is determinant noun verb noun. This is processed as a deletion error with only a determinant missing as at (16) and the missing determinant is introduced by way of a default determinant "the" to produce the sentence "the woman waters the man". Having now generated a sentence that is syntactically correct, the system proceeds to check if the sentence has a semantic error.
To check if the sentence has a valid meaning, the system parses the sentence to produce all possible logic statements; namely, "WOMAN WATER MAN"; "WOMAN WATER PEOPLE"; "PEOPLE WATER MAN"; and, "PEOPLE WATER PEOPLE".
Each statement is used to perform a lookup in the corresponding validation knowledge base that contains the logic statements "X WATER MAN"; "WOMAN X MAN"; "WOMAN WATER X"; "X WATER PEOPLE"; "WOMAN X PEOPLE"; "PEOPLE X MAN"; "PEOPLE WATER X"; "PEOPLE X PEOPLE".
None of the logic statements exist in the validation knowledge base and as a result, the system concludes that the sentence has a semantic error. Since the incorrect word is not known a priori, the agent tries all valid combinations on each parsed sentence, considering that a search is carried out for a match where there is a specific word. This search would be equivalent to a human using common sense to ask themselves "what would water a man?" as in "X WATER MAN", or what "action is most common between people?" as in "PEOPLE X PEOPLE", taking into consideration the limited vocabulary and syntax.
The knowledgebase options are; "PEOPLE X PEOPLE" which matches with "PEOPLE LOVE PEOPLE" (rank 0) and "PEOPLE WATER X" (rank 1 ) which matches with "PEOPLE WATER PLANT".
The search thus yields two matches; "LOVE" and "PLANT" in their respective positions, are these are thus potential solutions. However, since "PLANT" is found on a common sense knowledge base with a higher rank, it is used as the final solution to build the suggested sentence which becomes "the woman waters the plant".
In a further deletion error example, the sentence from the speech recognition engine that is passed to the system is "the mother the child". The sentence label is determinant noun determinant noun and a syntax error is detected. The sentence label is classifed as a "word missing" situation as indicated at (17) and processed accordingly.
Firstly, the system fills in the missing verb, to create the temporary sentence "the mother dummy_verb the child" which is now syntactically correct. Since the position of the incorrect word is already identifed, the system performs a lookup on the common sense knowledge base in which the entries are "MOTHER X CHILD"; MOTHER X PEOPLE"; PEOPLE X CHILD"; AND, "PEOPLE X PEOPLE". There are two matches with the verbs 1FEED1 and 1LOVE1 that would give the two sentences "MOTHER FEED CHILD" having a rank of 2 and "PEOPLE LOVE PEOPLE" having a rank of 0. The match with the highest rank is "FEED1. Thus, the system proceeds to lookup the first verb form entry in the verb database and finds the word 'feeds'. Replacing the 'dummy verb' with the correct word, the suggested sentence becomes "the mother feeds the child".
A further deletion error type of sentence to be analyzed may be "the doctor treated ate" which was actually produced by a speech recognition engine as were many of the other examples given. The system detects a syntax error from the sentence label, which is determinant noun verb verb. Further processing indicates a deletion error of the type indicated at (18). The sentence label is on the list and is thus categorized. The system detects and replaces the incorrect word "ate" and fills in the missing determinant, to produce the temporary sentence "the doctor treated the dummy_noun".
Knowing the error position, the system parses the sentence and searches the common sense knowledge base for a solution. The only possibilities are
"DOCTOR TREAT X" giving the match of "DOCTOR TREAT PATIENT" with a rank of 2 and "PEOPLE TREAT X" with a rank of 1.
Only one match is found, making "PATIENT" the final solution. The system then replaces the dummy noun with the match, to build the suggested sentence as "the doctor treated the patient".
The following is an actual example of the correction of a semantic or logic error. In this instance the sentence "the woman ate the apple" was outputted
as the text string "the woman danced the apple". Word tagging carried out at (4) resulted in the sequence "Determinant Noun Verb Determinant Noun". This sequence is determined as being permissible at (5). This sequence is permissible, which means we have valid syntax. The text string is then sent for logic checking at (6) and is converted to all possible sentence forms at (27) such as "WOMAN DANCE APPLE", "PEOPLE DANCE APPLE", "WOMAN DANCE FRUIT", and "PEOPLE DANCE FRUIT".
Each word string is looked up in its corresponding validation knowledge base file and at (29) it is established that there is no match. The text string is therefore passed on for fixing a logic error at (30) as it makes no sense. The text string has already been parsed and the various possibilities are ranked at (32) in the manner indicated above. This is done by performing a lookup on the corresponding common sense knowledge base file for each of the above statements. Since it is desired to check for all possibilities, all word alignments are considered, i.e. a lookup is carried out for each of "WOMAN DANCE APPLE"; "X DANCE APPLE"; "WOMAN X APPLE"; and "WOMAN DANCE X". No match is found. A similar search is carried out with other statements. In so doing one match is found from "PEOPLE X FRUIT" on "PEOPLE EAT FRUIT'. It has a ranking of 0, and it's the only match. So "eat" is returned as the solution, and the position of the word is given so that we can build the sentence at (34). The final text will be "the woman eats the apple". This is because the verb "EAT" could take many forms, and in the current implementation a default is chosen which is the first entry in the verb list.
A final example illustrates the benefit of separating the common sense knowledge base from the validation knowledge base. In the validation knowledge base there may be statements like "PEOPLE LOVE FRUIT" and "PEOPLE BURN FRUIT" so that sentences like "the girl loved the apple" or "the woman burned the apple" would not be rejected by the system as semantically incorrect. Had just one of these statements been on the
common sense knowledge base file, there would have been two matches with the same ranking and the system would have been unable to correct the sentence "the woman danced the apple" in the example above. For instance, in the case of "the woman burned the apple", one of the converted sentences could be "PEOPLE BURN FRUIT", assuming we used the word "PEOPLE" to classify and represent words such as "woman", "man" or "girl", and likewise, assuming that "FRUIT" was the word or symbol chosen to represent "apple" and other words like "banana".
When doing the logic check as previously explained, "PEOPLE BURN FRUIT" is found in the validation knowledge base and the sentence is therefore assumed by the system to be semantically correct. If the statement "PEOPLE BURN FRUIT" had been present in the common sense knowledge base as well, correction of "the woman danced the apple" in the above example, would have yielded two matches with the same ranking, namely "burn" and "eat", and no automatic correction would be possible without further information. The common sense knowledge base can therefore be used to list only the most common or obvious knowledge associated with different sets of words, in order to allow automatic correction by the system, without optional further processing or intervention from the user.
It is envisaged that the invention will be particularly useful in applications in which the spectrum of language utilized is somewhat limited although it is also envisaged that the invention could be broadened considerably to apply to general purpose speech recognition systems. In the field of small vocabulary applications, it is envisaged that the invention will be particularly useful as applied to commanding robots, home automation, operating system and software commands, dialogue systems and videogames. Somewhat larger vocabulary applications include general and specialized dictation software.
It will be understood that by using the procedure generally outlined above, an enhanced speech recognition or optical character reader system is achieved and a reduction made of nonsensical errors.
Of course, numerous variations can be made to the implementation of the invention described above without departing from the scope hereof.
Claims
1. A system for processing electronically generated text including electronic text generating means (1 , 2) for generating an initial text string (3); first processing means for producing an intermediate text string; and second processing means having access to a knowledge base (28, 32) for processing at least intermediate text strings optionally as a sequence with one or more other successive intermediate text strings in order to compare the meaning thereof to items in the knowledge base and replace any words that may render the intermediate text string nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string, the system being characterised in that the first processing means is adapted to apply the steps of categorising each word as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category (4), creating a category sequence corresponding to the text string and comparing the category sequence to a plurality of predetermined permissible sequences to thereby check the syntax of the initial text string (5), wherein a text string (herein termed an intermediate text string) having a category sequence corresponding to a predetermined permissible sequence is passed for further processing and an initial text string having a category sequence not corresponding to a predetermined permissible sequence is treated in an effort to remedy the cause of non-correspondence (7).
2. A system for processing electronically generated text, the system including electronic text generating means (1 , 2) and computer means programmed to receive electronic initial text strings (3) generated by the electronic text generating means that are optionally embodied in the computer means and to produce intermediate text strings on the basis thereof; the computer means being further programmed to compare an intermediate text string optionally as a sequence with one or more other successive intermediate text strings with information contained in a knowledge base (28, 32) and replace any words in the intermediate text string that make it nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string; the system being characterised in that the computer means is still further programmed to categorise each word in the initial text string as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category and to create a category sequence corresponding to the initial text string (4); to compare the category sequence of the initial text string to a plurality of predetermined permissible category sequences to thereby check the syntax of the initial text string (5); to pass for further treatment a text string having a category sequence corresponding to one of a predetermined permissible sequence as an intermediate text string; and to treat an initial text string having a category sequence not corresponding to a predetermined permissible sequence, as may be possible, in an effort to remedy the cause of non-correspondence and in an attempt to correct errors (7).
3. A system as claimed in either one of claims 1 or 2 in which the electronic text generating means is selected from a speech recognition engine and an optical character reading device.
4. A system as claimed in any one of the preceding claims in which there are at least three categories including nouns, verbs, and determinants into which all words in text strings processed are categorised.
5. A system as claimed in claim 4 in which each initial text string is composed of a limited number of words containing either one determinant, one noun and one verb, or two determinants, two nouns and one verb in each instance in a logical sequence.
6. A system as claimed in either one of claims 4 or 5 in which the check of the category sequence of a text string against predetermined permissible sequences includes a check for an absence of a required word.
7. A system as claimed in any one of the preceding claims in which a classification method is used such that a single word or symbol can be used to represent a class or group of words.
8. A system as claimed in claim 7 in which items of information in the knowledgebase are rated according to the number of dictionary words they contain, such that a dictionary or specific word leads to a higher ranking, and a class word or symbol representing a group of words, leads to a lower ranking.
9. A system as claimed in any one of the preceding claims in which an insoluble situation arising from two different solutions being available, or in any other situation, is referred to a user for manual selection of an appropriate solution or re-entry of the relevant text string or utterance.
10. A system as claimed in any one of the preceding claims in which a text string having a category sequence not corresponding to a predetermined permissible sequence and that is treated in an effort to remedy the cause of non-correspondence is automatically subjected to a logic check.
11. A system as claimed in any one of the preceding claims in which the knowledgebase consists of two different knowledgebases, namely, a commonsense knowledgebase (28) employed primarily for correcting errors detected, and a validation knowledgebase (32) against which text strings are checked in order to achieve validation.
12. A method of processing text strings generated by electronic text generating means (1 ), the method comprising processing initial text strings (3) generated by the electronic text generating means to form first intermediate text strings and comparing a first intermediate text string optionally as a sequence with one or more other successive intermediate text strings with information contained in a knowledge base (28, 32) and replacing words in the intermediate text string that render it nonsensical in an attempt to correct errors in semantics and produce an optionally final processed text string, the method being characterized in that it includes the steps of categorising each word in an initial text string as belonging to one of a predetermined plurality of categories of parts of speech including nouns, verbs, and at least one other category to create a category sequence corresponding to the initial text string (4); comparing the category sequence of the initial text string with a plurality of predetermined permissible category sequences to thereby check the syntax of the initial text string (5); passing for further treatment as an intermediate text string a text string having a category sequence corresponding to one of a predetermined permissible sequence; and treating an initial text string having a category sequence not corresponding to one of the predetermined permissible sequences, as may be possible, in an effort to remedy the cause of non-correspondence (7).
13. A component of a system as claimed in any one of claims 1 to 11 comprising said first processing means and said second processing means, said component being particularly adapted to be combined with electronic text generating means (1 , 2) in order to form such a system.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ZA2008/0744 | 2008-08-15 | ||
ZA200807044 | 2008-08-15 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2010018453A2 true WO2010018453A2 (en) | 2010-02-18 |
WO2010018453A3 WO2010018453A3 (en) | 2011-04-14 |
Family
ID=41682291
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2009/006552 WO2010018453A2 (en) | 2008-08-15 | 2009-08-14 | System and method for processing electronically generated text |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2010018453A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107895193A (en) * | 2017-11-13 | 2018-04-10 | 北京神州泰岳软件股份有限公司 | A kind of construction of knowledge base method, parameter setting method and device based on dialogue |
CN108647239A (en) * | 2018-04-04 | 2018-10-12 | 顺丰科技有限公司 | Talk with intension recognizing method and device, equipment and storage medium |
CN109753640A (en) * | 2019-01-04 | 2019-05-14 | 江西理工大学应用科学学院 | A kind of text error correction method based on artificial intelligence |
EP3955099A1 (en) * | 2020-08-11 | 2022-02-16 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for controlling the operation mode of a terminal device, and storage medium |
CN114611524A (en) * | 2022-02-08 | 2022-06-10 | 马上消费金融股份有限公司 | Text error correction method and device, electronic equipment and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108305625B (en) * | 2018-01-29 | 2020-12-18 | 深圳春沐源控股有限公司 | Voice control method and device, electronic equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4947438A (en) * | 1987-07-11 | 1990-08-07 | U.S. Philips Corporation | Process for the recognition of a continuous flow of spoken words |
US20020111803A1 (en) * | 2000-12-20 | 2002-08-15 | International Business Machines Corporation | Method and system for semantic speech recognition |
WO2004044888A1 (en) * | 2002-11-13 | 2004-05-27 | Schoenebeck Bernd | Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries |
US7383172B1 (en) * | 2003-08-15 | 2008-06-03 | Patrick William Jamieson | Process and system for semantically recognizing, correcting, and suggesting domain specific speech |
-
2009
- 2009-08-14 WO PCT/IB2009/006552 patent/WO2010018453A2/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4947438A (en) * | 1987-07-11 | 1990-08-07 | U.S. Philips Corporation | Process for the recognition of a continuous flow of spoken words |
US20020111803A1 (en) * | 2000-12-20 | 2002-08-15 | International Business Machines Corporation | Method and system for semantic speech recognition |
WO2004044888A1 (en) * | 2002-11-13 | 2004-05-27 | Schoenebeck Bernd | Voice processing system, method for allocating acoustic and/or written character strings to words or lexical entries |
US7383172B1 (en) * | 2003-08-15 | 2008-06-03 | Patrick William Jamieson | Process and system for semantically recognizing, correcting, and suggesting domain specific speech |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107895193A (en) * | 2017-11-13 | 2018-04-10 | 北京神州泰岳软件股份有限公司 | A kind of construction of knowledge base method, parameter setting method and device based on dialogue |
CN107895193B (en) * | 2017-11-13 | 2020-03-13 | 中科鼎富(北京)科技发展有限公司 | Knowledge base construction method, and parameter setting method and device based on conversation |
CN108647239A (en) * | 2018-04-04 | 2018-10-12 | 顺丰科技有限公司 | Talk with intension recognizing method and device, equipment and storage medium |
CN109753640A (en) * | 2019-01-04 | 2019-05-14 | 江西理工大学应用科学学院 | A kind of text error correction method based on artificial intelligence |
EP3955099A1 (en) * | 2020-08-11 | 2022-02-16 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for controlling the operation mode of a terminal device, and storage medium |
US11756545B2 (en) | 2020-08-11 | 2023-09-12 | Beijing Xiaomi Mobile Software Co., Ltd. | Method and device for controlling operation mode of terminal device, and medium |
CN114611524A (en) * | 2022-02-08 | 2022-06-10 | 马上消费金融股份有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN114611524B (en) * | 2022-02-08 | 2023-11-17 | 马上消费金融股份有限公司 | Text error correction method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2010018453A3 (en) | 2011-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9911413B1 (en) | Neural latent variable model for spoken language understanding | |
KR102256240B1 (en) | Non-factoid question-and-answer system and method | |
US5500920A (en) | Semantic co-occurrence filtering for speech recognition and signal transcription applications | |
He et al. | A data-driven spoken language understanding system | |
US20070094004A1 (en) | Conversation controller | |
KR101136007B1 (en) | System and method for anaylyzing document sentiment | |
JP2005010691A (en) | Apparatus and method for speech recognition, apparatus and method for conversation control, and program therefor | |
WO1996000436A1 (en) | Method and system for bootstrapping statistical processing into a rule-based natural language parser | |
WO2010018453A2 (en) | System and method for processing electronically generated text | |
EP2317507B1 (en) | Corpus compilation for language model generation | |
US8504359B2 (en) | Method and apparatus for speech recognition using domain ontology | |
CN107507613B (en) | Scene-oriented Chinese instruction identification method, device, equipment and storage medium | |
CN110866390B (en) | Method and device for recognizing Chinese grammar error, computer equipment and storage medium | |
JP6775465B2 (en) | Dialogue rule collation device, dialogue device, dialogue rule collation method, dialogue method, dialogue rule collation program, and dialogue program | |
Chistikov et al. | Improving prosodic break detection in a Russian TTS system | |
Spiegler | Machine learning for the analysis of morphologically complex languages | |
Li et al. | Discriminative data selection for lightly supervised training of acoustic model using closed caption texts | |
US11984116B2 (en) | Method and system for unsupervised discovery of unigrams in speech recognition systems | |
JP2005157602A (en) | Conversation control device, conversation control method, and those programs | |
Athanasopoulou et al. | Using lexical, syntactic and semantic features for non-terminal grammar rule induction in spoken dialogue systems | |
Wang et al. | Macrosyntactic Segmenters of a French spoken corpus | |
Esteve et al. | On the use of linguistic consistency in systems for human-computer dialogues | |
Wutiwiwatchai et al. | Hybrid statistical and structural semantic modeling for Thai multi-stage spoken language understanding | |
Bhowmik et al. | Development of A Word Based Spell Checker for Bangla Language | |
Wang et al. | Macrosyntactic segmenters of a spoken French Corpus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09806509 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09806509 Country of ref document: EP Kind code of ref document: A2 |