CN104391589B - A kind of Chinese and English Mixed design content identification method based on record of keys - Google Patents
A kind of Chinese and English Mixed design content identification method based on record of keys Download PDFInfo
- Publication number
- CN104391589B CN104391589B CN201410764964.3A CN201410764964A CN104391589B CN 104391589 B CN104391589 B CN 104391589B CN 201410764964 A CN201410764964 A CN 201410764964A CN 104391589 B CN104391589 B CN 104391589B
- Authority
- CN
- China
- Prior art keywords
- user
- dictionary
- standard
- keyboard
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/02—Input arrangements using manually operated switches, e.g. using keyboards or dials
- G06F3/023—Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
- G06F3/0233—Character input methods
- G06F3/0237—Character input methods using prediction or retrieval techniques
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses the Chinese and English Mixed design content identification method based on record of keys, specific steps include that the Keyboard Message of user is converted into keyboard sequence;It reads record file and current input state is converted into standard translation format;It reads standard translation format and is searched from the specific reduction dictionary of user, standard restoration dictionary, or a best candidate item is looked for be substituted, complete reduction step;Displaying translation result is simultaneously changed.User's Keyboard Message is associated with user's keyboard sequence standard on message window by the present invention, current input state is identified with the state automatic identification algorithm based on window, improve the accuracy rate of identification, some keyboard sequences omitted for user, it is searched in dictionary, one best candidate item of selection substitutes, for single keyboard sequence string, the efficiency of lookup is improved by the lookup algorithm based on reverse dictionary entry method, the present invention corrects interface equipped with user simultaneously, and user can carry out artificial correction to the file restored.
Description
Technical field
The present invention relates to computer realm, specifically a kind of Chinese and English Mixed design content recognition side based on record of keys
Method.
Background technology
With information-based development, computer is deep into the every aspect in people's life, and keyboard input is as main
Interactive mode plays an important role in internet exchange and routine office work, but is directed to be restored according to Keyboard Message and use
The problem of family is originally inputted, there are no an effective solution schemes.Field is restored in current input, there are no ripe skills
Art scheme, the problem of being primarily present, embody in the following areas:
First, user is during input, may switch window at any time, and the frequency of switch window is relatively high, one
As method cannot by window and input message relating get up.
Secondly, user is during input, and due to the auto-complete function of input method, user can omit some keyboard sequences
Row typically omit the latter half of phonetic, result in going wrong during reduction in this way, cannot correctly match
As a result.
Furthermore input method determines epidemic situation comparison difficulty, due on the market there are many input method, and cutting between input method
It changes key to differ, switching method is different inside input method, and when causing to restore input by user, the state of input method, which determines, is not allowed
Really.
Finally, the result accuracy rate of reduction is not high, and the probability for unisonance allograph wherein occur is relatively high.
To be the one of keyboard input reduction field if can directly or indirectly be solved the above problems by certain methods
Quantum jump.
Invention content
The purpose of the present invention is to provide a kind of search efficiency height, the high Chinese and English based on record of keys of recognition accuracy
Mixed design content identification method, to solve the problems mentioned in the above background technology.
To achieve the above object, the present invention provides the following technical solutions:
A kind of Chinese and English Mixed design content identification method based on record of keys, is as follows:
(1) Keyboard Message of user is converted into keyboard sequence during input by user, removed in keyboard sequence
Noise information carries out merger processing, and persistence according to the number of windows input frames to user's keyboard sequence;
(2) record file is read, current input state is identified using the state automatic identification algorithm based on window, then will
Recognition result is converted to standard translation format;
(3) standard translation format is read, is searched first using for the specific reduction dictionary of user, then reuses mark
Quasi- reduction dictionary, uses the lookup algorithm based on reverse dictionary entry to the character string in each reference format, is turned over
Translate as a result, for match less than result look for a best candidate item to substitute in dictionary, complete reduction step;
(4) translation result is showed into user, user corrects interface by user and modifies, for wherein translating not just
True result and unisonance allograph is modified, and these modifications are added in the specific reduction dictionary of user, and preservation most terminates
Fruit.
Compared with prior art, the beneficial effects of the invention are as follows:
User's Keyboard Message is associated with user's keyboard sequence standard on message window by the present invention, with the shape based on window
State automatic identification algorithm identify current input state, improve the accuracy rate of identification, for user omit some keyboard sequences,
It is searched in dictionary, selects a best candidate item to substitute, for single keyboard sequence string, by being based on reverse dictionary entry method
Lookup algorithm improve the efficiency of lookup, the present invention corrects interface equipped with user simultaneously, and user can be to restoring
File carries out artificial correction.
Description of the drawings
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart that Keyboard Message is converted into keyboard sequence in the present invention.
Fig. 3 is the principle schematic of the lookup algorithm based on reverse dictionary entry in the present invention.
Specific implementation mode
The technical solution of this patent is described in more detail With reference to embodiment.
- 3 are please referred to Fig.1, a kind of Chinese and English Mixed design content identification method based on record of keys, specific steps are such as
Under:
(1) Keyboard Message of user is converted into keyboard sequence during input by user, removed in keyboard sequence
Noise information carries out merger processing, and persistence according to the number of windows input frames to user's keyboard sequence;
(2) record file is read, current input state is identified using the state automatic identification algorithm based on window, then will
Recognition result is converted to standard translation format;
The standard translation format G=WQ, is made of window number W and list entries Q, wherein:W indicates a window
Mouthful number, be for identifying the keyboard list entries under the same window, thus can when window frequent switching energy
It is enough that corresponding input is sat in the right seat;The list entries on window that Q expressions are identified for window number W, Q=T1, T2,
T3 ..., list entries are a sequences being made of at least one input unit.
Each input unit is made of input state, character string and separator, i.e. T=[State] S
[Separator], wherein:T indicates that input unit, [state] indicate that the input state of this input unit T, S indicate a word
Symbol string, [Separator] indicate a separator.
Input state [state] ∈ { P, E, W }, wherein:P indicates that spelling input method, E indicate English input method, W tables
Show five-stroke input method.
Character string S [i] ∈ { 0-9, a-z, A-Z }, each character in character string S [i] belong to number, capital letter
One kind in capital volume lowercase.
The separator
[Separator] ∈ { carriage return character, newline, space, Shift, Tab, Cpas Lock, Esc, punctuation mark }, point
It is for the input of user is separated every symbol [Separator], for each input unit, there are one unique defeated
Enter state.
The principle of the state automatic identification algorithm based on window is:
In step (2), record file is read first, and the format conversion for reading record file is that standard translation format is:
G=WT1T2T3...Tn
The input state T of standard translation formati[State] is uncertain, and during common input, user is each
All there are one input method status in secondary input process, but for identification during, cannot judge current input method shape
State, because input method status cannot be captured during input by user, it is assumed that P(i,x)Indicate i-th input unit
State is a probability value of x, and the codomain of x is { P, E, W };It may be with preceding n-1 for the input method status of each input unit
The state of a input unit is related, and the distance between two input units are different, then impact factor is different, it is assumed that
R(m,i)Indicate that i-th of input unit state of state pair of m-th of input unit is the impact factor of x;It can be inputted simultaneously
Matching result in user thesaurus, D(i,x)Indicate that the state of i-th of input unit is the probability value of x, the codomain of x be P, E,
W}。
α indicates that the input unit of front i-1 then indicates dictionary to currently defeated to currently inputting the impact factors of i states, 1- α
The impact factor for entering i states, then have:P(i,x)=F(i,x)α+D(i,x)(1- α), F(i,x)The state pair of i-1 input unit before indicating
I input unit states are the influence value of x,
Then have:P(i, x) x=PExpression state is the probability of spelling input method, P(i, x) x=EExpression state is the shape of English input method
State, P(i, x) x=WExpression state is the probability of five-stroke input method, then the input state of i-th of input unit is:In these three values most
Big x values.
In the judgement of input method input state, RmiIndicate that i-th of input unit state of state pair of m-th of input unit is
The influence value of x, basically reflect the states between the i-m input unit of position two of difference to influence relationship for this influence value,
Therefore a window W can be defined, is indicated for i-th of input unit, the input unit in only preceding W length ranges
State is significant, can thus reduce parameter value, is equally reached ideal effect.It is after above-mentioned improvement then:
P(i,x)=F(i,x)α+D(i,x)(1-α)
R(l,y,x)Indicate that two input unit distances are l, the state of previously input unit is y, before this value reflects
One input unit state is the impact factor that y is x to the latter input unit state.Then for F(i,x)For, only focus on
State inside its adjacent preceding W window, does not need to pay close attention to other states.
(3) standard translation format is read, is searched first using for the specific reduction dictionary of user, then reuses mark
Quasi- reduction dictionary, uses the lookup algorithm based on reverse dictionary entry to the character string in each reference format, is turned over
Translate as a result, for match less than result look for a best candidate item to substitute in dictionary, complete reduction step;
(4) translation result is showed into user, user corrects interface by user and modifies, for wherein translating not just
True result and unisonance allograph is modified, and these modifications are added in the specific reduction dictionary of user, and preservation most terminates
Fruit.
The it is proposed of the specific reduction dictionary of the user and standard restoration dictionary be due to user's input information during make
It is determined with input method input habit, when user inputs, certain particular brand input method can be used, made for a long time
During, input method can record the use habit of user, and in the later input process of user, can be according to user's
Input habit matching input;Standard restoration dictionary is system dictionary, similar with the standard dictionary in input method, reflects input method
The dictionary of standard, the specific reduction dictionary of user are according in user's correcting module in step (4), according to the amendment knot of user
Fruit builds the specific reduction dictionary of user, and each modification all can dynamically be added to the specific reduction dictionary of access customer, afterwards
Reduction process in can be restored using the specific reduction dictionary of these users.
The lookup algorithm based on reverse dictionary entry:Internal storage data is that computer is effectively organized in memory
The mode of dictionary entry, for every dictionary entry, structure is:
Item={ message [], results [], result_length, pointers [], pointer_length }
Wherein message indicates keyboard sequence in dictionary, and the keyboard sequence in message will be reverse, and results is used for indicating to match
To dictionary as a result, result_length is used to indicate that the length of dictionary result, pointers is used for indicating partial indexes,
Pointer_length is used for indicating the length of partial indexes.
Results []={ vector < result > }
Pointers []={ vector < pointer > }
Index is a kind of data structure in memory, is in order to which the data that user uses more quickly are accessed, originally
Index includes in the part based on input unit global index global_index [] and based on this entry character information in invention
Deposit index;It is that will be incited somebody to action according to the similarities and differences of dictionary entry the first two character based on input unit global index global_index []
Its true address information real_addr in memory record, then hash function is used to identify indirect index memory
That deposited inside address ind_addr, each indirect index memory address ind_addr is true address information real_addr.
Hash function is:H (k)=(int) (k-'a');
Indicate that the function of indirect index memory address is:Ind_addr=h (message [0]) * 26+h (message [1])
The function of true address is:Real_addr=global_index [ind_addr]
It is normal search dictionary be using being searched one by one in dictionary to memory, some of which lookup be it is useless,
Such as the dictionary entry to be searched is:The dictionary entry of abab, comparison are acab, and latter one dictionary entry is acad, then
Subsequent dictionary entry acad just need not be than right, based on the partial indexes model of this entry character information in each dictionary item
It is added to a pointers [] in mesh, improves search efficiency, pointers [i] indicates to search entry check_
The preceding i-1 character of item.message is identical with the preceding i-1 character of index_item.message, is i-th of character
The address of next lookup when differing.
Lookup algorithm based on reverse dictionary entry is as follows:
Input:Search entry check_item
Output:Position pos in the dictionary of place
In step (3), matched in dictionary using the lookup algorithm based on reverse dictionary entry, if looked for
Less than then using the method based on natural language processing, analyze preceding part of speech analysis to input unit, syntactic analysis, sentence justice
A best candidate item is selected, solves auto-complete function of the user due to input method, needs the word inputted to associate user
Out to which some characters be omitted the problem of.
User's Keyboard Message is associated with user's keyboard sequence standard on message window by the present invention, with the shape based on window
State automatic identification algorithm identify current input state, improve the accuracy rate of identification, for user omit some keyboard sequences,
It is searched in dictionary, selects a best candidate item to substitute, for single keyboard sequence string, by being based on reverse dictionary entry method
Lookup algorithm improve the efficiency of lookup, the present invention corrects interface equipped with user simultaneously, and user can be to restoring
File carries out artificial correction.
The better embodiment of this patent is explained in detail above, but this patent is not limited to above-mentioned embodiment party
Formula, one skilled in the relevant art within the scope of knowledge, can also be under the premise of not departing from this patent objective
Various changes can be made.
Claims (1)
1. a kind of Chinese and English Mixed design content identification method based on record of keys, which is characterized in that be as follows:
(1) Keyboard Message of user is converted into keyboard sequence during input by user, removes the noise in keyboard sequence
Information carries out merger processing, and persistence according to the number of windows input frames to user's keyboard sequence;
(2) record file is read, current input state is identified using the state automatic identification algorithm based on window, it then will identification
As a result standard translation format is converted to;
(3) standard translation format is read, is searched first using for the specific reduction dictionary of user, then reuses standard also
Former dictionary uses based on reverse character string in each standard translation format in specific reduction dictionary and standard restoration dictionary
The lookup algorithm of dictionary entry, obtains translation result, for match less than result specific reduction dictionary and standard also
It looks for a best candidate item to be substituted in former dictionary, completes reduction step;
(4) translation result is showed into user, user corrects interface by user and modifies, incorrect for wherein translating
As a result it is modified with unisonance allograph, and these modifications is added in the specific reduction dictionary of user, preserve final result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410764964.3A CN104391589B (en) | 2014-12-11 | 2014-12-11 | A kind of Chinese and English Mixed design content identification method based on record of keys |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410764964.3A CN104391589B (en) | 2014-12-11 | 2014-12-11 | A kind of Chinese and English Mixed design content identification method based on record of keys |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391589A CN104391589A (en) | 2015-03-04 |
CN104391589B true CN104391589B (en) | 2018-09-28 |
Family
ID=52609501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410764964.3A Expired - Fee Related CN104391589B (en) | 2014-12-11 | 2014-12-11 | A kind of Chinese and English Mixed design content identification method based on record of keys |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391589B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108388346A (en) * | 2018-02-28 | 2018-08-10 | 山东师范大学 | A kind of Intelligent input mechanism and input method based on ARM and camera |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1896923A (en) * | 2005-06-13 | 2007-01-17 | 余可立 | Method for inputting English Bashu railing Chinese morphology translation intermediate text by computer |
CN101403947A (en) * | 2008-11-19 | 2009-04-08 | 黄庆传 | Computer word input method, its keyboard and mouse |
CN103399766A (en) * | 2013-07-29 | 2013-11-20 | 百度在线网络技术(北京)有限公司 | Method and device for updating input method system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8713464B2 (en) * | 2012-04-30 | 2014-04-29 | Dov Nir Aides | System and method for text input with a multi-touch screen |
-
2014
- 2014-12-11 CN CN201410764964.3A patent/CN104391589B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1896923A (en) * | 2005-06-13 | 2007-01-17 | 余可立 | Method for inputting English Bashu railing Chinese morphology translation intermediate text by computer |
CN101403947A (en) * | 2008-11-19 | 2009-04-08 | 黄庆传 | Computer word input method, its keyboard and mouse |
CN103399766A (en) * | 2013-07-29 | 2013-11-20 | 百度在线网络技术(北京)有限公司 | Method and device for updating input method system |
Also Published As
Publication number | Publication date |
---|---|
CN104391589A (en) | 2015-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7810030B2 (en) | Fault-tolerant romanized input method for non-roman characters | |
CN106202153B (en) | A kind of the spelling error correction method and system of ES search engine | |
CN106537370B (en) | Method and system for robust tagging of named entities in the presence of source and translation errors | |
CN109800414B (en) | Method and system for recommending language correction | |
CN112131920B (en) | Data structure generation for table information in scanned images | |
CN114036930A (en) | Text error correction method, device, equipment and computer readable medium | |
CN103049458A (en) | Method and system for revising user word bank | |
AU2012250880A1 (en) | Statistical spell checker | |
CN109948144A (en) | A method of the Teachers ' Talk Intelligent treatment based on classroom instruction situation | |
CN112989806A (en) | Intelligent text error correction model training method | |
CN103810161A (en) | Method for converting Cyril Mongolian into traditional Mongolian | |
Jain et al. | Detection and correction of non word spelling errors in Hindi language | |
Zelenko et al. | Discriminative methods for transliteration | |
JP2018066800A (en) | Japanese speech recognition model learning device and program | |
WO2014189400A1 (en) | A method for diacritisation of texts written in latin- or cyrillic-derived alphabets | |
CN104391589B (en) | A kind of Chinese and English Mixed design content identification method based on record of keys | |
Oprean et al. | Using the Web to create dynamic dictionaries in handwritten out-of-vocabulary word recognition | |
Doush et al. | Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction | |
JP4266222B2 (en) | WORD TRANSLATION DEVICE, ITS PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM | |
CN111209724A (en) | Text verification method and device, storage medium and processor | |
CN114548075A (en) | Text processing method, text processing device, storage medium and electronic equipment | |
NAKJAI et al. | Automatic Thai finger spelling transcription | |
CN104239294A (en) | Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system | |
Mohapatra et al. | Spell checker for OCR | |
CN115455948A (en) | Spelling error correction model training method, spelling error correction method and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180928 Termination date: 20181211 |
|
CF01 | Termination of patent right due to non-payment of annual fee |