[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN1549192A - Computer identification and automatic inputting method for hand writing character font - Google Patents

Computer identification and automatic inputting method for hand writing character font Download PDF

Info

Publication number
CN1549192A
CN1549192A CNA031190782A CN03119078A CN1549192A CN 1549192 A CN1549192 A CN 1549192A CN A031190782 A CNA031190782 A CN A031190782A CN 03119078 A CN03119078 A CN 03119078A CN 1549192 A CN1549192 A CN 1549192A
Authority
CN
China
Prior art keywords
vector
hand
computer
font
written script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA031190782A
Other languages
Chinese (zh)
Other versions
CN100485711C (en
Inventor
周非凡
程卓
凡东
曾俊玲
张惠捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CNB031190782A priority Critical patent/CN100485711C/en
Publication of CN1549192A publication Critical patent/CN1549192A/en
Application granted granted Critical
Publication of CN100485711C publication Critical patent/CN100485711C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The hand writing distinguishing and inputting method in computer includes the following steps: image pre-treatment of the hand writing input from scanner; hand writing font extraction including line separation with the horizontal projection of text line and character separation with the vertical projection of text character; establishing template of computer font and hand written font including font characteristic vector extraction and classification; character matching including font characteristic extraction and matching in computer; and hand writing distinction via establishing the corresponding relation between hand writing font and computer font. The present invention is simple and convenient, and can facilitate man-machine conversation.

Description

Computer identification and the automated input methodology of hand-written script
Technical field
The present invention relates to computer identification and the automated input methodology in Chinese information processing technology field, particularly a kind of hand-written script.
Background technology
The identification of the computer of hand-written script and automatic input system are one of the present natural language processing field very problems of focus. Its major function is embodied in can process arbitrarily hand-written contribution, and the panel computer that popular handwriting pad and Microsoft release on the market has solved the time-consuming problem of words input to a certain extent, has embodied the superiority of office automation. But handwriting pad and panel computer also have very large shortcoming: expensive, common people are difficult to bear; During use, also need carry. In addition, for hand-written script, for example write the hand-written manuscript on paper, and be printed on the written historical materials such as hand-written script on the carrier and font, computer can't be accomplished automatically identification and automatically input at present, needs by manual identified and input.
Summary of the invention
Technical problem to be solved by this invention is: computer identification and automated input methodology that a kind of hand-written script is provided, it not only makes calculates the automatic identification of function by the hand-written manuscript of scanner input, and can identify simultaneously hand-written script and the font on the carrier of being printed on by scanner input, and the pictorial information of text is converted into the character code form that computer can directly be processed, finish the computer of text and automatically input.
The present invention solves the technical scheme of its technical problem, comprising:
1) hand-written script of scanner input carried out visual pretreated step;
2) extraction of hand-written script font, its step comprises: row cutting and character segmentation,
The row cutting utilizes the floor projection of line of text to carry out cutting,
Character segmentation utilizes the upright projection of text word to carry out cutting;
3) modeling of computer font, its step comprises: the font style characteristic vector extracts and sorts out;
4) modeling of hand-written script, the same with the modeling procedure of computer font;
5) characters matching, its step comprises: the font style characteristic vector of computer extracts and coupling,
The font style characteristic vector of computer extracts, finished by the modeling procedure of computer font,
The font style characteristic Vectors matching of computer comprises the coupling of single character and detection coupling and the error correction of sentence;
6) identification of hand-written script the steps include:
After hand-written script has carried out feature extraction, carry out feature coding according to font style characteristic vector classifying method,
After each stack features is finished coding, at first in feature database, seek its respectively index value of correspondence,
After the index codes of correspondence found, next step was exactly the rule of correspondence according to mapping table, sought corresponding standard GB/T code by its corresponding index codes, thereby set up the corresponding of hand-written script and computer font;
Above-mentioned steps 1) to 5) be the step of automated input methodology.
Major advantage of the present invention is as follows:
One. can make calculate function automatically identification can automatically identify hand-written script and the font on the carrier of being printed on by the scanner input simultaneously by the hand-written manuscript of scanner input.
They are two years old. and the pictorial information of text can be converted into the character code form that computer can directly be processed, finish the computer of text and automatically input.
They are three years old. and easy to use: the writer need only provide hand-written manuscript, can operate computer by itself or other people, with the hand-written manuscript such as the various manuscripts of scanner input, mail, note, signature and be printed on hand-written script on the carrier and the written historical materials such as font, finish automatic identification and input, thereby solved veritably the problem that can not input, realized convenient man-machine dialog interface.
They are four years old. need not to typewrite again, laborsaving, save time, less manpower. Support the use with printer, just can print above-mentioned written historical materials, thereby solved veritably the problem consuming time of input, can save duplicator simultaneously.
They are five years old. and application prospect is very open: be applicable to office, publishing house and newspapers and periodicals society, and individual's use etc., market potential is large.
Description of drawings
Fig. 1 is main program flow chart of the present invention.
Fig. 2 is the floor projection schematic diagram of row cutting.
Fig. 3 is the upright projection schematic diagram of character segmentation.
Fig. 4 is the schematic diagram that the image of single hand-written script is carried out the upper and lower, left and right projection.
Fig. 5 is with the schematic diagram of left to the quantification image that is projected as example.
Fig. 6 be with left to be projected as the example differential after the image schematic diagram.
The specific embodiment
The invention will be further described below in conjunction with embodiment and accompanying drawing.
One. flow process
Comprise:
1) hand-written script of scanner input carried out visual pretreated step;
As shown in Figure 1, also comprise:
2) extraction of hand-written script font, its step comprises: row cutting and character segmentation,
The row cutting utilizes the floor projection of line of text to carry out cutting,
Character segmentation utilizes the upright projection of text word to carry out cutting;
3) modeling of computer font, its step comprises: the font style characteristic vector extracts and sorts out;
4) modeling of hand-written script, the same with the modeling procedure of computer font;
5) characters matching, its step comprises: the font style characteristic vector of computer extracts and coupling,
The font style characteristic vector of computer extracts, finished by the modeling procedure of computer font,
The font style characteristic Vectors matching of computer comprises the coupling of single character and detection coupling and the error correction of sentence;
6) identification of hand-written script the steps include:
After hand-written script has carried out feature extraction, carry out feature coding according to font style characteristic vector classifying method.
After each stack features is finished coding, at first in feature database, seek its respectively index value of correspondence.
After the index codes of correspondence found, next step was exactly the rule of correspondence according to mapping table, and seeking corresponding internal code by its corresponding index value is the standard GB/T code, thereby set up the corresponding of handwritten form and computer font. But seeking in the middle of the process of code, may return back out that now a plurality of hand-written scripts are to a computer font or appearance, a hand-written script perhaps occurring does not have computer font corresponding with it. Such problem should be solved by corpus-based and statistical language model. Determine the correspondence of the two by the method for probability.
Above-mentioned steps 1) to 5) be the step of automated input methodology.
Two. visual preliminary treatment (known technology)
Handwritten paper at first exists by the form of scanner with picture, then carries out the initialization process of picture, and picture is quantized to make dot matrix (comprising colouring intensity).
Removal paper lattice and so on standard " hot-tempered sound ": for the paper lattice, be different because it has the color of very large standardization and general and font, choose this type of color dot and then remove, can achieve the goal.
Remove stains: the dot matrix that stains manifest is the continuous dot matrix of a slice, and generally more even, for above-mentioned characteristics, can obtain its edge, removes to get final product.
Three. the extraction of hand-written script font
1. go cutting:
Isolation between row and the row, because existence gap clearly between the row, so the performance on the binaryzation dot matrix is the zone that consists of that forms by 0. Utilize the floor projection of line of text to carry out cutting. The purpose of row cutting is from a width of cloth document image, calculates the bound of delegation's literal pixel, thereby obtains line of text.
Because the people has started writing in hand-written process weight minute, utilize gray scale can better embody difference between gap and the handwritten word row.
The method of row cutting is: utilize one group of horizon light alignment shape to do irradiation, thereby obtain projection at a certain coordinate direction, the gray scale of this projection is by how many tolerance of covered " luminous flux ", and formula is,
v y = Σ x = 0 sx f 1 ( x , y ) f ( x , y ) - - - ( 1 )
In the formula: f1(x, y) is the text gray scale image, and f (x, y) is the binary picture of document image, SxSize for document image.
Between hand-written manuscript is capable and capable very large spacing is arranged generally, but consider again " hot-tempered sound ", so establish a very little very little threshold values v1, if coordinate figure is lower than threshold values, just can think the interval of line of text, if be higher than v1, then can think the shared zone of font itself, so just line of text can be separated accurately.
2. character segmentation:
Line of text just can be carried out the cutting between the word after separating. Because be based on the identification of characteristic vector, so, need to be syncopated as single handwritten word from interline. Between each Chinese character the space is arranged, utilize this space hand-written script can be separated. Generally enough spaces are arranged between the Chinese character, utilize this space to be conducive to the separation of font, but because handwritten form generally has related stroke, demarcation interval is isolated so can not determine the size in each shared interval of word. Make with the light sciagraphy at this and to isolate computing, sciagraphy is to utilize one group of vertical light alignment shape to do irradiation, thereby obtains projection at a certain coordinate direction. If this " shade " has gray scale, then with covered " luminous flux " what tolerance. The outer of this shade is a curve, can make the shape on plane be converted into plane curve. Because it is light to connect the stroke of pen, also is a little less than the gray scale, for better embodying separating effect, utilize gray scale to calculate.
v ( x ) = Σ y = 1 sy f 1 ( x , y ) f ( x , y ) - - - ( 2 )
In the formula: f1(x, y) is the text gray scale image, and f (x, y) is the binary picture of document image, SySize for document image.
Adopting gray scale image is because of the people's unavoidably company's of having pen appearance in the process of writing, and it is generally light than normal stroke to connect pen, and good embodiment can be arranged on gray-scale map, can more significantly represent the space in v (x). Detect the minimum of a value min (x) of v (x), establish a threshold values v2, think the hand-written script region for the point of v (x)>v2, think interval region between word and the word for the point of v (x)<v2.
By formula (1), (2), basically can reflect the position at each hand-written script place, namely be syncopated as the absolute version of hand-written script.
Four. the modeling of computer font
1. the font style characteristic vector extracts:
1) sets up the characteristic vector of type matrix: the dot matrix of first image of the single hand-written script that obtains after the cutting being set up a standard, namely be that horizontal direction equates with the vertical direction function upper bound, build up 0/1 dot matrix, for example the image that cuts out is grouped into the geometric center of 48 * 48 dot matrix, namely be that horizontal direction equates with the vertical direction function upper bound, for the extraction of feature is prepared, not process if do not do these, the similarity of literal relatively just can't correctly be carried out. The projection of handwritten word and the dot matrix of standard are compared, carry out binary conversion treatment, this process is finished by the pretreated step of image.
Then, the image of single hand-written script (for example " in " word font) is carried out the upper and lower, left and right projection, obtain the image (seeing Fig. 4) of four stack features vectors.
This figure has reflected rising and the downward trend of stroke, and the waveform definition among the figure is edge function H1 (X), H2 (X), H1 (Y), the H2 (Y) of type matrix. Edge function has abundant information, and the feature of a handwritten word nearly all can show at edge function. In the text of reality, because different fonts, different symbols is even same font also is not wide and not contour, and the position of cutting also can not be accurate in the junction of two fonts, and these all or the accurate extraction of the strong above-mentioned feature of impact.
2) set up the edge function of type matrix: H1 (X), H2 (X), H1 (Y), H2 (Y). Edge function is some rough curves, is unfavorable for carrying out the extraction of characteristic value, and available formula (3) quantizes, and quantizes image and asks for an interview Fig. 5, and this figure is take left to as the example projection.
3) quantize edge function: formula is,
h ( x ) = Σ x 1 = 0 b 1 ( H ( x 1 ) + H ( x 1 + b 1 m ) ) [ u ( x - x 1 ) - u ( x - x 1 - b 1 m ) ] / 2 - - - ( 3 )
4) characteristic vector of type matrix is extracted: to the quantification edge function of H1 (X), H2 (X), four edge functions foundation of H1 (Y), H2 (Y), respectively four stack features vectors are carried out differential, obtain four groups of vector combinations that consisted of by impulse function. The differential image is asked for an interview Fig. 6, and this figure is routine to being projected as with left.
Can extract three stack features vector by following method for each group impulse function:
Each impulse function represents a direction, with left to be projected as example, positive direction be designated as 1, reciprocally be designated as 0, rearrange sequentially a characteristic vector group S1;
Between per two impulse functions an interval is arranged, write down the ratio at all intervals, for example a (1): a (2): a (3) ... .a (n);
The amplitude of each impulse function can be different, write down the ratio of the amplitude of all impulse functions, b (1) for example, b (2), b (3) ... .b (n);
The like, obtaining different directions is the vector of upper and lower, left and right direction.
Computer font also can be set up vector on the direction of upper and lower, left and right for each computer font.
2. the font style characteristic vector is sorted out:
The amount of calculation that compares in view of characteristic value is too large, proposes a kind ofly to build storehouse thought based on coding.
1) coding
Amplitude vector embodies the fluctuating of font, and its coding method is:
An amplitude vector b (1) is arranged, b (2), b (3) ... .b (n), n are natural number, such data are deposited in computer and are not easy to management and retrieval. Make that b (1) is 1, if b (2)>b (1), b (2)=1 then, otherwise, b (2)=0, promote that then can be expressed as formula as follows:
Figure A0311907800091
If it is 1:4:5:2:3:6 that an amplitude vector is arranged, then corresponding coding is 1:1:1:0:1:1.
Blank vector, the stroke that embodies font distributes, and its coding method is identical with the coding method of amplitude vector.
Symbolic vector, its coding is finished in front, and corresponding equally is the vector that consists of by 1 and 0.
2) example
The coding example of amplitude vector, blank vector, symbolic vector please sees attached list respectively one, two, three.
Five. the modeling of hand-written script
The same with the modeling procedure of computer font.
Six. characters matching
Its step comprises: the font style characteristic vector of computer extracts and coupling.
The font style characteristic vector of computer extracts, and is finished by the modeling procedure of computer font.
The font style characteristic Vectors matching of computer comprises the coupling of single character and detection coupling and the error correction of sentence.
1. the coupling of single character
1) for each Chinese character is corresponding with the call number in characteristic vector storehouse, should set up the property data base concordance list to computer font. In the matching process of the characteristic vector of carrying out afterwards, reducing the calculating of similarity, improve the discrimination of system, is the large characteristic that the present invention designs.
Step is as follows:
By the coding of the characteristic vector of upper and lower, left and right projection, set up the characteristic vector storehouse after the mixing, the hybrid code in the whole characteristic vector storehouse is arranged according to Gray code;
Convert word-base code to 2 system forms;
Set up one by the mapping table (see Table seven) of characteristic vector storehouse to word-base code, word-base code adopts national standard coding GB.
2) between characteristic vector data storehouse and character library, set up concordance list, each Chinese character is encoded, utilize known encode character for computer to carry out Chinese character index.
The foundation in characteristic vector data storehouse comprises:
Six characteristic vectors formerly each Chinese character having been set up, impulse function on the X-axis is as example, set up a list and deposit the ratio at the interval of impulse function, set up the ratio that a list is deposited the amplitude of impulse function, set up the symbol sequence valve that a list is deposited impulse function;
Same foundation is based on three lists of Y-axis;
Then encode;
The indexed sequential of list is performed as follows mode to be arranged:
X------>>Y,
Symbolic vector----〉〉blank vector----〉〉ratio of amplitude,
Symbolic vector only have two kinds of positive and negatives may, represent with 0 and 1, arrange according to the order of Gray code,
Blank vector is ratio, with the ratio integer, since first, encodes from small to large afterwards.
3) set up the example in characteristic vector storehouse with 5 characteristic vectors:
Please see attached list four, five, six.
2. the matching detection of sentence
The detection coupling of sentence, its method is: detected the corpus of being set up by phrase by ternary statistical language model method.
Corpus is at the basis of a large amount of practices statistics statement and phrase commonly used, thereby calculates prior probability and the posterior probability that each word occurs, and then according to the current word that is identified of the Word prediction that has occurred.
If wi is any one word in the text, if known its first two words wi-2 in the text, wi-1 is just can predict the probability that wi occurs with conditional probability P (wi| (wi-2) (wi-1)). The concept of Here it is statistical language model. In general, if represent in the text an arbitrarily word sequence with variable W, it is comprised of a tactic n word, i.e. W=w1w2...wn, and then statistical language model is exactly the probability P (W) that this word sequence W occurs in text. Utilize the product formula of probability, P (W) is deployable to be:
  P(W)=P(w1)P(w2|w1)P(w3|w1 w2)...P(wn|w1 w2...wn-1)
On calculating, this method is too complicated. If the probability of occurrence of any one word wi is only relevant with two words of its front, problem just can be simplified greatly. At this moment language model is called ternary model (tri-gram):
P ( W ) ≈ P ( w 1 ) P ( w 2 | w 1 ) * Π i = 1 n P ( wi | ( wi - 2 ) ( wi - 1 ) )
In general, the N meta-model is exactly to suppose that the probability of occurrence of current word is only relevant with N-1 the word of its front. Importantly these probability parameters all can calculate by Large Scale Corpus. Have such as the ternary probability:
P(wi|(wi-2)(wi-1))≈count((wi-2)(wi-1wi))/count((wi-2)(wi-1))
Cumulative number that the specific word sequence occurs in whole corpus of count (...) expression in the formula.
3. the coupling error correction of sentence:
Join probability model and code identification are identified accurately to hand-written script, and concrete steps are as follows:
Hand-written script is accessed corpus after " GB " storehouse that obtains corresponding computer font by coding, obtain the relevant density of this word and word that its front occurs, if the density of being correlated with is too little, then returns previous feature database;
Symbolic vector moves with the bound line that is no more than up and down 5 code elements, and blank vector and amplitude vector move simultaneously with the bound line that is no more than up and down 20 code elements, and mobile 10 times of each vector is accessed one time corpus;
Surpass 80% until find the probability of which time coupling, can determine the therewith corresponding relation of word of corresponding hand-written script. Reach higher discrimination. Because system directly embeds existing corpus, so do not need the process learnt.
In the very nonstandard situation of script, the error correction link that is absolutely necessary.
Seven. in sum, by a series of modeling and coding, and the utilization of corpus finally, the Chinese character recognition system of setting up, utilized the diversified means such as cutting, classification, coding, realized that computer is to the identification of handwritten word and automatically input.
Eight. subordinate list
Table one amplitude vector
Table two blank vector
Table three symbolic vector
+ -- + ......... +
1 0 1 ........ 1
Table four amplitude vector 1
Upper projection Lower projection Left projection Right projection Index value 1
 1  0  0  0  0  1  0  0  0  0  1  0  0  0  0  1  0  0  0  0
 1  0  0  0  1  1  0  0  1  1  1  0  0  0  1  1  0  0  0  1
 1  0  0  1  1  1  0  0  1  0  1  0  0  1  0  1  0  0  1  0
 1  0  0  1  0  1  0  1  1  0  1  0  0  1  1  1  0  0  1  1
 。  。   。   。 。 。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。 。 。
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
Table five blank vector 1
Upper projection Lower projection Left projection Right projection Index value 2
 1  0  0  0  0  1  0  0  0  0  1  0  0  0  0  1  0  0  0  0
 1  0  0  0  1  1  0  0  1  1  1  0  0  0  1  1  0  0  0  1
 1  0  0  1  1  1  0  0  1  0  1  0  0  1  0  1  0  0  1  0
 1  0  0  1  0  1  0  1  1  0  1  0  0  1  1  1  0  0  1  1
 。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
Table six symbolic vector 1
Upper projection Lower projection Left projection Right projection Index value 3
 1  0  0  0  0  1  0  0  0  0  1  0  0  0  0  1  0  0  0  0
 1  0  0  0  1  1  0  0  1  1  1  0  0  0  1  1  0  0  0  1
 1  0  0  1  1  1  0  0  1  0  1  0  0  1  0  1  0  0  1  0
 1  0  0  1  0  1  0  1  1  0  1  0  0  1  1  1  0  0  1  1
 。  。  。  。  。  。 。 。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。  。
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
Table seven mapping table
Index 1 Index 2 Index 3 The index coding GB
00001 00001 00001 000010000100001 011010

Claims (8)

1. the computer of a hand-written script is identified and automated input methodology, comprising:
1) hand-written script of scanner input carried out visual pretreated step; It is characterized in that also comprising:
2) extraction of hand-written script font, its step comprises: row cutting and character segmentation, the row cutting utilizes the floor projection of line of text to carry out cutting, and character segmentation utilizes the upright projection of text word to carry out cutting;
3) modeling of computer font, its step comprises: the font style characteristic vector extracts and sorts out;
4) modeling of hand-written script, the same with the modeling procedure of computer font;
5) characters matching, its step comprises: the font style characteristic vector of computer extracts and coupling, and the font style characteristic vector of computer extracts, and is finished by the modeling procedure of computer font, the font style characteristic Vectors matching of computer comprises the coupling of single character and detection coupling and the error correction of sentence;
6) identification of hand-written script the steps include:
After hand-written script has carried out feature extraction, carry out feature coding according to font style characteristic vector classifying method,
After each stack features is finished coding, at first in feature database, seek its respectively index value of correspondence,
After the index codes of correspondence found, next step was exactly the rule of correspondence according to mapping table, sought corresponding standard GB/T code by its corresponding index codes, thereby set up the corresponding of hand-written script and computer font;
Above-mentioned steps 1) to 5) be the step of automated input methodology.
2. the computer of hand-written script according to claim 1 is identified and automated input methodology, the method that it is characterized in that the row cutting is: utilize one group of horizon light alignment shape to do irradiation, thereby obtain projection at a certain coordinate direction, the gray scale of this projection is by how many tolerance of covered " luminous flux ", formula is
v y = Σ x = 0 sx f 1 ( x , y ) f ( x , y ) - - - ( 1 )
In the formula: f1(x, y) is the text gray scale image, and f (x, y) is the binary picture of document image, SxSize for document image.
3. the computer of hand-written script according to claim 1 is identified and automated input methodology, the method that it is characterized in that character segmentation is: utilize one group of vertical light alignment shape to do irradiation, thereby obtain projection at a certain coordinate direction, the gray scale of this projection is by how many tolerance of covered " luminous flux ", formula is
v ( x ) = Σ y = 1 sy f 1 ( x , y ) f ( x , y ) - - - ( 2 )
In the formula: f1(x, y) is the text gray scale image, and f (x, y) is the binary picture of document image, SySize for document image.
4. the computer of hand-written script according to claim 1 is identified and automated input methodology, it is characterized in that the method that the font style characteristic vector extracts is:
1) set up the characteristic vector of type matrix: first the image of the single hand-written script that obtains after the cutting being set up the dot matrix of a standard, namely is that horizontal direction equates with the vertical direction function upper bound, builds up 0/1 dot matrix; The projection of handwritten word and the dot matrix of standard are compared, carry out binary conversion treatment, this process is finished by the pretreated step of image; Then, the image of single hand-written script is carried out the upper and lower, left and right projection, obtains four stack features vector,
2) set up the edge function of type matrix: H1 (X), H2 (X), H1 (Y), H2 (Y),
3) quantize edge function: formula is,
h ( x ) = Σ x 1 = 0 b 1 ( H ( x 1 ) + H ( x 1 + b 1 m ) ) [ u ( x - x 1 ) - u ( x - x 1 - b 1 m ) ] / 2 - - - ( 3 )
4) characteristic vector of type matrix is extracted: to the quantification edge function of H1 (X), H2 (X), four edge functions foundation of H1 (Y), H2 (Y), respectively four stack features vectors are carried out differential, obtain four groups of vector combinations that consisted of by impulse function
Extract three stack features vector for each group impulse function by following method:
Each impulse function represents a direction, positive direction be designated as 1, reciprocally be designated as 0, rearrange sequentially a characteristic vector group S1,
Between per two impulse functions an interval is arranged, writes down the ratio at all intervals,
Write down the ratio of the amplitude of all impulse functions,
The like, obtain different directions and be the vector on the direction of upper and lower, left and right.
5. the computer of hand-written script according to claim 1 is identified and automated input methodology, it is characterized in that the method that the font style characteristic vector is sorted out is: as follows based on the coding database construction,
Amplitude vector: embody the fluctuating of font, its coding method is,
An amplitude vector b (1) is arranged, b (2), b (3) ... .b (n),
Then formula is as follows:
In the formula: make that b (1) is 1, if b (2)>b (1), b (2)=1 then, otherwise b (2)=0; N is natural number; Blank vector: the stroke that embodies font distributes, and its coding method is identical with the coding method of amplitude vector; Symbolic vector: its coding is finished in front, and corresponding equally is the vector that consists of by 1 and 0.
6. the computer of hand-written script according to claim 1 is identified and automated input methodology, it is characterized in that the coupling of single character, the steps include:
By the coding of the characteristic vector of upper and lower, left and right projection, set up the characteristic vector data storehouse after the mixing, the hybrid code in the whole characteristic vector data storehouse is arranged according to Gray code;
Convert word-base code to 2 system forms;
Set up one by the mapping table of characteristic vector storehouse to word-base code, word-base code adopts national standard coding GB;
Between characteristic vector data storehouse and character library, set up concordance list, each Chinese character is encoded, utilize known encode character for computer to carry out Chinese character index;
The foundation in characteristic vector data storehouse comprises:
1) six characteristic vectors formerly each Chinese character having been set up, impulse function on the X-axis is as example, set up a list and deposit the ratio at the interval of impulse function, set up a list and deposit the ratio of the amplitude of impulse function, set up the symbol sequence valve that a list is deposited impulse function;
2) same foundation is based on three lists of Y-axis;
3) then encode;
4) indexed sequential of list being performed as follows mode arranges:
X------>>Y,
Symbolic vector----〉〉blank vector----〉 ratio of amplitude, symbolic vector only have two kinds of positive and negatives may, represent with 0 and 1, order according to Gray code is arranged, and blank vector is ratio, with the ratio integer, since first, encode from small to large afterwards.
7. the identification of the computer of hand-written script according to claim 1 and automated input methodology is characterized in that the detection of sentence is mated, and its method is: detected the corpus of being set up by phrase by ternary statistical language model method.
8. the identification of the computer of hand-written script according to claim 1 and automated input methodology is characterized in that the coupling error correction of sentence, and its method is that join probability model and code identification are identified accurately to hand-written script, and concrete steps are as follows:
Hand-written script is accessed corpus after " GB " storehouse that obtains corresponding computer font by coding, obtain the relevant density of this word and word that its front occurs, if the density of being correlated with is too little, then returns previous feature database;
Symbolic vector moves with the bound line that is no more than up and down 5 code elements, and blank vector and amplitude vector move simultaneously with the bound line that is no more than up and down 20 code elements, and mobile 10 times of each vector is accessed one time corpus;
Surpass 80% until find the probability of which time coupling, can determine the therewith corresponding relation of word of corresponding hand-written script.
CNB031190782A 2003-05-16 2003-05-16 Computer identification and automatic inputting method for hand writing character font Expired - Fee Related CN100485711C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031190782A CN100485711C (en) 2003-05-16 2003-05-16 Computer identification and automatic inputting method for hand writing character font

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031190782A CN100485711C (en) 2003-05-16 2003-05-16 Computer identification and automatic inputting method for hand writing character font

Publications (2)

Publication Number Publication Date
CN1549192A true CN1549192A (en) 2004-11-24
CN100485711C CN100485711C (en) 2009-05-06

Family

ID=34320842

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031190782A Expired - Fee Related CN100485711C (en) 2003-05-16 2003-05-16 Computer identification and automatic inputting method for hand writing character font

Country Status (1)

Country Link
CN (1) CN100485711C (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100405278C (en) * 2005-09-14 2008-07-23 株式会社东芝 Character reader, character reading method, and character reading program
CN100578432C (en) * 2007-12-04 2010-01-06 哈尔滨工业大学深圳研究生院 Method for directly writing handwriting information
CN101881999A (en) * 2010-06-21 2010-11-10 安阳师范学院 Oracle video input system and implementation method
CN101393601B (en) * 2007-09-21 2011-08-17 汉王科技股份有限公司 Method for identifying mathematical formula of print form
CN102456136A (en) * 2010-10-29 2012-05-16 方正国际软件(北京)有限公司 Image-text splitting method and system
CN101571921B (en) * 2008-04-28 2012-07-25 富士通株式会社 Method and device for identifying key words
CN103064946A (en) * 2012-12-26 2013-04-24 天津三星通信技术研究有限公司 Method and device for storing original handwriting and method and device for searching original handwriting
CN103778250A (en) * 2014-02-19 2014-05-07 张朝亮 Implement method for Chinese wubi cursive script dictionary query system
CN105160342A (en) * 2015-08-11 2015-12-16 成都数联铭品科技有限公司 HMM-GMM-based automatic word picture splitting method and system
CN106096524A (en) * 2016-06-01 2016-11-09 广东小天才科技有限公司 Method and device for acquiring beauty degree of Chinese characters
CN109299663A (en) * 2018-08-27 2019-02-01 刘梅英 Hand-written script recognition methods, system and terminal device
CN109446873A (en) * 2018-08-27 2019-03-08 刘梅英 Hand-written script recognition methods, system and terminal device
CN110126484A (en) * 2019-05-30 2019-08-16 深圳龙图腾创新设计有限公司 A kind of printing device
CN110580351A (en) * 2017-07-04 2019-12-17 艾朝君 chinese character and Italian intercommunication mutual recognition technical method

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100405278C (en) * 2005-09-14 2008-07-23 株式会社东芝 Character reader, character reading method, and character reading program
CN101393601B (en) * 2007-09-21 2011-08-17 汉王科技股份有限公司 Method for identifying mathematical formula of print form
CN100578432C (en) * 2007-12-04 2010-01-06 哈尔滨工业大学深圳研究生院 Method for directly writing handwriting information
CN101571921B (en) * 2008-04-28 2012-07-25 富士通株式会社 Method and device for identifying key words
CN101881999B (en) * 2010-06-21 2012-11-21 安阳师范学院 Oracle video input system and implementation method
CN101881999A (en) * 2010-06-21 2010-11-10 安阳师范学院 Oracle video input system and implementation method
CN102456136B (en) * 2010-10-29 2013-06-05 方正国际软件(北京)有限公司 Image-text splitting method and system
CN102456136A (en) * 2010-10-29 2012-05-16 方正国际软件(北京)有限公司 Image-text splitting method and system
CN103064946A (en) * 2012-12-26 2013-04-24 天津三星通信技术研究有限公司 Method and device for storing original handwriting and method and device for searching original handwriting
CN103064946B (en) * 2012-12-26 2015-10-28 天津三星通信技术研究有限公司 Original handwriting store method and device, original handwriting search method and device
CN103778250A (en) * 2014-02-19 2014-05-07 张朝亮 Implement method for Chinese wubi cursive script dictionary query system
CN103778250B (en) * 2014-02-19 2017-02-15 张朝亮 Implement method for universal Chinese wubi cursive script dictionary query system
CN105160342A (en) * 2015-08-11 2015-12-16 成都数联铭品科技有限公司 HMM-GMM-based automatic word picture splitting method and system
CN106096524A (en) * 2016-06-01 2016-11-09 广东小天才科技有限公司 Method and device for acquiring beauty degree of Chinese characters
CN110580351A (en) * 2017-07-04 2019-12-17 艾朝君 chinese character and Italian intercommunication mutual recognition technical method
CN109299663A (en) * 2018-08-27 2019-02-01 刘梅英 Hand-written script recognition methods, system and terminal device
CN109446873A (en) * 2018-08-27 2019-03-08 刘梅英 Hand-written script recognition methods, system and terminal device
CN110126484A (en) * 2019-05-30 2019-08-16 深圳龙图腾创新设计有限公司 A kind of printing device

Also Published As

Publication number Publication date
CN100485711C (en) 2009-05-06

Similar Documents

Publication Publication Date Title
CN1877598A (en) Method for gathering and recording business card information in mobile phone by using image recognition
CN1549192A (en) Computer identification and automatic inputting method for hand writing character font
CN1126608C (en) Method and system for recognising routing information on letters and parcels
CN1145872C (en) Method for automatically cutting and identiying hand written Chinese characters and system for using said method
CN1276384C (en) Video stream classifiable symbol isolation method and system
CN1752992A (en) Character recognition apparatus, character recognition method, and character recognition program
CN1655583A (en) Systems and methods for generating high compression image data files having multiple foreground planes
CN1177407A (en) Method and system for velocity-based head writing recognition
CN101046848A (en) Image processing apparatus and image processing method
CN105308944A (en) Classifying objects in images using mobile devices
CN1945599A (en) Image processing device, image processing method, and computer program product
CN1458791A (en) Sectioned layered image system
CN104123550A (en) Cloud computing-based text scanning identification method
CN1141666C (en) Online character recognition system for recognizing input characters using standard strokes
CN1041773C (en) Character recognition method and apparatus based on 0-1 pattern representation of histogram of character image
CN1251130C (en) Method for identifying multi-font multi-character size print form Tibetan character
CN1310182C (en) Method, device and storage medium for enhancing document, image and character recognition
CN1664846A (en) On-line hand-written Chinese characters recognition method based on statistic structural features
CN1147652A (en) Construction method of data base for writing identifying system
CN1269060C (en) Method and system of digitizing ancient Chinese books and automatizing the content search
CN1238816C (en) Font recogtnizing method based on single Chinese characters
CN1740943A (en) A file enciphering method
CN1317664C (en) Confused stroke order library establishing method and on-line hand-writing Chinese character identifying and evaluating system
CN1308889C (en) Method, device and storage medium for character recognition
KR100655916B1 (en) Document image processing and verification system for digitalizing a large volume of data and method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090506

Termination date: 20100516