JPS59128681A - Character reader - Google Patents
Character readerInfo
- Publication number
- JPS59128681A JPS59128681A JP58003001A JP300183A JPS59128681A JP S59128681 A JPS59128681 A JP S59128681A JP 58003001 A JP58003001 A JP 58003001A JP 300183 A JP300183 A JP 300183A JP S59128681 A JPS59128681 A JP S59128681A
- Authority
- JP
- Japan
- Prior art keywords
- character
- similarity
- dictionary
- characters
- character string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims 1
- 241000406668 Loxodonta cyclotis Species 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 1
- 238000000034 method Methods 0.000 description 1
Landscapes
- Character Discrimination (AREA)
Abstract
Description
【発明の詳細な説明】
〔発明の技術分野〕
この発明は、帳票などに記入された、例えば複数の文字
からなる文字列について、単語辞書を用いてこの文字列
を認識して読み取る文字読取り装置に関するものである
。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention provides a character reading device that uses a word dictionary to recognize and read a character string composed of, for example, a plurality of characters written on a form or the like. It is related to.
従来この種の文字読取り装置においては、帳票などに記
入され文字すべての認識対象文字に対して基準パターン
を有し、入力文字と上記基準パターンとの類似度を計算
することにより、文字を決定していた。例えば、漢字を
含む日本文を読み取る場合は、日本国では常用漢字が広
く普及しているため、約1・000〜zooo字種を認
識すれば使用字種をほぼカバーすることができる。しか
るに、使用頻度の少ない常用漢字以外の漢字の使用も皆
無とは云えず、この漢字を含めると約5,000〜10
,000字種の認識が必要である。したがって、この場
合には、使用頻度の多少にかかわらず、入力文字とすべ
ての認識対象文字の基準パターンとの類似度の計算を行
なわなければならぬため、認識対象文字が増加するにと
もない、その処理時間が増大するという欠点があった。Conventionally, this type of character reading device has a reference pattern for all characters to be recognized written on a form, etc., and determines the character by calculating the similarity between the input character and the above reference pattern. was. For example, when reading Japanese text that includes kanji, since common kanji are widely used in Japan, recognizing about 1,000 to zooo character types can cover most of the character types used. However, it cannot be said that there are no kanji other than common kanji that are used less frequently, and if these kanji are included, there are about 5,000 to 10 kanji.
,000 character types need to be recognized. Therefore, in this case, it is necessary to calculate the degree of similarity between the input character and the reference pattern of all recognition target characters, regardless of how frequently they are used, so as the number of recognition target characters increases, There was a drawback that processing time increased.
この発明は、上記のような従来のものの欠点を除去する
目的でなされたもので、帳票などに記録された文字を認
識するに当シ、使用頻度の高い文字の認識辞書を用いて
入力文字の類似度を求め、類似度の高い文字に注目して
単語辞書から文字列を選択し、この文字列に使用頻度の
低い文字が含まれている場合は、さらに使用頻度の低い
文字の認識辞書を用いて類似度を求める仁とにより、文
字列を短時間に認識することができるようにした文字読
取り装置を提供するものである。This invention was made for the purpose of eliminating the above-mentioned drawbacks of the conventional methods.In order to recognize characters recorded on forms, etc., this invention uses a recognition dictionary for frequently used characters to recognize input characters. Find the similarity, select a string from a word dictionary by focusing on characters with high similarity, and if this string contains characters that are used less frequently, select a recognition dictionary for characters that are used less frequently. The object of the present invention is to provide a character reading device that can recognize character strings in a short time by using a method to determine similarity.
以下、この発明の実施例について説明する。第1図はこ
の発明の一実施例である文字読取り装置の内部構成要素
を示す!ロック図である。図において、1は帳票、2は
特徴抽出手段、3は第1の類似度計算手段、4は第1の
認識辞書、5は決定手段、6は単語辞書、7は第2の認
識辞書、Bは第2の類似度計算手段である。Examples of the present invention will be described below. FIG. 1 shows the internal components of a character reading device which is an embodiment of the present invention! It is a lock diagram. In the figure, 1 is a form, 2 is a feature extraction means, 3 is a first similarity calculation means, 4 is a first recognition dictionary, 5 is a determination means, 6 is a word dictionary, 7 is a second recognition dictionary, B is the second similarity calculation means.
次に、上記第1図に示されるように構成された文字読取
シ装置の動作について説明する。まず、帳票!上に記入
された文字を走査装置(図示しない)で走査し、特徴抽
出手段2により文字の特徴を抽出する。次いで、第1の
類似度計算手段3では、特徴抽出手段2で抽出した入力
文字の特徴量と第1の認識辞書4に格納されている基準
パターンの特徴量とから類似度を計算する。なお、第1
の認識辞書4内には使用頻度の多い文字の基準・母3−
ターンが格納されている。決定手段5では類似度の高い
認識候補文字に注目し、との認識候補文字を含む文字列
を単語辞書6から選択する。選択された文字列中の文字
がすべて第1の認識辞書4内の基準パターン中に存在す
れば、文字列毎に第1の類似度計算手段3で求めた各文
字毎の類似度の総和を求め、この総和の最大値を持つ文
字列を認識結果とする。また、単語辞書6から選択され
た文字列中に、第1の認識辞書4内の基準ツヤターン中
に存在しない文字、すなわち第2の認識辞書7内に基準
z4ターンを持つ文字があれば、この文字についてのみ
、さらに第2の類似度計算手段8を用いて類似度を求め
、各文字毎の類似度の総和の最大値を持つ文字列を認識
結果とする。Next, the operation of the character reading device configured as shown in FIG. 1 will be explained. First, the ledger! The characters written above are scanned by a scanning device (not shown), and the features of the characters are extracted by the feature extraction means 2. Next, the first similarity calculation means 3 calculates the similarity from the feature amount of the input character extracted by the feature extraction means 2 and the feature amount of the reference pattern stored in the first recognition dictionary 4. In addition, the first
The recognition dictionary 4 stores standards and base turns of frequently used characters. The determining means 5 focuses on recognition candidate characters with a high degree of similarity, and selects a character string including the recognition candidate character from the word dictionary 6. If all the characters in the selected character string exist in the reference pattern in the first recognition dictionary 4, the sum of the similarities for each character calculated by the first similarity calculating means 3 for each character string is calculated. The character string with the maximum value of this sum is the recognition result. Furthermore, if there is a character in the character string selected from the word dictionary 6 that does not exist in the standard gloss turn in the first recognition dictionary 4, that is, a character with a standard z4 turn in the second recognition dictionary 7, this The second similarity calculating means 8 is used to calculate the similarity only for the characters, and the character string having the maximum sum of the similarities for each character is taken as the recognition result.
第2図、第3図及び第4図は、第1図の文字読取シ装置
の動作を説明するための文字1文字列。FIGS. 2, 3, and 4 show one character string for explaining the operation of the character reading device shown in FIG. 1.
類似度などの具体例を示す図である。FIG. 3 is a diagram showing a specific example of similarity and the like.
第2図において、9は「楔形」の入力文字、10は第1
の認識辞書4内の基準・ぐターンめ文字例であり、仁の
文字例10はIJ 、r形」 。In Figure 2, 9 is a "wedge-shaped" input character, and 10 is the first
This is an example of the standard Gutanme character in the recognition dictionary 4, and the character example 10 of jin is IJ, r form.
4− 「状」の各文字13,14.15で示される。4- It is indicated by each character 13, 14, and 15 of "state".
11は第2の認識辞書フ内の基準ノ母ターンの文字例で
あ)、この文字例11はrnlJ 、 r牙」の各文字
16.1丁で示される。12は単語辞書6内の文字列例
であり、この文字列例12は「象形」。11 is a character example of the standard mother turn in the second recognition dictionary (F), and this character example 11 is represented by 16.1 characters each of ``rnlJ'' and ``r fang''. Reference numeral 12 is an example of a character string in the word dictionary 6, and this example character string 12 is "elephant".
「楔形」、「楔状」、「東牙」の各文字列1B。Each character string 1B is "cuneiform", "cuneiform", and "toga".
19、ao、21で示される。19, ao, and 21.
第3図においては、「楔形」の入力文字9を第1の類似
度計算手段3で求めた結果、類似度の高い候補文字が「
形」の候補文字22であjl)、r模Jの文字16に対
する類似度の高い候補文字は存在しない場合を示してい
る。単語辞書6内の文字列例12のうちで、第2番目に
「形」のある「象形」。In FIG. 3, as a result of calculating the input character 9 of "cuneiform" by the first similarity calculation means 3, candidate characters with high similarity are "
In the candidate character 22 of ``shape'' (jl), there is no candidate character that has a high degree of similarity to the character 16 of r model J. Among the 12 character string examples in the word dictionary 6, "geograph" is the second word with "shape".
「楔形」の各文字列18.19が候補文字列として選択
される。Each "cuneiform" character string 18 and 19 is selected as a candidate character string.
第4図においては、「楔形」の入力文字9に対する「象
形j r楔形」の各文字列18.19の類似度を示して
いる。すなわち、「象形」の文字列IBに対する類似度
は、第1の類似度計算手段3から求まる「象」に対する
符号z3で示す類似度0.50と、「形」に対する符号
24で示す類似度0.95とから求まる。また、「楔形
」の文字列19に対する類似度は、第2の類似度計算手
段8から求まるrmJに対する符号25で示す類似度0
.90と、第1の類似度計算手段3から求まる「形」に
対する符号26で示す類似度0.95とから求まる。す
なわち、「象形」の文字列18に対する類似度は0.5
0 +0.95=1.45となり、「楔形」の文字列1
9に対する類似度は0.90+0.95=1.85とな
るため、「楔形」の入力文字9は類似度の高い「楔形」
の文字列19として認識される。In FIG. 4, the degree of similarity of each character string 18 and 19 of "elephant j r wedge shape" with respect to the input character 9 of "cuneiform" is shown. In other words, the similarity of "elephant" to the character string IB is 0.50, indicated by the code z3, for "elephant" determined by the first similarity calculating means 3, and 0, indicated by the code 24 for "shape". It can be found from .95. Further, the degree of similarity to the character string 19 of "cuneiform" is 0, which is indicated by the symbol 25 to rmJ found by the second similarity calculation means 8.
.. 90 and a similarity of 0.95 indicated by the reference numeral 26 for the "shape" obtained from the first similarity calculation means 3. In other words, the degree of similarity of "Glyph" to the character string 18 is 0.5.
0 +0.95 = 1.45, and the "cuneiform" character string 1
The similarity to 9 is 0.90 + 0.95 = 1.85, so the input character 9 of "cuneiform" is "cuneiform" with high similarity.
is recognized as character string 19.
したがって、従来のこの種の文字読取シ装置におけるよ
うに、入力文字と認識辞書内のすべての基準パターンと
の類似度の計算を行なうものに比べて、上記したこの発
明の文字読取シ装置では、使用頻度が多く、入力文字と
の類似度が高い文字に注目して候補文字列を絞り、さら
に、この候補文字列中に使用頻度の少ない文字が存在す
るときのみ、この文字について類似度を求めるため、類
似度計算は、第1の認識辞書1内の文字及び候補文字列
中に存在する第2の認識辞書7内の文字についてのみ行
なえば良い。このため、非常に高速度に文字列をg識す
ることができる。また、具体的には、第1の認識辞書4
として、例えば高速のROM (リード・オンリ・メモ
リ)を使用し、第2の認識辞書7として、例えば低速、
大容量の磁気ディスクを使用することにより容易に実現
することが可能である。Therefore, compared to a conventional character reading device of this type that calculates the degree of similarity between an input character and all reference patterns in a recognition dictionary, the character reading device of the present invention described above Narrow down candidate character strings by focusing on characters that are frequently used and have a high degree of similarity to the input character, and then calculate the similarity of these characters only when there are characters that are used less frequently in this candidate character string. Therefore, similarity calculation need only be performed for characters in the first recognition dictionary 1 and characters in the second recognition dictionary 7 that are present in the candidate character string. Therefore, character strings can be recognized at a very high speed. Moreover, specifically, the first recognition dictionary 4
As the second recognition dictionary 7, for example, a high-speed ROM (read-only memory) is used, and as the second recognition dictionary 7, for example, a low-speed,
This can be easily achieved by using a large-capacity magnetic disk.
なお、上記実施例では入力文字列内の文字数が2個の場
合について説明したが、この発明はこれに限定されるこ
となく、入力文字数と単語辞書内の文字数が一致すれば
良く、文字数には限定されることがない。Although the above embodiment describes the case where the number of characters in the input character string is two, the present invention is not limited to this, and it is sufficient that the number of input characters matches the number of characters in the word dictionary; Not limited.
この発明は以上説明したように、文字の使用頻度が高い
か、低いかによシ認識辞書を階層化(2段)して使用し
、第1段目として使用頻度の高い文字の認識辞書を用い
て入力文字の類似度を求め、類似度の高い文字に注目し
て単語辞書から候補文7−
字列を選択し、この中に使用頻度の低い文字が含まれて
いる場合は、さらに第2段目として使用頻度の低い文字
の認識辞書を用いて類似度を求めるように構成したので
、類似度計算の回数を極力低減させることができるとと
もに、極めて高速度に入力文字列を認識することができ
るという優れた効果を奏するものである。As explained above, this invention uses a hierarchical (two-stage) recognition dictionary for characters with high or low frequency of use, and uses a recognition dictionary of characters with high frequency of use as the first stage. Find the similarity of the input characters, focus on the characters with high similarity, and select candidate sentence 7 from the word dictionary. Since the system is configured to calculate similarity using a recognition dictionary for characters that are used infrequently in each step, it is possible to reduce the number of similarity calculations as much as possible, and to recognize input character strings at extremely high speed. This has excellent effects.
第1図はこの発明の一実施例である文字読取シ装置の内
部構成要素を示す!ロック図、第2図。
第3図及び第4図は、第1図の文字読取シ装置の動作を
説明するだめの文字2文字列、類似度などの具体例を示
す図である。
図において、l・・・帳票、2・・・特徴抽出手段、3
・・・第1の類似度計算手段、番・・・第1の認識辞書
、5・・・決定手段、6・・・単語辞書、7・・・第2
の認識辞書、8・・・第2の類似度計算手段である。
代理人 葛野信−
8−FIG. 1 shows the internal components of a character reading device which is an embodiment of the present invention. Lock diagram, Figure 2. 3 and 4 are diagrams showing specific examples of two character strings, similarities, etc., for explaining the operation of the character reading device shown in FIG. 1. In the figure, l... form, 2... feature extraction means, 3
... first similarity calculation means, number ... first recognition dictionary, 5 ... determination means, 6 ... word dictionary, 7 ... second
recognition dictionary, 8... second similarity calculation means. Agent Makoto Kuzuno- 8-
Claims (1)
シ装置において、文字を走査してその特徴を抽出する特
徴抽出手段と、第1の認識対象字種基準・母ターンを格
納した第1の認識辞書と、入力文字の特徴と前記第1の
認識辞書内の基準)母ターンの特徴とから類似度を求め
る第1の類似度計算手段と、前記第1の認識対象字以外
の第2の醪穢対宋文字の基準t4ターンを格納した第2
の書、3磯辞書と、前記入力文字の特徴と前記第2の認
1葉辞書内の基準パターンの特徴とから類似度を求める
第2の類似度計算手段と、前記第1及び第2の各認識対
象字種多構成される文字列を格納した単語辞書と、前記
文字列を認識して決定する決定手段とを具備し、前記第
1の類似度計算手段で求めた類似度の高い文字を含む文
字列を前記単語辞書から選択し、この選択された文字列
が第1の認識対象字種のみで構成されている場合は、前
記第1の類似度計算手段で求めた1文字毎の類似度から
文字列全体の類似度を求め、また、選択された文字列中
に第2の認識対象字種が含まれる場合は、この第2の認
識対象字種との類似度を前記第2の類似度計算手段によ
り求め、前記第1の類似度計算手段で求めた第1の認識
対象字種との1文字毎の類似度と、前記第2の類似度計
算手段で求めた第2の認識対象字種との1文字毎の類似
度とから、文字列全体の類似度を求めて文字列を認識す
ることを特徴とする文字読取り装置。In a character reading device that reads characters recorded on a form or the like by &[, the character extraction means scans the characters and extracts their characteristics, and the first recognition target character type standard and main turn are stored. a recognition dictionary; a first similarity calculation means for calculating similarity from the characteristics of the input character and the characteristics of the base turn (reference in the first recognition dictionary); The second one that stores the standard t4 turn of the Song Dynasty characters
a second similarity calculation means for calculating similarity from the features of the input characters and the features of the reference pattern in the second recognition 1-leaf dictionary; A word dictionary storing character strings composed of many types of characters to be recognized, and a determining means for recognizing and determining the character strings, the character having a high degree of similarity determined by the first similarity calculating means. is selected from the word dictionary, and if the selected character string is composed only of the first recognition target character type, the character string for each character calculated by the first similarity calculation means is The similarity of the entire character string is calculated from the similarity, and if the selected character string includes a second recognition target character type, the similarity with this second recognition target character type is calculated from the second recognition target character type. The similarity of each character with the first recognition target character type obtained by the first similarity calculation means and the second similarity calculation means obtained by the second similarity calculation means. A character reading device that recognizes a character string by determining the similarity of the entire character string from the similarity of each character with a character type to be recognized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58003001A JPS59128681A (en) | 1983-01-12 | 1983-01-12 | Character reader |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP58003001A JPS59128681A (en) | 1983-01-12 | 1983-01-12 | Character reader |
Publications (1)
Publication Number | Publication Date |
---|---|
JPS59128681A true JPS59128681A (en) | 1984-07-24 |
Family
ID=11545123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP58003001A Pending JPS59128681A (en) | 1983-01-12 | 1983-01-12 | Character reader |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS59128681A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013246721A (en) * | 2012-05-28 | 2013-12-09 | Nippon Telegr & Teleph Corp <Ntt> | Character string recognition device, character string recognition program, and storage medium |
-
1983
- 1983-01-12 JP JP58003001A patent/JPS59128681A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013246721A (en) * | 2012-05-28 | 2013-12-09 | Nippon Telegr & Teleph Corp <Ntt> | Character string recognition device, character string recognition program, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2973944B2 (en) | Document processing apparatus and document processing method | |
JP2726568B2 (en) | Character recognition method and device | |
JP2734386B2 (en) | String reader | |
JPH07200745A (en) | How to compare at least two image sections | |
US20010051965A1 (en) | Apparatus for rough classification of words, method for rough classification of words, and record medium recording a control program thereof | |
CN110180186A (en) | A kind of topographic map conversion method and system | |
JPS59128681A (en) | Character reader | |
JPH06215184A (en) | Extraction area labeling device | |
JPH08287188A (en) | Character string recognition device | |
JP3151866B2 (en) | English character recognition method | |
JP4328511B2 (en) | Pattern recognition apparatus, pattern recognition method, program, and storage medium | |
JP2746345B2 (en) | Post-processing method for character recognition | |
JP2671533B2 (en) | Character string recognition method and apparatus thereof | |
JP2947832B2 (en) | Word matching method | |
JP4209511B2 (en) | Character recognition method, character recognition device, and computer-readable recording medium recording a program for causing a computer to execute the character recognition method | |
JP2006072520A (en) | Information processor, its method and its program recording medium | |
JPH0746363B2 (en) | Drawing reader | |
JPH11203406A (en) | Character segmenting method, character recognizing method, character recognition device, and recording medium | |
JP2976445B2 (en) | Character recognition device | |
JP2002092549A (en) | Character recognition method and storage medium | |
JP3100786B2 (en) | Character recognition post-processing method | |
JP2727755B2 (en) | Character string recognition method and apparatus | |
JP2973898B2 (en) | Character recognition method and device | |
JPS60110089A (en) | Character recognizer | |
JP2839515B2 (en) | Character reading system |