JPS59128681A

JPS59128681A - Character reader

Info

Publication number: JPS59128681A
Application number: JP58003001A
Authority: JP
Inventors: Keiji Kobayashi; 啓二小林; Masataka Yamamoto; 山本　勝敬
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1983-01-12
Filing date: 1983-01-12
Publication date: 1984-07-24

Abstract

PURPOSE:To select a candidate character string from a word dictionary while remarking a character with high similarity and to recognize the character string for a short time by using the 1st grade or the 2nd grade of a recognition dictionary in accordance with the using frequency of character. CONSTITUTION:A character entered in a business form 1 is scanned by a scanner in a character reader and the feature of the character is extracted by a feature extracting means 2. Subsequently, the similarity is calculated from the feature variable of the input character extracted by the 1st similarity calculating means 3 and the feature variable of a reference pattern stored in the 1st recognition dictionary 4. The reference pattern of a character with high using frequency is stored in the dictionary 4, and while remarking the recognized character with high similarity by a deciding means 5 inputting the operation result of the means 3, a character string including a candidate character to be recognized is selected from the word dictionary 6. If no reference pattern in the dictionary 4 exists in the character string selected from the dictionary 6, the similarity is found with the 2nd similarity calculating means 8 by using the 2nd recognition dictionary 7. Thus, the character string can be recognized for a short time.

Description

【発明の詳細な説明】〔発明の技術分野〕この発明は、帳票などに記入された、例えば複数の文字
からなる文字列について、単語辞書を用いてこの文字列
を認識して読み取る文字読取り装置に関するものである
。[Detailed Description of the Invention] [Technical Field of the Invention] The present invention provides a character reading device that uses a word dictionary to recognize and read a character string composed of, for example, a plurality of characters written on a form or the like. It is related to.

[Prior art]

従来この種の文字読取り装置においては、帳票などに記
入され文字すべての認識対象文字に対して基準パターン
を有し、入力文字と上記基準パターンとの類似度を計算
することにより、文字を決定していた。例えば、漢字を
含む日本文を読み取る場合は、日本国では常用漢字が広
く普及しているため、約１・０００〜ｚｏｏｏ字種を認
識すれば使用字種をほぼカバーすることができる。しか
るに、使用頻度の少ない常用漢字以外の漢字の使用も皆
無とは云えず、この漢字を含めると約５，０００〜１０
，０００字種の認識が必要である。したがって、この場
合には、使用頻度の多少にかかわらず、入力文字とすべ
ての認識対象文字の基準パターンとの類似度の計算を行
なわなければならぬため、認識対象文字が増加するにと
もない、その処理時間が増大するという欠点があった。Conventionally, this type of character reading device has a reference pattern for all characters to be recognized written on a form, etc., and determines the character by calculating the similarity between the input character and the above reference pattern. was. For example, when reading Japanese text that includes kanji, since common kanji are widely used in Japan, recognizing about 1,000 to zooo character types can cover most of the character types used. However, it cannot be said that there are no kanji other than common kanji that are used less frequently, and if these kanji are included, there are about 5,000 to 10 kanji.
,000 character types need to be recognized. Therefore, in this case, it is necessary to calculate the degree of similarity between the input character and the reference pattern of all recognition target characters, regardless of how frequently they are used, so as the number of recognition target characters increases, There was a drawback that processing time increased.

[Summary of the invention]

この発明は、上記のような従来のものの欠点を除去する
目的でなされたもので、帳票などに記録された文字を認
識するに当シ、使用頻度の高い文字の認識辞書を用いて
入力文字の類似度を求め、類似度の高い文字に注目して
単語辞書から文字列を選択し、この文字列に使用頻度の
低い文字が含まれている場合は、さらに使用頻度の低い
文字の認識辞書を用いて類似度を求める仁とにより、文
字列を短時間に認識することができるようにした文字読
取り装置を提供するものである。This invention was made for the purpose of eliminating the above-mentioned drawbacks of the conventional methods.In order to recognize characters recorded on forms, etc., this invention uses a recognition dictionary for frequently used characters to recognize input characters. Find the similarity, select a string from a word dictionary by focusing on characters with high similarity, and if this string contains characters that are used less frequently, select a recognition dictionary for characters that are used less frequently. The object of the present invention is to provide a character reading device that can recognize character strings in a short time by using a method to determine similarity.

[Embodiments of the invention]

以下、この発明の実施例について説明する。第１図はこ
の発明の一実施例である文字読取り装置の内部構成要素
を示す！ロック図である。図において、１は帳票、２は
特徴抽出手段、３は第１の類似度計算手段、４は第１の
認識辞書、５は決定手段、６は単語辞書、７は第２の認
識辞書、Ｂは第２の類似度計算手段である。Examples of the present invention will be described below. FIG. 1 shows the internal components of a character reading device which is an embodiment of the present invention! It is a lock diagram. In the figure, 1 is a form, 2 is a feature extraction means, 3 is a first similarity calculation means, 4 is a first recognition dictionary, 5 is a determination means, 6 is a word dictionary, 7 is a second recognition dictionary, B is the second similarity calculation means.

次に、上記第１図に示されるように構成された文字読取
シ装置の動作について説明する。まず、帳票！上に記入
された文字を走査装置（図示しない）で走査し、特徴抽
出手段２により文字の特徴を抽出する。次いで、第１の
類似度計算手段３では、特徴抽出手段２で抽出した入力
文字の特徴量と第１の認識辞書４に格納されている基準
パターンの特徴量とから類似度を計算する。なお、第１
の認識辞書４内には使用頻度の多い文字の基準・母３− ターンが格納されている。決定手段５では類似度の高い
認識候補文字に注目し、との認識候補文字を含む文字列
を単語辞書６から選択する。選択された文字列中の文字
がすべて第１の認識辞書４内の基準パターン中に存在す
れば、文字列毎に第１の類似度計算手段３で求めた各文
字毎の類似度の総和を求め、この総和の最大値を持つ文
字列を認識結果とする。また、単語辞書６から選択され
た文字列中に、第１の認識辞書４内の基準ツヤターン中
に存在しない文字、すなわち第２の認識辞書７内に基準
ｚ４ターンを持つ文字があれば、この文字についてのみ
、さらに第２の類似度計算手段８を用いて類似度を求め
、各文字毎の類似度の総和の最大値を持つ文字列を認識
結果とする。Next, the operation of the character reading device configured as shown in FIG. 1 will be explained. First, the ledger! The characters written above are scanned by a scanning device (not shown), and the features of the characters are extracted by the feature extraction means 2. Next, the first similarity calculation means 3 calculates the similarity from the feature amount of the input character extracted by the feature extraction means 2 and the feature amount of the reference pattern stored in the first recognition dictionary 4. In addition, the first
The recognition dictionary 4 stores standards and base turns of frequently used characters. The determining means 5 focuses on recognition candidate characters with a high degree of similarity, and selects a character string including the recognition candidate character from the word dictionary 6. If all the characters in the selected character string exist in the reference pattern in the first recognition dictionary 4, the sum of the similarities for each character calculated by the first similarity calculating means 3 for each character string is calculated. The character string with the maximum value of this sum is the recognition result. Furthermore, if there is a character in the character string selected from the word dictionary 6 that does not exist in the standard gloss turn in the first recognition dictionary 4, that is, a character with a standard z4 turn in the second recognition dictionary 7, this The second similarity calculating means 8 is used to calculate the similarity only for the characters, and the character string having the maximum sum of the similarities for each character is taken as the recognition result.

第２図、第３図及び第４図は、第１図の文字読取シ装置
の動作を説明するための文字１文字列。FIGS. 2, 3, and 4 show one character string for explaining the operation of the character reading device shown in FIG. 1.

類似度などの具体例を示す図である。FIG. 3 is a diagram showing a specific example of similarity and the like.

第２図において、９は「楔形」の入力文字、１０は第１
の認識辞書４内の基準・ぐターンめ文字例であり、仁の
文字例１０はＩＪ　　、ｒ形」　。In Figure 2, 9 is a "wedge-shaped" input character, and 10 is the first
This is an example of the standard Gutanme character in the recognition dictionary 4, and the character example 10 of jin is IJ, r form.

４− 「状」の各文字１３，１４．１５で示される。4- It is indicated by each character 13, 14, and 15 of "state".

１１は第２の認識辞書フ内の基準ノ母ターンの文字例で
あ）、この文字例１１はｒｎｌＪ　、　ｒ牙」の各文字
１６．１丁で示される。１２は単語辞書６内の文字列例
であり、この文字列例１２は「象形」。11 is a character example of the standard mother turn in the second recognition dictionary (F), and this character example 11 is represented by 16.1 characters each of ``rnlJ'' and ``r fang''. Reference numeral 12 is an example of a character string in the word dictionary 6, and this example character string 12 is "elephant".

「楔形」、「楔状」、「東牙」の各文字列１Ｂ。Each character string 1B is "cuneiform", "cuneiform", and "toga".

１９、ａｏ、２１で示される。19, ao, and 21.

第３図においては、「楔形」の入力文字９を第１の類似
度計算手段３で求めた結果、類似度の高い候補文字が「
形」の候補文字２２であｊｌ）、ｒ模Ｊの文字１６に対
する類似度の高い候補文字は存在しない場合を示してい
る。単語辞書６内の文字列例１２のうちで、第２番目に
「形」のある「象形」。In FIG. 3, as a result of calculating the input character 9 of "cuneiform" by the first similarity calculation means 3, candidate characters with high similarity are "
In the candidate character 22 of ``shape'' (jl), there is no candidate character that has a high degree of similarity to the character 16 of r model J. Among the 12 character string examples in the word dictionary 6, "geograph" is the second word with "shape".

「楔形」の各文字列１８．１９が候補文字列として選択
される。Each "cuneiform" character string 18 and 19 is selected as a candidate character string.

第４図においては、「楔形」の入力文字９に対する「象
形ｊ　ｒ楔形」の各文字列１８．１９の類似度を示して
いる。すなわち、「象形」の文字列ＩＢに対する類似度
は、第１の類似度計算手段３から求まる「象」に対する
符号ｚ３で示す類似度０．５０と、「形」に対する符号
２４で示す類似度０．９５とから求まる。また、「楔形
」の文字列１９に対する類似度は、第２の類似度計算手
段８から求まるｒｍＪに対する符号２５で示す類似度０
．９０と、第１の類似度計算手段３から求まる「形」に
対する符号２６で示す類似度０．９５とから求まる。す
なわち、「象形」の文字列１８に対する類似度は０．５
０　＋０．９５＝１．４５となり、「楔形」の文字列１
９に対する類似度は０．９０＋０．９５＝１．８５とな
るため、「楔形」の入力文字９は類似度の高い「楔形」
の文字列１９として認識される。In FIG. 4, the degree of similarity of each character string 18 and 19 of "elephant j r wedge shape" with respect to the input character 9 of "cuneiform" is shown. In other words, the similarity of "elephant" to the character string IB is 0.50, indicated by the code z3, for "elephant" determined by the first similarity calculating means 3, and 0, indicated by the code 24 for "shape". It can be found from .95. Further, the degree of similarity to the character string 19 of "cuneiform" is 0, which is indicated by the symbol 25 to rmJ found by the second similarity calculation means 8.
．． 90 and a similarity of 0.95 indicated by the reference numeral 26 for the "shape" obtained from the first similarity calculation means 3. In other words, the degree of similarity of "Glyph" to the character string 18 is 0.5.
0 +0.95 = 1.45, and the "cuneiform" character string 1
The similarity to 9 is 0.90 + 0.95 = 1.85, so the input character 9 of "cuneiform" is "cuneiform" with high similarity.
is recognized as character string 19.

したがって、従来のこの種の文字読取シ装置におけるよ
うに、入力文字と認識辞書内のすべての基準パターンと
の類似度の計算を行なうものに比べて、上記したこの発
明の文字読取シ装置では、使用頻度が多く、入力文字と
の類似度が高い文字に注目して候補文字列を絞り、さら
に、この候補文字列中に使用頻度の少ない文字が存在す
るときのみ、この文字について類似度を求めるため、類
似度計算は、第１の認識辞書１内の文字及び候補文字列
中に存在する第２の認識辞書７内の文字についてのみ行
なえば良い。このため、非常に高速度に文字列をｇ識す
ることができる。また、具体的には、第１の認識辞書４
として、例えば高速のＲＯＭ　（リード・オンリ・メモ
リ）を使用し、第２の認識辞書７として、例えば低速、
大容量の磁気ディスクを使用することにより容易に実現
することが可能である。Therefore, compared to a conventional character reading device of this type that calculates the degree of similarity between an input character and all reference patterns in a recognition dictionary, the character reading device of the present invention described above Narrow down candidate character strings by focusing on characters that are frequently used and have a high degree of similarity to the input character, and then calculate the similarity of these characters only when there are characters that are used less frequently in this candidate character string. Therefore, similarity calculation need only be performed for characters in the first recognition dictionary 1 and characters in the second recognition dictionary 7 that are present in the candidate character string. Therefore, character strings can be recognized at a very high speed. Moreover, specifically, the first recognition dictionary 4
As the second recognition dictionary 7, for example, a high-speed ROM (read-only memory) is used, and as the second recognition dictionary 7, for example, a low-speed,
This can be easily achieved by using a large-capacity magnetic disk.

なお、上記実施例では入力文字列内の文字数が２個の場
合について説明したが、この発明はこれに限定されるこ
となく、入力文字数と単語辞書内の文字数が一致すれば
良く、文字数には限定されることがない。Although the above embodiment describes the case where the number of characters in the input character string is two, the present invention is not limited to this, and it is sufficient that the number of input characters matches the number of characters in the word dictionary; Not limited.

〔Effect of the invention〕

この発明は以上説明したように、文字の使用頻度が高い
か、低いかによシ認識辞書を階層化（２段）して使用し
、第１段目として使用頻度の高い文字の認識辞書を用い
て入力文字の類似度を求め、類似度の高い文字に注目し
て単語辞書から候補文７− 字列を選択し、この中に使用頻度の低い文字が含まれて
いる場合は、さらに第２段目として使用頻度の低い文字
の認識辞書を用いて類似度を求めるように構成したので
、類似度計算の回数を極力低減させることができるとと
もに、極めて高速度に入力文字列を認識することができ
るという優れた効果を奏するものである。As explained above, this invention uses a hierarchical (two-stage) recognition dictionary for characters with high or low frequency of use, and uses a recognition dictionary of characters with high frequency of use as the first stage. Find the similarity of the input characters, focus on the characters with high similarity, and select candidate sentence 7 from the word dictionary. Since the system is configured to calculate similarity using a recognition dictionary for characters that are used infrequently in each step, it is possible to reduce the number of similarity calculations as much as possible, and to recognize input character strings at extremely high speed. This has excellent effects.

[Brief explanation of drawings]

第１図はこの発明の一実施例である文字読取シ装置の内
部構成要素を示す！ロック図、第２図。第３図及び第４図は、第１図の文字読取シ装置の動作を
説明するだめの文字２文字列、類似度などの具体例を示
す図である。図において、ｌ・・・帳票、２・・・特徴抽出手段、３
・・・第１の類似度計算手段、番・・・第１の認識辞書
、５・・・決定手段、６・・・単語辞書、７・・・第２
の認識辞書、８・・・第２の類似度計算手段である。代理人　葛野信− ８−FIG. 1 shows the internal components of a character reading device which is an embodiment of the present invention. Lock diagram, Figure 2. 3 and 4 are diagrams showing specific examples of two character strings, similarities, etc., for explaining the operation of the character reading device shown in FIG. 1. In the figure, l... form, 2... feature extraction means, 3
... first similarity calculation means, number ... first recognition dictionary, 5 ... determination means, 6 ... word dictionary, 7 ... second
recognition dictionary, 8... second similarity calculation means. Agent Makoto Kuzuno- 8-

Claims

[Claims]

In a character reading device that reads characters recorded on a form or the like by &[, the character extraction means scans the characters and extracts their characteristics, and the first recognition target character type standard and main turn are stored. a recognition dictionary; a first similarity calculation means for calculating similarity from the characteristics of the input character and the characteristics of the base turn (reference in the first recognition dictionary); The second one that stores the standard t4 turn of the Song Dynasty characters
a second similarity calculation means for calculating similarity from the features of the input characters and the features of the reference pattern in the second recognition 1-leaf dictionary; A word dictionary storing character strings composed of many types of characters to be recognized, and a determining means for recognizing and determining the character strings, the character having a high degree of similarity determined by the first similarity calculating means. is selected from the word dictionary, and if the selected character string is composed only of the first recognition target character type, the character string for each character calculated by the first similarity calculation means is The similarity of the entire character string is calculated from the similarity, and if the selected character string includes a second recognition target character type, the similarity with this second recognition target character type is calculated from the second recognition target character type. The similarity of each character with the first recognition target character type obtained by the first similarity calculation means and the second similarity calculation means obtained by the second similarity calculation means. A character reading device that recognizes a character string by determining the similarity of the entire character string from the similarity of each character with a character type to be recognized.