JPS6079486A

JPS6079486A - Character recognizing system

Info

Publication number: JPS6079486A
Application number: JP58186755A
Authority: JP
Inventors: Saiji Kageyama; 斎司蔭山; Hirohide Endo; 遠藤　裕英
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-10-07
Filing date: 1983-10-07
Publication date: 1985-05-07
Also published as: JPH0479036B2

Abstract

PURPOSE:To reduce a cost of an OCR by generating an OCR use standard pattern from a character generator. CONSTITUTION:An observing part 21 converts an input character pattern 20a to an electric signal 21a. This electric signal 21a is converted to a unit character pattern 22a by a pre-processing part 22, and also converted to a feature 23a by a feature extracting part 23. On the other hand, a feature is extracted by a feature extracting part 25, generating successively a font pattern 24a by a character generator 24, and an OCR use standard pattern 25a is obtained. Subsequently, in a matching part 26, similarity between the feature 23a and each standard pattern 25a in evaluated. Next, in a deciding part 27, basing on an evaluated value of similarity, whether an answer category 27a is outputted or an output of the answer category is refused 27a is determined.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は文字認識方式に係り、特に文字認識過程におけ
る標準パターンの作成、参照方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a character recognition system, and particularly to a standard pattern creation and reference method in the character recognition process.

[Background of the invention]

文字認識過程（与えられた印刷又は千」きされた文字パ
ターンに対し、その文字パターンの文字カテゴＩＪ　’
に決定する機能）および文字の表示印字機能（与えられ
た文字コードに対応するフォントを表示、印字する機能
）の両方をもつ装ｊ４を作成する鳴合、文字１謬、熾用
標準パターンメモリ（以下ＯＣＲ用標準標準パターンメ
モリぶ。）と文字表示、印字用パターンメモリ（キャラ
クタジェネレータ、以下文字の表示、印字用メモリとも
呼ぶ。）の両方を用意するのが普通である。しかし漢字
のように文字カテゴリ数が多い鳴合（常用漢字で１９４
５カテゴリ、ＪＩＳ、′ル１水準で２９６５カテゴ！Ｊ
ｉＪＩＳ第１．第２水準の両方で６３４９カテゴリ）両
方のメモリヲ装置に実装するとコストが高くなるという
問題点があった。Character recognition process (for a given printed or written character pattern, identify the character category IJ' of that character pattern)
The standard pattern memory (for character 1 and It is common to prepare both a standard standard pattern memory for OCR (hereinafter referred to as standard pattern memory for OCR) and a pattern memory for character display and printing (character generator, hereinafter also referred to as character display and print memory). However, like kanji, there are many character categories (Nariai (commonly used kanji with 194
5 categories, JIS, 2965 categories in 'Le 1 level! J
iJIS No. 1. There was a problem in that the cost would be high if both memory devices (6349 categories) were implemented in both devices at the second level.

この問題点を解決する一つの方法として、従来、ＯＣＩ
”を用標準パターンメモリのみを装置に実装し、文字の
表示、印字用のフォント′ｆ：ＯＣＲ用標準パターンか
ら作成する方式〔出願Ａ　４９−２６２’１８　〕が提
案されている。しかしこの従来発明にはＯＣＲ用、頭重
パターンの表現〔特徴表現〕が縮退している場合、文字
の表示又は印字に用いるフォントつまシ文字形状がその
標準パターンから復元できないという欠点があった。文
字形状が復元できない特徴表現の例としては０周辺分布
と呼ばれる特徴表現、■ストローク密度関数と呼ばれる
特徴表現、■、＜　ｌ）フエラル特徴と呼ばれる特徴表
：Ｌ＠そのこのうち（Ｃ）は第１次ペリフェラル特徴と
呼ばれ、（ｄ）は第２次ペリフェラル特徴と呼ばれる。As one method to solve this problem, conventionally, OCI
A method has been proposed [Application A 49-262'18] in which only a standard pattern memory for "" is installed in the device, and a font for displaying and printing characters is created from a standard pattern for OCR. The invention has a drawback that if the expression (feature expression) of the head weight pattern for OCR is degenerate, the font block character shape used for character display or printing cannot be restored from the standard pattern.The character shape can be restored. Examples of feature representations that cannot be performed include a feature representation called the 0-marginal distribution, ■ a feature expression called the stroke density function, ■, < l) A feature table called ferential features: L@Of which (C) is the first-order peripheral feature. (d) is called the second-order peripheral feature.

■〜＠の特徴は印刷漢字又は手書き漢字を認識する際有
効であることが示されており、従来発明の大切な目的で
ある漢字認識装置を実現する時、上記欠点は大きな障害
であった。The features of ■~@ have been shown to be effective in recognizing printed or handwritten Chinese characters, and the above-mentioned drawbacks have been a major hindrance in realizing a Chinese character recognition device, which is an important objective of the present invention.

■の特徴表現は［中日、中野他、Ｒｅｃｏｇｎ　ｉ　ｔ
　ｉｏ口ｏｆ　（：：ｈｉｎｅｓｅ　Ｃｈａｒａｃｔｅ
ｖｓ　、Ｐｒｏｓ、ｏｆ　ｔｈｅＣｏｎｆｅｖｅｎｃｅ
　ｏｎ　ｍａｃｈｉｎｅ　ｐｅｒｃｅｐｔｉｏｎ　ｏｆ
ｐａｔｔｅｒｎｓ　ａｎｄ　ｐｉｃｔｕｒｅｓ　ｈｅｌ
ｄ　ａｔ　ｔｈｅ　ＮＰＬ％Ｔｅｄｄ　ｉｎｇｔｏｎ、
〔１９７２年４月〕りに詳細に記載されている。The characteristic expression of ■ is [Chunichi, Nakano et al., Recognize
io口of （：：hinese Character
vs , Pros, of the Conference
on machine perception of
patterns and pictures hel
d at the NPL%Teddington,
It is described in detail in [April 1972].

■、■の特徴表現は「萩田他、３つの概形特徴を用いた
手１°き漢字の分類、信学論、ＶＯｌ、６３Ｄ、　Ａ　
１２、ＰＰ１０９６〜１１０１、［１９８０年１２月り
」に詳細に記；或されている。The feature expressions of ■ and ■ are based on "Hagita et al., Classification of hand-written kanji using three outline features, Theory of IEICE, Vol. 63D, A.
12, PP1096-1101, [December 1980].

[Purpose of the invention]

本発明の目的は、上記従来発明の欠点を解消し、文字認
識過程における標準パターンの作成、参照方式を提供す
ることにある。SUMMARY OF THE INVENTION An object of the present invention is to eliminate the drawbacks of the conventional inventions and provide a standard pattern creation and reference system in the character recognition process.

[Summary of the invention]

本発明では上記目的を達成するために、ＯＣＲ用標準標
準パターンメモリなく、文字の表示、印字用パターンメ
モリ（キャラクタジェネレータ）のみを装置に実装し、
キャラクタジェネレータが発生するフォントからＯＣＲ
用標準標準ターンを作成しながら参照することによシ、
文字認識を行えるようにしだ。その結果従来発明の欠点
を解決し文字形状が復元できない縮退した特徴表現もＯ
ＣＲ用標準標準パターンて使用できるようになった。In order to achieve the above object, the present invention does not have a standard standard pattern memory for OCR, but only a pattern memory (character generator) for character display and printing is installed in the device,
OCR from font generated by character generator
For reference while creating standard turns,
Enables character recognition. As a result, the drawbacks of the conventional invention are solved, and degenerate feature expressions that cannot restore character shapes are eliminated.
The standard standard pattern for CR can now be used.

またＯＣＲ用標準標準パターンメモリ装を不要にしたこ
とによシ、従来と同様なメモリ容量にすることができた
。さらに印刷文字）くターン１に認識する場合、印刷文
字パターンの発生源であるフォントから本発明の方法で
作成したＯ　ＣＬｔ用標準・々ターンは、従来の印字さ
れた・くターンをスキャナで電気信号に変換してから作
成したＯ　Ｃ、ｆｔ用標準パターンよシ、文字品質が高
いという特色がある。Furthermore, by eliminating the need for a standard pattern memory device for OCR, the memory capacity can be kept the same as before. Furthermore, when recognizing printed characters (printed characters) in turn 1, the OCLt standard turns created by the method of the present invention from the font that is the source of the printed character pattern can be used to convert the conventional printed It is characterized by high character quality compared to standard patterns for OC and FT created after conversion to signals.

この特色を反映して文字認識種度（特に品質の高い文字
パターンに対する）を高くできるという効果もある。There is also the effect that character recognition degree (particularly for high-quality character patterns) can be increased by reflecting this feature.

本発明ｔｖ基本手順を第２図のＰ　Ａ　Ｄ　（Ｐｒｏｇ
ｒａｍＡｎａｌｙｓｉｓ　ｌ）ｉａｇｒａｍ）に示す。The basic procedure of the TV according to the present invention is shown in FIG.
ramAnalysis l)iagram).

本ＰＮＤについての４つの注意事項を次に示す。The following are four precautions regarding this PND.

■　手順（２）で冴られた特徴ｆと手順（３−１）で得
らｎたＯ（、Ｒ用標準パターンの特徴ｇ目　、・・・・
・・＋ｇｌｊ＋・・・・・・＋　ｇ　ＩＪ（１）は、手
順（３−２）で整合がとれるような対応する形式でなけ
ればならない。例えばｆとｇｚ（１くｊくＪ（す）がベ
クトルで表現される場合、それらの次元数と各次元につ
いての変域を対応させる必要がある。■ Feature f refined in step (2) and n obtained in step (3-1) O(, feature g of standard pattern for R,...
...+glj+...+ g IJ (1) must be in a corresponding format that allows for consistency in step (3-2). For example, when f and gz(1×J(su)) are expressed as vectors, it is necessary to correspond the number of dimensions and the range of each dimension.

■　手順（３−１）における前処理（主に大きさ、傾き
などの正規化）は不要であることが多い。その代表例は
、手順（２）において未知文字パターンＵをフォントパ
ターンＰ＋に合わせる正規化を行うことができる場合で
ある。(2) Preprocessing (mainly normalization of size, slope, etc.) in step (3-1) is often unnecessary. A typical example is the case where it is possible to normalize the unknown character pattern U to match the font pattern P+ in step (2).

■　手順（２）の特殊ケースとして、Ｕに何も施さすｆ
がＵそのものである。場合がある。■ As a special case of step (2), do nothing to U f
is U itself. There are cases.

■　手順（３−１）の特殊ケースとして、Ｐｌに何も施
さすｇ　＋＋−Ｐ　」、　ＰＯ）−１である場合もある
。■ As a special case of step (3-1), there is a case where nothing is applied to Pl, g++-P'', PO)-1.

[Embodiments of the invention]

以下、本発明の実施例について説明する。 Examples of the present invention will be described below.

第３図は本発明の一実施例を示す図である。FIG. 3 is a diagram showing an embodiment of the present invention.

図において、初めに人力文字パターン２０ａが観測部２
１に人力される。観測部２１は入力文字パターン２０２
を眠気信号２１ａに変換する。眠気信号２１ａは２２の
処理によシ、前処理と切シ出しの一方又は両者が施され
た陵、単位文字パターン２２ａに変換される。前処理は
特徴抽出部２３の中でなされることもある。単位文字パ
ターン２２ａは特徴抽出部２３によシ特徴２３ａに変換
される。こうして入力文字パターン２０ａが特徴２３ａ
に変換される。In the figure, the human character pattern 20a first appears on the observation unit 2.
1 is man-powered. The observation unit 21 receives the input character pattern 202
is converted into a drowsiness signal 21a. The drowsiness signal 21a is converted into a unit character pattern 22a which has been subjected to one or both of preprocessing and cutting. Preprocessing may also be performed within the feature extraction unit 23. The unit character pattern 22a is converted into a feature 23a by the feature extractor 23. In this way, the input character pattern 20a becomes the feature 23a.
is converted to

一方、キャラクタジェネレータ２４に１幀次フォントパ
ターン２４ａｉ発生させながら、そのフォントパターン
２４ａに前処理（主に正規化）と特徴抽出を２５によシ
施し、ＯＣＲ用標準標準パターン２５ａる。On the other hand, while the character generator 24 generates the first-order font pattern 24ai, the font pattern 24a is subjected to preprocessing (mainly normalization) and feature extraction by the font 25, and is then used as an OCR standard pattern 25a.

フォントパターン２４ａに前処理、特徴抽出を施してＯ
，ＣＲ用標準パターン２５ａを得た例を第３図（ａ）を
用いて説明する。The font pattern 24a is subjected to preprocessing and feature extraction.
, an example of obtaining the CR standard pattern 25a will be described with reference to FIG. 3(a).

３１のフォントパターンをｚ（Ｘ、ｙ）、Ｘ＝１、・・
・・・・＋　Ｎｌ　＋　）’　＝１　＋・・・・・・、
Ｎ２と衣す。このフォントパターン”　’（ｘ＋　ｙ　
）に対し、（Ｘ＠）周辺分布３２Ｐ＋（Ｘ）と（ｙ軸）
周辺分布３３　Ｐ２０’）を次式によシ計算し、ＯＣＲ
用標準標準パターン２５ａた。31 font patterns z (X, y), X=1,...
...+ Nl + )' = 1 + ......,
Named N2. This font pattern "'(x+y
), (X@) marginal distribution 32P + (X) and (y axis)
The marginal distribution 33 P20') is calculated using the following formula, and OCR
standard standard pattern 25a.

ｊＰ２（Ｙ）−Σ：　ｚ（Ｘ＋　Ｙ　）　＊　ｙ＝＝ｌ
　ｌ　＋・曲、　Ｎ２！領！またフォントパターン２４ａから０ＣＦｔ用標準パター
ンを得るだめの前処理、特徴抽出の方法としては、文字
認識の分野で知られている謙々な方法を適用することが
できる。jP2(Y)-Σ: z(X+Y) *y==l
l+・Song, N2! Territory! Further, as a method for preprocessing and feature extraction to obtain a standard pattern for 0CFt from the font pattern 24a, a modest method known in the field of character recognition can be applied.

次に整合部２６において、特徴２３ａと各標準パターン
２５ａ（特徴表現）との間の類似性を評価する。続いて
判定部２７において、上記類似性の評価値に基づいて、
答カテゴリ２７　ａ　ｉ出力するか、又は答カテゴリの
出力を拒否（リジェクト２７ｂ）するかを決定する。Next, the matching unit 26 evaluates the similarity between the feature 23a and each standard pattern 25a (feature expression). Next, in the determination unit 27, based on the similarity evaluation value,
Decide whether to output the answer category 27 ai or to reject the output of the answer category (reject 27b).

２０ａ、２１ａ＋・・・・・・などの信号の制御部は、
制御部２８で行う。＠３図においては、図のけん雑化を
さけるため制呻信号線を省略した。２１゜２２、・・・
・・・、２８の各処理部と記憶部は、光電変換装置、μ
ｍプロセッサ、メモリなどによシ実現することができる
。２４のキャラクタジェネレータには不揮発性メモリを
用いる。The control unit for signals such as 20a, 21a+, etc.
This is done by the control unit 28. In Figure @3, the suppression signal line has been omitted to avoid cluttering the diagram. 21°22,...
. . , 28 processing units and storage units include a photoelectric conversion device, μ
It can be realized by m processors, memory, etc. Non-volatile memory is used for the 24 character generators.

第４図は本発明の他の一実施例を示す図である。FIG. 4 is a diagram showing another embodiment of the present invention.

本実施例はワードプロセラ力或いは文字印刷装置）４１
と文字認識装置ｔ　４２　ｉ一体とした装置で、キャラ
クタジェネレータ４３を共用した場合である。This embodiment is a word processor or character printing device) 41
This is a case where the character generator 43 is shared by a device that is integrated with a character recognition device t42i.

文字認識装置４２は、キャラクタジェネレータ４３のフ
ォントパターンに前処理、特徴抽出などを施して標準パ
ターンを作成し、参照する。The character recognition device 42 performs preprocessing, feature extraction, etc. on the font pattern of the character generator 43 to create a standard pattern, and refers to the standard pattern.

ワードプロセッサ（或いは文字印刷装置）４１はキャラ
クタジェネレータにフォントを発生させ、それを文字の
表示又は印字に利用する。A word processor (or character printing device) 41 causes a character generator to generate a font, and uses it for displaying or printing characters.

〔Effect of the invention〕

本発明によれば、ＯｃＲ用標準標準パターン価格のキャ
ラクタジェネレータから作成して、参照できるので、０
ＣＲ（特に漢字０ＣＲ）の価格を低くすることができる
。またＯＣＲ用標準パターン作成用の上記キャラクタジ
ェネレータを文字の表示、印字にも併用することにょシ
、コストパフォーマンスをさらに改善することができる
。According to the present invention, since it can be created from the character generator of the standard standard pattern price for OcR and referred to, 0
The price of CR (especially Kanji 0CR) can be lowered. Furthermore, cost performance can be further improved by using the character generator for creating standard patterns for OCR in combination with character display and printing.

第２図は本発明を実行するためのＰ　’Ａ　Ｄを示す図
、第３図は本発明の一久施例の構成を示す図、第４図は
本発明の他の実施例の構成を示す図である。FIG. 2 is a diagram showing the P'A D for carrying out the present invention, FIG. 3 is a diagram showing the configuration of a first embodiment of the present invention, and FIG. 4 is a diagram showing the configuration of another embodiment of the present invention. It is a diagram.

Claims

[Claims] 1. Character recognition that evaluates the class [personality] between an unknown character pattern and a number of standard character patterns and determines the output of an answer category based on the evaluation value of its robustness. In processing,
A character recognition method characterized in that the standard character pattern is obtained by performing preprocessing, feature extraction, or other conversions on the above-mentioned font patterns or ondo patterns generated by a character generator. 2. A character recognition method as set forth in claim 1, characterized in that the font pattern generated by the character generator is also used for displaying characters, stamping, etc.