[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

JPS6079486A - Character recognizing system - Google Patents

Character recognizing system

Info

Publication number
JPS6079486A
JPS6079486A JP58186755A JP18675583A JPS6079486A JP S6079486 A JPS6079486 A JP S6079486A JP 58186755 A JP58186755 A JP 58186755A JP 18675583 A JP18675583 A JP 18675583A JP S6079486 A JPS6079486 A JP S6079486A
Authority
JP
Japan
Prior art keywords
character
pattern
feature
standard
ocr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP58186755A
Other languages
Japanese (ja)
Other versions
JPH0479036B2 (en
Inventor
Saiji Kageyama
斎司 蔭山
Hirohide Endo
遠藤 裕英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP58186755A priority Critical patent/JPS6079486A/en
Publication of JPS6079486A publication Critical patent/JPS6079486A/en
Publication of JPH0479036B2 publication Critical patent/JPH0479036B2/ja
Granted legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To reduce a cost of an OCR by generating an OCR use standard pattern from a character generator. CONSTITUTION:An observing part 21 converts an input character pattern 20a to an electric signal 21a. This electric signal 21a is converted to a unit character pattern 22a by a pre-processing part 22, and also converted to a feature 23a by a feature extracting part 23. On the other hand, a feature is extracted by a feature extracting part 25, generating successively a font pattern 24a by a character generator 24, and an OCR use standard pattern 25a is obtained. Subsequently, in a matching part 26, similarity between the feature 23a and each standard pattern 25a in evaluated. Next, in a deciding part 27, basing on an evaluated value of similarity, whether an answer category 27a is outputted or an output of the answer category is refused 27a is determined.

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は文字認識方式に係り、特に文字認識過程におけ
る標準パターンの作成、参照方法に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a character recognition system, and particularly to a standard pattern creation and reference method in the character recognition process.

〔発明の背景〕[Background of the invention]

文字認識過程(与えられた印刷又は千」きされた文字パ
ターンに対し、その文字パターンの文字カテゴIJ ’
に決定する機能)および文字の表示印字機能(与えられ
た文字コードに対応するフォントを表示、印字する機能
)の両方をもつ装j4を作成する鳴合、文字1謬、熾用
標準パターンメモリ(以下OCR用標準標準パターンメ
モリぶ。)と文字表示、印字用パターンメモリ(キャラ
クタジェネレータ、以下文字の表示、印字用メモリとも
呼ぶ。)の両方を用意するのが普通である。しかし漢字
のように文字カテゴリ数が多い鳴合(常用漢字で194
5カテゴリ、JIS、′ル1水準で2965カテゴ!J
iJIS第1.第2水準の両方で6349カテゴリ)両
方のメモリヲ装置に実装するとコストが高くなるという
問題点があった。
Character recognition process (for a given printed or written character pattern, identify the character category IJ' of that character pattern)
The standard pattern memory (for character 1 and It is common to prepare both a standard standard pattern memory for OCR (hereinafter referred to as standard pattern memory for OCR) and a pattern memory for character display and printing (character generator, hereinafter also referred to as character display and print memory). However, like kanji, there are many character categories (Nariai (commonly used kanji with 194
5 categories, JIS, 2965 categories in 'Le 1 level! J
iJIS No. 1. There was a problem in that the cost would be high if both memory devices (6349 categories) were implemented in both devices at the second level.

この問題点を解決する一つの方法として、従来、OCI
”を用標準パターンメモリのみを装置に実装し、文字の
表示、印字用のフォント′f:OCR用標準パターンか
ら作成する方式〔出願A 49−262’18 〕が提
案されている。しかしこの従来発明にはOCR用、頭重
パターンの表現〔特徴表現〕が縮退している場合、文字
の表示又は印字に用いるフォントつまシ文字形状がその
標準パターンから復元できないという欠点があった。文
字形状が復元できない特徴表現の例としては0周辺分布
と呼ばれる特徴表現、■ストローク密度関数と呼ばれる
特徴表現、■、< l)フエラル特徴と呼ばれる特徴表
:L@そのこのうち(C)は第1次ペリフェラル特徴と
呼ばれ、(d)は第2次ペリフェラル特徴と呼ばれる。
As one method to solve this problem, conventionally, OCI
A method has been proposed [Application A 49-262'18] in which only a standard pattern memory for "" is installed in the device, and a font for displaying and printing characters is created from a standard pattern for OCR. The invention has a drawback that if the expression (feature expression) of the head weight pattern for OCR is degenerate, the font block character shape used for character display or printing cannot be restored from the standard pattern.The character shape can be restored. Examples of feature representations that cannot be performed include a feature representation called the 0-marginal distribution, ■ a feature expression called the stroke density function, ■, < l) A feature table called ferential features: L@Of which (C) is the first-order peripheral feature. (d) is called the second-order peripheral feature.

■〜@の特徴は印刷漢字又は手書き漢字を認識する際有
効であることが示されており、従来発明の大切な目的で
ある漢字認識装置を実現する時、上記欠点は大きな障害
であった。
The features of ■~@ have been shown to be effective in recognizing printed or handwritten Chinese characters, and the above-mentioned drawbacks have been a major hindrance in realizing a Chinese character recognition device, which is an important objective of the present invention.

■の特徴表現は[中日、中野他、Recogn i t
 io口of (::hinese Characte
vs 、Pros、of theConfevence
 on machine perception of
patterns and pictures hel
d at the NPL%Tedd ington、
〔1972年4月〕りに詳細に記載されている。
The characteristic expression of ■ is [Chunichi, Nakano et al., Recognize
io口of (::hinese Character
vs , Pros, of the Conference
on machine perception of
patterns and pictures hel
d at the NPL%Teddington,
It is described in detail in [April 1972].

■、■の特徴表現は「萩田他、3つの概形特徴を用いた
手1°き漢字の分類、信学論、VOl、63D、 A 
12、PP1096〜1101、[1980年12月り
」に詳細に記;或されている。
The feature expressions of ■ and ■ are based on "Hagita et al., Classification of hand-written kanji using three outline features, Theory of IEICE, Vol. 63D, A.
12, PP1096-1101, [December 1980].

〔発明の目的〕[Purpose of the invention]

本発明の目的は、上記従来発明の欠点を解消し、文字認
識過程における標準パターンの作成、参照方式を提供す
ることにある。
SUMMARY OF THE INVENTION An object of the present invention is to eliminate the drawbacks of the conventional inventions and provide a standard pattern creation and reference system in the character recognition process.

〔発明の概要〕[Summary of the invention]

本発明では上記目的を達成するために、OCR用標準標
準パターンメモリなく、文字の表示、印字用パターンメ
モリ(キャラクタジェネレータ)のみを装置に実装し、
キャラクタジェネレータが発生するフォントからOCR
用標準標準ターンを作成しながら参照することによシ、
文字認識を行えるようにしだ。その結果従来発明の欠点
を解決し文字形状が復元できない縮退した特徴表現もO
CR用標準標準パターンて使用できるようになった。
In order to achieve the above object, the present invention does not have a standard standard pattern memory for OCR, but only a pattern memory (character generator) for character display and printing is installed in the device,
OCR from font generated by character generator
For reference while creating standard turns,
Enables character recognition. As a result, the drawbacks of the conventional invention are solved, and degenerate feature expressions that cannot restore character shapes are eliminated.
The standard standard pattern for CR can now be used.

またOCR用標準標準パターンメモリ装を不要にしたこ
とによシ、従来と同様なメモリ容量にすることができた
。さらに印刷文字)くターン1に認識する場合、印刷文
字パターンの発生源であるフォントから本発明の方法で
作成したO CLt用標準・々ターンは、従来の印字さ
れた・くターンをスキャナで電気信号に変換してから作
成したO C、ft用標準パターンよシ、文字品質が高
いという特色がある。
Furthermore, by eliminating the need for a standard pattern memory device for OCR, the memory capacity can be kept the same as before. Furthermore, when recognizing printed characters (printed characters) in turn 1, the OCLt standard turns created by the method of the present invention from the font that is the source of the printed character pattern can be used to convert the conventional printed It is characterized by high character quality compared to standard patterns for OC and FT created after conversion to signals.

この特色を反映して文字認識種度(特に品質の高い文字
パターンに対する)を高くできるという効果もある。
There is also the effect that character recognition degree (particularly for high-quality character patterns) can be increased by reflecting this feature.

本発明tv基本手順を第2図のP A D (Prog
ramAnalysis l)iagram)に示す。
The basic procedure of the TV according to the present invention is shown in FIG.
ramAnalysis l)iagram).

本PNDについての4つの注意事項を次に示す。The following are four precautions regarding this PND.

■ 手順(2)で冴られた特徴fと手順(3−1)で得
らnたO(、R用標準パターンの特徴g目 、・・・・
・・+glj+・・・・・・+ g IJ(1)は、手
順(3−2)で整合がとれるような対応する形式でなけ
ればならない。例えばfとgz(1くjくJ(す)がベ
クトルで表現される場合、それらの次元数と各次元につ
いての変域を対応させる必要がある。
■ Feature f refined in step (2) and n obtained in step (3-1) O(, feature g of standard pattern for R,...
...+glj+...+ g IJ (1) must be in a corresponding format that allows for consistency in step (3-2). For example, when f and gz(1×J(su)) are expressed as vectors, it is necessary to correspond the number of dimensions and the range of each dimension.

■ 手順(3−1)における前処理(主に大きさ、傾き
などの正規化)は不要であることが多い。その代表例は
、手順(2)において未知文字パターンUをフォントパ
ターンP+に合わせる正規化を行うことができる場合で
ある。
(2) Preprocessing (mainly normalization of size, slope, etc.) in step (3-1) is often unnecessary. A typical example is the case where it is possible to normalize the unknown character pattern U to match the font pattern P+ in step (2).

■ 手順(2)の特殊ケースとして、Uに何も施さすf
がUそのものである。場合がある。
■ As a special case of step (2), do nothing to U f
is U itself. There are cases.

■ 手順(3−1)の特殊ケースとして、Plに何も施
さすg ++−P 」、 PO)−1である場合もある
■ As a special case of step (3-1), there is a case where nothing is applied to Pl, g++-P'', PO)-1.

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の実施例について説明する。 Examples of the present invention will be described below.

第3図は本発明の一実施例を示す図である。FIG. 3 is a diagram showing an embodiment of the present invention.

図において、初めに人力文字パターン20aが観測部2
1に人力される。観測部21は入力文字パターン202
を眠気信号21aに変換する。眠気信号21aは22の
処理によシ、前処理と切シ出しの一方又は両者が施され
た陵、単位文字パターン22aに変換される。前処理は
特徴抽出部23の中でなされることもある。単位文字パ
ターン22aは特徴抽出部23によシ特徴23aに変換
される。こうして入力文字パターン20aが特徴23a
に変換される。
In the figure, the human character pattern 20a first appears on the observation unit 2.
1 is man-powered. The observation unit 21 receives the input character pattern 202
is converted into a drowsiness signal 21a. The drowsiness signal 21a is converted into a unit character pattern 22a which has been subjected to one or both of preprocessing and cutting. Preprocessing may also be performed within the feature extraction unit 23. The unit character pattern 22a is converted into a feature 23a by the feature extractor 23. In this way, the input character pattern 20a becomes the feature 23a.
is converted to

一方、キャラクタジェネレータ24に1幀次フォントパ
ターン24ai発生させながら、そのフォントパターン
24aに前処理(主に正規化)と特徴抽出を25によシ
施し、OCR用標準標準パターン25aる。
On the other hand, while the character generator 24 generates the first-order font pattern 24ai, the font pattern 24a is subjected to preprocessing (mainly normalization) and feature extraction by the font 25, and is then used as an OCR standard pattern 25a.

フォントパターン24aに前処理、特徴抽出を施してO
,CR用標準パターン25aを得た例を第3図(a)を
用いて説明する。
The font pattern 24a is subjected to preprocessing and feature extraction.
, an example of obtaining the CR standard pattern 25a will be described with reference to FIG. 3(a).

31のフォントパターンをz(X、y)、X=1、・・
・・・・+ Nl + )’ =1 +・・・・・・、
N2と衣す。このフォントパターン” ’(x+ y 
)に対し、(X@)周辺分布32P+(X)と(y軸)
周辺分布33 P20’)を次式によシ計算し、OCR
用標準標準パターン25aた。
31 font patterns z (X, y), X=1,...
...+ Nl + )' = 1 + ......,
Named N2. This font pattern "'(x+y
), (X@) marginal distribution 32P + (X) and (y axis)
The marginal distribution 33 P20') is calculated using the following formula, and OCR
standard standard pattern 25a.

jP2(Y)−Σ: z(X+ Y ) * y==l
 l +・曲、 N2!領! またフォントパターン24aから0CFt用標準パター
ンを得るだめの前処理、特徴抽出の方法としては、文字
認識の分野で知られている謙々な方法を適用することが
できる。
jP2(Y)-Σ: z(X+Y) *y==l
l+・Song, N2! Territory! Further, as a method for preprocessing and feature extraction to obtain a standard pattern for 0CFt from the font pattern 24a, a modest method known in the field of character recognition can be applied.

次に整合部26において、特徴23aと各標準パターン
25a(特徴表現)との間の類似性を評価する。続いて
判定部27において、上記類似性の評価値に基づいて、
答カテゴリ27 a i出力するか、又は答カテゴリの
出力を拒否(リジェクト27b)するかを決定する。
Next, the matching unit 26 evaluates the similarity between the feature 23a and each standard pattern 25a (feature expression). Next, in the determination unit 27, based on the similarity evaluation value,
Decide whether to output the answer category 27 ai or to reject the output of the answer category (reject 27b).

20a、21a+・・・・・・などの信号の制御部は、
制御部28で行う。@3図においては、図のけん雑化を
さけるため制呻信号線を省略した。21゜22、・・・
・・・、28の各処理部と記憶部は、光電変換装置、μ
mプロセッサ、メモリなどによシ実現することができる
。24のキャラクタジェネレータには不揮発性メモリを
用いる。
The control unit for signals such as 20a, 21a+, etc.
This is done by the control unit 28. In Figure @3, the suppression signal line has been omitted to avoid cluttering the diagram. 21°22,...
. . , 28 processing units and storage units include a photoelectric conversion device, μ
It can be realized by m processors, memory, etc. Non-volatile memory is used for the 24 character generators.

第4図は本発明の他の一実施例を示す図である。FIG. 4 is a diagram showing another embodiment of the present invention.

本実施例はワードプロセラ力或いは文字印刷装置)41
と文字認識装置t 42 i一体とした装置で、キャラ
クタジェネレータ43を共用した場合である。
This embodiment is a word processor or character printing device) 41
This is a case where the character generator 43 is shared by a device that is integrated with a character recognition device t42i.

文字認識装置42は、キャラクタジェネレータ43のフ
ォントパターンに前処理、特徴抽出などを施して標準パ
ターンを作成し、参照する。
The character recognition device 42 performs preprocessing, feature extraction, etc. on the font pattern of the character generator 43 to create a standard pattern, and refers to the standard pattern.

ワードプロセッサ(或いは文字印刷装置)41はキャラ
クタジェネレータにフォントを発生させ、それを文字の
表示又は印字に利用する。
A word processor (or character printing device) 41 causes a character generator to generate a font, and uses it for displaying or printing characters.

〔発明の効果〕〔Effect of the invention〕

本発明によれば、OcR用標準標準パターン価格のキャ
ラクタジェネレータから作成して、参照できるので、0
CR(特に漢字0CR)の価格を低くすることができる
。またOCR用標準パターン作成用の上記キャラクタジ
ェネレータを文字の表示、印字にも併用することにょシ
、コストパフォーマンスをさらに改善することができる
According to the present invention, since it can be created from the character generator of the standard standard pattern price for OcR and referred to, 0
The price of CR (especially Kanji 0CR) can be lowered. Furthermore, cost performance can be further improved by using the character generator for creating standard patterns for OCR in combination with character display and printing.

第2図は本発明を実行するためのP ’A Dを示す図
、第3図は本発明の一久施例の構成を示す図、第4図は
本発明の他の実施例の構成を示す図である。
FIG. 2 is a diagram showing the P'A D for carrying out the present invention, FIG. 3 is a diagram showing the configuration of a first embodiment of the present invention, and FIG. 4 is a diagram showing the configuration of another embodiment of the present invention. It is a diagram.

Claims (1)

【特許請求の範囲】 1、未知文字パターンと腹数個の標準文字パターンとの
間の類[身性を評価し、その頑似1生の評価値に基づい
て答カテゴリの出力を決定する文字認識処理において、
キャラクタジェネレータに発生させた各)、オンドパタ
ーン或いは、上記各フォントパターンに前処理、特徴抽
出などの変換を施したものを上記標準文字パターンとし
たことを特徴とする文字認識方式。 2、特許請求の範囲第1項の記載において、上記キャラ
クタジェネレータに発生きせたフォントパターンを文字
の表示、印文などにもバ1いたことを特徴とする文字認
識方式。
[Claims] 1. Character recognition that evaluates the class [personality] between an unknown character pattern and a number of standard character patterns and determines the output of an answer category based on the evaluation value of its robustness. In processing,
A character recognition method characterized in that the standard character pattern is obtained by performing preprocessing, feature extraction, or other conversions on the above-mentioned font patterns or ondo patterns generated by a character generator. 2. A character recognition method as set forth in claim 1, characterized in that the font pattern generated by the character generator is also used for displaying characters, stamping, etc.
JP58186755A 1983-10-07 1983-10-07 Character recognizing system Granted JPS6079486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58186755A JPS6079486A (en) 1983-10-07 1983-10-07 Character recognizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58186755A JPS6079486A (en) 1983-10-07 1983-10-07 Character recognizing system

Publications (2)

Publication Number Publication Date
JPS6079486A true JPS6079486A (en) 1985-05-07
JPH0479036B2 JPH0479036B2 (en) 1992-12-14

Family

ID=16194068

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58186755A Granted JPS6079486A (en) 1983-10-07 1983-10-07 Character recognizing system

Country Status (1)

Country Link
JP (1) JPS6079486A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245606A (en) * 2019-06-13 2019-09-17 广东小天才科技有限公司 Text recognition method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245606A (en) * 2019-06-13 2019-09-17 广东小天才科技有限公司 Text recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
JPH0479036B2 (en) 1992-12-14

Similar Documents

Publication Publication Date Title
CN109684911B (en) Expression recognition method and device, electronic equipment and storage medium
JP3139521B2 (en) Automatic language determination device
US6178261B1 (en) Method and system for extracting features in a pattern recognition system
WO2018223994A1 (en) Method and device for synthesizing chinese printed character image
JP3163185B2 (en) Pattern recognition device and pattern recognition method
CN110798636A (en) Subtitle generating method and device and electronic equipment
CN112241667A (en) Image detection method, device, equipment and storage medium
US20030156754A1 (en) Method and system for extracting title from document image
Belay et al. Factored convolutional neural network for amharic character image recognition
Reza et al. Table localization and segmentation using GAN and CNN
CN111079374A (en) Font generation method, device and storage medium
CN114863431A (en) Text detection method, device and equipment
Dey et al. Clean Text and Full-Body Transformer: Microsoft's Submission to the WMT22 Shared Task on Sign Language Translation
JPS6079486A (en) Character recognizing system
JP5769029B2 (en) Character recognition device, recognition dictionary generation device, and normalization method
US20240169758A1 (en) Method and system for increasing face images
Rajashekararadhya et al. Zone-based hybrid feature extraction algorithm for handwritten numeral recognition of two popular Indian scripts
Kumar et al. PCA-based offline handwritten character recognition system
Desai et al. Adversarial Network for Photographic Image Synthesis from Fine-grained Captions
Saeed Automatic recognition of handwritten arabic text: A survey
JPH06251156A (en) Pattern recognizing device
Samanta et al. Fast character recognition using Kohonen Neural Network
Philip et al. A novel bilingual OCR system based on column-stochastic features and SVM classifier for the specially enabled
Ravi et al. Fuzzy logic based character recognizer
JPH03282689A (en) Character recognizing device