JPH03161886A

JPH03161886A - Method for correcting misreading of ocr

Info

Publication number: JPH03161886A
Application number: JP1302536A
Authority: JP
Inventors: Hiroyuki Katsuyama; 勝山　弘之; Naoji Matsunoshita; 松野下　直司; Shigeaki Hitomi; 人見　茂明
Original assignee: YUNIKOSU KK
Current assignee: YUNIKOSU KK
Priority date: 1989-11-21
Filing date: 1989-11-21
Publication date: 1991-07-11

Abstract

PURPOSE:To easily search a misread character by storing plural read characters correspondingly to an image and respective character codes and displaying the image of characters correspondingly to each group of character codes belonging to the same sort. CONSTITUTION:Plural characters read out by an optical character reader (OCR) 1 are stored as an image, character codes corresponding to respective characters are also correspondingly stored, and at the time of displaying the image on a display device such as a CRT, the image of characters is displayed so as to be aligned in each group of character codes belonging to the same sort. Namely, characters recognized as the same character by the OCR 1 are displayed as a group independently of their correct reading/misreading. Since the image of characters is displayed on the display device 4 as the same character group, a misread character can easily be detected.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、ＯＣＲ　（光学文字読み取り、ｏｐｔｉｃａ
ｌ　ｃｈａｒａｃｔｏｒ　ｒｅａｄｅｒ）に関し、特に
ＯＣＲで読み取られ、認識された誤読キャラクタやりジ
エクトキャラクタを修正する方法に関する。DETAILED DESCRIPTION OF THE INVENTION (Industrial Application Field) The present invention is an OCR (optical character reading)
In particular, the present invention relates to a method for correcting misread characters and recognized characters read and recognized by OCR.

（発明の背景）コンピュータへの文字、数字、記号等のキャラクタデー
クのインプット作業において、間違ったキャラクタをイ
ンプットしてしまう（誤読）場合がある。この誤読した
キャラクタを探し、修正するのは大変面倒な作業であり
、最終的にはデータ原本とコンピュータのアウトプット
とを目視で見比べて探し、修正を加えるしかない。(Background of the Invention) When inputting character data such as letters, numbers, and symbols to a computer, the wrong characters may be input (misreading). Searching for and correcting these misread characters is a very troublesome task, and in the end, the only option is to visually compare the original data and the computer output to find and correct the characters.

容易に誤読を探しだすために、以下に挙げるような手段
が講じられている。In order to easily detect misreadings, the following measures are taken.

まずベリファイであるが、これは同一のデータを３倍の
人間、３倍のマシンを使って処理しなければならない不
効率さがある。First, there is verification, which is inefficient because it requires three times as many people and three times as many machines to process the same data.

計算チェックによっても、ハツシトータルではどのペー
ジに誤読があるかしかわからず、合計チェックや、モジ
ュラステン、モジュラスイレブンなどのチェックデジッ
ト方式や、範囲チェックやマスターチェック等の方法で
も資料のどの部分に誤読があるかはわかるが、どのキャ
ラクタであるかはわからない。従って結局はデータ原本
のその部分と、コンピュータのアウトプットのその部分
とを目視でチェックしなければならない。Even with a calculation check, Hatsushitotal can only tell which pages have misreadings, and even with methods such as total checks, check digit methods such as modulus ten and modulus eleven, range checks and master checks, there are misreadings in which parts of the material. I know if there is, but I don't know which character it is. Therefore, in the end, that part of the original data and that part of the computer output must be visually checked.

なおかつ、ベリファイ及びハッシトータル以外の方法で
は、誤読探しの網をかけられない部分が存在する。要す
るに誤読探しすらできない部分が存在するのである。Furthermore, there are some areas where it is impossible to search for misreading using methods other than verify and hash total. In short, there are parts where you can't even look for misreadings.

ＯＣＲによるデータ入力の場合にもまた誤読が存在し、
誤読キャラクタ対策が問題となる。Misreading also exists in the case of data input using OCR,
The problem is how to deal with misread characters.

ＯＣＲにおいてはキャラクタデータを正しく読み取るこ
とができない文字は以下の２種類に分けられる。In OCR, characters whose character data cannot be read correctly are divided into the following two types.

１つはりジェクトキャラクタであり、これはＯＣＲで読
み取ったイメージが演算部において何れのキャラクタに
該当するか判定できなかったものである。One is a rejected character, which is a character to which the arithmetic unit could not determine which character the image read by OCR corresponds to.

２つめは誤読キャラクタであり、間違って読んだ文字、
すなわち原本により特定したキャラクタと、ＯＣＲで認
識したキャラクタとが食い違っているものである。The second is a misread character, which is a character read incorrectly.
In other words, there is a discrepancy between the character specified by the original and the character recognized by OCR.

このうち、リジェクトキャラクタは、どのキャラクタが
リジェクトされたかＯＣＲ側で分かるので、これらを抜
き出して示すことにより修正するのは容易である。Among these characters, since the OCR side can tell which characters have been rejected, it is easy to correct them by extracting and showing them.

ところが誤読キャラクタは、ＯＣＲは正しく読んだと認
識しているので、どれが誤読キャラクタであるかをＯＣ
Ｒ側では判別できない。従って上述したような方法によ
り探しだして修正しなければならないが、キーインプッ
ト同様煩雑な作業であり、折角ＯＣＲを使用してもデー
タ入力に非常に手間がかかってしまう。However, since the OCR recognizes that the misread characters are correctly read, the OC recognizes which characters are misread characters.
It cannot be determined on the R side. Therefore, it is necessary to find and correct the data using the method described above, but this is a cumbersome task similar to key input, and even if OCR is used, data input is extremely time-consuming.

よって本発明の目的は、ＯＣＲにより読み取ったキャラ
クタのうち、誤読キャラクタを容易に探し出し、修正す
る方法を提供するものである。Therefore, an object of the present invention is to provide a method for easily finding and correcting misread characters among characters read by OCR.

（発明の構成）上記目的を達成するために請求項１に記載の発明は、Ｏ
ＣＲで読み取られた複数のキャラクタをイメージで記憶
するとともに、対応する文字コードに当てはめて記憶し
、同一種類の文字コードのグループごとに、前記キャラ
クタのイメージを対応させて表示装置に列記表示する過
程を含んでＯＣＲの誤読修正方法を構成した。(Structure of the invention) In order to achieve the above object, the invention according to claim 1
A process of storing a plurality of characters read by the CR as images, storing them by applying them to corresponding character codes, and listing and displaying the corresponding images of the characters on a display device for each group of character codes of the same type. A method for correcting OCR misreading was constructed.

請求項２に記載の発明は、表示装置に列記表示されたイ
メージを指定し、正しいキャラクタをキーインしてＯＣ
Ｒの誤読を修正する請求項１に記載のＯＣＲの誤読修正
方法を構成した。The invention according to claim 2 specifies the images listed and displayed on the display device, and inputs the correct character to input the OC.
The OCR misreading correction method according to claim 1, which corrects the misreading of R, has been constructed.

（作用効果）本発明は上記の構成としたので、次のような作用効果を
奏する。(Effects) Since the present invention has the above-described configuration, the following effects are achieved.

請求項１に記載の発明によると、まずＯＣＲで読み取られた複数のキャラクタをイメージで記
憶し、また対応する文字コードの当てはめて記憶してお
く。次にこのイメージをＣＲＴやＬＣＤ．ＰＤＰのよう
な表示装置に表示して行くのだが、その際同一種類の文
字コードのグループごとに、前記キャラクタのイメージ
を対応させて列記表示される。すなわち、表示装置上に
は、誤読したと否とに拘らず、ＯＣＲが同一キャラクタ
と認識したものが一群となって表示されていく。According to the first aspect of the invention, first, a plurality of characters read by OCR are stored as images, and corresponding character codes are applied and stored. Next, transfer this image to a CRT or LCD. When displayed on a display device such as a PDP, images of the characters are displayed in correspondence with each group of character codes of the same type. In other words, characters that the OCR recognizes as the same character are displayed as a group on the display device, regardless of whether the characters are misread or not.

よって、請求項ｌに記載の発明によると、イメージが表
示装置に同一キャラクタ群として並べられるので、表示
装置を見ながら誤読修正作業を行なう者にとっては、同
一キャラクタ群の中から異質なものを探せばよいので、
非常に容易に誤読キャラクタを発見できるという利点が
ある。Therefore, according to the invention set forth in claim 1, images are arranged as the same character group on the display device, so it is difficult for a person who corrects misreading while looking at the display device to search for different characters among the same character group. It's okay, so
This method has the advantage that misread characters can be detected very easily.

このように誤読文字の発見が容易で、修正が簡単である
ために、ＯＣＲに読み取られる帳票へのキャラクタの書
き込みにおいても、ＯＣＲの種類ごとに異なる特定のい
わゆるｒＯｃＲ文字」という堅苦しい書き方を強制しな
くてもよく、帳票記入者にとっても負担が軽くなるので
、ＯＣＲの利用度、応用度を大幅に増進することができ
る。Because misread characters are easy to find and correct, even when characters are written on forms read by OCR, the formal writing style of "rOcR characters," which are different for each type of OCR, is forced. There is no need to do this, and the burden on the person filling out the form is reduced, so the degree of use and application of OCR can be greatly improved.

請求項２に記載の発明によると、表示装置上で、カーソ
ル等によりイメージビットを指定し、正しいキャラクタ
をキーインして直ちに誤読修正を行なうことができるの
で、修正作業が非常に素早く、楽にできる。According to the second aspect of the present invention, the misreading can be corrected immediately by specifying the image bit with a cursor or the like on the display device and keying in the correct character, so that the correction work can be done very quickly and easily.

（実施例）以下図示の実施例について説明する。(Example) The illustrated embodiment will be described below.

第１図は本発明に係るＯＣＲの誤読修正方法を具体化し
たブロック線図である。FIG. 1 is a block diagram embodying an OCR misreading correcting method according to the present invention.

第１図において、ＯＣＲ　１により読み取られた複数の
キャラクタがＣＰＵ．ＲＡＭ、ＲＯＭを含むコンピュー
タの制御演算部２に入力され、対応する文字コードを当
てはめられてＲＡＭに記憶された後、さまざまなデータ
処理が行なわれるのは従来と同様である。In FIG. 1, a plurality of characters read by OCR 1 are read by CPU. As in the past, the data is inputted to the control/arithmetic unit 2 of the computer including RAM and ROM, assigned a corresponding character code and stored in the RAM, and then subjected to various data processing.

なお、ここではＯＣＲ１は演算部を含まない単なる入力
装置であり、制御演算部２が文字認識を行なうものとし
て説明するが、本発明の誤読修正方法を別のコンピュー
タで行なう場合にはＯＣＲ１は制御演算部や出力部等を
含むデータ処理装置となる。Note that the explanation here assumes that the OCR 1 is a mere input device that does not include a calculation section, and that the control calculation section 2 performs character recognition. This is a data processing device that includes a calculation section, an output section, etc.

さらに本実施例にかかる誤読修正方法によると、ＯＣＲ
　１において読み取られたイメージが、イメージビット
として対応する文字コードに当てはめられてメモリ３に
格納される。イメージはビットで落す、あるいはベクト
ル法で記憶する等の記憶方法があるが、ここではビット
で記憶するものとする。このイメージビットはデジタル
信号化されているが、ほぼＯＣＲ　１が読み取る帳票に
手書きされ、あるいはプリンタでプリントされ、印刷さ
れ、ゴム印で押されたままの形である。デジタル信号化
されたイメージビットは制御演算部２を通じてＣＲＴ４
に表示されるが、例えば１ミリ平方あたり１６ビットで
表示される。Furthermore, according to the misreading correction method according to this embodiment, OCR
The image read in step 1 is stored in memory 3 as image bits, which are assigned to corresponding character codes. There are ways to store images, such as storing them in bits or using the vector method, but here we will assume that they are stored in bits. This image bit is converted into a digital signal, but it is almost in the form of a handwritten document read by the OCR 1, or a form that is printed by a printer, printed, and stamped with a rubber stamp. The image bits converted into digital signals are transferred to the CRT4 through the control calculation unit 2.
For example, it is displayed with 16 bits per square millimeter.

第２図は本発明の誤読修正方法のアルゴリズムである。FIG. 2 is an algorithm of the misreading correcting method of the present invention.

ステップｌにおいて、制御演算部２はメモリ３からある
文字コードが当てはめられた特定キャラクタ郡を呼出し
、表示装置であるＣＲＴ４に列記表示する。すなわち、
誤読であると否とに拘らず、ある文字コードを当てはめ
たイメージビットごとにＣＲＴ４に表示して行く。In step 1, the control calculation section 2 calls out a specific character group to which a certain character code has been applied from the memory 3, and lists and displays them on the CRT 4, which is a display device. That is,
Regardless of whether it is a misread or not, each image bit to which a certain character code is applied is displayed on the CRT 4.

第３図はＣＲＴ４上に表示されたイメージビット群の一
例を示す。ＣＲＴ画面左端の角付文字Ｏ、１、２は特定
キャラクタ種であり、各キャラクタ種の右側に列記され
た手書き形のイメージビットは、誤読であるか否かに関
わりなく制御演算部２がそのキャラクタ種に該当すると
認識し、その文字コードを当てはめたイメージビット群
である。ここでは「１』のイメージビット群の中にｅで
示した誤読があることが容易に分かる。FIG. 3 shows an example of an image bit group displayed on the CRT4. The squared characters O, 1, and 2 at the left end of the CRT screen are specific character types, and the handwritten image bits listed on the right side of each character type are processed by the control calculation unit 2 regardless of whether they are misread or not. This is a group of image bits that are recognized as corresponding to a character type and have their character codes applied. Here, it is easy to see that there is a misreading indicated by e in the image bit group of "1".

なお第３図はｌ画面を表示した状態を示すものであるが
、列記表示の方法はオペレータがチェックしやすいよう
に選択できる。１画面分を同時に表示させることもでき
るが、例えば一定の時間間隔をもって１字ずつ表示させ
ていったり、１行ずつ表示させてその都度チェックする
ことにより、一層注意深く、容易に誤読を探すことがで
きる。Although FIG. 3 shows a state in which the L screen is displayed, the method of list display can be selected to make it easier for the operator to check. Although it is possible to display one screen at a time, for example, by displaying one character at a time at regular intervals, or by displaying one line at a time and checking each time, you can more carefully and easily look for misreadings. can.

ステップ２において、オペレータは誤読があるかどうか
目視で探し、あればカーソルで該誤読キャラクタを指示
し、正しいキャラクタをキーインする。第３図の例であ
れば、カーソルをｅの位置に移動させて「７」をキーイ
ンする。これで修正は完了する。In step 2, the operator visually searches for misreading, and if so, points to the misreading character with the cursor and keys in the correct character. In the example of FIG. 3, move the cursor to position e and key in "7". The modification is now complete.

さらに誤読を探して修正し、誤読がなくなれば、表示さ
れているキャラクタ群の修正は完了し、ステップ３にお
いて入力待ちとなる。例えばリターンを入力する。Further, if the misreading is found and corrected, the correction of the displayed character group is completed and the process waits for input in step 3. For example, enter return.

ステップ４で、さらに次のキャラクタ群があればステッ
プ１に戻って次のキャラクタ群を表示させ、なければす
べての修正作業が完了する。In step 4, if there is another character group, the process returns to step 1 to display the next character group; if not, all correction work is completed.

さらに、リジェクトキャラクタの修正をすることもでき
る。Furthermore, it is also possible to modify rejected characters.

第４図においては左端の特定キャラクタの一番上にｒＲ
Ｊの文字があり、この右に列記されているのがＯＣＲ　
１で判読できなかったりジェクトキャラクタのイメージ
ビットである。これらリジェクトキャラクタも、上記の
誤読キャラクタの修正と同様に、カーソルで指示してキ
ーインすることにより修正を加えることができる。この
例では左から順にｒ３Ｊ　　ｒ５Ｊ　　ｒ６Ｊ　　ｒ２
Ｊを入力する。In Figure 4, rR is displayed at the top of the leftmost specific character.
There is a letter J, and the one listed to the right is OCR.
1 is an image bit that cannot be read or is an object character. These rejected characters can also be modified by pointing them with a cursor and keying in, in the same way as the misread characters described above. In this example, from left to right: r3J r5J r6J r2
Enter J.

なお、第３、４図に示したようなイメージビットを第１
図のプリンタ５に出力し、さらに再チェックすることも
可能である。Note that the image bits shown in Figures 3 and 4 are
It is also possible to output it to the printer 5 shown in the figure and further check it again.

このように、上記実施例によればＯＣＲによる誤読やり
ジェクトを簡単に探しだして修正することができる。In this manner, according to the embodiment described above, it is possible to easily find and correct misreading errors or errors caused by OCR.

また、実践的には、特に数字に関しては記入シートデー
タ上、各種チェックを実行でき１１４．る場合には読み取ったキャラクタデータについてＣＰＵ
で計算チェックを行ない、正しい部分の文字型をＣＲＴ
上に表示せず、チェックにかかった部分のみを誤読の可
能性がある部分として表示するようにする。このように
すれば、ＣＲＴに表示される文字数が減少するので、修
正者の画面チェックの負担を軽減することができる。In addition, in practice, various checks can be performed on the input sheet data, especially regarding numbers.1 1 4. If the character data is read, the CPU
Check the calculation and check the correct character type on the CRT.
Instead of displaying it above, only the parts that have been checked are displayed as parts that may be misread. In this way, the number of characters displayed on the CRT is reduced, so the burden on the corrector in checking the screen can be reduced.

以上本発明の実施例について説明したが、本発明は上記
実施例に限定されるものではなく、本発明の要旨の範囲
内において適宜変形実施可能であることは言うまでもな
い。Although the embodiments of the present invention have been described above, it goes without saying that the present invention is not limited to the above embodiments, and can be modified as appropriate within the scope of the gist of the present invention.

[Brief explanation of the drawing]

第１図は本発明にかかるＯＣＲの誤読修正方法の一実施
例を具体化したブロック綿図、第２図は同じ実施例の各
ステップを示すアルゴリズム、第３図は同じ実施例にお
いて表示されたＣＲＴ画面を示す図、第４図はりジェク
トキャラクタの表示を含むＣＲＴ画面を示す図である。１２ｌ・　ＯＣＲ，２・制御演算部、３・メモリ、４・　ＣＲＴ，誤読キャラクタ。Fig. 1 is a block diagram embodying an embodiment of the method for correcting OCR misreading according to the present invention, Fig. 2 is an algorithm showing each step of the same embodiment, and Fig. 3 is a diagram of the same embodiment displayed. FIG. 4 is a diagram showing a CRT screen including a display of a beam projector character. 1 2 l ・OCR, 2 ・Control calculation unit, 3 ・Memory, 4 ・CRT, misread character.

Claims

[Claims]

(1) A plurality of characters read by OCR are stored as images, and are stored by applying them to corresponding character codes, and for each group of character codes of the same type, the images of the characters are listed in association with each other on a display device. A method for correcting OCR misreading, the method comprising the step of displaying.

(2) The OCR misreading correcting method according to claim 1, wherein the OCR misreading is corrected by specifying the image listed and displayed on the display device and keying in the correct character.