JP2000339408A

JP2000339408A - Character segment device

Info

Publication number: JP2000339408A
Application number: JP11146960A
Authority: JP
Inventors: Masato Minami; 正人南; Toshiyuki Koda; 敏行香田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-05-26
Filing date: 1999-05-26
Publication date: 2000-12-08

Abstract

PROBLEM TO BE SOLVED: To more precisely judge the number of characters in a character block where plural characters are contained by using a relation between the character and a frame line and a position relation with the characters before and behind in a character segment device. SOLUTION: A frame line detection part 2 extracts a frame line of a whole input picture and a character block separation part 3 separates it into respective character blocks. A character number judgment part 4 judges the number of characters contained in the character block by using not only peripheral information from a circumscribing rectangle to the character but also the frame line detected by the frame line detection part 2 and the position relation with the character blocks separated by the character block separation part 3. A segment part 6 segments the character in a unit of one character through a noise removal part 5.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ＯＣＲの手書き文字認
識装置等に利用される文字切り出し装置、及び文字切り
出し方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting device and a character extracting method used for an OCR handwritten character recognition device and the like.

【０００２】[0002]

【従来の技術】従来の文字切り出し装置における文字塊
の文字数判定方法として、特開平６−３３３０８９号公
報の「光学文字読取装置」がある（第１の従来の文字切
り出し装置とする）。この「光学文字読取装置」におい
ては、以下のようにして文字塊の文字数を判定してい
る。図２７はその主要構成を示している。画像入力部２
０１は文字が記入された帳票を光学的に走査して文字の
イメージデータをメモリ上に取り込む。投影部２０２は
メモリに取り込んだイメージデータから縦方向の投影デ
ータを作成する。判断部２０３はこの投影部２０２で作
成された投影データから続け字か通常の１文字かを判断
する（図２８( ａ) ）。削除部２０４はこの判断部２０
３で続け字と判断されたとき文字として不要な部分を削
除する（図２８( ｂ) ）。文字切り出し部２０５は判断
部２０３で通常の１文字と判断されたときの文字、およ
び削除部２０４で不要な部分を削除された文字を１文字
ずつ切り出し、文字認識部２０６で文字認識を行う。2. Description of the Related Art As a method for determining the number of characters in a character block in a conventional character cutout device, there is an "optical character reading device" disclosed in JP-A-6-33089 (hereinafter referred to as a first conventional character cutout device). In this “optical character reading device”, the number of characters in a character block is determined as follows. FIG. 27 shows the main configuration. Image input unit 2
Numeral 01 optically scans a form in which characters are written and takes in character image data into a memory. The projection unit 202 creates vertical projection data from the image data captured in the memory. The judging unit 203 judges from the projection data created by the projecting unit 202 whether it is a continuous character or a normal character (FIG. 28 (a)). The deletion unit 204 determines this judgment unit 20
When it is determined in step 3 that the character is a continuation character, an unnecessary portion as a character is deleted (FIG. 28B). The character cutout unit 205 cuts out one character at a time when the determination unit 203 determines that it is a normal single character, and a character from which an unnecessary part has been deleted by the deletion unit 204, and performs character recognition by the character recognition unit 206.

【０００３】また、従来の文字切り出し装置におけるノ
イズ除去方法として、特開平８−１６７１７号公報の
「郵便物区分装置」がある（第２の従来の文字切り出し
装置とする）。この「郵便物区分装置」においては、以
下のようにしてノイズを除去している。図２９はその主
要構成を示している。正規化部２１１により正規化さ
れ、ブロック化部２１２でブロック化された図形パター
ンの連結するブロックごとの位置と面積を位置面積算出
部２１３で算出し、この算出した面積から各ブロックに
対する面積比を面積比算出部２１４で算出し、各ブロッ
クごとの位置に対応するノイズ判定基準としての面積比
をノイズ判定テーブル２１５から読みだしてノイズ判定
部２１６で比較してノイズか否かを判定し、この判定さ
れたノイズをノイズ除去部２１７で正規化された図形パ
ターンから除去するようにしたものである。[0003] As a noise removing method in a conventional character extracting device, there is a "mail sorting device" disclosed in JP-A-8-16717 (hereinafter referred to as a second conventional character extracting device). In the "mail sorting apparatus", noise is removed as follows. FIG. 29 shows the main configuration. The position and area of each connected block of the graphic pattern normalized by the normalization unit 211 and blocked by the blocking unit 212 are calculated by the position area calculation unit 213, and the area ratio for each block is calculated from the calculated area. The area ratio calculation unit 214 calculates the area ratio as a noise determination criterion corresponding to the position of each block from the noise determination table 215, and compares the read area ratio with the noise determination unit 216 to determine whether the noise is present. The determined noise is removed from the graphic pattern normalized by the noise removing unit 217.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、第１の
従来の文字切り出し装置では、図３０のように複数箇所
で前後の文字が連結している場合では、投影の小さい箇
所が存在しないため、実際よりも文字数を少なく判定す
ると共に、図３１のように上部が開いた手書きの“４”
のような場合では、文字の途中で投影が極小になるた
め、実際よりも文字数を多く判定してしまうという第１
の課題があった。また、第２の従来の文字切り出し装置
では、図３２のような大きなノイズが除去できないとい
う第２の課題があった。However, in the first conventional character segmentation device, when the preceding and succeeding characters are connected at a plurality of positions as shown in FIG. The number of characters is determined to be smaller than that of the handwritten “4” with the top opened as shown in FIG.
In such a case, since the projection becomes minimal in the middle of the character, the number of characters is determined to be larger than the actual number.
There were challenges. Further, the second conventional character segmentation apparatus has a second problem that large noise as shown in FIG. 32 cannot be removed.

【０００５】[0005]

【課題を解決するための手段】本発明は、上記第１の課
題を解決するために、垂直枠線と文字との位置関係に基
づいた第１の仮文字数を求め、第１の仮文字数をベース
にして最終的な文字塊の文字数を求めることにより、複
数枠内に含まれる複数点で連結する２個以上の文字、あ
るいは上部が開いた手書きの“４”に対しても、投影や
文字高さ情報のみの文字数判定により文字数を誤った場
合でも、記入枠と文字とのバランスも考慮して文字数を
直すことが可能になり、より正確な文字数判定を可能に
したものである。According to the present invention, in order to solve the above first problem, a first number of temporary characters is obtained based on a positional relationship between a vertical frame and a character, and the first number of temporary characters is calculated. By calculating the number of characters in the final character block based on the base, it is possible to project and characterize two or more characters connected at multiple points included in multiple frames or handwritten “4” with an open top. Even if the number of characters is incorrect due to the determination of the number of characters of only the height information, the number of characters can be corrected in consideration of the balance between the entry frame and the characters, and more accurate number of characters can be determined.

【０００６】本発明はまた、上記第２の課題を解決する
ために、活字列の中に手書きが含まれない（すなわち手
書き判定文字はノイズである）ことを前提に、各々の文
字塊に対する活字・手書き判定結果を利用して、対象の
文字塊の周辺に存在する文字塊が活字判定の場合に、手
書き判定された対象の文字塊をノイズと判定して除去す
ることにより、小さなノイズだけを除去するのみなら
ず、活字列中に含まれる大きなノイズの除去をも可能に
したものである。In order to solve the second problem, the present invention is based on the premise that a handwritten character string is not included in a character string (that is, a handwritten determination character is noise). Using the result of handwriting determination, when a character block existing around the target character block is a type determination, the target character block determined by handwriting is determined to be noise and removed to remove only small noise. In addition to the removal, it is also possible to remove large noise contained in the character string.

【０００７】[0007]

【発明の実施の形態】本発明の請求項１に記載の発明
は、水平方向に連続する非分離枠内に文字群が記入され
た伝票を画像化して入力する場合において、入力画像を
二値化する二値化部と、前記二値化部で二値化された画
像に含まれる全ての垂直枠線の座標を検出し、順番に番
号を付加して、各々の枠線の座標と番号を記憶する枠線
検出部と、前記二値化部で二値化された画像を文字塊毎
に分離する文字塊分離部と、前記文字塊分離部で分離さ
れた各々の文字塊に含まれる文字数を判定する文字数判
定部と、前記文字塊分離部で分離された文字塊群からノ
イズを除去するノイズ除去部と、ノイズ除去部で除去さ
れずに残った各々の文字塊群に対して、前記文字数判定
部で判定された文字数だけ文字の切り出しを実行する切
り出し部とを備え、前記文字数判定部が、前記枠線検出
部で検出した枠線の座標と、前記文字塊分離部で分離さ
れた文字塊の座標から、文字塊中に含まれる枠線の番号
と数を算出する含枠数算出部と、前記含枠数算出部で算
出された枠線番号に対応する枠線群の中で、前記文字塊
分離部で分離された文字塊左端に最も近い枠線と前記文
字塊分離部で分離された文字塊左端との距離を求める左
枠線距離算出部と、前記含枠数算出部で算出された枠線
番号に対応する枠線群の中で、前記文字塊分離部で分離
された文字塊右端に最も近い枠線と前記文字塊分離部で
分離された文字塊右端との距離を求める右枠線距離算出
部と、含枠数算出部で求めた文字塊中に含まれる枠線の
数と、前記左枠線検出部で検出された枠線と文字塊左端
との距離と、前記右枠線検出部で検出された枠線と文字
塊右端との距離から文字塊の第１の仮文字数を判定する
枠線文字数判定部と、前記文字塊分離部で分離された文
字塊に外接する矩形上端から、垂直ライン毎に文字塊ま
での距離を求める上端距離算出部と、前記文字塊分離部
で分離された文字塊に外接する矩形下端から、垂直ライ
ン毎に文字塊までの距離を求める下端距離算出部と、前
記上端距離算出部で算出された垂直ライン毎の矩形上端
から文字塊までの距離と、前記下端距離算出部で算出さ
れた垂直ライン毎の矩形下端から文字塊までの距離から
文字塊の第２の仮文字数を求める上下距離文字数判定部
と、前記枠線文字数判定部で判定した第１の仮文字数
と、前記上下距離文字数判定部で判定した第２の仮文字
数を用いて、文字塊の文字数を判定する第１の比較文字
数判定部とを備え、前記第１の比較文字数判定部で判定
した文字数を前記文字数判定部の出力とする文字切り出
し装置であり、基本的に、各々の文字と各々の記入枠は
１対１に対応しているという前提の下に、枠線文字数判
定部で文字塊と枠線との位置関係に基づいて第１の仮文
字数を求め、上下距離文字数判定部で文字塊の外接矩形
の上端と下端からの背景距離によって第２の仮文字数を
求め、第１の比較文字数判定部で第２の仮文字数で第１
の仮文字数を修正して得られる文字数を最終的な文字数
とすることにより、前後の文字が連結する、あるいは上
部が開いた手書きの“４”のような場合に発生する、投
影や文字高さ情報のみの文字数判定手法による文字数誤
りを、記入枠と文字との存在位置のバランスを考慮して
直すことにより、より正確な文字数判定を可能にすると
いう作用を有する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The invention according to claim 1 of the present invention is directed to a case where a slip in which a character group is written in a horizontally continuous non-separable frame is input as a binary image. The binarization unit to be converted, and detects the coordinates of all vertical frame lines included in the image binarized by the binarization unit, adds numbers in order, and sets the coordinates and numbers of each frame line. , A character block separating unit that separates the image binarized by the binarizing unit into character blocks, and a character block separated by the character block separating unit. A character number determination unit that determines the number of characters, a noise removal unit that removes noise from the character lump group separated by the character lump separation unit, and a character lump group that has not been removed by the noise removal unit. A cutout unit that cuts out characters by the number of characters determined by the character number determination unit, The number-of-characters determination unit calculates the number and the number of the frame line included in the character block from the coordinates of the frame line detected by the frame line detection unit and the coordinates of the character block separated by the character block separation unit. The frame count calculation unit, and a frame line closest to the left end of the character chunk separated by the character chunk separation unit and the character in the frame group corresponding to the frame number calculated by the frame count calculation unit. A left frame line distance calculation unit for calculating a distance to the left end of the character block separated by the block separation unit; and a frame line group corresponding to the frame line number calculated by the frame content number calculation unit. A right frame line distance calculating unit that calculates the distance between the frame line closest to the right end of the character block separated by the unit and the right end of the character block separated by the character block separating unit; , The distance between the frame line detected by the left frame line detection unit and the left end of the character block, and the number of frame lines detected by the right frame line detection unit. A frame line character number judging unit for judging the first provisional number of characters of the character chunk from the distance between the frame line and the right end of the character chunk, and a vertical line from the top of the rectangle circumscribing the character chunk separated by the character chunk separating unit. An upper end distance calculation unit that calculates the distance to the character chunk for each, and a lower end distance calculation unit that calculates the distance to the character chunk for each vertical line from the lower end of the rectangle circumscribing the character chunk separated by the character chunk separation unit, From the distance from the upper end of the rectangle to the character block for each vertical line calculated by the upper end distance calculator and the distance from the lower end of the rectangle to the character block for each vertical line calculated by the lower end distance calculator, a second character block is obtained. The number of characters in a character block using the vertical character number determining unit for determining the number of temporary characters, the first temporary character number determined by the frame line character number determining unit, and the second temporary character number determined by the vertical distance character number determining unit First comparison character count And a character cutout device that includes the number of characters determined by the first comparison character number determination unit as an output of the character number determination unit. Basically, each character and each entry frame are in a one-to-one correspondence. Under the premise that they are supported, the frame line character number determination unit obtains the first provisional character number based on the positional relationship between the character block and the frame line, and the upper and lower distance character number determination unit determines The second provisional character number is obtained from the background distance from the lower end, and the first comparison character number determination unit determines the first provisional character number using the second provisional character number.
By adjusting the number of characters obtained by correcting the provisional number of characters to the final number of characters, the projection and character heights that occur when the preceding and following characters are connected or in the case of handwritten "4" with an open top By correcting the character number error due to the character number determination method using only information in consideration of the balance between the existing positions of the entry frame and the character, it is possible to more accurately determine the character number.

【０００８】本発明の請求項２に記載の発明は、前記文
字数判定部が、前記文字塊分離部で分離した各々の文字
塊に関して、対象の文字塊と前後の文字塊の位置関係を
用いて、前記枠線文字数判定部で求められた第１の仮文
字数を修正して第３の仮文字数を求める前後塊文字数判
定部と、前記第１の比較文字数判定部に代えて、前記上
下距離文字数判定部で求めた第２の仮文字数と、前記前
後塊文字数判定部で求めた第３の仮文字数を用いて、文
字塊の文字数を判定する第２の比較文字数判定部とを備
え、前記第２の比較文字数判定部で判定した文字数を前
記文字数判定部の出力とする請求項１記載の文字切り出
し装置であり、本来は１個の枠内には１文字のみ存在す
るという前提に基づいて、前後塊文字数判定部で文字塊
左端が存在する枠内に他の文字塊( 直前の文字塊とす
る) 、または文字塊右端が存在する枠内に他の文字塊(
直後の文字塊とする) が含まれている場合に、対象の文
字塊または直前・直後の文字塊のどちらが枠内に含まれ
るべき文字であるかを判断して、その結果に基づいて第
２の比較文字数判定部で文字塊の文字数を修正すること
により、周辺の文字塊との文字数バランスを考慮した文
字数判定が可能になり、その結果、文字数判定の精度が
向上するという作用を有する。In the invention according to a second aspect of the present invention, the character number judging section uses the positional relationship between the target character block and the preceding and following character blocks for each character block separated by the character block separating section. The number of characters before and after the mass, which corrects the first provisional character number obtained by the frame line character number determination unit to obtain the third provisional character number, and replaces the first comparative character number determination unit with the vertical character number A second comparison character number determination unit that determines the number of characters in a character block by using the second provisional character number determined by the determination unit and the third provisional character number determined by the front and rear lump character number determination unit; 2. The character cutout device according to claim 1, wherein the number of characters determined by the comparison character number determination unit is output from the character number determination unit, based on the assumption that there is originally only one character in one frame. The frame where the left edge of the character block exists in the front and rear block character number determination unit Other (and previous character lump) character lumps, or else in a frame character lumps rightmost exists character lumps (
Is determined, it is determined whether the target character block or the character block immediately before or immediately after is the character to be included in the frame, and the second character block is determined based on the result. Correcting the number of characters in the character chunk by the comparison character number judging unit makes it possible to judge the number of characters in consideration of the balance of the number of characters with the surrounding character chunks. As a result, the accuracy of the character number judgment is improved.

【０００９】本発明の請求項３に記載の発明は、入力画
像を二値化する二値化部と、前記二値化部で二値化され
た画像を文字塊毎に分離する文字塊分離部と、前記文字
塊分離部で分離された各々の文字塊に含まれる文字数を
判定する文字数判定部と、前記文字塊分離部で分離され
た文字塊群において、ノイズと判定された文字塊を除去
するノイズ除去部と、ノイズ除去部で除去されずに残っ
た各々の文字塊群に対して、前記文字数判定部で判定さ
れた文字数だけ文字の切り出しを実行する切り出し部と
を備え、前記ノイズ除去部が、前記文字塊分離部で分離
した各々の文字塊の中で、小さな文字塊をノイズと判定
して除去する小塊ノイズ除去部と、前記文字塊分離部で
分離した各々の文字塊の活字・手書き判定を行う字種判
定部と、前記字種判定部において対象の文字塊が手書き
判定され、かつ前記字種判定部において対象文字塊の前
後の文字塊が活字判定された場合に、対象文字塊を除去
する字種ノイズ除去部とを備えた文字切り出し装置であ
り、活字文字列の中に手書きの文字は混入しないことを
前提に、字種判定部で各々の文字塊について活字・手書
き判定を行うことにより、対象の文字塊が手書き判定
で、対象の前後の文字塊が活字判定の場合には、対象の
文字塊をノイズと判定することにより、通常の小ノイズ
除去手法で除去できない大きなノイズでも、活字列の中
に含まれていれば除去できるという作用を有する。According to a third aspect of the present invention, there is provided a binarizing section for binarizing an input image, and a character chunk separation for separating the image binarized by the binarizing section into character chunks. Part, a character number determining unit that determines the number of characters included in each character block separated by the character block separating unit, and a character block determined to be noise in the character block group separated by the character block separating unit. A noise removing unit that removes, and a cutout unit that cuts out characters for each of the character lump groups remaining without being removed by the noise removing unit by the number of characters determined by the character number determining unit, wherein the noise A removing unit configured to determine a small character block as noise in each of the character blocks separated by the character block separating unit, and to remove each of the character blocks separated by the character block separating unit; A character type determining unit for determining the type and handwriting of the character, and the character type A character type noise removing unit that removes the target character block when the target character block is handwritten in the fixed unit and the character block before and after the target character block is printed in the character type determination unit. It is a character cutout device, and based on the assumption that handwritten characters are not mixed in the printed character string, the character type determination unit performs type and handwriting determination for each character block, so that the target character block can be determined by handwriting. If the character chunk before and after the target is type determination, the target character chunk is determined to be noise, so that even large noise that cannot be removed by a normal small noise removal method is included in the type string. It has the effect that it can be removed.

【００１０】本発明の請求項４に記載の発明は、複数の
文字が水平方向に続けて記入されている伝票を画像化し
て入力する場合において、前記字種判定部が、対象の文
字塊が前後の文字塊と高さが近似する場合に、対象の文
字塊を活字と判定する高さ判定部と、対象の文字塊の上
端の垂直座標が前後の文字塊の上端の垂直座標と近似す
る、あるいは対象の文字塊の下端の垂直座標が前後の文
字塊の下端の垂直座標と近似する場合に、対象の文字塊
を活字と判定する座標判定部と、前記高さ判定部、もし
くは座標判定部で活字と判定されない文字塊を手書きと
判定する手書き判定部とを備えた請求項３記載の文字切
り出し装置であり、前後の活字判定文字塊とは高さが異
なるために、高さ判定だけでは活字判定できない掠れ活
字に対しても、座標判定部で前後の活字判定文字塊と上
端もしくは下端の垂直座標が近似するだけで活字と判定
できることにより、高さ判定だけの場合より、多くの活
字文字を活字と判定できるという作用を有する。According to a fourth aspect of the present invention, in the case where a slip in which a plurality of characters are continuously written in a horizontal direction is imaged and input, the character type determination unit determines whether the target character block is A height determination unit that determines the target character block as a print when the height is similar to the preceding and following character blocks, and the vertical coordinate of the upper end of the target character block approximates the vertical coordinate of the upper end of the preceding and following character blocks. Or, when the vertical coordinates of the lower end of the target character block are similar to the vertical coordinates of the lower ends of the preceding and following character blocks, a coordinate determining unit that determines the target character block as a printed type, and the height determining unit or the coordinate determining unit. 4. The character segmentation device according to claim 3, further comprising: a handwriting determination unit that determines a character block that is not determined to be a handwritten by the unit as handwriting. Even if the type is not clear, By it can be determined that the print in only the vertical coordinates of the print determination character lumps and upper or lower ends of the front and rear is approximated by a determination unit, an effect that than just the height determination, can determine the number of printed characters print with.

【００１１】本発明の請求項５に記載の発明は、複数の
文字が水平方向に続けて記入され、文字記入領域がフィ
ールド単位に分離している伝票を画像化して入力する場
合において、前記字種判定部が、同一フィールド内の各
々の文字塊に対して、同一フィールド内の他の文字塊と
高さあるいは垂直座標が近似する文字塊を活字と判定
し、それ以外を全て手書きと判定するフィールド判定部
とを備えた請求項４記載の文字切り出し装置であり、基
本的に同一フィールド内での全ての文字は、規則的に並
ぶ同一フォントの活字、もしくは手書きで構成されるた
め、フィールド全体の文字塊の高さ・座標情報を用いて
活字・手書き判定することにより、可能な限り多くの情
報を用いて手書き・活字判定を行うことができるため、
手書き・活字判定の判定精度が高くなり、活字判定文字
塊の中に含まれる手書き判定された文字塊、すなわちノ
イズがより多く発見・除去できるという作用を有する。According to a fifth aspect of the present invention, in the case where a slip in which a plurality of characters are sequentially written in a horizontal direction and a character entry area is separated in units of fields is imaged and input, the characters are input. The type determination unit determines, for each character chunk in the same field, a character chunk whose height or vertical coordinate is similar to other character chunks in the same field as a print type, and determines all others as handwriting. 5. The character cutout device according to claim 4, further comprising a field determination unit, wherein basically all characters in the same field are composed of regularly arranged typefaces of the same font or handwritten. By determining the type and handwriting using the height / coordinate information of the character chunk, the handwriting / type determination can be performed using as much information as possible.
The accuracy of handwritten / printed character determination is increased, and the character block that has been determined by handwriting, that is, more noise, contained in the typed character block can be found and removed.

【００１２】本発明の請求項６に記載の発明は、非分離
枠内に文字群が記入された伝票を画像化して入力する場
合において、入力画像を二値化して各々の文字塊毎に分
離し、二値画像に含まれる全ての垂直枠線の座標を検出
して番号付けを行い、各々の枠線の座標と番号を記憶
し、各々の文字塊に対し、検出された枠線の座標と、分
離された文字塊の座標から、文字塊中に含まれる枠線の
番号と数を算出し、算出された枠線番号の中から、文字
塊左端に最も近い枠線の番号と、文字塊右端に最も近い
枠線の番号を求め、文字塊中に含まれる枠線の数と、文
字塊の左端座標と文字塊中に含まれる枠線の中で文字塊
左端に最も近い枠線との距離と、文字塊の右端座標と文
字塊中に含まれる枠線の中で文字塊右端に最も近い枠線
との距離から文字塊の第１の仮文字数を判定し、各々の
文字塊に外接する矩形について、垂直ライン毎に矩形上
端から対応する文字塊までの距離と、矩形下端端から対
応する文字塊までの距離を求め、これらの距離の大きさ
と変化から文字塊の第２の仮文字数を判定し、第１の仮
文字数と第２の仮文字数から文字塊の実際の文字数を判
定し、文字塊群からノイズ判定された文字塊を除去し、
除去されずに残った各々の文字塊群について、判定され
た文字数分だけ文字の切り出しを実行する文字切り出し
方法であり、基本的に、各々の文字と各々の記入枠は１
対１に対応しているという前提の下、文字塊と枠線との
位置関係に基づいて第１の仮文字数を求め、文字塊の外
接矩形の上端と下端からの背景距離によって第２の仮文
字数を求め、第２の仮文字数で第１の仮文字数を修正し
て得られる文字数を最終的な文字数とすることにより、
前後の文字が連結する、あるいは上部が開いた手書きの
“４”のような場合に発生する、投影や文字高さ情報の
みの文字数判定手法による文字数誤りを、記入枠と文字
との存在位置のバランスを考慮して直すことにより、よ
り正確な文字数判定を可能にするという作用を有する。According to a sixth aspect of the present invention, in the case where a slip in which a character group is entered in a non-separable frame is input as an image, the input image is binarized and separated for each character block. Then, the coordinates of all vertical frame lines included in the binary image are detected and numbered, the coordinates and numbers of each frame line are stored, and the coordinates of the detected frame lines are stored for each character block. And, from the coordinates of the separated character block, the number and number of the frame lines included in the character block are calculated, and from the calculated frame line numbers, the number of the frame line closest to the left end of the character block and the character The number of the frame line closest to the right end of the chunk is determined, the number of the frame lines included in the character block, the left end coordinates of the character block, and the frame line closest to the left end of the character block among the frame lines included in the character block. Of the character block and the distance between the coordinates of the right end of the character block and the frame line closest to the right end of the character block among the frame lines included in the character block. The first number of provisional characters is determined, and for each rectangle circumscribing each character block, the distance from the upper end of the rectangle to the corresponding character block and the distance from the lower end of the rectangle to the corresponding character block are determined for each vertical line. The second provisional character number of the character block is determined from the magnitude and change of the distance of the character block, the actual number of characters of the character block is determined from the first provisional character number and the second provisional character number, and the character determined as noise from the character block group Remove lumps,
This is a character extraction method for executing character extraction for each character chunk group remaining without being removed by the determined number of characters. Basically, each character and each entry frame are 1
Under the premise that the character block corresponds to one, the first temporary number of characters is obtained based on the positional relationship between the character block and the frame line, and the second temporary character number is determined based on the background distance from the upper and lower ends of the circumscribed rectangle of the character block. By calculating the number of characters and correcting the first number of temporary characters with the second number of temporary characters to obtain the final number of characters,
In the case where the preceding and following characters are connected or the upper part is open, such as handwritten "4", the character number error by the character number determination method based on only the projection and character height information is calculated based on the position of the entry frame and the character. By performing the correction in consideration of the balance, there is an effect that the number of characters can be more accurately determined.

【００１３】本発明の請求項７に記載の発明は、非分離
枠内に文字群が記入された伝票を画像化して入力する場
合において、入力画像を二値化して各々の文字塊毎に分
離し、入力画像に含まれる全ての垂直枠線の座標を検出
して番号付けを行い、各々の枠線の座標と番号を記憶
し、各々の文字塊に対し、検出された枠線の座標と、分
離された文字塊の座標から、文字塊中に含まれる枠線の
番号と数を算出し、算出された枠線番号の中から、文字
塊左端に最も近い枠線の番号と、文字塊右端に最も近い
枠線の番号を求め、文字塊中に含まれる枠線の数と、文
字塊の左端座標と文字塊中に含まれる枠線の中で文字塊
左端に最も近い枠線との距離と、文字塊の右端座標と文
字塊中に含まれる枠線の中で文字塊右端に最も近い枠線
との距離から文字塊の第１の仮文字数を判定し、各々の
文字塊に外接する矩形について、垂直ライン毎に矩形上
端から対応する文字塊までの距離と、矩形下端端から対
応する文字塊までの距離を求め、これらの距離の大きさ
と変化から文字塊の第２の仮文字数を判定し、対象の文
字塊と前後の文字塊の位置関係から第３の仮文字数を判
定し、第１の仮文字数、第２の仮文字数、および第３の
仮文字数から文字塊の実際の文字数を判定し、文字塊群
からノイズ判定された文字塊を除去し、除去されずに残
った各々の文字塊群について、判定された文字数分だけ
文字の切り出しを実行する文字切り出し方法であり、本
来は１個の枠内には１文字のみ存在するという前提に基
づいて、文字塊左端が存在する枠内に他の文字塊( 直前
の文字塊とする) 、または文字塊右端が存在する枠内に
他の文字塊( 直後の文字塊とする) が含まれている場合
に、対象の文字塊または直前・直後の文字塊のどちらが
枠内に含まれるべき文字であるかを判断して、その結果
に基づいて文字塊の文字数を修正することにより、周辺
の文字塊との文字数バランスを考慮した文字数判定が可
能になり、その結果文字数判定の精度が向上するという
作用を有する。According to a seventh aspect of the present invention, when a voucher in which a character group is entered in a non-separable frame is input as an image, the input image is binarized and separated for each character block. Then, the coordinates of all the vertical frame lines included in the input image are detected and numbered, the coordinates and numbers of each frame line are stored, and for each character block, the coordinates of the detected frame lines and From the coordinates of the separated character chunk, calculate the number and number of frame lines included in the character chunk, and from the calculated frame line numbers, select the number of the frame line closest to the left end of the character chunk and the character chunk The number of the frame line closest to the right end is obtained, and the number of the frame lines included in the character block, the coordinates of the left end of the character block and the frame line closest to the left end of the character block among the frame lines included in the character block are determined. From the distance and the distance between the right end coordinate of the character block and the frame line closest to the right end of the character block among the frame lines included in the character block, the character block The first number of provisional characters is determined, and for each rectangle circumscribing each character block, the distance from the upper end of the rectangle to the corresponding character block and the distance from the lower end of the rectangle to the corresponding character block are determined for each vertical line. The second provisional character number of the character block is determined from the magnitude and change of the distance of the distance, the third provisional character number is determined from the positional relationship between the target character block and the preceding and following character blocks, and the first provisional character number, the second provisional character number, The actual number of characters of the character block was determined from the number of provisional characters and the third provisional character number, and the character blocks determined as noise were removed from the character block group. For each character block group remaining without being removed, the determination was made. This is a character cutout method that cuts out characters by the number of characters. Based on the premise that there is originally only one character in one frame, another character block (just before Or the right end of the character block If another character block (the following character block) is included in the existing frame, determine whether the target character block or the character block immediately before or after is the character to be included in the frame. Then, by correcting the number of characters in the character block based on the result, the number of characters can be determined in consideration of the balance of the number of characters with the surrounding character blocks, and as a result, the accuracy of the character number determination is improved.

【００１４】本発明の請求項８に記載の発明は、入力画
像を二値化して各々の文字塊毎に分離し、二値画像に含
まれる全ての垂直枠線の座標を検出して番号付けを行
い、各々の枠線の座標と番号を記憶し、各々の文字塊に
対して文字数を判定した後、各々の文字塊の活字・手書
き判定を行い、対象の文字塊の判定結果が手書き、かつ
対象文字塊の周辺の文字塊の判定結果が活字の場合に、
対象文字塊をノイズと判定して除去し、さらに文字塊群
の中で小さな文字塊をノイズと判定して除去し、除去さ
れずに残った各々の文字塊群について、判定された文字
数分だけ文字の切り出しを実行する文字切り出し方法で
あり、活字文字列の中に手書きの文字は混入しないこと
を前提に、各々の文字塊について活字・手書き判定を行
うことにより、対象の文字塊が手書き判定で、対象の前
後の文字塊が活字判定の場合には、対象の文字塊をノイ
ズと判定することにより、通常の小ノイズ除去手法で除
去できない大きなノイズでも、活字列の中に含まれてい
れば除去できるという作用を有する。According to an eighth aspect of the present invention, an input image is binarized and separated into individual character blocks, and the coordinates of all vertical frame lines included in the binary image are detected and numbered. After storing the coordinates and the number of each frame line, and determining the number of characters for each character block, the type and handwriting of each character block are determined, and the determination result of the target character block is handwritten, And, when the judgment result of the character block around the target character block is a print type,
The target character block is determined to be noise and removed, and the small character block in the character block group is determined to be noise and removed. For each of the character blocks remaining without being removed, only the determined number of characters This is a character segmentation method that executes character segmentation.Assuming that handwritten characters are not mixed in a typescript string, the type and handwriting judgment is performed for each character chunk to determine whether the target character chunk is handwritten. In the case where the character chunk before and after the target is a type determination, the target character chunk is determined to be noise, so that even large noise that cannot be removed by a normal small noise removal method is included in the type string. It has the effect of being able to be removed.

【００１５】本発明の請求項９に記載の発明は、請求項
６から８のいずれかに記載の文字切り出し方法をソフト
ウエアにより実現したプログラムを記録した記録媒体で
あり、本発明の文字切り出し方法を他のシステムにも簡
単に導入できるようにすることができ、認識部の機能を
持つシステム、あるいは記録媒体と組み合わせることに
よって、入力パターンを取得するだけで、そのパターン
に含まれる文字の読取も可能になるという作用を有し、
また、認識性能向上の結果としてユーザ側の確認・修正
の負担が少なくなるというという作用を有する。According to a ninth aspect of the present invention, there is provided a recording medium storing a program that realizes the character extracting method according to any one of the sixth to eighth aspects by software. Can be easily introduced into other systems, and by combining it with a system that has the function of a recognition unit or a recording medium, it is possible to simply read an input pattern and read characters contained in that pattern. Has the effect of becoming possible,
In addition, there is an effect that the burden of confirmation and correction on the user side is reduced as a result of the improvement in recognition performance.

【００１６】（実施の形態１）以下、本発明の第１の実
施の形態を図面に基づいて説明する。図１は本発明の第
１の実施の形態における文字切り出し装置のブロック図
を示し、図２４はその動作手順を示すものである。図１
において、１は二値化部であり、入力画像が多値で入力
された場合には二値化を行う（図２４の１０１）。２は
枠線検出部であり、二値化部２で二値化された画像内に
含まれる垂直枠線の座標と番号を記憶する（図２４の１
０２）。３は文字塊分離部であり、二値化部２で二値化
された画像から、縦方向・横方向の投影から文字塊毎に
分離する（図２４の１０３）。４は文字数判定部であ
り、文字塊分離部３で分離された各々の文字塊に対して
含まれる文字数を判定する。５はノイズ除去部であり、
文字塊分離部３で分離された各々の文字塊に対してノイ
ズ判定を行い、ノイズと判定された文字塊を除去する
（図２４の１１２）。６は切り出し部であり、ノイズ除
去部５で除去されずに残った各々の文字塊に対して、文
字数判定部４で判定された文字数分だけ、文字イメージ
の切り出しを行う（図２４の１１３）。(Embodiment 1) Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram of a character segmenting apparatus according to the first embodiment of the present invention, and FIG. 24 shows an operation procedure thereof. FIG.
In FIG. 24, reference numeral 1 denotes a binarization unit, which performs binarization when an input image is input in multiple values (101 in FIG. 24). Reference numeral 2 denotes a frame detection unit which stores coordinates and numbers of vertical frame lines included in the image binarized by the binarization unit 2 (1 in FIG. 24).
02). Numeral 3 denotes a character chunk separation unit, which separates each character chunk from the image binarized by the binarization unit 2 from vertical and horizontal projections (103 in FIG. 24). Reference numeral 4 denotes a character number determination unit that determines the number of characters included in each character block separated by the character block separation unit 3. 5 is a noise removing unit,
Noise determination is performed on each character chunk separated by the character chunk separation unit 3, and the character chunk determined to be noise is removed (112 in FIG. 24). Reference numeral 6 denotes a cutout unit, which cuts out a character image from each character block remaining without being removed by the noise removal unit 5 by the number of characters determined by the character count determination unit 4 (113 in FIG. 24). .

【００１７】図２は本実施の形態における文字数判定部
４のブロック図を示すものである。１１は含枠数算出部
であり、枠線検出部１で検出された枠線の座標と番号を
用いて、文字塊分離部３で分離された各々の文字塊に対
し、文字塊中に含まれる枠線の番号と数を算出する（図
２４の１０４）。１２は左枠線距離算出部であり、文字
塊分離部３で分離された文字塊左端と、含枠数算出部１
１で算出された枠線番号に対応する枠線群の中で、その
文字塊左端に最も近い枠線との距離を求める（図２４の
１０５）。１３は右枠線距離算出部であり、文字塊分離
部３で分離された文字塊右端と、含枠数算出部１１で算
出された枠線番号に対応する枠線群の中で、その文字塊
右端に最も近い枠線との距離を求める（図２４の１０
６）。１４は枠線文字数判定部であり、含枠数算出部１
１で求めた文字塊中に含まれる枠線の数に１を足した数
をベースに、左端枠線距離算出部１２で求めた距離が短
い場合は１を減じ、右端枠線距離算出部１３で求めた距
離が短い場合はさらに１を減じることにより、第１の仮
文字数を求める（図２４の１０７）。１５は上端距離算
出部であり、文字塊分離部３で分離された文字塊に外接
する矩形の上端から、垂直ライン毎に文字塊に到達する
までの距離を求める（図２４の１０８）。１６は下端距
離算出部であり、文字塊分離部３で分離された文字塊に
外接する矩形の下端から、垂直ライン毎に文字塊に到達
するまでの距離を求める（図２４の１０９）。１７は上
下距離文字数判定部であり、上端距離算出部１５で算出
された垂直ライン毎の矩形上端から文字塊までの距離
と、下端距離算出部１６で算出された垂直ライン毎の矩
形下端から文字塊までの距離を用いて、距離の大きさや
変化から第２の仮文字数を求める（図２４の１１０）。
１８は第１の比較文字数判定部であり、枠線文字数判定
部１４で求めた第１の仮文字数と、上下端距離文字数判
定部１７で求めた第２の仮文字数から、文字塊の文字数
を判定する（図２４の１１１）。FIG. 2 shows a block diagram of the number-of-characters judging section 4 in the present embodiment. Reference numeral 11 denotes a frame content calculating unit which uses the coordinates and numbers of the frame lines detected by the frame line detecting unit 1 to include each character block separated by the character block separating unit 3 in the character block. The number and the number of the frame line to be calculated are calculated (104 in FIG. 24). Reference numeral 12 denotes a left frame line distance calculation unit, which includes the left end of the character block separated by the character block separation unit 3 and the frame number calculation unit 1
In the frame line group corresponding to the frame line number calculated in step 1, the distance to the frame line closest to the left end of the character block is obtained (105 in FIG. 24). Reference numeral 13 denotes a right frame line distance calculation unit, which is included in the right edge of the character block separated by the character block separation unit 3 and the frame line group corresponding to the frame line number calculated by the frame number calculation unit 11. The distance to the frame line closest to the right end of the block is obtained (10 in FIG. 24).
6). Reference numeral 14 denotes a frame line character number determination unit, and the frame number calculation unit 1
If the distance obtained by the left end frame line distance calculation unit 12 is short, 1 is subtracted from the number obtained by adding 1 to the number of frame lines included in the character block obtained in 1, and the right end frame line distance calculation unit 13 If the distance obtained in is short, the first provisional character number is obtained by further reducing 1 (107 in FIG. 24). Reference numeral 15 denotes an upper end distance calculation unit, which calculates the distance from the upper end of the rectangle circumscribing the character chunk separated by the character chunk separation unit 3 to the character chunk for each vertical line (108 in FIG. 24). Numeral 16 denotes a lower end distance calculating unit, which obtains a distance from the lower end of the rectangle circumscribing the character block separated by the character block separating unit 3 to the character block for each vertical line (109 in FIG. 24). Reference numeral 17 denotes an upper / lower distance character number determination unit which calculates the distance from the upper end of the rectangle for each vertical line calculated by the upper end distance calculation unit 15 to the character block and the character from the lower end of the rectangle for each vertical line calculated by the lower end distance calculation unit 16 Using the distance to the chunk, a second number of provisional characters is obtained from the magnitude and change of the distance (110 in FIG. 24).
Reference numeral 18 denotes a first comparison character number determination unit which calculates the number of characters in a character block from the first provisional character number obtained by the frame line character number determination unit 14 and the second provisional character number obtained by the upper and lower end distance character number determination unit 17. It is determined (111 in FIG. 24).

【００１８】以上のように構成された文字切り出し装置
について、以下その動作を図３から図１５を用いて説明
する。入力画像の例として、図３に示すような水平方向
に連続する非ドロップアウトカラーの枠内に文字が記入
された伝票を用いる。なお、以下の説明では、水平方向
をｘ、垂直方向をｙとする。The operation of the character segmenting apparatus configured as described above will be described below with reference to FIGS. As an example of the input image, a slip in which characters are entered in a horizontally non-dropout color frame as shown in FIG. 3 is used. In the following description, the horizontal direction is x and the vertical direction is y.

【００１９】最初に枠線検出部１において、垂直枠線を
検出した後、左から順番に各々の垂直枠線に番号を付加
し、ｘ座標も同時に記憶する（図４）。次に、二値化部
２において、入力画像を二値化した後、文字塊分離部３
において、全体のｘ方向・ｙ方向の投影を求め、投影の
存在する範囲を文字塊とする（図５）。なお、文字塊の
決定方法としては、投影ではなく、黒画素をラベリング
して、各々のラベルを文字塊とする方法でも良い。First, after the vertical frame lines are detected by the frame line detecting section 1, a number is added to each vertical frame line in order from the left and the x coordinate is also stored simultaneously (FIG. 4). Next, in the binarizing unit 2, after binarizing the input image, the character block separating unit 3
, The entire projections in the x and y directions are obtained, and the area where the projections exist is defined as a character block (FIG. 5). As a method of determining a character block, a method of labeling black pixels and forming each label as a character block may be used instead of projection.

【００２０】次に、文字数判定部４において、各々の文
字塊に対する文字数を決定するが、以下では図２の構成
を用いて文字数判定部４の詳しい動作を説明する。ま
ず、含枠数算出部１１において、文字塊分離部３で分離
した各々の文字塊に関して、文字塊が存在するｘ方向座
標の範囲と、枠線検出部１で検出された各々の垂直枠線
のｘ座標を比較し、垂直枠線のｘ座標が文字塊の存在す
るｘ座標の範囲に含まれる場合には、その垂直枠線の番
号とｘ座標と、含まれる垂直枠線の総数を記憶する（図
６）。次に、左枠線距離算出部１２において、含枠数算
出部１１で算出された枠線番号の中で最も小さな番号
（最も左の枠線）を選択し、選択した枠線のｘ座標と、
文字塊分離部３で分離された文字塊左端のｘ座標との距
離を求める（図７）。また、右枠線距離算出部１３にお
いて、含枠数算出部１１で算出された枠線番号の中で最
も大きな番号（最も右の枠線）を選択し、選択した枠線
のｘ座標と、文字塊分離部３で分離された文字塊右端の
ｘ座標との距離を求める（図８）。次に、枠線文字数判
定部１４では、含枠数算出部１１で求めた文字塊中に含
まれる枠線の数に１を足した数をベースに、左端枠線距
離算出部１２が求めた距離が予め定めた閾値より短い場
合は１を減じ、右端枠線距離算出部１３で求めた距離が
短い場合はさらに１を減じることにより、第１の仮文字
数を求める（図９）。Next, the number-of-characters determination unit 4 determines the number of characters for each character block. Hereinafter, the detailed operation of the number-of-characters determination unit 4 will be described with reference to the configuration of FIG. First, in the frame content calculating unit 11, for each character chunk separated by the character chunk separating unit 3, the range of the x direction coordinate where the character chunk exists, and each vertical frame line detected by the frame line detecting unit 1. And if the x-coordinate of the vertical frame is included in the range of the x-coordinate where the character block exists, the number and x-coordinate of the vertical frame and the total number of included vertical frames are stored. (FIG. 6). Next, in the left frame line distance calculation unit 12, the smallest number (leftmost frame line) among the frame line numbers calculated by the frame number calculation unit 11 is selected, and the x coordinate of the selected frame line is selected. ,
The distance from the x-coordinate of the left end of the character chunk separated by the character chunk separation unit 3 is obtained (FIG. 7). Further, the right frame line distance calculation unit 13 selects the largest number (rightmost frame line) among the frame line numbers calculated by the frame number calculation unit 11, and calculates the x coordinate of the selected frame line, The distance from the x-coordinate of the right end of the character chunk separated by the character chunk separation unit 3 is obtained (FIG. 8). Next, in the frame line character number determination unit 14, the left end frame line distance calculation unit 12 calculates the number based on the number obtained by adding 1 to the number of frame lines included in the character chunk obtained by the frame number calculation unit 11. When the distance is shorter than a predetermined threshold, the first temporary character number is obtained by subtracting 1 when the distance obtained by the right end frame line distance calculation unit 13 is short (FIG. 9).

【００２１】次に、上端距離算出部１５では、文字塊分
離部３で分離された文字塊に外接する矩形の上端から文
字塊までの各ｘ座標毎の垂直距離（上端距離）を求める
（図１０）。また、下端距離算出部１６では、文字塊分
離部３で分離された文字塊に外接する矩形の下端から文
字塊までの各ｘ座標毎の垂直距離（下端距離）を求める
（図１１）。上下距離文字数判定部１７では、上端距離
算出部１５で求めた各ｘ座標毎の上端距離、および下端
距離算出部１６で求めた各ｘ座標毎の下端距離を用い
て、各々の距離の大きい地点、隣接するｘ座標での距離
の変化の大きい地点、あるいは、外接矩形高さから上端
距離と下端距離を減じて得られる文字高さの小さな地点
を文字間の分割候補地点とし、求めた分割候補地点の数
に１を加えたものを第２の仮文字数とする（図１２）。
ただし、分割候補地点間の距離が短い場合は、片方の分
割候補地点を削除し、それに応じて第２の仮文字数も減
少させる。削除の基準としては、分割候補地点の算出手
法( 下端距離、上端距離、文字高さ) 、距離の大きさ
や、文字高さ、枠線との距離に応じて、各々の分割候補
地点に優先順位を付加し、分割候補地点削除時に優先順
位の低い分割候補地点を削除するようにしても構わな
い。Next, the upper end distance calculator 15 calculates a vertical distance (upper distance) from the upper end of the rectangle circumscribing the character chunk separated by the character chunk separator 3 to the character chunk for each x-coordinate (FIG. 2). 10). Further, the lower end distance calculation unit 16 obtains a vertical distance (lower end distance) for each x coordinate from the lower end of the rectangle circumscribing the character lump separated by the character lump separation unit 3 to the character lump (FIG. 11). The upper and lower distance character number determination unit 17 uses the upper end distance for each x coordinate obtained by the upper end distance calculation unit 15 and the lower end distance for each x coordinate obtained by the lower end distance calculation unit 16 to determine the point at which each distance is large. A point having a large change in the distance at the adjacent x coordinate, or a point having a small character height obtained by subtracting the upper end distance and the lower end distance from the height of the circumscribed rectangle is set as a division candidate point between characters. A value obtained by adding 1 to the number of points is set as a second provisional character number (FIG. 12).
However, when the distance between the division candidate points is short, one of the division candidate points is deleted, and the second provisional character number is reduced accordingly. As a criterion of deletion, priority is given to each candidate division point according to the calculation method of the division candidate points (bottom distance, top distance, character height), the size of the distance, the character height, and the distance from the frame line May be added, and a division candidate point having a lower priority may be deleted when a division candidate point is deleted.

【００２２】第１の比較文字数判定部１８では、枠線文
字数判定部１４で判定された第１の仮文字数と、上下距
離文字数判定部１７で求めた第２の仮文字数から文字塊
の文字数を判定する。文字数の判定ルールとしては、例
えば以下に示すものがある。（１）第１の仮文字数＝第２の仮文字数の場合・文字数：第１の仮文字数とする。・分割候補地点：上下距離文字数判定部１７で求めた分
割候補地点を採用する。（２）第１の仮文字数＜第２の仮文字数の場合・文字数：第１の仮文字数とする。・分割候補地点：上下距離文字数判定部１７で求めた分
割候補地点から、優先度の高い方から順に、第１の仮文
字数−１の個数の分割候補地点を採用する。ただし、下
端距離・上端距離が非常に大きい、変化が非常に大き
い、文字高さが、非常に小さい( 全て予め設けた閾値と
の比較で決定) 分割候補地点の数（ｍとする）が仮文字
数−１を越える場合は、文字数＝ｍ＋１とし、分割候補
地点数＝ｍとする。（３）第１の仮文字数＞第２の仮文字数の場合・文字数：第２の仮文字数とする。・分割候補地点：上下距離文字数判定部１７で求めた分
割候補地点を採用する。ただし、分割候補地点で区切ら
れる文字の幅が２文字分以上あると判定される場合は、
その間に存在する枠線のｘ座標を分割候補地点とし、文
字数を増加させる（図１３）。以上で、文字数判定部４の動作説明を終了する。The first comparison character number determination unit 18 calculates the number of characters of the character block from the first provisional character number determined by the frame line character number determination unit 14 and the second provisional character number obtained by the vertical distance character number determination unit 17. judge. The rules for determining the number of characters include, for example, the following. (1) When the first provisional character number = the second provisional character number • The number of characters: The first provisional character number. -Division candidate point: A division candidate point obtained by the vertical distance character number determination unit 17 is adopted. (2) When the first provisional character number <the second provisional character number • The number of characters: The first provisional character number. Division candidate points: From the division candidate points obtained by the vertical distance character number determination unit 17, the number of candidate division points equal to the first provisional character number minus one is adopted in descending order of priority. However, the lower end distance and the upper end distance are very large, the change is very large, and the character height is very small (all are determined by comparison with a preset threshold value). When the number of characters exceeds -1, the number of characters is set to m + 1, and the number of division candidate points is set to m. (3) In the case where the first provisional character number> the second provisional character number: The number of characters: the second provisional character number. -Division candidate point: A division candidate point obtained by the vertical distance character number determination unit 17 is adopted. However, if it is determined that the width of the character delimited at the division candidate point is two characters or more,
The number of characters is increased by setting the x-coordinate of the frame line existing between them as a candidate division point (FIG. 13). This is the end of the description of the operation of the character number determination unit 4.

【００２３】次に、ノイズ除去部５では、文字塊分離部
３で分離された各々の文字塊に対してノイズ判定後にノ
イズ除去を行う（図１４）。最後に、切り出し部６で
は、ノイズ除去部５で除去されずに残った各々の文字塊
に対して、ラベル分離手法や、文字高さの情報を用い
て、文字数判定部４で判定された文字数分だけ文字イメ
ージの切り出しを行う（図１５）。Next, the noise removing unit 5 performs noise removal on each of the character blocks separated by the character block separating unit 3 after determining the noise (FIG. 14). Finally, in the cutout unit 6, the number of characters determined by the number-of-characters determination unit 4 is determined for each character block remaining without being removed by the noise removal unit 5 using the label separation method and the information on the character height. The character image is cut out by the minute (FIG. 15).

【００２４】このように、第１の実施の形態によれば、
従来は文字が複数地点で連結しているために投影の大き
さだけでは文字数が正しく判定できない場合でも、文字
数判定部４において、文字塊に含まれる枠線の数によっ
て文字を判定することにより、文字数を正しく判定でき
る。As described above, according to the first embodiment,
Conventionally, even if the number of characters cannot be correctly determined only by the size of the projection because the characters are connected at a plurality of points, the character number determination unit 4 determines the character based on the number of frame lines included in the character block, The number of characters can be determined correctly.

【００２５】（実施の形態２）図１６は本発明の第２の
実施の形態における文字切り出し装置の文字数判定部４
のブロック図を示すものである。文字数判定部４以外の
構成・動作は第１の実施の形態と同様であり、文字数判
定部４においても、前後塊文字数判定部５１を追加し、
第１の比較文字数判定部１８を、第２の比較文字数判定
部５２に変更した以外は、構成・動作は第１の実施の形
態と同様である。(Embodiment 2) FIG. 16 shows the number-of-characters judging section 4 of the character segmenting apparatus according to the second embodiment of the present invention.
FIG. The configuration / operation other than the character number determination unit 4 is the same as that of the first embodiment. The character number determination unit 4 also includes a front / back lump character number determination unit 51,
The configuration and operation are the same as in the first embodiment, except that the first comparison character number determination unit 18 is changed to the second comparison character number determination unit 52.

【００２６】以下、第２の実施の形態における文字数判
定部４の動作を、図１６および図２５の動作手順を示す
フロー図を用いて説明する。５１は前後塊文字数判定部
であり、文字塊分離部３で分離した文字塊群において、
枠線文字数判定部１４で求められた第１の仮文字数に基
づいて、対象の文字塊と前後の文字塊の位置関係から第
３の仮文字数を求める（図２５の１２１）。５２は第２
の比較文字数判定部であり、上下距離文字数判定部１７
で求めた第２の仮文字数と、前後塊文字数判定部５１で
求めた第３の仮文字数を用いて、文字塊の実際の文字数
を判定する（図２５の１２２）。The operation of the number-of-characters judging section 4 in the second embodiment will be described below with reference to the flowcharts of FIGS. 16 and 25 showing the operation procedure. Reference numeral 51 denotes a front and rear chunk character number determination unit, and in the character chunk group separated by the character chunk separation unit 3,
Based on the first provisional character number obtained by the frame character number determination unit 14, a third provisional character number is obtained from the positional relationship between the target character block and the preceding and following character blocks (121 in FIG. 25). 52 is the second
, A vertical character number determining unit 17
The actual number of characters in the character block is determined by using the second provisional character number obtained in (1) and the third provisional character number obtained by the preceding and following block character number determination unit 51 (122 in FIG. 25).

【００２７】以上のように構成された文字切り出し装置
の文字数判定部４について、以下その動作を図１７を参
照して説明する。前後塊文字数判定部５１では、枠線文
字数判定部１４で求めた第１の仮文字数をベースに、文
字数分離部３で分離した文字塊群において、対象文字塊
の左端が含まれる枠（枠αとする）の中に直前の文字塊
の一部も含まれる場合には、枠αに含まれる直前の文字
塊の大きさが枠αに含まれる対象文字塊の大きさよりも
大きい場合には、文字数を１個減少させ、対象文字塊の
右端が含まれる枠（枠βとする）の中に直後の文字塊の
一部も含まれる場合には、枠αに含まれる直後の文字塊
の大きさが枠αに含まれる対象文字塊の大きさよりも大
きい場合にも、文字数を１個減少させて、その値を第３
の仮文字数とする。The operation of the character number judging unit 4 of the character segmenting apparatus having the above-described structure will be described below with reference to FIG. In the front and rear chunk character number determination unit 51, based on the first provisional number of characters obtained by the frame line character number determination unit 14, in the character chunk group separated by the character number separation unit 3, a frame including the left end of the target character chunk (frame α ) Includes a part of the immediately preceding character block, and if the size of the immediately preceding character block included in the frame α is larger than the size of the target character block included in the frame α, If the number of characters is reduced by one, and the frame including the right end of the target character block (hereinafter referred to as a frame β) includes a part of the character block immediately after, the size of the character block immediately after included in the frame α Is larger than the size of the target character block included in the frame α, the number of characters is reduced by one and the value is set to the third
Is the number of provisional characters.

【００２８】第２の比較文字数判定部５２では、上下距
離文字数判定部１７で求めた第２の仮文字数と、前後塊
文字数判定部５１で求めた第３の仮文字数から、文字塊
の実際の文字数を判定するが、文字数の判定ルールの一
例としては、第１の実施の形態において第１の比較文字
数判定部１８の動作説明で用いた文字数の判定ルール
の、第１の仮文字数を第３の仮文字数に置き換えて判定
するという方法がある。The second comparison character number determination unit 52 determines the actual character chunk from the second provisional character number obtained by the vertical distance character number determination unit 17 and the third provisional character number obtained by the front and rear chunk character number determination unit 51. The number of characters is determined. As an example of the rule for determining the number of characters, the first provisional number of characters in the rule for determining the number of characters used in the description of the operation of the first comparative character number determination unit 18 in the first embodiment is the third. There is a method in which the determination is made by replacing the number of temporary characters.

【００２９】このように、第２の実施の形態によれば、
１個の枠内には１文字のみ存在するという原則に基づい
て、文字塊左端が存在する枠内に直前の文字塊、または
文字塊右端が存在する枠内に直後の文字塊が含まれてい
る場合に、対象の文字塊または直前・直後の文字塊のど
ちらが枠内に含まれるべき文字であるかを判断すること
により、文字数判定の精度を上げることができる。As described above, according to the second embodiment,
Based on the principle that only one character exists in one frame, the immediately preceding character block is included in the frame where the left end of the character block exists, or the immediately following character block is included in the frame where the right end of the character block exists. In such a case, it is possible to improve the accuracy of the number-of-characters determination by determining whether the target character block or the character block immediately before or immediately after is a character to be included in the frame.

【００３０】（実施の形態３）図１８は本発明の第３の
実施の形態における文字切り出し装置のノイズ除去部５
のブロック図を示すものである。ノイズ除去部５以外の
構成・動作は第１の実施の形態または第２の実施の形態
と同様にしてよく、動作も同じである。図２６は本実施
の形態３における動作手順を示すフロー図である。図１
８において、２１は小塊ノイズ除去部であり、文字塊分
離部３で分離した各々の文字塊から小さなノイズを除去
する（図２６の１３１）。２２は字種判定部であり、文
字塊分離部３で分離した各々の文字塊が活字か手書きか
を判定する。２３は字種ノイズ除去部であり、字種判定
部２２において対象の文字塊が手書き判定され、かつ対
象文字塊の前後の文字塊が活字判定された場合に、対象
文字塊を除去する（図２６の１３５）。(Embodiment 3) FIG. 18 shows a noise removing unit 5 of a character cutout device according to a third embodiment of the present invention.
FIG. The configuration and operation other than the noise removal unit 5 may be the same as those of the first or second embodiment, and the operation is the same. FIG. 26 is a flowchart showing an operation procedure according to the third embodiment. FIG.
In FIG. 8, reference numeral 21 denotes a small chunk noise removing unit, which removes small noise from each character chunk separated by the character chunk separating unit 3 (131 in FIG. 26). A character type determination unit 22 determines whether each character block separated by the character block separation unit 3 is print or handwriting. A character type noise removing unit 23 removes the target character block when the target character block is handwritten by the character type determination unit 22 and the character blocks before and after the target character block are printed (FIG. 26, 135).

【００３１】図１９は字種判定部２２のブロック図を示
すものである。３１は高さ判定部であり、対象の文字塊
が前後の文字塊と高さが近似する場合に活字と判定する
（図２６の１３２）。３２は座標判定部であり、対象の
文字塊の上端の垂直座標が前後の文字塊の上端の垂直
座標と近似する、あるいは対象の文字塊の下端の垂直座
標が前後の文字塊の下端の垂直座標と近似する場合に、
対象の文字塊を活字と判定する（図２６の１３３）。３
３は手書き判定部であり、高さ判定部３１、もしくは座
標判定部３２で活字と判定されない文字塊を手書きと判
定する（図２６の１３４）。FIG. 19 is a block diagram of the character type judging section 22. Reference numeral 31 denotes a height determination unit that determines a character type when the target character block is similar in height to the preceding and following character blocks (132 in FIG. 26). Reference numeral 32 denotes a coordinate determination unit, in which the vertical coordinate of the upper end of the target character block is close to the vertical coordinate of the upper end of the preceding or following character block, or the vertical coordinate of the lower end of the target character block is perpendicular to the lower end of the preceding or following character block. When approximating the coordinates,
It is determined that the target character block is a print type (133 in FIG. 26). 3
Reference numeral 3 denotes a handwriting determination unit that determines a character block that is not determined as a print character by the height determination unit 31 or the coordinate determination unit 32 as handwriting (134 in FIG. 26).

【００３２】以上のように構成された文字切り出し装置
のノイズ除去部５において、以下その動作を説明する。
ノイズ除去部５では、文字塊分離部３で分離された各々
の文字塊に対してノイズ判定後にノイズ除去を行うが、
以下では図２０から２３を用いて、ノイズ除去部５の詳
しい動作を説明する。まず、小塊ノイズ除去部２１で
は、文字塊分離部３で分離した各々の文字塊の大きさ情
報を用いて、閾値より小さい文字塊を除去する（図２
０）。次に、字種判定部２２において、文字塊分離部３
で分離した各々の文字塊の活字・手書き判定を以下の手
順で行う。まず、高さ判定部３１では、対象の文字塊の
高さと、前後の文字塊との高さを比較し、高さがほぼ一
致する場合には、対象の文字塊を活字と判定する（図２
１）。次に、座標判定部３２では、対象の文字塊の上端
のｙ座標と前後の文字塊の上端のｙ座標、そして、対象
の文字塊の上端のｙ座標と前後の文字塊の上端のｙ座標
を比較し、上端もしくは下端のｙ座標がほぼ一致する場
合は、対象の文字塊を活字と判定する（図２２）。上端
もしくは下端の片方だけの一致で判定する理由は、かす
れスタンプ文字によって、二値化の時点で文字の一部が
欠けた場合でも活字判定できるようにするためである。
次に、手書き判定部３３では、高さ判定部３１、もしく
は座標判定部３２で活字と判定されない文字塊を手書き
と判定する。以上が字種判定部２２における動作であ
る。次に、字種ノイズ除去部２３では、字種判定部２２
において対象の文字塊が手書き判定され、かつ対象文字
塊の前後の文字塊が活字判定された場合に、対象文字塊
を除去する（図２３）。なお、活字判定に利用する前後
の文字塊の数は１個だけではなく、認識対象画像に応じ
て数を変えてもよい。また、前後の全ての文字塊が活字
判定されている必要はなく、活字判定された文字塊の割
合や数で決定しても構わない。The operation of the noise elimination unit 5 of the character segmenting apparatus having the above-described structure will be described below.
The noise removal unit 5 performs noise removal on each character block separated by the character block separation unit 3 after noise determination.
Hereinafter, the detailed operation of the noise removing unit 5 will be described with reference to FIGS. First, the small block noise removing unit 21 removes a character block smaller than a threshold using the size information of each character block separated by the character block separating unit 3 (FIG. 2).
0). Next, in the character type determination unit 22, the character block separation unit 3
The print / handwriting determination of each character block separated by is performed in the following procedure. First, the height determination unit 31 compares the height of the target character block with the height of the preceding and following character blocks, and determines that the target character block is a print type when the heights substantially match (FIG. 2
1). Next, the coordinate determination unit 32 calculates the y coordinate of the upper end of the target character block, the y coordinate of the upper end of the preceding and following character blocks, the y coordinate of the upper end of the target character block, and the y coordinate of the upper end of the preceding and following character blocks. Are compared, and when the y coordinate of the upper end or the lower end substantially coincides, the target character block is determined to be a print character (FIG. 22). The reason why the judgment is made based on the matching of only one of the upper end and the lower end is to make it possible to determine the print type even if a part of the character is missing at the time of binarization by the faint stamp character.
Next, the handwriting determination unit 33 determines that a character block that is not determined as a print character by the height determination unit 31 or the coordinate determination unit 32 is handwritten. The above is the operation in the character type determination unit 22. Next, in the character type noise removing unit 23, the character type determining unit 22
When the target character block is determined by handwriting in, and the character blocks before and after the target character block are determined by print, the target character block is removed (FIG. 23). It should be noted that the number of character chunks before and after used for type determination is not limited to one, and may be changed according to the recognition target image. In addition, it is not necessary that all of the character chunks before and after are type-determined, and may be determined by the ratio or the number of the character chunks whose type is determined.

【００３３】このように、第３の実施の形態によれば、
従来手法で除去できなかった大きなノイズに対しても、
前後の文字が活字の場合は、ノイズ除去部５において、
前後の文字塊は活字判定、対象の文字塊は手書き判定さ
れるので、対象の文字塊をノイズと判定することが可能
になり、除去することができる。As described above, according to the third embodiment,
Even for large noise that could not be removed by the conventional method,
When the characters before and after are printed, the noise removing unit 5
Since the character chunk before and after is determined by print type and the target character chunk is determined by handwriting, the target character chunk can be determined as noise and can be removed.

【００３４】[0034]

【発明の効果】以上説明したように、本発明の請求項１
に記載の文字切り出し装置によれば、前後の文字が連結
する、あるいは上部が開いた手書きの“４”のような場
合に発生する、投影や文字高さ情報のみの文字数判定手
法による文字数誤りを、記入枠と文字との存在位置のバ
ランスを考慮して直すことにより、より正確な文字数判
定を可能にする。As described above, according to the first aspect of the present invention,
According to the character segmentation device described in (1), the number of characters caused by the character number determination method based on only the projection and character height information, which occurs when the characters before and after are linked or when the upper part is open like a handwritten “4”, is eliminated. By taking into account the balance between the existing positions of the entry frame and the characters, the number of characters can be determined more accurately.

【００３５】本発明の請求項２に記載の文字切り出し装
置によれば、対象の文字塊または直前・直後の文字塊の
どちらが枠内に含まれるべき文字であるかを判断して、
その結果に基づいて文字塊の文字数を修正することによ
り、周辺の文字塊との文字数のバランスを考慮した文字
数判定が可能になり、その結果文字数判定の精度が向上
する。According to the character extracting device of the second aspect of the present invention, it is determined whether the target character block or the character block immediately before or immediately after is the character to be included in the frame, and
By correcting the number of characters of the character block based on the result, it is possible to determine the number of characters in consideration of the balance of the number of characters with the surrounding character blocks, and as a result, the accuracy of the determination of the number of characters is improved.

【００３６】本発明の請求項３に記載の文字切り出し装
置によれば、活字文字列の中に手書きの文字は混入しな
いことを前提に、各々の文字塊について活字・手書き判
定を行うことにより、対象の文字塊が手書き判定で、対
象の前後の文字塊が活字判定の場合には、対象の文字塊
をノイズと判定することにより、通常の小ノイズ除去手
法で除去できない大きなノイズでも、活字列の中に含ま
れていれば除去することができる。According to the character extracting apparatus of the third aspect of the present invention, it is assumed that a handwritten character is not mixed in a printed character string, and a character / handwriting determination is performed for each character block. If the target character block is handwritten and the character blocks before and after the target are type determination, the target character block is determined to be noise. It can be removed if it is contained within.

【００３７】本発明の請求項４に記載の文字切り出し装
置によれば、前後の活字判定文字塊とは高さが異なるた
めに、高さ判定だけでは活字判定できない掠れ活字に対
しても、前後の活字判定文字塊と上端もしくは下端の垂
直座標が近似しているだけで活字判定が可能になる。According to the character extracting device of the fourth aspect of the present invention, since the height is different from that of the front and rear character judging character blocks, it is possible to perform the front and rear shading type which cannot be determined by the height judgment alone. The character type can be determined only by approximating the vertical coordinates of the character block and the upper or lower end of the character block.

【００３８】本発明の請求項５に記載の文字切り出し装
置によれば、基本的に同一フィールド内での全ての文字
は、規則的に並ぶ同一フォントの活字、もしくは手書き
で構成されるため、フィールド全体の文字塊の高さ・座
標情報を用いて活字・手書き判定することにより、可能
な限り多くの情報を用いて手書き・活字判定を行うこと
ができるため、手書き・活字判定の判定精度が高くな
り、活字判定文字塊の中に含まれる手書き判定文字塊、
すなわちノイズがより多く発見・除去することができ
る。According to the character extracting apparatus of the fifth aspect of the present invention, basically, all the characters in the same field are composed of the same type of font or handwriting arranged regularly. By performing type / handwriting determination using the height / coordinate information of the entire character block, handwriting / type determination can be performed using as much information as possible. A handwritten judgment character block included in the type judgment character block,
That is, more noise can be found and removed.

【００３９】本発明の請求項６に記載の文字切り出し方
法によれば、前後の文字が連結する、あるいは上部が開
いた手書きの“４”のような場合に発生する、投影や文
字高さ情報のみの文字数判定手法による文字数誤りを、
記入枠と文字との存在位置のバランスを考慮して直すこ
とにより、より正確な文字数判定を可能にする。According to the character extracting method of the sixth aspect of the present invention, the projection and character height information generated when the preceding and following characters are connected or when the upper part is open such as a handwritten "4" Error in the number of characters
By re-evaluating the balance of the existing positions of the entry frame and the character, it is possible to more accurately determine the number of characters.

【００４０】本発明の請求項７に記載の文字切り出し方
法によれば、対象の文字塊または直前・直後の文字塊の
どちらが枠内に含まれるべき文字であるかを判断して、
その結果に基づいて文字塊の文字数を修正することによ
り、周辺の文字塊との文字数バランスを考慮した文字数
判定が可能になり、その結果文字数判定の精度が向上す
る。According to the character extracting method of the present invention, it is determined whether the target character block or the character block immediately before or immediately after is the character to be included in the frame, and
By correcting the number of characters in the character block based on the result, it is possible to determine the number of characters in consideration of the balance of the number of characters with the surrounding character blocks, and as a result, the accuracy of the determination of the number of characters is improved.

【００４１】本発明の請求項８に記載の文字切り出し方
法によれば、活字文字列の中に手書きの文字は混入しな
いことを前提に、各々の文字塊について活字・手書き判
定を行うことにより、対象の文字塊が手書き判定で、対
象の前後の文字塊が活字判定の場合には、対象の文字塊
をノイズと判定することにより、通常の小ノイズ除去手
法で除去できない大きなノイズでも、活字列の中に含ま
れていれば除去することができる。According to the character extracting method of the present invention, on the assumption that handwritten characters are not mixed in the type character string, the type / handwriting determination is performed for each character block. If the target character block is handwritten and the character blocks before and after the target are type determination, the target character block is determined to be noise. It can be removed if it is contained within.

【００４２】本発明の請求項９に記載の記録媒体によれ
ば、本発明の文字切り出し方法を他のシステムにも簡単
に導入できるようにすることができ、認識部の機能を持
つシステム、あるいは記録媒体と組み合わせることによ
って、入力パターンを取得するだけで、そのパターンに
含まれる文字の読み取りも可能になり、また、認識性能
向上の結果としてユーザ側の確認・修正の負担が少なく
なる。According to the recording medium of the ninth aspect of the present invention, the character extracting method of the present invention can be easily introduced into another system, and a system having a function of a recognition unit, or By combining with a recording medium, it is possible to read the characters included in the input pattern only by acquiring the input pattern, and to reduce the burden of confirmation and correction on the user side as a result of the improvement of the recognition performance.

[Brief description of the drawings]

【図１】本発明の各実施の形態における文字切り出し装
置のブロック図FIG. 1 is a block diagram of a character cutout device according to each embodiment of the present invention.

【図２】第１の実施の形態における文字数判定部のブロ
ック図FIG. 2 is a block diagram of a character number determination unit according to the first embodiment;

【図３】第１の実施の形態における入力画像例の説明図FIG. 3 is an explanatory diagram of an example of an input image according to the first embodiment;

【図４】第１の実施の形態における垂直枠線の検出と番
号付加例の説明図FIG. 4 is an explanatory diagram of an example of detection and numbering of vertical frame lines according to the first embodiment.

【図５】第１の実施の形態における文字塊単位の抽出方
法の説明図FIG. 5 is an explanatory diagram of a method of extracting a character block unit according to the first embodiment;

【図６】第１の実施の形態における文字塊に含まれる枠
線数・番号の記憶方法の説明図FIG. 6 is an explanatory diagram of a method of storing the number of frame lines / numbers included in a character block according to the first embodiment.

【図７】第１の実施の形態における選択枠線と文字塊左
端の距離の求め方の説明図FIG. 7 is an explanatory diagram of a method of obtaining a distance between a selection frame line and a left end of a character block in the first embodiment.

【図８】第１の実施の形態における選択枠線と文字塊右
端の距離の求め方の説明図FIG. 8 is an explanatory diagram of a method of obtaining a distance between a selection frame line and a right end of a character block according to the first embodiment.

【図９】第１の実施の形態における枠線と文字塊の位置
関係による第１の仮文字数算出法の説明図FIG. 9 is an explanatory diagram of a first provisional character number calculation method based on a positional relationship between a frame line and a character block according to the first embodiment;

【図１０】第１の実施の形態における各列における外接
矩形上端から文字までの距離（上端距離）の説明図FIG. 10 is an explanatory diagram of a distance (upper distance) from the upper end of a circumscribed rectangle to a character in each column in the first embodiment.

【図１１】第１の実施の形態における各列における外接
矩形左端から文字までの距離（下端距離）の説明図FIG. 11 is an explanatory diagram of a distance (lower end distance) from a left end of a circumscribed rectangle to a character in each column according to the first embodiment.

【図１２】第１の実施の形態における上端距離と下端距
離を用いた第２の仮文字数算出方法の説明図FIG. 12 is an explanatory diagram of a second provisional character number calculation method using the upper end distance and the lower end distance in the first embodiment;

【図１３】第１の実施の形態における文字幅が大きい場
合の文字数補正方法の説明図FIG. 13 is an explanatory diagram of a character number correction method according to the first embodiment when the character width is large.

【図１４】第１の実施の形態におけるノイズ削除の説明
図FIG. 14 is an explanatory diagram of noise removal according to the first embodiment.

【図１５】第１の実施の形態における文字塊に複数の文
字が含まれる場合の１文字単位の分離例の説明図FIG. 15 is an explanatory diagram of a separation example in units of one character when a character block includes a plurality of characters according to the first embodiment;

【図１６】第２の実施の形態における文字数判定部のブ
ロック図FIG. 16 is a block diagram of a character number determination unit according to the second embodiment.

【図１７】第２の実施の形態における前後の文字との位
置関係による分割候補位置( 文字数) 補正の説明図FIG. 17 is an explanatory diagram of correction of a division candidate position (the number of characters) based on a positional relationship with characters before and after according to the second embodiment;

【図１８】第３の実施の形態におけるノイズ除去部のブ
ロック図FIG. 18 is a block diagram of a noise removing unit according to the third embodiment;

【図１９】第３の実施の形態における字種判定部のブロ
ック図FIG. 19 is a block diagram of a character type determination unit according to the third embodiment;

【図２０】第３の実施の形態における小ノイズ削除の説
明図FIG. 20 is an explanatory diagram of small noise elimination in the third embodiment.

【図２１】第３の実施の形態における文字塊の高さによ
る活字判定手法の説明図FIG. 21 is an explanatory diagram of a type determination method based on the height of a character block in the third embodiment.

【図２２】第３の実施の形態における欠け活字に対応し
た上端・下端ｙ座標の一致度による活字判定手法の説明
図FIG. 22 is an explanatory diagram of a type determination method based on the degree of coincidence of upper and lower y coordinates corresponding to a missing type according to the third embodiment;

【図２３】第３の実施の形態における活字判定手法を利
用した大きなノイズの削除手法の説明図FIG. 23 is an explanatory diagram of a method for removing large noise using a type determination method according to the third embodiment.

【図２４】第１の実施の形態における動作手順のフロー
図FIG. 24 is a flowchart of an operation procedure according to the first embodiment.

【図２５】第２の実施の形態における動作手順のフロー
図FIG. 25 is a flowchart of an operation procedure according to the second embodiment.

【図２６】第３の実施の形態における動作手順のフロー
図FIG. 26 is a flowchart of an operation procedure according to the third embodiment;

【図２７】第１の従来例における文字切り出し装置のブ
ロック図FIG. 27 is a block diagram of a character segmentation device in the first conventional example.

【図２８】第１の従来例における文字数判定・切り出し
方法の説明図FIG. 28 is an explanatory diagram of a character number determination / cutout method in the first conventional example.

【図２９】第２の従来例における文字切り出し装置のブ
ロック図FIG. 29 is a block diagram of a character cutout device according to a second conventional example.

【図３０】第１の従来例における文字数を少なく判定す
る例の説明図FIG. 30 is an explanatory diagram of an example in which the number of characters is determined to be small in the first conventional example.

【図３１】第１の従来例における文字数を多く判定する
例の説明図FIG. 31 is an explanatory diagram of an example of determining a large number of characters in the first conventional example.

【図３２】第２の従来例における除去できないノイズ例
の説明図FIG. 32 is an explanatory diagram of an example of noise that cannot be removed in the second conventional example.

[Explanation of symbols]

１二値化部２枠線検出部３文字塊分離部４文字数判定部５ノイズ除去部６切り出し部１１含枠数算出部１２左枠線距離算出部１３右枠線距離算出部１４枠線文字数判定部１５上端距離算出部１６下端距離算出部１７上下距離文字数判定部１８第１の比較文字数判定部２１小塊ノイズ除去部２２字種判定部２３字種ノイズ除去部３１高さ判定部３２座標判定部３３手書き判定部５１前後塊文字数判定部５２第２の比較文字数判定部２０１画像入力部２０２投影部２０３判断部２０４削除部２０５文字切り出し部２０６認識部２１１正規化部２１２ブロック化部２１３位置面積算出部２１４面積化算出部２１５ノイズ判定テーブル２１６ノイズ判定部２１７ノイズ除去部 DESCRIPTION OF SYMBOLS 1 Binarization part 2 Frame line detection part 3 Character block separation part 4 Character number judgment part 5 Noise removal part 6 Cutout part 11 Frame number calculation part 12 Left frame line distance calculation part 13 Right frame line distance calculation part 14 Number of frame line characters Judgment unit 15 Upper end distance calculation unit 16 Lower end distance calculation unit 17 Vertical distance character number judgment unit 18 First comparative character number judgment unit 21 Small block noise elimination unit 22 Character type judgment unit 23 Character type noise elimination unit 31 Height judgment unit 32 Coordinate Determination unit 33 handwriting determination unit 51 front and back mass character number determination unit 52 second comparison character number determination unit 201 image input unit 202 projection unit 203 determination unit 204 deletion unit 205 character cutout unit 206 recognition unit 211 normalization unit 212 blocking unit 213 position Area calculation unit 214 Area calculation unit 215 Noise determination table 216 Noise determination unit 217 Noise removal unit

Claims

[Claims]

1. A binarization unit for binarizing an input image when a slip in which a character group is written in a horizontally continuous non-separable frame is input as an image. A frame line detection unit that detects coordinates of all vertical frame lines included in the binarized image, adds numbers in order, and stores the coordinates and numbers of each frame line; and the binarization unit. A character chunk separating unit that separates the image binarized in each character chunk, a character number determining unit that determines the number of characters included in each character chunk separated by the character chunk separating unit, and the character chunk separating unit. A noise removing unit that removes noise from the character block group separated by the above, and for each character block group remaining without being removed by the noise removing unit, cut out characters by the number of characters determined by the character number determining unit. And a cutout unit to be executed, wherein the character number determination unit is A frame count calculating unit that calculates the number and number of frame lines included in the character chunk from the coordinates of the detected frame line and the coordinates of the character chunk separated by the character chunk separation unit; In the frame line group corresponding to the frame line number calculated by the unit, the frame line closest to the left end of the character block separated by the character block separation unit and the left end of the character block separated by the character block separation unit A left frame line distance calculating unit for calculating a distance, and a frame closest to the right end of the character block separated by the character block separating unit in the group of frame lines corresponding to the frame line number calculated by the frame number calculating unit. A right frame line distance calculation unit that calculates the distance between the line and the right end of the character block separated by the character block separation unit, and the number of frame lines included in the character block determined by the frame number calculation unit,
From the distance between the frame line detected by the left frame line detection unit and the left end of the character block, and the distance between the frame line detected by the right frame line detection unit and the right end of the character block, the first provisional character count of the character block is calculated. A frame line character number judging unit to be judged, an upper end distance calculating unit for calculating a distance from the upper end of the rectangle circumscribing the character chunk separated by the character chunk separating unit to the character chunk for each vertical line, A lower end distance calculation unit for calculating a distance from the lower end of the rectangle circumscribing the separated character lump to the character lump for each vertical line, and a distance from the upper end of the rectangle to the character lump for each vertical line calculated by the upper end distance calculation unit An upper / lower distance character number determining unit for obtaining a second provisional character number of the character block from the distance from the lower end of the rectangle to the character block for each vertical line calculated by the lower end distance calculating unit; and the frame line character number determining unit. The first number of temporary characters and the number of upper and lower distance characters A first comparison character number determination unit that determines the number of characters in the character block using the second provisional character number determined by the determination unit; and the first comparison character number determination unit determines the number of characters determined by the first comparison character number determination unit. Character cutout device to be output.

2. The method according to claim 1, wherein the character number determining unit determines, for each character block separated by the character block separating unit, the frame line character number determining unit using a positional relationship between a target character block and preceding and following character blocks. A mass character judging unit that corrects the first temporary character number to obtain a third temporary character number, and a second temporary character number obtained by the vertical distance character number determining unit instead of the first comparative character number determining unit. And a second comparison character number determination unit that determines the number of characters in the character block by using the third provisional character number obtained by the front and rear lump character number determination unit. The number of characters determined by the second comparison character number determination unit 2. The character segmenting device according to claim 1, wherein the character string is an output of the character number determining unit.

3. A binarizing section for binarizing an input image, a character chunk separating section for separating the image binarized by the binarizing section for each character chunk, and separating by the character chunk separating section. A number-of-characters determining unit that determines the number of characters included in each of the determined character blocks; a noise removing unit that removes a character block determined to be noise from the character block group separated by the character block separating unit; A cutout unit that cuts out characters for each of the character chunk groups left unremoved by the removal unit by the number of characters determined by the character number determination unit, wherein the noise removal unit includes: A small chunk noise removing unit for judging a small character chunk as noise in each of the character chunks separated by the unit, and a character type judgment for performing type / handwriting judgment of each character chunk separated by the character chunk separating unit And a character block targeted by the character type determination unit Handwritten determined, and if the before and after text chunks of target character lumps is print determined in the character type determining unit, character segmentation unit and a character type noise removing unit for removing the object character lumps.

4. In a case where a slip in which a plurality of characters are continuously written in a horizontal direction is imaged and input, the character type determination unit determines that a target character block is similar in height to the preceding and following character blocks. In case,
A height determination unit that determines the target character block as a type, and the vertical coordinate of the upper end of the target character block approximates the vertical coordinate of the upper end of the preceding or following character block, or the vertical coordinate of the lower end of the target character block is When the vertical coordinate of the lower end of the character block is approximated, a coordinate determining unit that determines the target character block as a type, and a character block that is not determined as a type by the height determining unit or the coordinate determining unit is determined as a handwriting. The character segmentation device according to claim 3, further comprising a handwriting determination unit.

5. In the case where a plurality of characters are successively written in a horizontal direction and a slip in which a character entry area is separated in units of fields is input as an image, the character type determination unit may be configured to execute each of the characters in the same field. 5. A field determination unit which determines a character block whose height or vertical coordinate is similar to another character block in the same field as a character block of the same character block as a printed character, and determines that all other blocks are handwritten. Character extraction device according to the description.

6. When a slip in which a character group is written in a non-separable frame is input as an image, the input image is binarized and separated for each character block, and all of the characters included in the binary image are separated. The coordinates of the vertical border are detected and numbered, the coordinates and numbers of each border are stored, and for each character chunk, the coordinates of the detected border and the coordinates of the separated chunk are From the calculated number of frame lines included in the character block, the number of the frame line closest to the left end of the character block and the number of the frame line closest to the right end of the character block are calculated from the calculated frame line numbers. ,
The number of frame lines included in the character block, the distance between the left end coordinate of the character block and the frame line closest to the left end of the character block among the frame lines included in the character block, the right end coordinate of the character block, and the character block The first provisional number of characters in the character block is determined from the distance from the frame line closest to the right end of the character block among the frame lines included therein. For each rectangle circumscribing each character block, The distance to the corresponding character block and the distance from the lower end of the rectangle to the corresponding character block are determined, and the second temporary number of characters in the character block is determined from the magnitude and change of these distances. The actual number of characters in the character block is determined from the number of provisional characters in 2, and the character blocks determined as noise are removed from the character block group. For each character block group remaining without being removed,
A character extraction method that executes character extraction for the determined number of characters.

7. When a slip in which a character group is entered in a non-separable frame is input as an image, the input image is binarized and separated for each character block, and all of the characters included in the binary image are separated. Detects the coordinates of the vertical frame lines and numbers them, stores the coordinates and numbers of each frame line, and for each character block, calculates the coordinates of the detected frame line and the coordinates of the separated character block. , Calculate the number and number of the frame lines included in the character block, and calculate the frame line number closest to the left end of the character block and the frame line number closest to the right end of the character block from the calculated frame line numbers. Asked,
The number of frame lines included in the character block, the distance between the left end coordinate of the character block and the frame line closest to the left end of the character block among the frame lines included in the character block, the right end coordinate of the character block, and the character block The first provisional number of characters in the character block is determined from the distance from the frame line closest to the right end of the character block among the frame lines included therein. For each rectangle circumscribing each character block, The distance to the corresponding character block and the distance from the lower end of the rectangle to the corresponding character block are determined, and the second provisional number of characters in the character block is determined from the magnitude and change of these distances. The third number of provisional characters is determined from the positional relationship of the character blocks, the actual number of characters in the character block is determined from the first provisional character number, the second provisional character number, and the third provisional character number, and noise is determined from the character block group. Removed character chunks, and judge for each character chunk remaining without being removed Character segmentation method for performing a cut-out of character just the number of characters that was.

8. An input image is binarized and separated for each character block, coordinates of all vertical frame lines included in the binary image are detected and numbering is performed, and coordinates of each frame line are determined. After storing the numbers and determining the number of characters for each character chunk, the type and handwriting judgment of each character chunk is performed, and the judgment result of the target character chunk is handwritten, and the character chunk around the target character chunk is determined. If the determination result is a print type, the target character block is determined and removed as noise, and a small character block in the character block group is determined and removed as noise. A character extraction method for extracting characters by the determined number of characters.

9. A recording medium which records a program that realizes the character extracting method according to claim 6 by software.