JP7508212B2

JP7508212B2 - Image processing device, image processing method, and program

Info

Publication number: JP7508212B2
Application number: JP2019196215A
Authority: JP
Inventors: 泰輔石黒
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2024-07-01
Anticipated expiration: 2039-10-29
Also published as: JP2021071763A

Description

本発明は、文書画像から反転領域を特定する技術に関する。 The present invention relates to a technology for identifying inverted areas from a document image.

スキャナやカメラを用いて得られた文書画像に対してＯＣＲ処理を行う際、いわゆる白抜き文字のような背景と文字部分とが反転した領域（反転領域）を特定し、反転状態を解消することで、ＯＣＲ精度を向上させる技術が知られている。特許文献１には、文書等を読み取って得られた入力画像に対してエッジ抽出を実施し、エッジ抽出結果を解析することで反転領域を特定し、その反転領域が存在する画像部分の反転状態を解消する方法が開示されている。 When performing OCR processing on document images obtained using a scanner or camera, a technique is known that identifies areas where the background and text are inverted (inverted areas), such as white-out characters, and eliminates the inverted state, thereby improving OCR accuracy. Patent Document 1 discloses a method of performing edge extraction on an input image obtained by scanning a document, etc., identifying inverted areas by analyzing the edge extraction results, and eliminating the inverted state in image parts where the inverted areas exist.

特開２０１０－１８６２４６号公報JP 2010-186246 A

しかしながら、上記特許文献１の技術では、文書画像のコンテンツに依存してエッジ抽出の処理負荷が増加することになる。たとえば、文字数が多い文書画像の場合、全文字に対してエッジ抽出を実施し、抽出された各エッジ情報に対して反転領域の判定処理を行う必要があり、処理コストが掛かりすぎてしまう。 However, with the technology of Patent Document 1, the processing load of edge extraction increases depending on the content of the document image. For example, in the case of a document image with a large number of characters, it is necessary to perform edge extraction for all characters and to perform inversion area determination processing for each extracted edge information, which results in excessive processing costs.

本開示に係る画像処理装置は、背景と前景とが反転した状態の文字を含む反転領域と、反転領域以外の領域にある通常文字とを含む文書画像に対し、前記通常文字を消すための収縮処理を少なくとも含むモルフォロジー処理を行って、前記反転領域を構成する可能性のある連結画素塊が残された強調画像を生成する処理手段と、前記強調画像に残された前記連結画素塊の情報に基づき特定される前記反転領域の候補領域に対応する前記文書画像の部分画像に対する解析を行う解析手段と、前記解析手段による解析結果に基づいて、前記文書画像における、背景と前景とが反転した状態の前記文字を含む前記反転領域を特定する特定手段と、を備え、前記特定手段は、前記候補領域が複数ある場合に、各候補領域をその位置関係に基づいて統合し、統合した後の候補領域に対応する前記文書画像の部分画像に対する解析結果に基づいて、前記反転領域を特定する、ことを特徴とする。 The image processing device according to the present disclosure comprises a processing means for performing morphological processing, including at least a contraction process for erasing normal characters, on a document image including an inverted region including characters in which the background and foreground are inverted, and normal characters in an area other than the inverted region, to generate an enhanced image in which connected pixel blocks that may constitute the inverted region remain ; an analysis means for analyzing a partial image of the document image corresponding to a candidate region of the inverted region identified based on information of the connected pixel blocks remaining in the enhanced image; and an identification means for identifying the inverted region in the document image including the characters in which the background and foreground are inverted, based on the analysis result by the analysis means, wherein, when there are multiple candidate regions, the identification means integrates the candidate regions based on their positional relationship, and identifies the inverted region based on the analysis result of the partial image of the document image corresponding to the integrated candidate region .

本開示の技術によれば、文書画像に多くのコンテンツが含まれていても、処理コストを抑制しつつ精度良く反転領域を特定することができる。 The technology disclosed herein makes it possible to accurately identify inverted regions while keeping processing costs down, even when a document image contains a large amount of content.

情報処理システムの構成を示す図A diagram showing the configuration of an information processing system. 実施形態１に係る、文書画像に含まれる反転領域の特定等に関わるソフトウェア構成を示す機能ブロック図FIG. 1 is a functional block diagram showing a software configuration related to identification of a reversed region included in a document image according to a first embodiment; 実施形態１に係る、画像処理装置における処理の大まかな流れを示すフローチャート1 is a flowchart showing an outline of a process in an image processing apparatus according to a first embodiment; （ａ）は二値画像の一例を示す図、（ｂ）は二値画像内の反転領域を非反転化した結果の一例を示す図、（ｃ）は強調画像の一例を示す図、（ｄ）は反転領域の外枠を残した場合の一例を示す図FIG. 1A is a diagram showing an example of a binary image; FIG. 1B is a diagram showing an example of a result of non-inverting an inverted region in the binary image; FIG. 1C is a diagram showing an example of an enhanced image; and FIG. 1D is a diagram showing an example of a case where the outer frame of the inverted region is left. 実施形態１に係る、反転領域判定処理の詳細を示すフローチャート1 is a flowchart showing details of a process for determining an inverted region according to the first embodiment; （ａ）は二値画像の一例を示す図、（ｂ）は強調画像の一例を示す図、（ｃ）は二値画像内の反転領域を非反転化した結果の一例を示す図FIG. 4A is a diagram showing an example of a binary image, FIG. 4B is a diagram showing an example of an enhanced image, and FIG. 4C is a diagram showing an example of a result of non-inverting an inverted region in the binary image. 実施形態２に係る、反転領域判定処理の詳細を示すフローチャート10 is a flowchart showing details of an inversion region determination process according to a second embodiment. （ａ）は二値画像の一例を示す図、（ｂ）は強調画像の一例を示す図、（ｃ）は暫定統合領域の一例を示す図、（ｄ）は二値画像内の反転領域を非反転化した結果の一例を示す図1A is a diagram showing an example of a binary image; FIG. 1B is a diagram showing an example of an enhanced image; FIG. 1C is a diagram showing an example of a provisional integrated region; and FIG. 1D is a diagram showing an example of a result of non-inverting an inverted region in the binary image. 実施形態２の変形例に係る、候補領域の再評価処理の詳細を示すフローチャート11 is a flowchart showing details of a candidate region re-evaluation process according to a modification of the second embodiment. （ａ）は部分二値画像の一例を示す図、（ｂ）は縮小二値画像の模式図FIG. 2A is a diagram showing an example of a partial binary image, and FIG. 2B is a schematic diagram of a reduced binary image. 部分二値画像に対してｘ軸方向に射影をとった結果The result of projecting the partial binary image in the x-axis direction

以下、本発明を実施するための形態について図面を用いて説明する。なお、以下の実施の形態は特許請求の範囲に係る発明を限定するものでなく、また実施の形態で説明されている特徴の組み合わせの全てが発明の解決手段に必須のものとは限らない。 Below, the embodiments for carrying out the present invention will be explained with reference to the drawings. Note that the following embodiments do not limit the invention as claimed, and not all of the combinations of features described in the embodiments are necessarily essential to the solution of the invention.

［実施形態１］
＜ハードウェア構成＞
図１は、本実施形態に係る、情報処理システムの構成を示す図である。情報処理システムは、スキャナ装置１００と画像処理装置１１０とからなる。以下、図１を参照して、スキャナ装置１１０と画像処理装置１１０のハードウェア構成について簡単に説明する。 [Embodiment 1]
<Hardware Configuration>
Fig. 1 is a diagram showing the configuration of an information processing system according to this embodiment. The information processing system is made up of a scanner device 100 and an image processing device 110. Below, the hardware configurations of the scanner device 110 and the image processing device 110 will be briefly described with reference to Fig. 1.

スキャナ装置１００は、制御部１０１、画像読取部１０２及び通信部１０３を有している。制御部１０１は、ＣＰＵ、ＲＡＭ、ＲＯＭなどで構成され、ＣＰＵがＲＯＭに格納された所定のプログラムをＲＡＭに展開して実行することで、スキャナ装置１００を統括的に制御する。画像読取部１０１は、不図示の原稿台に載置された文書を光学的に読み取って、スキャン画像を生成する。通信部１０３は、ネットワークを介して外部装置（ここでは画像処理装置１１０）との間で画像データのやり取りなどを行う通信インタフェースである。 The scanner device 100 has a control unit 101, an image reading unit 102, and a communication unit 103. The control unit 101 is composed of a CPU, RAM, ROM, etc., and the CPU loads a specific program stored in the ROM into the RAM and executes it to control the scanner device 100 as a whole. The image reading unit 101 optically reads a document placed on a platen (not shown) and generates a scanned image. The communication unit 103 is a communication interface that exchanges image data with an external device (here, the image processing device 110) via a network.

画像処理装置１１０は、制御部１１１、大容量記憶部１１２、表示部１１３、入力部１１４及び通信部１１５を有している。制御部１１１は、ＣＰＵ、ＲＡＭ、ＲＯＭなどで構成され、ＣＰＵがＲＯＭに格納された所定のプログラムをＲＡＭに展開して実行することで、画像処理装置１１０を統括的に制御する。大容量記憶部１１２は、例えばＨＤＤやＳＳＤであり、各種データや各種プログラム等を記憶する。表示部１１３は、例えば液晶ディスプレイであり、ユーザに各種情報を表示する。入力部１１４は、キーボードやマウスであり、ユーザによる各種操作を受け付ける。なお、表示部１１３と入力部１１４は、タッチパネルのように一体であってもよい。通信部１１５は、ネットワークを介して外部装置（ここではスキャナ装置１００）との間で画像データのやり取りなどを行う通信インタフェースである。 The image processing device 110 has a control unit 111, a mass storage unit 112, a display unit 113, an input unit 114, and a communication unit 115. The control unit 111 is composed of a CPU, a RAM, a ROM, etc., and the CPU loads a predetermined program stored in the ROM into the RAM and executes it, thereby controlling the image processing device 110 in an overall manner. The mass storage unit 112 is, for example, a HDD or SSD, and stores various data and various programs. The display unit 113 is, for example, a liquid crystal display, and displays various information to the user. The input unit 114 is, for example, a keyboard or a mouse, and accepts various operations by the user. The display unit 113 and the input unit 114 may be integrated, such as a touch panel. The communication unit 115 is a communication interface that exchanges image data with an external device (here, the scanner device 100) via a network.

本実施形態においては、スキャナ装置１００の画像読取部１０１が帳票等の紙文書をスキャンして、スキャン画像を生成する。生成されたスキャン画像のデータは、通信部１０３により画像処理装置１１０に送信される。画像処理装置１１０では、通信部１１５がスキャン画像のデータを受信して大容量記憶部１１２に格納する。なお、図１に示す情報処理システムの構成は一例であり、例えば、スキャナ装置１００と画像処理装置１１０とは一体化されていてもよいし、画像処理装置１１０がインターネットを介して接続されるクラウドタイプのサーバコンピュータであってもよい。 In this embodiment, the image reading unit 101 of the scanner device 100 scans a paper document such as a form to generate a scanned image. The generated scanned image data is transmitted to the image processing device 110 by the communication unit 103. In the image processing device 110, the communication unit 115 receives the scanned image data and stores it in the mass storage unit 112. Note that the configuration of the information processing system shown in FIG. 1 is one example, and for example, the scanner device 100 and the image processing device 110 may be integrated, or the image processing device 110 may be a cloud-type server computer connected via the Internet.

図２は、本実施形態に係る画像処理装置１１０の、文書画像に含まれる反転領域の特定等に関わるソフトウェア構成を示す機能ブロック図である。なお、本明細書において「反転領域」の用語は、文書画像内の色が付いた地（背景）の中で文字列等の前景部分が抜かれた、背景と前景とが反転している領域を意味するものとして用いるものとする。 Figure 2 is a functional block diagram showing the software configuration related to the identification of inverted areas contained in a document image of the image processing device 110 according to this embodiment. Note that in this specification, the term "inverted area" is used to mean an area in a document image where the foreground, such as text, is inverted, with the foreground portion, such as a colored background, removed.

画像処理装置１１０は、二値化部２００と反転領域処理部２１０とを有し、反転領域処理部２１０はさらに、モルフォロジー処理部２１１、連結画素塊抽出部２１２、反転領域判定部２１３、画素反転部２１４を含む。これら各部の機能は、制御部１１１においてＲＯＭに格納されたプログラムをＣＰＵが実行することによって実現される。なお、画像処理装置１１０は、図２に示す機能以外の機能、例えばＯＣＲ機能をさらに有していてもよい。 The image processing device 110 has a binarization unit 200 and an inversion region processing unit 210, which further includes a morphology processing unit 211, a connected pixel block extraction unit 212, an inversion region determination unit 213, and a pixel inversion unit 214. The functions of these units are realized by the CPU executing a program stored in the ROM in the control unit 111. Note that the image processing device 110 may further have functions other than those shown in FIG. 2, such as an OCR function.

以下、図３に示すフローチャートを参照しつつ、各部の機能について説明する。図３のフローチャートに示す、画像処理装置１１０における一連の処理に先立って、ユーザは、スキャナ装置１００を用いて紙文書を光学的に読み取って、スキャン画像を生成する。そして、画像処理装置１１０にスキャン画像のデータを送信し、画像処理装置１１０にて、ユーザが所定の開始指示を行うと、図３のフローチャートに示す一連の処理が開始される。ここでは、領収書のスキャン画像が処理対象画像として入力されたものとして説明を行うこととする。なお、以下の説明において記号「Ｓ」はステップを意味する。 Below, the function of each unit will be explained with reference to the flowchart shown in FIG. 3. Prior to the series of processes in the image processing device 110 shown in the flowchart in FIG. 3, the user optically reads a paper document using the scanner device 100 to generate a scanned image. Then, when the user sends the scanned image data to the image processing device 110 and issues a predetermined start instruction in the image processing device 110, the series of processes shown in the flowchart in FIG. 3 begins. Here, the explanation will be given assuming that a scanned image of a receipt has been input as the image to be processed. Note that in the following explanation, the symbol "S" means a step.

Ｓ３０１では、二値化部２００が、入力されたスキャン画像に対して二値化処理を行って、反転領域処理部２１０の処理対象となる二値画像を生成する。二値化処理は、例えば２５６階調（８ビット）といった多値の階調値からなる画像データを“０”と“１”の二値の画像データに量子化する処理である。ここでは、例えば判別分析法など既知の二値化手法を用いることができる。図４（ａ）は、領収書のスキャン画像に対する二値化処理の結果を示している。二値化処理によって得られた二値画像データは、制御部１１１内のＲＡＭに格納される。なお、二値化処理は必須ではなく、入力されたスキャン画像における前景部分と背景部分とが区別された画像が取得できればよい。 In S301, the binarization unit 200 performs binarization processing on the input scanned image to generate a binary image to be processed by the inverted area processing unit 210. The binarization processing is a process of quantizing image data consisting of multiple gradation values, for example, 256 gradations (8 bits), into binary image data of "0" and "1". Here, a known binarization method such as discriminant analysis can be used. FIG. 4(a) shows the result of the binarization processing on the scanned image of a receipt. The binary image data obtained by the binarization processing is stored in the RAM in the control unit 111. Note that the binarization processing is not essential, and it is sufficient to obtain an image in which the foreground and background parts of the input scanned image are distinguished.

Ｓ３０２～Ｓ３０５は、Ｓ３０１にて生成された二値画像において反転領域を特定し、特定された反転領域を非反転化する処理に相当する。この一連の処理によって、図４（ｂ）に示すような、反転領域４００内の白抜き文字で表現された文字列“金額”が、通常文字に変換された二値画像が得られることになる。以下、詳しく説明する。 Steps S302 to S305 correspond to the process of identifying the inverted area in the binary image generated in S301 and non-inverting the identified inverted area. This series of processes results in a binary image in which the character string "Amount" expressed in white characters within inverted area 400 as shown in FIG. 4(b) is converted to normal characters. This will be explained in detail below.

まず、Ｓ３０２では、モルフォロジー処理部２１１が、Ｓ３０１にて生成された二値画像に対してモルフォロジー処理を行って、当該二値画像から、およそ反転領域ではないと推認される画像領域の画素を削除する。具体的には、二値画像に対して収縮処理を先ず実行し、続いて、収縮処理後の画像に対して膨張処理を実行し、これを所定回繰り返す。最初に実行される収縮処理は、二値画像内の黒画素で構成される細い線を消失させる効果がある。よって、処理対象画像内の通常文字や罫線などを構成する前景としての黒画素が、縮小処理によって消えることになる。一方、反転領域は、文字や罫線よりも太い線として構成されるところ、収縮処理によって反転文字領域内の背景を構成する黒画素も一部消失することになるが、通常文字の前景となる黒画素よりも面積が大きいので残りやすい。つまり、収縮処理を施すことによって、通常文字や罫線等を構成する黒画素だけを削除し、反転領域を構成する黒画素が残るようにする。そして、収縮処理後の画像に同規模の膨張処理を実行することで、収縮処理では消失しなかった黒画素については、収縮前の状態に復元する。なお、モルフォロジー処理の目的は、処理対象画像内の反転領域らしい画像領域を強調することにある。従って、通常文字や罫線を消すための収縮処理は必須であるが、同じ回数の膨張処理を行うことは必須ではない。上述したモルフォロジー処理を二値画像に対して行うことにより、反転領域の可能性のある黒画素領域だけが残った二値画像が得られる。モルフォロジー処理によって得られた、反転領域の候補となる画像領域が強調された画像を、以下では、「強調画像」と呼ぶこととする。図４（ｃ）は、図４（ａ）に示す二値画像に対するモルフォロジー処理の結果としての強調画像を示している。通常文字や罫線などを構成する黒画素は消失し、“金額”の白抜き文字を含む反転領域４００内の黒画素と領収書下部の横長矩形を構成する黒画素だけが残っていることが分かる。 First, in S302, the morphology processing unit 211 performs morphology processing on the binary image generated in S301 to delete pixels in the image area that is assumed not to be an inverted area from the binary image. Specifically, a contraction process is first performed on the binary image, and then an expansion process is performed on the image after the contraction process, and this is repeated a predetermined number of times. The contraction process performed first has the effect of eliminating thin lines made up of black pixels in the binary image. Therefore, black pixels as the foreground that make up normal characters, lines, etc. in the image to be processed will disappear by the reduction process. On the other hand, since the inverted area is composed of lines thicker than characters and lines, some of the black pixels that make up the background in the inverted character area will also disappear by the contraction process, but they are larger in area than the black pixels that are the foreground of normal characters, so they are more likely to remain. In other words, by performing the contraction process, only the black pixels that make up normal characters, lines, etc. are deleted, and the black pixels that make up the inverted area remain. Then, by performing the same scale of expansion process on the image after the contraction process, the black pixels that were not lost by the contraction process are restored to their state before the contraction. The purpose of the morphology process is to emphasize image regions that are likely to be inverted regions in the image to be processed. Therefore, although contraction processing is necessary to remove normal characters and lines, it is not necessary to perform expansion processing the same number of times. By performing the above-mentioned morphology process on a binary image, a binary image is obtained in which only black pixel regions that may be inverted regions remain. Hereinafter, an image obtained by morphology processing, in which image regions that are candidates for inverted regions are emphasized, will be referred to as an "enhanced image". Figure 4(c) shows an enhanced image as a result of morphology processing on the binary image shown in Figure 4(a). It can be seen that the black pixels that make up the normal characters and lines have disappeared, and only the black pixels in the inverted region 400 including the white characters for "amount" and the black pixels that make up the horizontally long rectangle at the bottom of the receipt remain.

Ｓ３０３では、連結画素塊抽出部２１２が、Ｓ３０２にて得られた強調画像から連結画素塊（以下、“ＣＣ”と表記）を抽出する。ＣＣは、「Connected Component」の略であり、ここでは、強調画像内で連結している黒画素の塊を意味する。強調画像から抽出された黒画素塊のうち同一の黒画素塊を構成する黒画素には同一ラベルが付与され、反転領域の候補となる領域（以下、「候補領域」と呼ぶ。）として特定される。上述の図４（ｃ）に示す強調画像からは、反転領域４００に相当する黒画素塊４０１と最下部の帯状オブジェクトに相当する黒画素塊４０２の計２つの黒画素塊が抽出され、それぞれに異なるラベルが付与されて候補領域として特定されることになる。特定された候補領域の情報は、制御部１１１内のＲＡＭに格納される。 In S303, the connected pixel block extraction unit 212 extracts connected pixel blocks (hereinafter, referred to as "CC") from the enhanced image obtained in S302. CC stands for "Connected Component," and here means a block of black pixels connected in the enhanced image. The black pixels constituting the same black pixel block among the black pixel blocks extracted from the enhanced image are given the same label, and are identified as a region that is a candidate for the inverted region (hereinafter, referred to as a "candidate region"). From the enhanced image shown in FIG. 4(c) described above, a total of two black pixel blocks are extracted: a black pixel block 401 corresponding to the inverted region 400 and a black pixel block 402 corresponding to the strip-shaped object at the bottom. Different labels are given to each of the black pixel blocks, and they are identified as candidate regions. Information on the identified candidate regions is stored in the RAM in the control unit 111.

Ｓ３０４では、反転領域判定部２１３が、Ｓ３０２にて特定された候補領域に対応する部分二値画像を取得し、当該部分二値画像に対して所定の解析処理を行い、得られた解析結果に基づいて、二値画像内の反転領域を決定する。上述の図４（ｃ）に示す強調画像の場合、候補領域としての黒画素塊４０１と４０２のうち、黒画素塊４０１が反転領域として決定されることになる。反転領域判定処理の詳細については後述する。 In S304, the inverted region determination unit 213 acquires a partial binary image corresponding to the candidate region identified in S302, performs a predetermined analysis process on the partial binary image, and determines an inverted region in the binary image based on the obtained analysis result. In the case of the highlighted image shown in FIG. 4(c) described above, of the black pixel blocks 401 and 402 serving as candidate regions, the black pixel block 401 is determined to be the inverted region. Details of the inverted region determination process will be described later.

Ｓ３０５では、画素反転部２１４が、Ｓ３０１にて得られた二値画像のうち、Ｓ３０４にて決定された反転領域に対応する部分の画素を反転させる処理を行う。具体的には、反転領域を構成する各画素に対して、画素値を現在の値とは異なる値に変更（すなわち、“１”は“０”に、“０”は“１”に変更）する処理を行って、白画素であれば黒画素に、白画素であれば黒画素に変更する。この画素反転処理により、２値画像内の反転領域を構成する画素だけを対象として画素属性が変更されることになる。つまり、２値画像内の通常文字や罫線を構成する各画素については、黒画素は黒画素のまま白画素は白画素のまま残り、白抜き文字の部分についてのみ、黒画素は白画素に白画素は黒画素に変更されることになる。こうして、前述の図４（ｂ）に示す、反転領域のみが非反転化された二値画像が得られる。当該処理により、反転領域内の文字列が反転され、いわゆる白抜き文字を含まない二値画像を得ることができる。この二値画像に対してＯＣＲ処理を実施することで、ＯＣＲ精度を向上させることができる。 In S305, the pixel inversion unit 214 performs a process of inverting pixels in the binary image obtained in S301 that correspond to the inversion area determined in S304. Specifically, for each pixel that constitutes the inversion area, a process of changing the pixel value to a value different from the current value (i.e., changing "1" to "0" and "0" to "1") is performed, so that if it is a white pixel, it is changed to a black pixel, and if it is a white pixel, it is changed to a black pixel. This pixel inversion process changes the pixel attribute only for the pixels that constitute the inversion area in the binary image. In other words, for each pixel that constitutes a normal character or ruled line in the binary image, black pixels remain as black pixels and white pixels remain as white pixels, and only for the part of the white-filled character, black pixels are changed to white pixels and white pixels are changed to black pixels. In this way, a binary image in which only the inversion area is not inverted, as shown in FIG. 4B described above, is obtained. This process inverts the character string in the inversion area, and a binary image that does not include so-called white-filled characters can be obtained. By performing OCR processing on this binary image, the OCR accuracy can be improved.

また、本実施形態では、反転領域に決定された画像領域内の全画素を反転させたが、一部画素は反転させずにそのままにしてもよい。例えば、反転領域の外枠部分の画素は反転させないことで、図４（ｄ）に示すように反転領域の外枠（罫線に相当する区切り情報）を残すことができる。この際は、反転領域と決定された領域のうち外縁から所定幅の部分の画素だけを残すようにすればよい。この程度の付加的な処理であれば、処理負荷が大幅に増えることもない。このように罫線に相当する区切り情報を残すことで、ＯＣＲ処理時に、文字認識結果である文字列の区切り情報として使用することが可能になる。 In addition, in this embodiment, all pixels in the image area determined to be an inversion area are inverted, but some pixels may be left as they are without being inverted. For example, by not inverting the pixels on the outer frame of the inversion area, it is possible to leave the outer frame of the inversion area (delimiter information equivalent to the ruled lines) as shown in FIG. 4(d). In this case, it is sufficient to leave only the pixels of a portion of the area determined to be an inversion area that is a specified width from the outer edge. Additional processing of this level does not significantly increase the processing load. By leaving the delimiter information equivalent to the ruled lines in this way, it becomes possible to use it as delimiter information for the character string that is the character recognition result during OCR processing.

（判定文字領域判定処理の詳細）
次に、前述の反転領域判定処理（Ｓ３０４）の詳細について、図５のフローチャートを用いて説明する。 (Details of character area determination process)
Next, the details of the inverted region determination process (S304) described above will be described with reference to the flowchart of FIG.

Ｓ５０１では、Ｓ３０３にて特定されたすべての候補領域の情報とＳ３０１にて得られた二値画像のデータがＲＡＭから読み出される。続くＳ５０２では、Ｓ５０１にて読み出された全候補領域のうち注目する候補領域に対応する部分の二値画像（以下、「部分二値画像」と表記）が生成される。具体的には、Ｓ５０１にて取得した２値画像から、注目候補領域に対応する部分の画像領域が切り出される。 In S501, information on all candidate regions identified in S303 and the binary image data obtained in S301 are read from the RAM. In the following S502, a binary image (hereinafter referred to as a "partial binary image") of a portion of all candidate regions read in S501 that corresponds to a candidate region of interest is generated. Specifically, an image area of a portion that corresponds to the candidate region of interest is cut out from the binary image obtained in S501.

次に、Ｓ５０３では、Ｓ５０２にて得られた部分二値画像における黒画素密度、具体的には、部分二値画像を構成する全画素のうち黒画素が占める割合が算出される。そして、Ｓ５０４では、Ｓ５０２にて得られた部分二値画像における白画素部分の輪郭線が抽出される。ここで抽出される輪郭線は、文字等の構成要素のうち直線の部分であり、例えば１文字あたり数十本の輪郭線が抽出されることになる。 Next, in S503, the black pixel density in the partial binary image obtained in S502 is calculated; specifically, the proportion of black pixels among all pixels constituting the partial binary image. Then, in S504, the contours of the white pixel parts in the partial binary image obtained in S502 are extracted. The contours extracted here are the straight line parts of the components of characters, etc., and for example, several tens of contours are extracted per character.

そして、Ｓ５０５では、Ｓ５０３にて算出された黒画素密度とＳ５０４にて抽出された輪郭線の本数に基づき、注目候補領域が反転領域であるかどうかが判定される。具体的には、黒画素密度および輪郭線数のそれぞれについて予め所定の閾値を規定しておき、いずれについても所定の閾値の範囲内であれば、注目候補領域には文字が含まれているものと見做して反転領域であると判定する。一方、黒画素密度および輪郭線数のいずれか一方あるいは両方が、所定の閾値の範囲を超える場合は、注目候補領域には文字が含まれていないものと見做して反転領域ではないと判定する。この場合において、まず黒画素密度の閾値については、黒画素密度が１００％に近いほど判定文字領域内で文字を形作る白画素の数が少なく、画数が多い複雑な文字であるほど黒画素密度が低くなることを考慮して決定する。つまり、白抜き文字として使用され得る最も複雑な文字等において想定される黒画素密度を基準として例えば75％といった値を設定すればよい。また、輪郭線数の閾値については、１文字当たりの平均的な輪郭線数及び反転領域内の想定される文字数を考慮して設定すればよい。この輪郭線数を判定指標に加えることで、判定精度を向上させることができる。注目候補領域が反転領域であると判定された場合は、Ｓ５０６に進む。一方、注目候補領域が反転領域ではないと判定された場合は、Ｓ５０７に進む。 Then, in S505, it is determined whether the target candidate region is an inverted region based on the black pixel density calculated in S503 and the number of contour lines extracted in S504. Specifically, a predetermined threshold is defined in advance for each of the black pixel density and the number of contour lines, and if both are within the range of the predetermined threshold, the target candidate region is deemed to contain characters and is determined to be an inverted region. On the other hand, if either or both of the black pixel density and the number of contour lines exceed the range of the predetermined threshold, the target candidate region is deemed to contain no characters and is determined to be not an inverted region. In this case, the threshold for the black pixel density is determined by taking into consideration that the closer the black pixel density is to 100%, the fewer the number of white pixels that form characters in the determined character region, and that the more complex the character is, the lower the black pixel density is. In other words, a value such as 75% may be set based on the black pixel density expected for the most complex character that can be used as a white-out character. The threshold for the number of contour lines may be set by taking into consideration the average number of contour lines per character and the expected number of characters in the inverted region. By adding this number of contour lines to the judgment index, the judgment accuracy can be improved. If it is determined that the target candidate region is an inverted region, the process proceeds to S506. On the other hand, if it is determined that the target candidate region is not an inverted region, the process proceeds to S507.

Ｓ５０６では、注目候補領域が反転領域として設定される。具体的には、注目候補領域について反転領域であることを示すフラグ等の情報が生成され、制御部１１１内のＲＡＭに格納される。 In S506, the target candidate region is set as an inverted region. Specifically, information such as a flag indicating that the target candidate region is an inverted region is generated and stored in the RAM in the control unit 111.

Ｓ５０７では、Ｓ５０１で読み出された全候補領域のすべてについて処理が完了したか否かが判定される。未処理の候補領域があればＳ５０２に戻り、次の注目候補領域を選択して処理を続行する。一方、すべての候補領域についての処理が完了していれば本処理を抜ける。 In S507, it is determined whether processing has been completed for all of the candidate areas read out in S501. If there are any unprocessed candidate areas, the process returns to S502, where the next candidate area of interest is selected and processing continues. On the other hand, if processing has been completed for all candidate areas, the process exits.

以上が、判定文字領域判定処理の内容である。なお、Ｓ５０３の黒画素密度の算出処理やＳ５０４の輪郭線の抽出処理においては、二値画像から切り出した部分二値画像を用いていており、モルフォロジー処理を施して得た強調画像を用いていない。その理由は、強調画像は収縮処理や膨張処理の影響で白抜き文字部分の文字線つぶれ等が生じ、黒画素密度や輪郭線数に基づく文字らしさの判定が期待通りに実行できない可能性があるためである。二値画像を使用することで、より高精度での判定が可能となる。また、文字らしさの判定指標は、黒画素密度と輪郭線数に限定されない。例えば、部分二値画像内の白画素を対象として連結画素塊（ＣＣ）を抽出し、抽出した連結画素塊の大きさ（大きいほど文字を形作る白画素の数が多い）や連結画素塊同士の間隔（画数の多い複雑な文字ほど間隔が狭い）に基づいて、文字らしさを判定してもよい。 The above is the content of the character region determination process. Note that in the black pixel density calculation process in S503 and the contour extraction process in S504, a partial binary image cut out from the binary image is used, and an enhanced image obtained by applying morphology processing is not used. The reason is that the emphasized image may cause the character lines of the white character part to be blurred due to the influence of the contraction process or expansion process, and the character-likeness determination based on the black pixel density and the number of contour lines may not be performed as expected. By using a binary image, it is possible to perform a determination with higher accuracy. In addition, the character-likeness determination index is not limited to the black pixel density and the number of contour lines. For example, connected pixel clusters (CC) may be extracted from white pixels in the partial binary image, and the character-likeness may be determined based on the size of the extracted connected pixel clusters (the larger the CC, the more white pixels that form a character) and the spacing between the connected pixel clusters (the more complex the character, the narrower the spacing).

以上説明したように、本実施形態によれば、反転領域を構成しない可能性が高い画素については初期段階で画素反転処理の対象画素から除外されることになる。これにより、反転領域の探索に要する処理コストを低減できる。また、文書画像内から文字や表を構成する罫線などの情報を除外した後の部分二値画像を対象として反転領域か否かの判定を行うため、文字数や表のセル数などに依存度の低い処理を実現することが可能となる。 As described above, according to this embodiment, pixels that are unlikely to form an inverted region are excluded from the target pixels for pixel inversion processing at an early stage. This reduces the processing cost required for searching for an inverted region. In addition, since the determination of whether or not an inverted region is present is performed on a partial binary image after excluding information such as characters and lines that form tables from within the document image, it is possible to realize processing that is less dependent on the number of characters or the number of table cells.

[実施形態２]
実施形態１では、強調画像から候補領域を特定し、当該候補領域に対応する二値画像（部分二値画像）を用いて反転領域を決定している。しかしながら、部分二値化画像の前提となる候補領域が適切でないことがある。例えば、反転領域の縁部に近い位置まで文字を形作る白画素の輪郭部分が迫っている場合、反転領域の外縁と文字を形成する白画素との間は細い黒画素線で構成されることになり、モルフォロジー処理によって、その間部分の黒画素が失われてしまう可能性がある。図６（ａ）は、２つの反転領域６００ａ及び６００ｂを有する見積書のスキャン画像に対して二値化処理を行って得られた二値画像を示し、図６（ｂ）は当該二値画像に対してモルフォロジー処理を行って得られた強調画像を示している。図６（ｂ）に示す強調画像では、反転領域６００ｂが６つの小領域６１１～６１６に分離されている。この強調画像に対して連結画素塊の抽出処理（Ｓ３０３）を適用すると、小領域６１１～６１６に対応する黒画素塊のそれぞれが候補領域として特定されることになる。その結果、期待するような反転領域の判定結果が得られないことになる。そこで、上述の問題に対処可能な態様を、実施形態２として説明する。なお、情報処理システムの基本構成など実施形態１と共通する内容については説明を省略し、以下では差異点である反転領域判定処理について説明することとする。 [Embodiment 2]
In the first embodiment, a candidate region is specified from the enhanced image, and an inverted region is determined using a binary image (partial binary image) corresponding to the candidate region. However, the candidate region that is the premise of the partial binary image may not be appropriate. For example, when the outline of the white pixels forming the character is close to the edge of the inverted region, a thin black pixel line is formed between the outer edge of the inverted region and the white pixels forming the character, and the black pixels in the intervening portion may be lost by morphology processing. FIG. 6(a) shows a binary image obtained by performing a binarization process on a scanned image of an estimate having two inverted regions 600a and 600b, and FIG. 6(b) shows an enhanced image obtained by performing a morphology process on the binary image. In the enhanced image shown in FIG. 6(b), the inverted region 600b is separated into six small regions 611 to 616. When the extraction process (S303) of connected pixel blocks is applied to this enhanced image, each of the black pixel blocks corresponding to the small regions 611 to 616 is identified as a candidate region. As a result, the expected inversion region determination result cannot be obtained. Therefore, an aspect capable of dealing with the above-mentioned problem will be described as embodiment 2. Note that a description of the contents common to embodiment 1, such as the basic configuration of the information processing system, will be omitted, and the following description will focus on the inversion region determination process, which is the difference.

（反転領域判定処理の詳細）
図７は、本実施形態に係る、反転領域判定処理の詳細を示すフローチャートである。以下、図７のフローチャートに沿って説明する。 (Details of the inversion area determination process)
7 is a flowchart showing details of the inversion region determination process according to this embodiment. The following will be described with reference to the flowchart in FIG.

Ｓ７０１は、実施形態１の図５のフローにおけるＳ５０１と同じであり、Ｓ３０３にて生成されたすべての候補領域の情報とＳ３０１にて得られた二値画像データがＲＡＭから読み出される。 S701 is the same as S501 in the flow of FIG. 5 in embodiment 1, and the information of all candidate regions generated in S303 and the binary image data obtained in S301 are read from RAM.

次のＳ７０２では、Ｓ７０１にて読み出したすべての候補領域についての再評価がなされる。本実施形態では、再評価の内容として、読み出された候補領域を、その位置関係に基づき統合する処理を行う。より具体的には、各候補領域について、隣接する他の候補領域との距離が予め規定した一定距離以内であるかを判定し、一定距離以内にある候補領域同士を統合して、より大きな１つの候補領域を生成する。前述の図６（ｂ）の例では、６つの小領域６１１～６１６に対応した６つの候補領域が１つに統合され、新たに破線の枠で示す領域が候補領域として特定されることになる。そして、統合の対象となった候補領域の情報は不要となるので削除（ＲＡＭから消去）される。なお、反転領域６００ａに対応する相対的に大きな領域６１０については、領域間の距離が閾値を満たすような他の候補領域が存在しないので、他の候補領域と統合されることなくそのまま維持される。 In the next step S702, all the candidate areas read out in S701 are re-evaluated. In this embodiment, the re-evaluation involves integrating the read candidate areas based on their positional relationships. More specifically, for each candidate area, it is determined whether the distance between the candidate area and the adjacent candidate areas is within a predetermined distance, and the candidate areas that are within the predetermined distance are integrated to generate a single larger candidate area. In the example of FIG. 6B described above, the six candidate areas corresponding to the six small areas 611 to 616 are integrated into one, and the area indicated by the dashed frame is newly identified as the candidate area. Then, the information on the candidate area that was integrated is deleted (erased from RAM) since it is no longer necessary. Note that, for the relatively large area 610 corresponding to the inverted area 600a, there is no other candidate area whose inter-area distance satisfies the threshold, so it is maintained as it is without being integrated with the other candidate areas.

Ｓ７０３～Ｓ７０８の各処理は、実施形態１の図５のフローにおけるＳ５０２～Ｓ５０７に対応し、特に異なるところはないので説明を省く。 The processes in steps S703 to S708 correspond to steps S502 to S507 in the flow in FIG. 5 of embodiment 1, and as there are no particular differences, a description of them will be omitted.

以上が、本実施形態に係る、反転領域判定処理の内容である。本実施形態の場合、本来は１つの黒画素塊として抽出されるべきものが別々の黒画素塊として抽出されるようなケースでも、候補領域の統合処理を行うことで、反転領域の候補となる１つの黒画素塊（候補領域）として扱うことが可能になる。上述の図６（ｂ）の例では、破線の矩形で示す候補領域が統合処理によって生成され、統合後の候補領域に対してＳ７０３以降の各処理が適用されることになる。その結果、反転領域６００ｂの全体に対応する画像領域が反転領域としてＳ７０７にて設定されるので、Ｓ３０５にて適切に非反転化することができる。図６（ｃ）は、見積書の二値画像に対して本実施形態の処理を適用した結果を示している。反転領域６００ａだけでなく反転領域６００ｂについても適切に非反転化されていることが分かる。 The above is the content of the inverted region determination process according to this embodiment. In this embodiment, even in cases where what should originally be extracted as one black pixel block is extracted as separate black pixel blocks, by performing the candidate region integration process, it is possible to treat them as one black pixel block (candidate region) that is a candidate for the inverted region. In the example of FIG. 6(b) above, the candidate region shown by the dashed rectangle is generated by the integration process, and each process from S703 onwards is applied to the integrated candidate region. As a result, the image region corresponding to the entire inverted region 600b is set as the inverted region in S707, so that it can be appropriately uninverted in S305. FIG. 6(c) shows the result of applying the process of this embodiment to the binary image of the estimate. It can be seen that not only the inverted region 600a but also the inverted region 600b is appropriately uninverted.

＜変形例＞
上述したような再評価を行う場合、候補領域が過剰に統合されてしまうケースがある。図８（ａ）は、４つの反転領域８００ａ～８００ｄを有する見積書のスキャン画像に対して二値化処理を行って得られた二値画像を示している。そして、図８（ｂ）は、当該二値画像に対してモルフォロジー処理を行って得られた強調画像を示している。図８（ｂ）に示す強調画像では、反転領域８００ｂが前述の図６（ｂ）における反転領域６００ｂと同様に６つの小領域８１１～８１６に分離され、強調画像全体では合計で９つの候補領域８１０～８１８が存在している。これら９つの候補領域に対して、上述の再評価における統合処理をそのまま適用すると、候補領域８１７と候補領域８１８との距離が近いために、両候補領域が統合されてしまう。この場合、候補領域８１７と候補領域８１７は互いの距離が近いだけで１つの反転領域を構成している訳ではない。そのため、双方の候補領域を統合して得られた新たな候補領域には、本来は反転領域ではない領域が存在するにも関わらずまとめて非反転化がなされ、反転領域を構成しない画素までが反転処理されてしまうことになる。そこで、統合処理に先立って、統合の可否を判定する態様を変形例として説明する。 <Modification>
When performing the reevaluation as described above, there are cases where the candidate regions are excessively integrated. FIG. 8(a) shows a binary image obtained by performing a binarization process on a scanned image of an estimate having four inverted regions 800a to 800d. FIG. 8(b) shows an enhanced image obtained by performing a morphology process on the binary image. In the enhanced image shown in FIG. 8(b), the inverted region 800b is separated into six small regions 811 to 816, similar to the inverted region 600b in FIG. 6(b) described above, and a total of nine candidate regions 810 to 818 exist in the entire enhanced image. If the integration process in the reevaluation described above is applied to these nine candidate regions as is, the candidate region 817 and the candidate region 818 are close to each other, and therefore the two candidate regions are integrated. In this case, the candidate region 817 and the candidate region 817 are close to each other, but they do not constitute a single inverted region. As a result, in the new candidate area obtained by merging the two candidate areas, even though there are areas that are not actually inverted areas, they are all uninverted, and even pixels that do not constitute the inverted area are inverted. Therefore, a modified example of a mode in which it is determined whether or not to merge prior to the merging process will be described.

図９は、本変形例に係る、候補領域の再評価処理の詳細を示すフローチャートである。以下、図９のフローチャートに沿って説明する。 Figure 9 is a flowchart showing the details of the candidate area reevaluation process according to this modified example. The following will be explained with reference to the flowchart in Figure 9.

Ｓ９０１では、Ｓ７０１にて読み出したすべての候補領域の中から、統合の可能性のある候補領域を特定し、当該特定された候補領域のまとまりを表す情報（以下、「暫定統合情報」と呼ぶ。）が生成される。この暫定統合情報の生成に際しては、前述のＳ７０２で説明した方法を適用する。具体的には、各候補領域のうち、他の候補領域との距離が予め規定した閾値以内という条件を満たす候補領域を特定し、当該特定された候補領域同士を暫定的に１つにまとめた領域（以下、「暫定統合領域」と呼ぶ。）を示す、統合され得る各候補領域を識別可能な情報を生成する。図８（ｃ）は、図８（ｂ）に示す９つの候補領域８１０～８１８のうち、候補領域８１１～８１６を１つにまとめることで得られる暫定統合領域８１９と、候補領域８１７及び８１８を１つにまとめることで得られる暫定統合領域８２０を示している。そして、これら２つの暫定統合領域８１９及び８２０にそれぞれ対応する暫定統合情報が生成されることになる。以降のＳ９０２～Ｓ９０５の各処理は、Ｓ９０１で得られた暫定統合情報毎に実行されることになる。 In S901, among all the candidate areas read out in S701, candidate areas that may be integrated are identified, and information representing a group of the identified candidate areas (hereinafter referred to as "provisional integration information") is generated. When generating this provisional integration information, the method described in S702 above is applied. Specifically, among each candidate area, a candidate area that satisfies the condition that the distance from other candidate areas is within a predetermined threshold is identified, and information that can identify each candidate area that may be integrated is generated, which indicates an area in which the identified candidate areas are provisionally integrated into one (hereinafter referred to as "provisional integration area"). Figure 8 (c) shows a provisional integration area 819 obtained by combining candidate areas 811 to 816 into one out of the nine candidate areas 810 to 818 shown in Figure 8 (b), and a provisional integration area 820 obtained by combining candidate areas 817 and 818 into one. Then, provisional integration information corresponding to these two provisional integration areas 819 and 820, respectively, is generated. The subsequent steps S902 to S905 are executed for each piece of provisional integrated information obtained in S901.

まず、Ｓ９０２では、Ｓ９０１にて生成された暫定統合情報それぞれについて、統合の可否を判定するための評価対象となる領域（以下、「評価領域」と呼ぶ。）が特定される。具体的には、暫定統合情報に含まれる複数の候補領域の情報を用いて、暫定統合領域から当該複数の候補領域を除外した部分が評価領域として特定される。そして、続くＳ９０３では、Ｓ９０２にて特定された評価領域とＳ７０１にて読み出した二値画像とに基づき、二値画像内の当該評価領域に相当する画像領域における黒画素密度が算出される。 First, in S902, for each piece of provisional integration information generated in S901, an area to be evaluated to determine whether integration is possible (hereinafter referred to as an "evaluation area") is identified. Specifically, using information on multiple candidate areas included in the provisional integration information, a portion of the provisional integration area excluding the multiple candidate areas is identified as the evaluation area. Then, in the following S903, based on the evaluation area identified in S902 and the binary image read out in S701, the black pixel density in the image area in the binary image corresponding to the evaluation area is calculated.

次に、Ｓ９０４では、Ｓ９０３にて算出された黒画素密度に基づき、各暫定統合領域についての統合可否の判定処理が実行される。具体的には、算出された黒画素密度が、予め規定した一定レベル以上である場合は統合可と判定し、一定レベル未満である場合は統合不可と判定する。この際の一定レベルに対応する閾値は、前述のＳ５０５における閾値よりも低い値でよい。一定量の黒画素の存在が確認できれば、その評価領域が本来は黒画素の領域（反転領域における“地”の領域）と推定できるためである。統合可否の判定に黒画素密度を用いることで、統合によって新たな候補領域として設定される部分が、二値画像において反転領域の候補として妥当であるかを判定することができる。例えば、暫定統合領域８２０の場合は、図８（ａ）に示す二値画像における評価領域に対応する部分８２１には黒画素が存在しない（黒画素密度は著しく低い）。これに対し、暫定統合領域８１９の場合は、図８（ａ）に示す二値画像における評価領域に対応する部分の黒画素の密度は高い。したがって、暫定統合領域８１９については統合可と判定され、暫定統合領域８２０については統合不可と判定されることになる。 Next, in S904, a process for determining whether or not each provisionally integrated region can be integrated is performed based on the black pixel density calculated in S903. Specifically, if the calculated black pixel density is equal to or greater than a certain level defined in advance, it is determined that integration is possible, and if it is less than the certain level, it is determined that integration is not possible. The threshold value corresponding to the certain level in this case may be a value lower than the threshold value in S505 described above. This is because if the presence of a certain amount of black pixels can be confirmed, it can be estimated that the evaluation region is originally an area of black pixels (the "ground" area in the inverted area). By using the black pixel density to determine whether or not integration is possible, it is possible to determine whether the part set as a new candidate region by integration is appropriate as a candidate for the inverted area in the binary image. For example, in the case of the provisionally integrated region 820, there are no black pixels in the part 821 corresponding to the evaluation region in the binary image shown in FIG. 8(a) (the black pixel density is extremely low). In contrast, in the case of the provisionally integrated region 819, the density of black pixels in the part corresponding to the evaluation region in the binary image shown in FIG. 8(a) is high. Therefore, the provisional merged area 819 is determined to be mergeable, and the provisional merged area 820 is determined to be unmerged.

以上が、本変形例に係る、候補領域の再評価処理の内容である。図８（ｄ）は、本変形例の処理を適用した結果を示している。図８（ａ）の二値画像における反転領域８００ｃ及び８００ｄが、それぞれ別個の反転領域として処理され、適切に非反転化されていることが分かる。このように、候補領域の再評価において、統合を行う前にその可否を判定することで、不要な統合がされるのを回避することができ、結果として反転領域ではない画素部分が非反転化してしまうのを抑制できる。 The above is the content of the candidate area re-evaluation process according to this modified example. Figure 8(d) shows the result of applying the process of this modified example. It can be seen that the inverted areas 800c and 800d in the binary image of Figure 8(a) are each processed as separate inverted areas and are appropriately uninverted. In this way, by determining whether or not merging is possible before performing it in the re-evaluation of the candidate areas, unnecessary merging can be avoided, and as a result, pixel parts that are not inverted areas can be prevented from being uninverted.

以上説明したように、本実施形態によれば、特定された候補領域を再評価することで、より精度良く文書画像内の反転領域を特定することが可能となる。 As described above, according to this embodiment, by reevaluating the identified candidate areas, it is possible to identify inverted areas within a document image with greater accuracy.

［実施形態３］
実施形態１及び２では、強調画像のサイズを、入力スキャン画像の二値画像と同一サイズとしていた。次に、縮小した二値画像を用いて強調画像を生成する態様を、実施形態３として説明する。なお、以下では実施形態１及び２との差異点、すなわち、強調画像の生成とそれに続く候補領域の特定について説明を行うこととする。 [Embodiment 3]
In the first and second embodiments, the size of the enhanced image is the same as the size of the binary image of the input scan image. Next, an aspect in which an enhanced image is generated using a reduced binary image will be described as the third embodiment. Note that the following describes the differences from the first and second embodiments, i.e., the generation of the enhanced image and the subsequent identification of the candidate region.

まず、入力スキャン画像から二値画像を生成すると（Ｓ３０１）、強調画像の生成に移る前に、Ｓ３０１にて生成された二値画像を縮小する処理を行う。次に、縮小後の二値画像に対して縮小・膨張処理を行って強調画像を生成する（Ｓ３０２）。そして、当該強調画像に対して黒画素塊を抽出する処理を行った後、縮小処理に用いた変倍率に基づき、抽出した黒画素塊を縮小前のサイズに変換して、候補領域を特定する（Ｓ３０３）。この段階で、候補領域のサイズは、縮小処理を行わなかった場合と変わらないので、Ｓ３０４以降の処理については同様に適用することが可能となる。 First, a binary image is generated from the input scanned image (S301), and before moving on to generating an enhanced image, the binary image generated in S301 is reduced. Next, a reduction and expansion process is performed on the reduced binary image to generate an enhanced image (S302). Then, a process is performed to extract black pixel blocks from the enhanced image, and the extracted black pixel blocks are converted to their pre-reduction size based on the scaling factor used in the reduction process to identify candidate areas (S303). At this stage, the size of the candidate areas remains the same as if the reduction process had not been performed, so the processes from S304 onwards can be applied in the same way.

上記のように縮小した二値画像を用いることで、モルフォロジー処理及び連結画素塊の抽出処理の負荷を軽減することができ、また、画像サイズが小さくなることで一時記憶の際に必要なメモリ容量も低減することができる。 By using a binary image reduced in size as described above, the load of morphological processing and the process of extracting connected pixel blocks can be reduced, and the smaller image size also reduces the memory capacity required for temporary storage.

ただし、縮小した二値画像を用いる際には、間隔の狭い反転領域同士が結合されてしまうケースがある、という点に留意が必要となる。図１０（ａ）は、あるスキャン画像の反転領域を含む表部分に対して二値化処理をして得られた部分二値画像を示している。図１０（ａ）の部分二値画像には、４つの反転領域１００１～１００４が含まれている。図１０（ｂ）は、図１０（ａ）の部分二値画像に対して予め規定した変倍率にて縮小処理を施して得られた縮小二値画像の模式図である。図１０（ａ）に示す部分二値画像における４つの反転領域１００１～１００４が縮小処理によって結合され、１つの黒画素塊１００５になってしまっている。このように、縮小処理を行うことで複数の反転領域間の境界が消失してしまう可能性がある。そして、このように意図せず結合されてしまった反転領域を対象に画素反転処理（Ｓ３０５）が適用されると、本来は反転領域ではない部分の画素までが反転してしまうことになる。 However, when using a reduced binary image, it is necessary to keep in mind that there are cases where inverted areas with a small gap between them are combined. FIG. 10(a) shows a partial binary image obtained by performing a binarization process on a table portion including an inverted area of a certain scanned image. The partial binary image in FIG. 10(a) contains four inverted areas 1001-1004. FIG. 10(b) is a schematic diagram of a reduced binary image obtained by performing a reduction process on the partial binary image in FIG. 10(a) at a predetermined magnification. The four inverted areas 1001-1004 in the partial binary image shown in FIG. 10(a) are combined by the reduction process to become one black pixel block 1005. In this way, the reduction process may cause the boundaries between multiple inverted areas to disappear. When the pixel inversion process (S305) is applied to the inverted areas that have been unintentionally combined in this way, even pixels that are not actually in the inverted areas will be inverted.

上記のような問題が生じないようにするには、実施形態２で説明した候補領域の再評価（Ｓ７０２）において、統合処理に先立って、各候補領域に対して縮小前の二値画像を用いた領域分割処理を行えばよい。具体的には、縮小前の二値画像から候補領域に対応する部分画像を切り出し、当該切り出した部分画像に対してｘ軸およびｙ軸方向の射影をとる。次に、生成された射影を解析し、予め規定した閾値を下回る座標位置群を特定する。そして、特定された座標位置群からなる領域を反転領域の候補外の領域（以下、「除外領域」と呼ぶ。）に決定し、決定した除外領域を含まないように、候補領域を分割する。ここで、具体例を用いて説明する。図１１は、図１０（ａ）に示す部分二値画像に対してｘ軸方向に射影をとった結果を示している。この場合、閾値ｔｈを下回る３つの領域（ｘ１～ｘ２間、ｘ３～ｘ４間、ｘ５～ｘ６間）が除外領域と決定される。そして、図１０（ｂ）の黒画素塊１００５に対応する候補領域は、図１０（ｃ）に示すように、上記３つの除外領域を境に４つの領域１００６～１００９に分割される。そして、これら４つの領域１００６～１００９それぞれを、新たに候補領域として特定する。後は、実施形態２の変形例で説明したように、分割後の各候補領域を含めたすべての候補領域を対象として暫定統合情報を生成し、暫定統合情報毎に統合の可否を判定すればよい。この際、分割によって得られた候補領域が、統合されてしまうことはない。前述のとおり、統合の可否は、二値画像において統合可能性のある候補領域間の黒画素密度に基づき判定するが、射影による分割で得られた各候補領域は黒画素密度が通常は低くなることから、統合不可と判定されるためである。 To avoid the above problem, in the reevaluation of the candidate regions (S702) described in the second embodiment, prior to the integration process, a region division process using the binary image before reduction is performed on each candidate region. Specifically, a partial image corresponding to the candidate region is cut out from the binary image before reduction, and the cut-out partial image is projected in the x-axis and y-axis directions. Next, the generated projection is analyzed to identify a group of coordinate positions below a predefined threshold. Then, the region consisting of the identified group of coordinate positions is determined as a region outside the candidates for the inverted region (hereinafter referred to as an "exclusion region"). The candidate region is divided so as not to include the determined exclusion region. Here, a specific example is used for explanation. FIG. 11 shows the result of projecting the partial binary image shown in FIG. 10(a) in the x-axis direction. In this case, three regions below the threshold th (between x1 and x2, between x3 and x4, and between x5 and x6) are determined as exclusion regions. Then, as shown in FIG. 10(c), the candidate area corresponding to the black pixel block 1005 in FIG. 10(b) is divided into four areas 1006-1009 with the above three exclusion areas as boundaries. Then, each of these four areas 1006-1009 is identified as a new candidate area. After that, as described in the modified example of the second embodiment, provisional integration information is generated for all candidate areas including each candidate area after division, and the possibility of integration is determined for each provisional integration information. At this time, the candidate areas obtained by division are not integrated. As described above, the possibility of integration is determined based on the black pixel density between candidate areas that can be integrated in a binary image, but each candidate area obtained by division by projection usually has a low black pixel density, so it is determined that integration is not possible.

以上の通り、縮小した二値画像を用いることで、モルフォロジー処理及び連結画素塊の抽出処理の負荷を軽減することができ、また、一時記憶装置の容量も低減することができる。また、候補領域に対して領域分割処理を追加的に行うことで、縮小処理によって間隔の狭い反転領域同士が結合されてしまうようなケースにも対処でき、処理負荷を軽減と精度の維持との両立も可能となる。 As described above, by using a reduced binary image, the load of morphology processing and connected pixel block extraction processing can be reduced, and the capacity of temporary storage devices can also be reduced. In addition, by additionally performing region segmentation processing on the candidate regions, it is possible to deal with cases where narrowly spaced inverted regions are combined due to the reduction processing, making it possible to reduce the processing load while maintaining accuracy.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Other Embodiments
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

Claims

a processing means for performing morphological processing including at least a contraction process for removing normal characters from a document image including an inverted region including characters in which the background and foreground are inverted, and normal characters in a region other than the inverted region, to generate an enhanced image in which connected pixel blocks that may constitute the inverted region remain;
an analysis means for analyzing a partial image of the document image corresponding to a candidate region for the inverted region identified based on information of the connected pixel blocks remaining in the highlighted image;
a specifying means for specifying the inverted area in the document image, the inverted area including the character in a state where the background and the foreground are inverted , based on the analysis result by the analyzing means ;
Equipped with
the specifying means, when there are a plurality of candidate regions, integrates the candidate regions based on their positional relationships, and specifies the inverted region based on an analysis result of a partial image of the document image corresponding to the integrated candidate region;
13. An image processing device comprising:

The image processing device according to claim 1, characterized in that the processing means performs an expansion process on the result of the contraction process as the morphological process to generate the enhanced image.

The image processing device according to claim 1 or 2, characterized in that the document image is a binary image obtained by binarizing a scanned image of a document.

the analysis result includes a black pixel density in the partial image and a number of contours of white pixel portions in the partial image;
4. The image processing device according to claim 1, wherein the first and second inputs are input to the image processing apparatus.

The analysis result is a size of a pixel cluster formed by connecting white pixels in the partial image or a distance between the pixel clusters.
4. The image processing device according to claim 1, wherein the first and second inputs are input to the image processing apparatus.

6. The image processing device according to claim 1, wherein the integration is a process of combining candidate regions into one when the distance between each candidate region and an adjacent candidate region is within a certain distance.

The image processing device according to claim 6, characterized in that the identification means does not perform the integration if the black pixel density in the image area between the candidate areas in the document image corresponding to the distance is less than a certain level, even if the distance is within a certain distance.

The document image is a reduced binary image obtained by performing a reduction process at a predetermined magnification on a binary image obtained by binarizing a scanned image of a document,
The image processing device according to any one of claims 1 to 7, characterized in that the identification means performs area segmentation processing using the binary image before the reduction processing for each of the candidate areas, and performs the integration on the candidate areas obtained by performing the area segmentation processing.

an inversion means for inverting each pixel constituting the inversion region identified in the document image;
a character recognition means for performing character recognition processing on an image obtained by inverting each pixel constituting the inverted region;
The image processing device according to claim 1 , further comprising:

a processing means for performing morphology processing, including at least a contraction process for removing normal characters, on a document image including an inverted region including characters in which the background and foreground are inverted, and normal characters in an area other than the inverted region, to generate an enhanced image in which connected pixel blocks that may constitute the inverted region remain;
an analysis means for analyzing a partial image of the document image corresponding to a candidate region for the inverted region identified based on information of the connected pixel blocks remaining in the highlighted image;
a specifying means for specifying the inverted area in the document image, the inverted area including the character in a state where the background and the foreground are inverted, based on the analysis result by the analyzing means;
Equipped with
The analysis result includes at least the number of contours of white pixel portions in the partial image.
13. An image processing device comprising:

a generating step of performing morphology processing including at least a contraction process for removing normal characters from a document image including an inverted region including characters in which the background and foreground are inverted, and an ordinary character in an area other than the inverted region, to generate an enhanced image in which connected pixel blocks that may constitute the inverted region remain;
an analysis step of analyzing a partial image of the document image corresponding to a candidate region for the inverted region identified based on information of the connected pixel blocks remaining in the highlighted image;
a specifying step of specifying the inverted area in the document image, the inverted area including the character in a state where the background and the foreground are inverted , based on the analysis result of the analyzing step;
Including ,
In the identifying step, when there are a plurality of candidate regions, the candidate regions are integrated based on their positional relationships, and the inverted region is identified based on an analysis result of a partial image of the document image corresponding to the integrated candidate region.
23. An image processing method comprising:

a generating step of performing morphology processing including at least a contraction process for removing normal characters from a document image including an inverted region including characters in which the background and foreground are inverted, and an ordinary character in an area other than the inverted region, to generate an enhanced image in which connected pixel blocks that may constitute the inverted region remain;
an analysis step of analyzing a partial image of the document image corresponding to a candidate region for the inverted region identified based on information of the connected pixel blocks remaining in the highlighted image;
a specifying step of specifying the inverted area in the document image, the inverted area including the character in a state where the background and the foreground are inverted, based on the analysis result of the analyzing step;
Including,
The analysis result includes at least the number of contours of white pixel portions in the partial image.
13. An image processing method comprising:

A program for causing a computer to function as the image processing device according to any one of claims 1 to 10 .