JP7570843B2

JP7570843B2 - IMAGE PROCESSING APPARATUS, IMAGE FORMING SYSTEM, IMAGE PROCESSING METHOD, AND PROGRAM

Info

Publication number: JP7570843B2
Application number: JP2020132464A
Authority: JP
Inventors: 泰志富久
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2024-10-22
Anticipated expiration: 2040-08-04
Also published as: JP2022029228A

Description

本開示は、読取画像から所定の項目の文字列を抽出する技術に関する。 This disclosure relates to a technique for extracting character strings of specified items from a scanned image.

原稿をスキャンした結果得られた読取画像にＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）処理を行うことで、読取画像内の文字列をデジタルデータとして抽出する方法がある。また読取画像が示す帳票の種別を検出し、その帳票の種別に応じた所定の項目の位置を特定することで、ＯＣＲ処理をした結果から所定の項目の文字列を抽出する方法がある。 There is a method of extracting character strings in a scanned image as digital data by performing OCR (Optical Character Recognition) processing on the scanned image obtained by scanning a document. There is also a method of extracting character strings in a specified item from the results of OCR processing by detecting the type of document indicated by the scanned image and identifying the position of a specified item according to the document type.

特許文献１には、帳票の罫線の特徴の情報をデータベースに登録しておき、入力された画像の罫線の特徴が一致する帳票をデータベースから特定する方法が記載されている。 Patent document 1 describes a method for registering information about the characteristics of ruled lines on forms in a database, and then identifying forms from the database that have matching line characteristics in an input image.

特開２００９－２３０４９８号公報JP 2009-230498 A

例えば、図３のような手書きで記載するための記載領域が複数あるフォームの場合、記載領域３０１および記載領域３０２のどちらかの記載領域に手書きで文字が記載された帳票が、ＯＣＲ処理の対象となる。このような帳票では、「電話番号」、「メールアドレス」のような同一項目の記載箇所がそれぞれの記載領域に含まれることがある。このため、読取画像から所定の項目の文字列を抽出しようとする場合、どちらの記載領域の項目から文字列を抽出すべきか判断できないため、所望の文字列を抽出できない虞がある。 For example, in the case of a form with multiple writing areas for handwriting as shown in FIG. 3, a form with handwritten characters written in either writing area 301 or writing area 302 is the subject of OCR processing. In such forms, the same item such as "phone number" and "email address" may be written in each writing area. For this reason, when attempting to extract a character string for a specific item from the scanned image, it is not possible to determine from which writing area the character string should be extracted, and there is a risk that the desired character string will not be extracted.

本開示の技術に関わる画像処理装置は、帳票を読み取ることによって得られた読取画像を取得する画像取得手段と、前記読取画像に含まれる手書き文字の画像と、手書き文字以外の領域の画像と、を前記読取画像から分離する分離手段と、前記分離によって得られた夫々の画像内の文字列領域を検出して、前記読取画像に含まれる手書き文字の文字列領域と、前記読取画像に含まれる印刷文字の文字列領域と、をそれぞれ検出する検出手段と、
手書き文字を記載するための領域が複数あり、前記複数の領域のうちの何れかの領域に手書き文字が記載された帳票の前記領域ごとの手書き文字の文字列領域の第１位置情報と、印刷文字の文字列領域の第２位置情報と、を含むフォームデータを取得するデータ取得手段と、前記フォームデータに含まれる位置情報に基づく前記帳票の文字列領域の画素情報と、前記読取画像の文字列領域の画素情報との一致度が閾値を超えた場合に、前記帳票の文字列領域と前記読取画像の文字列領域との配置が類似していると特定する特定手段と、前記フォームデータに含まれる前記第２位置情報に基づく前記帳票の印刷文字の文字列領域と、前記読取画像から検出された印刷文字の文字列領域と、が類似していると特定された場合に、当該フォームデータに含まれる前記第１位置情報に基づく前記帳票の手書き文字の文字列領域と類似しているとさらに特定された前記読取画像から検出された手書き文字の文字列領域から、前記フォームデータの前記手書き文字の前記文字列領域に対応付けられている所定の項目の文字列を抽出する抽出手段と、を有することを特徴とする。
The image processing device according to the technology disclosed herein includes an image acquisition unit that acquires a read image obtained by reading a document, a separation unit that separates an image of handwritten characters and an image of an area other than the handwritten characters from the read image, and a detection unit that detects character string areas in each image obtained by the separation, and detects character string areas of handwritten characters and character string areas of printed characters included in the read image, respectively .
the form has a plurality of areas for writing handwritten characters, and the form has handwritten characters written in any of the plurality of areas, the form having the first position information of a character string area of handwritten characters and the second position information of a character string area of printed characters ; an identification means for identifying, when a degree of coincidence between pixel information of the character string area of the form based on the position information included in the form data and pixel information of the character string area of the read image exceeds a threshold value, that the character string area of the form and the character string area of the read image are similar in arrangement ; and an extraction means for extracting, when it is identified that the character string area of printed characters of the form based on the second position information included in the form data and the character string area of printed characters detected from the read image are similar, a character string of a predetermined item associated with the character string area of handwritten characters of the form data from the character string area of handwritten characters detected from the read image that is further identified as being similar to the character string area of handwritten characters of the form based on the first position information included in the form data.

本開示の技術によれば、手書きで文字を記載するための複数の領域を持つような帳票であって、何れかの領域に手書き文字が記載された帳票を読み取ることによって得られた読取画像から当該手書きで記載された文字列を抽出することができる。 The technology disclosed herein makes it possible to extract a character string written by hand from a scanned image obtained by scanning a document having multiple areas for writing characters by hand in any of the areas.

画像形成システムの構成を示すブロック図。FIG. 1 is a block diagram showing a configuration of an image forming system. 画像形成装置の機能構成を示す図。FIG. 2 is a diagram showing the functional configuration of the image forming apparatus. 未記入の帳票フォームの一例を示す図。FIG. 13 is a diagram showing an example of a blank form. フォームデータを示す図。A diagram showing form data. メタデータ項目の文字情報抽出処理を示すフローチャート。11 is a flowchart showing a process of extracting character information from metadata items. 読取画像データと処理結果の一例を示す図。5A and 5B are diagrams showing an example of read image data and a processing result. 読取画像データと処理結果の一例を示す図。5A and 5B are diagrams showing an example of read image data and a processing result. 内容確認画面の一例を示す図。FIG. 13 is a diagram showing an example of a content confirmation screen. 内容確認画面の一例を示す図。FIG. 13 is a diagram showing an example of a content confirmation screen. 修正確認画面の一例を示す図。FIG. 13 is a diagram showing an example of a correction confirmation screen. フォームデータを示す図。A diagram showing form data. メタデータ項目の文字情報抽出処理を示すフローチャート。11 is a flowchart showing a process of extracting character information from metadata items.

以下、図面を参照して本開示の技術の実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る本開示の技術を限定するものでなく、また以下の実施形態で説明されている特徴の組み合わせの全てが本開示の技術の解決手段に必須のものとは限らない。また、同一の構成要素には同一の参照番号を付して、説明を省略する。 Below, the embodiments of the technology disclosed herein are described in detail with reference to the drawings. Note that the following embodiments do not limit the technology disclosed herein according to the claims, and not all of the combinations of features described in the following embodiments are necessarily essential to the solutions of the technology disclosed herein. In addition, the same components are given the same reference numbers, and descriptions are omitted.

＜実施形態１＞
［画像形成システムの概要］
図１は、画像形成システムの構成を示す図である。図１に示すように、画像形成システムは、画像形成装置１００とホストコンピュータ１７０とサーバ１９１（クラウドサーバーであってもよい）とを含む。本実施形態では、画像形成装置１００として、印刷機能、読取機能、ＦＡＸ機能等、複数の機能が一体化された複合機（ＭＦＰ：ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｒｉｎｔｅｒ）が用いられるものとして説明する。また、サーバ１９１は、文書管理機能を有するものとして説明する。画像形成装置１００とホストコンピュータ１７０とサーバ１９１とは、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）等のネットワーク１９０を相互に通信可能に接続されている。画像形成装置とホストコンピュータとサーバは複数台接続されていてもよく、他の装置が接続されていてもよい。また、ネットワーク１９０は、有線ネットワークや無線ネットワーク、または、それらが組み合わされた構成の場合がある。 <Embodiment 1>
[Overview of Image Forming System]
FIG. 1 is a diagram showing the configuration of an image forming system. As shown in FIG. 1, the image forming system includes an image forming apparatus 100, a host computer 170, and a server 191 (which may be a cloud server). In this embodiment, the image forming apparatus 100 will be described as being a multifunction printer (MFP) that integrates multiple functions such as a print function, a read function, and a FAX function. The server 191 will be described as having a document management function. The image forming apparatus 100, the host computer 170, and the server 191 are connected to a network 190 such as a LAN (Local Area Network) so as to be able to communicate with each other. A plurality of image forming apparatuses, host computers, and servers may be connected, or other devices may be connected. The network 190 may be a wired network, a wireless network, or a combination thereof.

画像形成装置１００は、制御装置１１０、リーダ装置１２０、プリンタ装置１３０、操作部１４０、記憶装置１５０を含む。制御装置１１０は、画像形成装置１００を統括的に制御する制御基板（コントローラ）である。制御装置１１０は、ＣＰＵ１１１、ＲＯＭ１１２、ＲＡＭ１１３、画像処理部１１４を含む。ＣＰＵ１１１は、システムバス（不図示）を介して、制御装置１１０内の各ブロックを制御する。例えば、ＣＰＵ１１１は、ＲＯＭ１１２やＲＡＭ１１３、記憶装置１５０、又は、他の記憶媒体に記憶されたプログラムを読み出して実行することにより、画像形成装置１００の機能を実行する。ＲＯＭ１１２は、例えば、制御プログラムや、画像形成装置１００の機能を実行する上で必要なテーブルや設定データ等を記憶する。ＲＡＭ１１３は、例えば、ＣＰＵ１１１のワークメモリとして用いられる。 The image forming apparatus 100 includes a control device 110, a reader device 120, a printer device 130, an operation unit 140, and a storage device 150. The control device 110 is a control board (controller) that controls the image forming apparatus 100 as a whole. The control device 110 includes a CPU 111, a ROM 112, a RAM 113, and an image processing unit 114. The CPU 111 controls each block in the control device 110 via a system bus (not shown). For example, the CPU 111 executes the functions of the image forming apparatus 100 by reading and executing a program stored in the ROM 112, the RAM 113, the storage device 150, or another storage medium. The ROM 112 stores, for example, a control program, and tables and setting data necessary for executing the functions of the image forming apparatus 100. The RAM 113 is used, for example, as a work memory for the CPU 111.

画像処理部１１４は、リーダ装置１２０が原稿を読み取ることによって生成された読取画像データや、外部から受信した画像データに対して、変換、補正、編集、圧縮／解凍など、種々の画像処理を実行する。画像処理部１１４は、ハードウェアで構成される場合があれば、ソフトウェアで実現される場合もある。このように本実施形態の画像形成装置１００は画像処理装置としても機能する。 The image processing unit 114 performs various image processing such as conversion, correction, editing, compression/decompression, etc. on the scanned image data generated by the reader device 120 reading a document and on image data received from the outside. The image processing unit 114 may be configured as hardware or may be realized as software. In this way, the image forming device 100 of this embodiment also functions as an image processing device.

リーダ装置１２０は、スキャナエンジンの構成を有し、不図示の原稿台にセットされた又は不図示の自動原稿給送装置（ＡＤＦ：ＡｕｔｏｍａｔｉｃＤｏｃｕｍｅｎｔＦｅｅｄｅｒ）から給送された原稿を光学的に読み取り、読取画像データを生成する。プリンタ装置１３０は、インクジェット記録方式や電子写真方式等、各種の記録方式に対応したプリンタエンジンの構成を有し、記録媒体上に画像を形成する。 The reader device 120 has a scanner engine configuration, and optically reads an original document set on a document table (not shown) or fed from an automatic document feeder (ADF) (not shown), and generates read image data. The printer device 130 has a printer engine configuration compatible with various recording methods such as inkjet recording method and electrophotographic method, and forms an image on a recording medium.

記憶装置１５０は、例えば、画像データ、モードおよびライセンスなどの機器情報、アドレス帳やカスタマイズなどのユーザ情報を記憶する。操作部１４０は、ユーザの操作を受付けるための操作キー、及び、各種設定やユーザインターフェース画面の表示などを行う液晶パネルを備え、受け付けたユーザ操作等の情報を制御装置１１０へ出力する。画像形成装置１００は、図１に示す構成に限られず、画像形成装置１００の実行可能な機能に応じて他の構成を含んでもよい。例えば、ＦＡＸ機能の実行に必要な構成や、近距離無線通信を可能とする構成を含む場合もある。 The storage device 150 stores, for example, image data, device information such as modes and licenses, and user information such as an address book and customization. The operation unit 140 has operation keys for accepting user operations and an LCD panel for displaying various settings and user interface screens, and outputs information such as accepted user operations to the control device 110. The image forming device 100 is not limited to the configuration shown in FIG. 1, and may include other configurations depending on the functions that the image forming device 100 can execute. For example, it may include a configuration necessary for executing a FAX function or a configuration that enables short-range wireless communication.

サーバ１９１は、制御装置１９８、操作部１９５、記憶装置１９６、表示部１９７を含む。制御装置１９８は、サーバ１９１を統括的に制御する制御基板（コントローラ）である。制御装置１９８は、ＣＰＵ１９２、ＲＯＭ１９３、ＲＡＭ１９４を含む。ＣＰＵ１９２は、システムバス（不図示）を介して、制御装置１９８内の各ブロックを制御する。例えば、ＣＰＵ１９２は、ＲＯＭ１９３、ＲＡＭ１９４、記憶装置１９６、又は他の記憶媒体に記憶されたプログラムを読み出して実行することにより、サーバ１９１の機能を実行する。ＲＯＭ１９３は、例えば、オペレーティングシステムプログラム（ＯＳ）等の各種制御プログラムや、サーバ１９１の機能を実行する上で必要なテーブルや設定データ等を記憶する。ＲＡＭ１９４は、例えば、ＣＰＵ１９２のワークメモリとして用いられる。記憶装置１９６は、例えば、各種アプリケーションプログラムや、データ、ユーザ情報、機器情報などを記憶する。操作部１９５は、ユーザの操作を受付けるためのキーボード、ポインティングデバイス等を備え、受け付けたユーザ操作等の情報を制御装置１９８へ出力する。表示部１９７は、例えば液晶ディスプレイであり、各種ユーザインターフェース画面や情報の表示を行う。 The server 191 includes a control device 198, an operation unit 195, a storage device 196, and a display unit 197. The control device 198 is a control board (controller) that controls the server 191 in an integrated manner. The control device 198 includes a CPU 192, a ROM 193, and a RAM 194. The CPU 192 controls each block in the control device 198 via a system bus (not shown). For example, the CPU 192 executes the functions of the server 191 by reading and executing a program stored in the ROM 193, the RAM 194, the storage device 196, or another storage medium. The ROM 193 stores, for example, various control programs such as an operating system program (OS), and tables and setting data necessary for executing the functions of the server 191. The RAM 194 is used, for example, as a work memory for the CPU 192. The storage device 196 stores, for example, various application programs, data, user information, device information, and the like. The operation unit 195 includes a keyboard, a pointing device, and the like for accepting user operations, and outputs information on accepted user operations, etc., to the control device 198. The display unit 197 is, for example, a liquid crystal display, and displays various user interface screens and information.

［画像形成装置の機能構成］
図２は画像形成装置１００のメタデータ項目の抽出処理に関わる機能構成を説明するための図である。本実施形態の画像形成装置１００は、画像データ取得部２０１、文字列領域検出部２０２、文字認識部２０３、手書き文字分離部２０４、文字情報抽出部２０５、および表示制御部２０６を有する。 [Functional Configuration of Image Forming Apparatus]
2 is a diagram for explaining a functional configuration related to the extraction process of metadata items of the image forming apparatus 100. The image forming apparatus 100 of this embodiment has an image data acquisition unit 201, a character string area detection unit 202, a character recognition unit 203, a handwritten character separation unit 204, a character information extraction unit 205, and a display control unit 206.

画像データ取得部２０１は、リーダ装置１２０が原稿を読み取ることによって生成された読取画像データを取得する画像取得部として機能する。文字列領域検出部２０２は、読取画像内に存在する文字列領域を検出する処理を行う。文字列領域の検出方法は、例えば、ある閾値で２値化を行った画像から文字と推測される矩形領域を抽出する方法等、既知の方法を適用すればよい。 The image data acquisition unit 201 functions as an image acquisition unit that acquires read image data generated by the reader device 120 reading a document. The character string area detection unit 202 performs processing to detect character string areas present in the read image. The character string area detection method may be a known method, such as a method of extracting a rectangular area that is assumed to be a character from an image that has been binarized using a certain threshold value.

文字認識部２０３は、文字列領域に対して光学文字認識処理（ＯＣＲ処理）を行う。本実施形態の文字認識部２０３は、文字列領域の分類に応じた適切な方法でＯＣＲ処理を行うことができる。例えば、手書き文字は印刷文字と比べて濃度が低く、書いた人毎の字形の差が大きい。このため手書き文字をＯＣＲ処理する場合、文字認識部２０３は、印刷文字をＯＣＲ処理する場とは異なる方法で、ＯＣＲ処理を行う。手書き文字を文字認識する方法としては、二値化処理時にコントラストを上げノイズ除去を強力にすることでＯＣＲ処理を行う。または、文字認識部２０３は、１文字ごとの画像特徴点で判断するのではなく前後の文字を含めてニューラルネットワークにより学習した認識エンジンを使用して、手書き文字専用のＯＣＲ処理を行ってもよい。手書き文字専用のＯＣＲ処理によって、手書き文字、印刷文字の両方の文字の認識精度を高めることができる。 The character recognition unit 203 performs optical character recognition (OCR) processing on the character string region. The character recognition unit 203 of this embodiment can perform OCR processing in an appropriate manner according to the classification of the character string region. For example, handwritten characters have a lower density than printed characters, and there is a large difference in the character shape depending on the writer. For this reason, when performing OCR processing on handwritten characters, the character recognition unit 203 performs OCR processing in a manner different from that used for OCR processing on printed characters. As a method for character recognition of handwritten characters, OCR processing is performed by increasing the contrast during binarization processing and strengthening noise removal. Alternatively, the character recognition unit 203 may perform OCR processing dedicated to handwritten characters using a recognition engine trained by a neural network including the characters before and after the characters, rather than judging each character based on the image feature points. OCR processing dedicated to handwritten characters can improve the recognition accuracy of both handwritten characters and printed characters.

手書き文字分離部２０４は、読取画像に対して手書き文字を分離する処理を行い、読取画像内の手書き文字を表す手書き文字画像と手書き文字以外の画像とを生成する。読取画像から手書き文字画像を分離する方法として、手書き文字以外の画像である帳票テンプレートの画像を事前に登録しておき、その画像と読取画像データの画像との差分を導出する方法がある。手書き文字を分離する方法はこの方法に限られない。他にも、例えば、読取画像データ内の手書き文字列領域とそれ以外の背景領域とをニューラルネットワークに学習させて、画素や領域ごとに手書き文字かを判断することで手書き文字画像と手書き文字以外の画像とを生成する方法が用いられてもよい。 The handwritten character separation unit 204 performs a process to separate handwritten characters from the read image, and generates a handwritten character image representing the handwritten characters in the read image and an image other than handwritten characters. One method of separating handwritten character images from a read image is to register an image of a form template, which is an image other than handwritten characters, in advance, and derive the difference between that image and the image of the read image data. The method of separating handwritten characters is not limited to this method. Another method that can be used is, for example, to have a neural network learn about handwritten character string areas and other background areas in the read image data, and generate a handwritten character image and an image other than handwritten characters by determining whether each pixel or area is a handwritten character.

文字情報抽出部２０５は、後述するフォームデータを取得し、読取画像の文字列領域と配置が類似する、フォームデータに登録されている文字列領域を特定する。そして、文字情報抽出部２０５は、文字認識部２０３による文字認識処理の結果得られた文字列から、文書名、氏名、電話番号等のメタデータ項目に対応する文字列を抽出するための処理を行う。文字情報抽出部２０５の処理の詳細は後述する。 The character information extraction unit 205 acquires form data, which will be described later, and identifies character string areas registered in the form data that have a similar layout to the character string areas in the scanned image. The character information extraction unit 205 then performs processing to extract character strings corresponding to metadata items such as document name, name, and telephone number from the character strings obtained as a result of the character recognition processing by the character recognition unit 203. The processing of the character information extraction unit 205 will be described in detail later.

表示制御部２０６は、確認画面および修正画面を操作部１４０の液晶パネルに表示させるための制御を行う。確認画面および修正画面については後述する。 The display control unit 206 controls the display of the confirmation screen and the correction screen on the liquid crystal panel of the operation unit 140. The confirmation screen and the correction screen will be described later.

図２の各部の機能は、ＣＰＵ１１１がＲＯＭ１１２または記憶装置１５０に記憶されているプログラムコードをＲＡＭ１１３に展開し実行することにより実現される。または、図２の各部の一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。 The functions of each part in FIG. 2 are realized by the CPU 111 expanding the program code stored in the ROM 112 or the storage device 150 into the RAM 113 and executing it. Alternatively, some or all of the functions of each part in FIG. 2 may be realized by hardware such as an ASIC or electronic circuit.

［文字情報抽出処理の対象となる帳票について］
画像形成装置１００は、申込書のような手書き文字と印刷文字とが含まれる原稿を読み取ることによって得られた読取画像の画像データから、文書名、氏名、電話番号等の所定の項目の文字列を抽出する文字情報抽出処理を行う。 [Regarding forms that are subject to character information extraction processing]
The image forming apparatus 100 performs a character information extraction process to extract character strings of specified items such as the document name, name, and telephone number from the image data of the scanned image obtained by reading a document that contains handwritten and printed characters, such as an application form.

文字情報抽出処理では、データベースに登録されている、帳票ごとの文字列領域の位置情報から、読取画像の文字列領域と配置が類似する文字列領域を特定する。そして特定された、フォームデータの文字列領域に対応付けられている抽出対象の項目の文字列領域と同じ領域から文字認識された読取画像内の文字列を、所定の項目の文字列として抽出する。類似する文字列領域が登録されていない場合、読取画像が示す帳票が新たな帳票として登録される。 In the character information extraction process, character string areas that are similar in layout to the character string areas in the scanned image are identified from the positional information of the character string areas for each form registered in the database. Then, a character string in the scanned image that has been recognized from the same area as the character string area of the item to be extracted that is associated with the character string area of the form data is extracted as the character string of the specified item. If no similar character string area is registered, the form indicated by the scanned image is registered as a new form.

図３は、手書き文字が書き込まれていない帳票フォーム３００を示す図である。帳票フォーム３００が示すように、読み取り対象となる原稿には、「個人会員」の記載領域３０１と「法人会員」の記載領域３０２のどちらか一方の領域に手書きで記載されるような帳票フォームに基づく原稿がある。帳票フォームの「個人会員」と「法人会員」との記載領域では手書きで記載される内容が異なるが、「電話番号」、「メールアドレス」といった共通する項目も存在する。 Figure 3 shows a form 300 with no handwritten characters. As form 300 shows, the manuscript to be read is based on a form that is handwritten in either the "individual member" writing area 301 or the "corporate member" writing area 302. The handwritten content differs between the "individual member" and "corporate member" writing areas of the form, but there are also common items such as "telephone number" and "email address."

帳票フォーム３００の「個人会員」の記載領域３０１に手書きで記載がされた原稿と、「法人会員」の記載領域３０２に手書きで記載された原稿は、「電話番号」、「メールアドレス」のように同一の項目の領域が複数ある。このためユーザが抽出したい文字列が、記載領域３０１に含まれる項目の文字列なのか記載領域３０２に含まれる項目の文字列なのかかが判断できず、所望の文字列を抽出できない虞がある。 The manuscript handwritten in the "individual member" entry area 301 of the form 300 and the manuscript handwritten in the "corporate member" entry area 302 have multiple areas for the same item, such as "phone number" and "email address." For this reason, it is not possible to determine whether the character string the user wishes to extract is a character string for an item included in entry area 301 or a character string for an item included in entry area 302, and there is a risk that the desired character string will not be extracted.

そこで本実施形態では、読取画像に含まれる印刷文字の文字列領域の情報と、手書き文字の文字列領域の情報と、を分離してそれぞれの情報をデータベースに登録する。読取画像から所定の項目を抽出する際は、読取画像に含まれる印刷文字の文字列領域、および読取画像に含まれる手書き文字の文字列領域のそれぞれ情報に基づき文書種別を判定することで、読取画像における所定の項目の文字列を抽出する。 In this embodiment, the information on the character string areas of printed characters and the information on the character string areas of handwritten characters contained in the scanned image are separated and each piece of information is registered in a database. When extracting a specific item from the scanned image, the document type is determined based on the information on the character string areas of printed characters contained in the scanned image and the information on the character string areas of handwritten characters contained in the scanned image, and the character string of the specific item in the scanned image is extracted.

［フォームデータの登録について］
図４は、サーバ１９１に記憶されている、登録済み帳票（文書）の情報を示すフォームデータの一例を示す図である。画像形成装置１００では、フォームデータを用いて、読取画像に含まれるメタデータ項目の文字列の抽出が行われる。 [Regarding form data registration]
4 is a diagram showing an example of form data indicating information of a registered form (document) stored in the server 191. In the image forming apparatus 100, character strings of metadata items included in a scanned image are extracted using the form data.

フォームデータには、登録済みの、画像の情報とメタデータ項目の情報とが含まれる。具体的には、画像の情報として、画像データ、手書き文字ＢＳ、および印刷文字ＢＳの情報が含まれる。読取画像４００は、登録済み文書の読取画像のデータを示す。手書き文字ＢＳ情報および印刷文字ＢＳ情報は、登録済み文書の文字列領域の配置を示す位置情報である。ＢＳはブロックセレクションを指す。 The form data includes registered image information and metadata item information. Specifically, the image information includes image data, handwritten character BS, and printed character BS information. The scanned image 400 shows the data of the scanned image of the registered document. The handwritten character BS information and printed character BS information are positional information that show the arrangement of the character string area of the registered document. BS refers to block selection.

手書き文字ＢＳ画像４０１は、登録済み文書の手書き文字が含まれる文字列領域（手書き文字列領域）の位置情報である手書き文字ＢＳ情報を画像として表した図である。手書き文字ＢＳ画像４０１内の実線の矩形領域が手書き文字列領域の位置およびサイズを示す。印刷文字ＢＳ画像４０２は、登録済み文書の印刷文字が含まれる文字列領域（印刷文字列領域）の位置情報を示す印刷文字ＢＳ情報を画像で表した図である。印刷文字ＢＳ画像４０２の実線の矩形領域が印刷文字列領域の位置およびサイズを示す。 The handwritten character BS image 401 is an image that represents handwritten character BS information, which is position information of a character string area (handwritten character string area) that contains handwritten characters of a registered document. The solid-line rectangular area in the handwritten character BS image 401 indicates the position and size of the handwritten character string area. The printed character BS image 402 is an image that represents printed character BS information, which is position information of a character string area (printed character string area) that contains printed characters of a registered document. The solid-line rectangular area in the printed character BS image 402 indicates the position and size of the printed character string area.

このように、本実施形態では、登録済み文書に対する文字列領域の情報として、印刷文字列領域と手書き文字列領域とのそれぞれの位置情報がそれぞれ登録されている。以下、手書き文字ＢＳ画像は手書き文字の文字列領域の位置情報を、印刷文字ＢＳ画像は印刷文字の文字列領域の位置情報を指すものとする。 In this way, in this embodiment, the position information of the printed character string area and the handwritten character string area are registered as information on the character string area for the registered document. Hereinafter, the handwritten character BS image refers to the position information of the character string area of handwritten characters, and the printed character BS image refers to the position information of the character string area of printed characters.

手書き文字ＢＳ画像４０１および印刷文字ＢＳ画像４０２のそれぞれの画像内の破線の矩形は、メタデータ項目の文字列を含む文字列領域の位置情報を示している。本実施形態におけるメタデータ項目は、「文書名」「申込日」「氏名」「法人名」「電話番号」であり、それぞれの項目に対応した位置情報がメタデータ項目情報として登録されている。位置情報４０３は「文書名」、位置情報４０４は「申込日」、位置情報４０５は「氏名」、位置情報４０６は「電話番号」のそれぞれの項目に対応する位置を示す。また、「法人名」の位置情報については不明である旨の情報が登録されている。 The dashed rectangles in each of the handwritten character BS image 401 and the printed character BS image 402 indicate the location information of the character string area containing the character string of the metadata item. The metadata items in this embodiment are "document name," "application date," "name," "corporate name," and "telephone number," and location information corresponding to each item is registered as metadata item information. Location information 403 indicates the location corresponding to each of the items "document name," location information 404 indicates the location corresponding to "application date," location information 405 indicates the location corresponding to "name," and location information 406 indicates the location corresponding to each of the items "telephone number." In addition, information is registered to the effect that the location information for "corporate name" is unknown.

本実施形態では、スキャン処理が実行される度に画像形成装置１００は、サーバ１９１から登録済のフォームデータの取得を行うことで、複数の画像形成装置で登録済みの情報を共有することできる構成としている。他にも、登録済みのフォームデータは、それぞれの画像形成装置１００の記憶装置１５０に記憶されている構成であってもよい。 In this embodiment, the image forming device 100 is configured to obtain the registered form data from the server 191 each time a scan process is performed, thereby allowing registered information to be shared among multiple image forming devices. Alternatively, the registered form data may be stored in the storage device 150 of each image forming device 100.

［文字情報抽出処理のフローについて］
図５は、文字情報抽出処理の詳細を説明するためのフローチャートである。図５のフローチャートで示される一連の処理は、画像形成装置１００のＣＰＵ１１１がＲＯＭ１１２に記憶されているプログラムコードをＲＡＭ１１３に展開し実行することにより行われる。また、図５におけるステップの一部または全部の機能をＡＳＩＣや電子回路等のハードウェアで実現してもよい。なお、各処理の説明における記号「Ｓ」は、当該フローチャートにおけるステップであることを意味し、以後のフローチャートにおいても同様とする。 [Text information extraction process flow]
Fig. 5 is a flowchart for explaining the details of the character information extraction process. The series of processes shown in the flowchart in Fig. 5 are performed by CPU 111 of image forming apparatus 100 expanding program code stored in ROM 112 into RAM 113 and executing it. In addition, some or all of the functions of the steps in Fig. 5 may be realized by hardware such as an ASIC or electronic circuit. Note that the symbol "S" in the explanation of each process indicates a step in the flowchart, and the same applies to the following flowcharts.

Ｓ５０１においてＣＰＵ１１１は、操作部１４０を介してユーザから原稿のスキャン処理の指示を受け付けると、リーダ装置１２０に原稿のスキャンを指示する。リーダ装置１２０は原稿に対してスキャンを実施し、原稿に対応した読取画像データを生成する。そして、画像データ取得部２０１は、リーダ装置１２０から原稿の読取画像データを取得する。 In S501, when the CPU 111 receives an instruction to scan a document from a user via the operation unit 140, it instructs the reader device 120 to scan the document. The reader device 120 scans the document and generates read image data corresponding to the document. The image data acquisition unit 201 then acquires the read image data of the document from the reader device 120.

図６は、本フローチャートの処理の結果生成されるデータの一例を示す図である。図６の読取画像６００は、帳票フォーム３００の「個人会員」の記載領域３０１に記載された原稿をスキャンすることによって得られた読取画像である。 Figure 6 is a diagram showing an example of data generated as a result of the processing of this flowchart. The scanned image 600 in Figure 6 is a scanned image obtained by scanning the document entered in the "Individual Member" entry area 301 of the form 300.

図７は、図６に示すデータを得るための原稿とは異なる原稿に対して、本フローチャートの処理において生成されたデータの一例を示す図である。図７の読取画像７００は、帳票フォーム３００の「法人会員」の記載領域３０２に記載された原稿をスキャンすることによって得られた読取画像である。 Figure 7 is a diagram showing an example of data generated in the process of this flowchart for a manuscript different from the manuscript used to obtain the data shown in Figure 6. The scanned image 700 in Figure 7 is a scanned image obtained by scanning a manuscript filled in the "Corporate Member" field 302 of the form 300.

Ｓ５０２において手書き文字分離部２０４は、Ｓ５０１で生成された読取画像データに対して手書き文字を分離する処理を行い、手書き文字画像と手書き文字以外の画像とを生成する。なお、Ｓ５０２の処理が行われる前に、スキャン処理の際に生じる斜行、原稿セット時の画像ずれ、画像の向きの違いなどを判定して、読取画像の位置合わせ処理が行われてもよい。読取画像の位置合わせ処理が行われることで、差分画像を抽出する精度を高めることができる。図６の読取画像６００に対して本ステップの処理を行うことで、手書き文字が抽出された手書き文字画像６０１と手書き文字以外の画像６０２とが得られる。また、図７の読取画像７００に対して本ステップの処理を行うことで、手書き文字画像７０１と、手書き文字以外の画像７０２とが得られる。読取画像６００および読取画像７００は同一フォームの帳票に手書きの記載がされている原稿をスキャンすることによって得られた画像であるため、手書き文字以外の画像６０２および手書き以外の画像７０２は原則として同一の画像となる。 In S502, the handwritten character separation unit 204 performs a process of separating handwritten characters from the read image data generated in S501, and generates a handwritten character image and an image other than handwritten characters. Note that before the process of S502 is performed, a position adjustment process of the read image may be performed by determining skew that occurs during the scanning process, image misalignment when the document is set, and differences in image orientation. By performing a position adjustment process of the read image, the accuracy of extracting the difference image can be improved. By performing the process of this step on the read image 600 in FIG. 6, a handwritten character image 601 from which handwritten characters are extracted and an image other than handwritten characters 602 are obtained. In addition, by performing the process of this step on the read image 700 in FIG. 7, a handwritten character image 701 and an image other than handwritten characters 702 are obtained. Since the read image 600 and the read image 700 are images obtained by scanning a document on which handwritten writing is written on the same form of document, the image other than handwritten characters 602 and the image other than handwritten characters 702 are essentially the same image.

Ｓ５０３では、Ｓ５０２で生成された手書き文字以外の画像６０２、７０２に対して文字認識のための処理を行う。本ステップでは、文字列領域検出部２０２は、ＯＣＲ処理の前に、ブロック分割処理を行い、画像の帳票構造を解析し、画像に含まれる背景領域と文字列領域を示す矩形領域（ブロック）とに分割するＢＳ処理を行う。 In S503, processing for character recognition is performed on the images 602, 702 other than the handwritten characters generated in S502. In this step, the character string area detection unit 202 performs block division processing before OCR processing, analyzes the form structure of the image, and performs BS processing to divide the image into a background area and rectangular areas (blocks) that indicate the character string areas.

手書き文字以外の画像６０２、７０２に対してＢＳ処理が行われた結果得られる文字列領域の座標情報（位置情報）を、説明のために画像で表したものが印刷文字ＢＳ画像６０４、７０４である。印刷文字ＢＳ画像６０４、７０４において矩形で表される文字列領域は、印刷文字列領域である。手書き文字以外の画像６０２、７０２は同一の画像であるため、印刷文字ＢＳ画像６０４および印刷文字ＢＳ画像７０４は同一の画像になる。 Printed character BS images 604, 704 are images that, for the purpose of explanation, show the coordinate information (position information) of the character string area obtained as a result of performing BS processing on images 602, 702 other than handwritten characters. The character string area represented by a rectangle in the printed character BS images 604, 704 is the printed character string area. Since the images 602, 702 other than handwritten characters are the same image, the printed character BS image 604 and the printed character BS image 704 are the same image.

そして、文字認識部２０３は、印刷文字ＢＳ画像６０４、７０４の夫々の文字列領域に対してＯＣＲ処理を実施する。文字列領域の文字列の文字コードを取得することで、文字列領域以外の領域をＯＣＲ処理する必要はない。このため、ＯＣＲ処理における処理負荷の軽減し、文字認識の精度を向上することができる。 The character recognition unit 203 then performs OCR processing on each character string area of the printed character BS images 604 and 704. By obtaining the character code of the character string in the character string area, there is no need to perform OCR processing on areas other than the character string area. This reduces the processing load in OCR processing and improves the accuracy of character recognition.

Ｓ５０４では、Ｓ５０２で生成された手書き文字画像６０１、７０１に対して文字認識のための処理を行う。文字列領域検出部２０２は、Ｓ５０３と同様にＯＣＲ処理の前に手書き文字画像６０１、７０１に対してＢＳ処理を行う。手書き文字ＢＳ画像６０３、７０３は、手書き文字画像６０１、７０１に対してＢＳ処理が行われた結果得られた手書き文字列領域の座標情報を、説明のために画像として表した図ある。図中の矩形が手書き文字列領域を表す。 In S504, processing for character recognition is performed on the handwritten character images 601, 701 generated in S502. As in S503, the character string area detection unit 202 performs BS processing on the handwritten character images 601, 701 before OCR processing. Handwritten character BS images 603, 703 are diagrams that, for the purpose of explanation, represent as images the coordinate information of the handwritten character string area obtained as a result of performing BS processing on the handwritten character images 601, 701. The rectangles in the diagrams represent the handwritten character string areas.

文字認識部２０３は、夫々の文字列領域に対してＯＣＲ処理を実施することで、文字列領域の文字列の文字コードを取得する。本ステップにおけるＯＣＲ処理の方法は、Ｓ５０３のＯＣＲ処理と同じ方法であってもよいし、手書き文字の文字認識に特化したＳ５０３とは異なるＯＣＲ処理の方法であってもよい。 The character recognition unit 203 performs OCR processing on each character string area to obtain the character code of the character string in the character string area. The OCR processing method in this step may be the same as the OCR processing method in S503, or may be an OCR processing method different from S503 that is specialized for character recognition of handwritten characters.

Ｓ５０５において文字情報抽出部２０５は、サーバ１９１の文書管理機能のＩ／Ｆを介して、サーバ１９１の記憶装置１９６に記憶されている、図４を用いて説明した登録済み文書のフォームデータを取得する。 In S505, the character information extraction unit 205 acquires the form data of the registered document described with reference to FIG. 4, which is stored in the storage device 196 of the server 191, via the I/F of the document management function of the server 191.

Ｓ５０６において文字情報抽出部２０５は、読取画像の印刷文字列領域と配置が類似する印刷文字列領域がフォームデータに登録されているか判定する。配置が類似する印刷文字列領域がフォームデータに登録されている場合にはＳ５０７へ進み、登録されてない場合にはＳ５１０へ進む。 In S506, the character information extraction unit 205 determines whether a print character string area with a similar layout to the print character string area of the scanned image is registered in the form data. If a print character string area with a similar layout is registered in the form data, the process proceeds to S507; if not, the process proceeds to S510.

Ｓ５０７において文字情報抽出部２０５は、読取画像の手書き文字列領域と配置が類似する手書き文字列領域がフォームデータに登録されているかを判定する。配置が類似している手書き文字列領域がフォームデータに登録されている場合（Ｓ５０７がＹＥＳ）にはＳ５０８へ進む。 In S507, the character information extraction unit 205 determines whether a handwritten character string area with a similar layout to the handwritten character string area in the scanned image is registered in the form data. If a handwritten character string area with a similar layout is registered in the form data (YES in S507), the process proceeds to S508.

Ｓ５０６およびＳ５０７における判定では、文字列領域の配置の一致度に基づき、配置が類似する文字列領域が特定される。例えば、Ｓ５０６では、読取画像における印刷文字列領域の座標情報に基づく印刷文字列領域の画素と、フォームデータに登録されているそれぞれの印刷文字列領域の座標情報に基づく印刷文字列領域の画素とを比較する。そして比較の結果、一致度が閾値を超えている、フォームデータに登録されている印刷文字列領域がある場合、読取画像の印刷文字列領域の配置は、フォームデータに登録されている、その印刷文字列領域の配置と類似すると特定される。 In the determinations at S506 and S507, character string areas with similar layouts are identified based on the degree of similarity of the layouts of the character string areas. For example, at S506, the pixels of the print character string area based on the coordinate information of the print character string area in the scanned image are compared with the pixels of the print character string area based on the coordinate information of each print character string area registered in the form data. If the comparison shows that there is a print character string area registered in the form data whose degree of similarity exceeds a threshold, the layout of the print character string area in the scanned image is identified as similar to the layout of that print character string area registered in the form data.

配置が類似する文字列領域を特定する方法として、画像データそのものを比較してもよいが、文字列領域の座標情報を使うことによって、類似判定における処理負荷を抑制することができる。またフォームデータにおける文字列領域の位置情報として座標情報を登録することで、フォームデータとして登録される情報のデータサイズを抑えることができる。 One way to identify character string areas with similar layouts is to compare the image data themselves, but by using the coordinate information of the character string areas, the processing load for similarity determination can be reduced. Also, by registering coordinate information as position information of the character string areas in the form data, the data size of the information registered as form data can be reduced.

Ｓ５０８において文字情報抽出部２０５は、Ｓ５０６およびＳ５０７で類似すると特定された、フォームデータに登録されている文字列領域の位置情報と対応付けて記憶されている夫々のメタデータ項目の位置を示す座標情報を取得する。そして、取得された座標に対応する、読取画像の文字列領域から認識された文字列を、その項目の文字列として抽出する。抽出された文字列は項目と対応付けられ、ユーザが確認するための画面を表示するために表示制御部２０６に出力される。 In S508, the character information extraction unit 205 acquires coordinate information indicating the position of each metadata item that is stored in association with the position information of the character string area registered in the form data and that was identified as similar in S506 and S507. Then, a character string recognized from the character string area of the scanned image that corresponds to the acquired coordinates is extracted as the character string for that item. The extracted character string is associated with the item and output to the display control unit 206 to display a screen for the user to confirm.

図８は、ユーザが、文字情報抽出部２０５が抽出したメタデータ項目ごとの文字列を確認するため確認画面８００の一例を示す図である。確認画面は、操作部１４０の液晶パネルに表示される画面である。図８の確認画面８００は、図６の読取画像６００に対してＳ５０２～５０８の処理が行われた結果表示される画像である。プレビュー表示領域８０１には読取画像６００が表示される。メタデータ項目表示領域８０３～８０７には文字情報抽出部２０５によって抽出された項目ごとの文字列が表示される。 Figure 8 is a diagram showing an example of a confirmation screen 800 that allows the user to confirm the character strings for each metadata item extracted by the character information extraction unit 205. The confirmation screen is a screen that is displayed on the liquid crystal panel of the operation unit 140. The confirmation screen 800 in Figure 8 is an image that is displayed as a result of performing the processes of S502 to S508 on the scanned image 600 in Figure 6. The scanned image 600 is displayed in a preview display area 801. The character strings for each item extracted by the character information extraction unit 205 are displayed in metadata item display areas 803 to 807.

本フローチャートの開始時おいて、フォームデータに登録されている内容は図４に示す内容であるものとする。この場合、読取画像６００の印刷文字ＢＳ画像６０４が示す印刷文字列領域の配置と、図４の登録済みの印刷文字ＢＳ画像４０２が示す印刷文字列領域の配置と、は類似すると判定される。また、読取画像６００の手書き文字ＢＳ画像６０３が示す手書き文字列領域の配置と、図４の登録済みの手書き文字ＢＳ画像４０１が示す手書き文字列領域の配置と、は類似すると判定される。 At the start of this flowchart, the contents registered in the form data are as shown in FIG. 4. In this case, the layout of the printed character string area shown by the printed character BS image 604 in the scanned image 600 is determined to be similar to the layout of the printed character string area shown by the registered printed character BS image 402 in FIG. 4. Also, the layout of the handwritten character string area shown by the handwritten character BS image 603 in the scanned image 600 is determined to be similar to the layout of the handwritten character string area shown by the registered handwritten character BS image 401 in FIG. 4.

このため、図４の登録済みの印刷文字ＢＳ画像４０２および手書き文字ＢＳ画像４０１における破線で囲まれたメタデータ項目の位置を示す座標が取得される。そして、読取画像からＯＣＲ処理した結果得られた文字列のうち、メタデータ項目の位置の文字列領域で認識された文字列が、メタデータ項目の文字列として決定される。その結果、図８の確認画面８００が表示される。 Therefore, coordinates indicating the positions of the metadata items surrounded by dashed lines in the registered printed character BS image 402 and handwritten character BS image 401 in Figure 4 are obtained. Then, among the character strings obtained as a result of OCR processing of the scanned image, the character string recognized in the character string area at the position of the metadata item is determined as the character string of the metadata item. As a result, the confirmation screen 800 in Figure 8 is displayed.

このため、確認画面８００におけるメタデータ項目表示領域８０３には、読取画像６００の位置情報４０３が示す文字列領域の文字列が、文書名の文字列「入会申込書」として抽出されて表示されている。同様に、位置情報４０４からは申込日の文字列として「２０２０年１月３１日」、位置情報４０５からは氏名の文字列として「下丸子太郎」、が抽出されて確認画面に表示されている。また、位置情報４０６からは電話番号の文字列として「０３－３７５８－××××」が抽出されて確認画面に表示されている。このように、抽出されたメタデータ項目の文字列を確認画面に表示させることができる。 Therefore, in the metadata item display area 803 on the confirmation screen 800, the character string in the character string area indicated by the location information 403 of the scanned image 600 is extracted and displayed as the document name character string "Membership application form". Similarly, "January 31, 2020" is extracted from the location information 404 as the application date character string, and "Shimomaruko Taro" is extracted from the location information 405 as the name character string, and these are displayed on the confirmation screen. In addition, "03-3758-xxxx" is extracted from the location information 406 as the telephone number character string, and these are displayed on the confirmation screen. In this way, the extracted metadata item character strings can be displayed on the confirmation screen.

印刷文字列領域と配置が類似する文字列領域は登録されているが、手書き文字列領域と配置が類似する文字列領域は登録されてない場合（Ｓ５０７がＮＯ）、Ｓ５０９へ進む。Ｓ５０９において文字情報抽出部２０５は、類似すると特定された印刷文字列領域と対応付けて記憶されている夫々のメタデータ項目の位置を示す座標情報を取得する。そして、取得された座標が示す、読取画像の文字列領域から認識された文字列をその項目の文字列として抽出する。抽出された文字列は項目と対応付けられてユーザが確認するための画面を表示するために表示制御部２０６に出力される。 If a character string area similar in layout to the printed character string area has been registered, but a character string area similar in layout to the handwritten character string area has not been registered (NO in S507), the process proceeds to S509. In S509, the character information extraction unit 205 acquires coordinate information indicating the position of each metadata item stored in association with the printed character string area identified as similar. Then, the character string recognized from the character string area of the scanned image indicated by the acquired coordinates is extracted as the character string for that item. The extracted character string is associated with the item and output to the display control unit 206 to display a screen for the user to confirm.

図９は、Ｓ５０９において文字情報抽出部２０５が抽出した項目ごとの文字列を、ユーザが確認するため確認画面の一例を示す図である。図９の確認画面９００は、図７の読取画像７００に対してＳ５０２～５０７、Ｓ５０９の処理が行われた結果表示される画像である。プレビュー表示領域９０１には読取画像７００が、メタデータ項目表示領域９０３～９０７には、Ｓ５０９で抽出されたメタデータ項目の文字列が表示されている。 Figure 9 is a diagram showing an example of a confirmation screen that allows the user to confirm the character strings for each item extracted by the character information extraction unit 205 in S509. The confirmation screen 900 in Figure 9 is an image that is displayed as a result of performing the processes of S502 to S507 and S509 on the scanned image 700 in Figure 7. The scanned image 700 is displayed in a preview display area 901, and the character strings of the metadata items extracted in S509 are displayed in metadata item display areas 903 to 907.

本フローチャートの開始時おいて、フォームデータに登録されている内容は図４に示す内容であるものとする。この場合、文字情報抽出対象の読取画像が読取画像７００であるとき、読取画像７００の印刷文字ＢＳ画像７０４が示す印刷文字列領域の配置と、図４の登録済みの印刷文字ＢＳ画像４０２が示す印刷文字列領域の配置は類似すると判定される。しかしながら、読取画像７００の手書き文字ＢＳ画像７０３が示す手書き文字列領域の配置と、登録済みの手書き文字ＢＳ画像４０１が示す手書き文字列領域の配置は類似しない。このため読取画像７００の場合、Ｓ５０７では、読取画像の手書き文字列領域と配置が類似する手書き文字列領域はフォームデータには登録されていないと判定される。この場合、読取画像６００の場合とは異なり、一部のメタデータ項目の文字列は抽出されないことになる。 At the start of this flowchart, the contents registered in the form data are as shown in FIG. 4. In this case, when the scanned image from which character information is extracted is scanned image 700, it is determined that the layout of the printed character string area indicated by printed character BS image 704 of scanned image 700 is similar to the layout of the printed character string area indicated by registered printed character BS image 402 in FIG. 4. However, the layout of the handwritten character string area indicated by handwritten character BS image 703 of scanned image 700 is not similar to the layout of the handwritten character string area indicated by registered handwritten character BS image 401. For this reason, in the case of scanned image 700, in S507 it is determined that the handwritten character string area similar in layout to the handwritten character string area of the scanned image is not registered in the form data. In this case, unlike the case of scanned image 600, the character strings of some metadata items are not extracted.

確認画面９００におけるメタデータ項目表示領域９０３には、読取画像７００における位置情報４０３が示す印刷文字列領域の文字列が、文書名の文字列「入会申込書」として抽出されて表示されている。同様に、位置情報４０３が示す印刷文字列領域からは申込日の文字列として「２０２０年１月３１日」が抽出され、表示されている。しかしながら、登録済みの印刷文字ＢＳ画像４０２には、氏名、および電話番号の項目の位置情報４０５、４０６に文字列領域がない。このため、文字情報抽出部２０５は、読取画像７００からは、一部のメタデータ項目の文字列だけを抽出している。 In the metadata item display area 903 on the confirmation screen 900, the character string in the print character string area indicated by the position information 403 in the scanned image 700 is extracted and displayed as the document name character string "Membership application". Similarly, "January 31, 2020" is extracted and displayed as the application date character string from the print character string area indicated by the position information 403. However, in the registered printed character BS image 402, there are no character string areas in the position information 405, 406 for the name and phone number items. For this reason, the character information extraction unit 205 extracts only the character strings of some of the metadata items from the scanned image 700.

Ｓ５１０において表示制御部２０６は、操作部１４０の液晶パネルに前述した確認画面を表示し、ユーザからの修正操作または登録操作を受け付ける。 At S510, the display control unit 206 displays the confirmation screen described above on the liquid crystal panel of the operation unit 140 and accepts correction or registration operations from the user.

図８および図９の確認画面８００、９００の、ファイル名表示領域８０２、９０２には、予め設定されている命名規則に基づいて、メタデータ項目を用いたファイル名が表示される。例えば、「文書名＿申込日＿氏名．ｐｄｆ」または「文書名＿申込日＿法人名．ｐｄｆ」の命名規則が設定されているものとする。図９の確認画面９００では、ファイル名の命名に必要なメタデータ項目である「氏名」または「法人名」の文字列がいずれも抽出されておらず、空欄である。このため、ファイル名には、メタデータが取得できていないことを示す「＊」が表示されている。 In the file name display areas 802, 902 of the confirmation screens 800, 900 in Figures 8 and 9, file names using metadata items are displayed based on a preset naming rule. For example, assume that the naming rule is set to "Document name_Application date_Name.pdf" or "Document name_Application date_Corporate name.pdf". In the confirmation screen 900 in Figure 9, neither the character strings "Name" nor "Corporate name", which are metadata items required for naming a file name, have been extracted and are blank. For this reason, an "*" is displayed in the file name to indicate that metadata has not been obtained.

確認画面８００、９００には、項目修正ボタン５０８と登録完了ボタン５０９が表示されている。項目修正ボタン５０８がユーザによって押下されると、各メタデータ項目表示領域に表示されている文字列を修正するための修正画面が表示される。修正画面では、ユーザから、メタデータ項目の修正操作を受け付けることができる。また、図９の確認画面９００では、ファイル名に必要なメタデータの一部が空欄のため、登録完了ボタン５０９がグレーアウトされて、ユーザが押下できないように設定されている。 Confirmation screens 800 and 900 display an Item Modification button 508 and a Registration Complete button 509. When the user presses the Item Modification button 508, a modification screen is displayed for modifying the character strings displayed in each metadata item display area. The modification screen can accept modifications to metadata items from the user. Also, in the confirmation screen 900 of Figure 9, some of the metadata required for the file name are blank, so the Registration Complete button 509 is grayed out and is set so that the user cannot press it.

図１０は、メタデータ項目の修正操作が受け付けられた後、操作部１４０の液晶パネルに表示される修正画面を示す図である。図１０の修正画面１０００は、図９の確認画面９００における「法人名」のメタデータ項目表示領域９０６が選択されて、項目修正ボタン５０８が押下された場合の修正画面である。 Figure 10 is a diagram showing a correction screen displayed on the liquid crystal panel of the operation unit 140 after a metadata item correction operation is accepted. The correction screen 1000 in Figure 10 is the correction screen when the metadata item display area 906 for "Corporate Name" on the confirmation screen 900 in Figure 9 is selected and the item correction button 508 is pressed.

修正画面１０００では、読取画像に含まれるユーザが選択可能な文字列領域が強調して表示される。本実施形態では、メタデータ項目の文字列としてユーザが選択可能な文字列領域が破線の枠として、読取画像７００上に重畳して表示される。ユーザが、「法人名」を示す枠１０１０を選択することで、メタデータ項目表示領域１００６には枠１０１０が示す文字列領域から文字認識された文字列「〇〇〇株式会社」が表示される。このように、文字情報抽出部２０５が抽出できなかった項目の文字列は、ユーザの指示を受け付けることによってその文字列を取得することができる。 On the correction screen 1000, character string areas included in the scanned image that can be selected by the user are highlighted. In this embodiment, character string areas that can be selected by the user as character strings for metadata items are displayed as dashed frames superimposed on the scanned image 700. When the user selects the frame 1010 indicating "Corporate Name", the character string "XXX Co., Ltd." is displayed in the metadata item display area 1006 as a result of character recognition from the character string area indicated by the frame 1010. In this way, the character string for an item that the character information extraction unit 205 was unable to extract can be obtained by accepting an instruction from the user.

法人名のメタデータ項目の文字列が受け付けられると、命名規則によるファイル名が決定される。このため、ユーザは登録完了ボタン５０９を押下できるようになる。同様にメタデータ項目表示領域１００７についても、枠１０１１がユーザによって選択されたことが受け付けられたため、電話番号の文字列「０３－３７５６－××××」が表示されている。 When the character string for the metadata item of the corporate name is accepted, a file name is determined based on the naming rules. This allows the user to press the registration completion button 509. Similarly, in the metadata item display area 1007, the selection of box 1011 by the user is accepted, and so the character string for the telephone number "03-3756-xxxx" is displayed.

Ｓ５１１において、ユーザによる登録完了ボタン５０９の押下を表示制御部２０６が受け付けると、修正指示の内容を示す情報が、サーバ１９１の文書管理機能のＩ／Ｆを介してサーバ１９１の記憶装置１９６に送信される。また、読取画像の印刷文字列領域を示す印刷文字ＢＳが登録されていない場合は、新規に登録される情報として、読取画像の、画像データと印刷文字ＢＳおよび手書き文字ＢＳの情報とが送信される。また、読取画像の手書き文字列領域を示す手書き文字ＢＳが登録されていない場合は、新規に登録される情報として、読取画像の、画像データと手書き文字ＢＳの情報とが送信される。 In S511, when the display control unit 206 accepts the user's pressing of the registration completion button 509, information indicating the content of the correction instruction is sent to the storage device 196 of the server 191 via the I/F of the document management function of the server 191. Furthermore, if the printed characters BS indicating the printed character string area of the read image have not been registered, the image data of the read image and information on the printed characters BS and handwritten characters BS are sent as newly registered information. Furthermore, if the handwritten characters BS indicating the handwritten character string area of the read image have not been registered, the image data of the read image and information on the handwritten characters BS are sent as newly registered information.

サーバ１９１では、記憶装置１９６内に記憶されているフォームデータに修正内容または新規の情報を登録する。なお、フォームデータが画像形成装置１００に記憶されている場合は、画像形成装置１００がフォームデータに修正内容または新規の情報の登録を行う。 The server 191 registers the corrections or new information in the form data stored in the storage device 196. If the form data is stored in the image forming device 100, the image forming device 100 registers the corrections or new information in the form data.

図１１は、更新されたフォームデータを示す図である。図１１は、本フローチャートの開始時において図４に示す状態であったフォームデータが、本フローチャートの処理を読取画像７００に対して行った結果、更新されたフォームデータの例を示す。 Figure 11 is a diagram showing updated form data. Figure 11 shows an example of form data that is updated as a result of performing the processing of this flowchart on the scanned image 700, from the form data shown in Figure 4 at the start of this flowchart.

図１１と図４とで比較すると分かるように、読取画像７００の画像データが、新規の情報として登録されている。読取画像７００は、登録済みの読取画像４００と同一帳票フォームの文書ではあるが記載領域が異なるため、新規の情報として登録されている。さらに、手書き文字ＢＳ画像１１０１が示すように、手書き文字ＢＳ画像４０１とは異なる、読取画像７００の手書き文字列領域の座標情報が新たに登録される。 As can be seen by comparing FIG. 11 with FIG. 4, the image data of scanned image 700 has been registered as new information. Scanned image 700 is a document of the same form as registered scanned image 400, but the writing area is different, so it has been registered as new information. Furthermore, as handwritten character BS image 1101 shows, the coordinate information of the handwritten character string area of scanned image 700, which is different from handwritten character BS image 401, is newly registered.

また、図１１と図４とで比較すると分かるように、読取画像７００が示す新規の情報が新たに登録されているものの、読取画像７００の印刷文字列領域の位置情報については、図１１には追加されていない。読取画像７００は、読取画像４００と同一の帳票フォームに基づく画像である。このような場合は、新規に登録する読取画像の印刷文字列領域の位置情報と同じ位置情報を示す印刷文字ＢＳ画像４０２が登録済みである。このため、読取画像７００の印刷文字列領域の位置情報の登録は行わなくてよい。 Also, as can be seen by comparing FIG. 11 with FIG. 4, although new information indicated by scanned image 700 has been newly registered, the position information of the print character string area of scanned image 700 has not been added to FIG. 11. Scanned image 700 is an image based on the same form as scanned image 400. In such a case, printed character BS image 402 indicating the same position information as the position information of the print character string area of the newly registered scanned image has already been registered. For this reason, it is not necessary to register the position information of the print character string area of scanned image 700.

このように本実施形態では、印刷文字ＢＳ画像が示す印刷文字列領域の情報のような複数の画像で共通する情報は、共有の情報として登録される。このため、フォームデータのデータサイズを抑制することができる。また、検索範囲となるデータも削減されることから検索時の処理負荷を抑えることができる。 In this manner, in this embodiment, information common to multiple images, such as information about the print character string area indicated by the print character BS image, is registered as shared information. This makes it possible to reduce the data size of the form data. In addition, the data in the search range is also reduced, making it possible to reduce the processing load during searches.

なお、Ｓ５０６でＮＯと判定された場合、その読取画像の手書き文字列領域および印刷文字列領域と配置が類似する、手書き文字列領域および印刷文字列領域はフォームデータには登録されていない。このため、本フローチャートの終了後に、読取画像の手書き文字列領域の位置情報と印刷文字列領域の位置情報とが新規の文書のフォームデータとして登録される。 If the determination in S506 is NO, the handwritten character string area and the printed character string area that are similar in layout to the handwritten character string area and the printed character string area of the scanned image are not registered in the form data. Therefore, after this flowchart ends, the position information of the handwritten character string area and the printed character string area of the scanned image are registered as form data for a new document.

また、図４のフォームデータでは登録がされていなかった、「法人名」の位置情報１１０７と「電話番号」の位置情報１１０８とが、図１１ではメタデータ項目の位置情報として登録されている。メタデータ項目の位置情報は、登録されている読取画像ごとにそれぞれ登録されてもよいが、本実施形態では、印刷文字列領域の情報を共有している複数の読取画像には、共通のメタデータ項目の位置情報が対応づけられて登録されている。つまり、図１１の、読取画像４００の手書き文字ＢＳ画像４０１と読取画像７００の手書き文字ＢＳ画像１１０１には、メタデータ項目の位置情報４０３～４０６および位置情報１１０７、１１０８が対応付けられている。このように、複数の読取画像に共通のメタデータ項目の位置情報を対応づけることで、登録するフォームデータのデータサイズを抑制することができる。 In addition, location information 1107 for "corporate name" and location information 1108 for "phone number", which were not registered in the form data of FIG. 4, are registered as location information for metadata items in FIG. 11. Location information for metadata items may be registered for each registered scanned image, but in this embodiment, location information for common metadata items is registered in association with multiple scanned images that share information on the print character string area. In other words, location information 403 to 406 and location information 1107 and 1108 for metadata items are associated with handwritten character BS image 401 of scanned image 400 and handwritten character BS image 1101 of scanned image 700 in FIG. 11. In this way, by associating location information for common metadata items with multiple scanned images, the data size of the form data to be registered can be reduced.

図１１に示すように、読取画像７００が新規の情報として登録されることによって、「電話番号」のメタデータ項目の位置情報は、位置情報４０６と位置情報１１０８との２つが登録されることになる。このように、同一項目の文字列領域が複数登録された場合は、処理対象の読取画像から文字列領域が検出された方から、その項目の文字列が抽出される。または、「電話番号」の位置情報として、位置情報４０６と位置情報１１０８どちらの位置情報を用いるかは、「個人会員」および「法人会員」に位置するチェックボックスにチェックが付いているかに応じて切り替えてもよい。または、処理対象の読取画像の手書き文字列領域の位置情報が、登録済みの手書き文字ＢＳ画像４０１と手書き文字ＢＳ画像１１０１とのどちらに類似するかに応じて切り替えてもよい。 As shown in FIG. 11, by registering scanned image 700 as new information, two pieces of location information, location information 406 and location information 1108, are registered for the metadata item "phone number". In this way, when multiple character string areas of the same item are registered, the character string of that item is extracted from the one in which the character string area is detected in the scanned image to be processed. Alternatively, whether location information 406 or location information 1108 is used as the location information for "phone number" may be switched depending on whether the checkboxes located at "individual member" and "corporate member" are checked. Alternatively, it may be switched depending on whether the location information of the handwritten character string area of the scanned image to be processed is similar to the registered handwritten character BS image 401 or handwritten character BS image 1101.

読取画像データを新規に登録する場合、読取画像データを組み込んだ通常のＰＤＦではなく、ＯＣＲ処理結果の文字情報を用いて、全文検索可能なＰＤＦなどの文書フォーマットに変換した結果得られたデータを読取画像データとして登録してもよい。または、読取画像データをＪＰＥＧやＰＮＧなどの画像フォーマットのままで登録してもよい。 When newly registering scanned image data, instead of a normal PDF incorporating the scanned image data, the data obtained by converting the text information resulting from OCR processing into a document format such as a full-text searchable PDF may be registered as the scanned image data. Alternatively, the scanned image data may be registered in its original image format such as JPEG or PNG.

以上説明したように本実施形態によれば、読取画像の印刷文字列領域の位置情報と手書き文字列領域の位置情報と、をそれぞれ用いてメタデータ項目の文字列を抽出する。このため、同一の帳票フォームの異なる記載領域に記載されていた文書の読取画像から所望の文字列を抽出することができる。 As described above, according to this embodiment, the position information of the printed character string area and the position information of the handwritten character string area of the scanned image are used to extract character strings of metadata items. This makes it possible to extract desired character strings from scanned images of documents that were written in different writing areas of the same form.

なお、上記の説明では、画像形成装置１００上で図５のフローチャートの処理を行うものとして説明した。他にも、処理負荷を分散するために、ネットワーク１９０を介してＳ５０１で生成した読取画像データをサーバ１９１に送信し、サーバ１９１でユーザからの操作受付以外の処理が行われてもよい。 In the above description, the processing of the flowchart in FIG. 5 is performed on the image forming apparatus 100. Alternatively, in order to distribute the processing load, the scanned image data generated in S501 may be transmitted to the server 191 via the network 190, and the server 191 may perform processing other than accepting operations from the user.

また、上記の説明では、Ｓ５０３およびＳ５０４ではＢＳ処理のあとに続けてＯＣＲ処理を行うもとして説明したが、ＯＣＲ処理は、メタデータ項目の文字列領域が抽出された後に行われてもよい。この場合、ＯＣＲ処理は、抽出されたメタデータ項目の文字列領域に対してのみ行ってもよい。そして、フォームデータから類似する文字列領域を特定できない場合、または、文字列が抽出できないメタデータ項目がある場合は、検出された文字列領域全体に対してそれぞれＯＣＲ処理を行う形態でもよい。 In the above description, OCR processing is performed following BS processing in S503 and S504, but OCR processing may be performed after the character string area of the metadata item is extracted. In this case, OCR processing may be performed only on the character string area of the extracted metadata item. If a similar character string area cannot be identified from the form data, or if there is a metadata item from which a character string cannot be extracted, OCR processing may be performed on the entire detected character string area.

＜実施形態２＞
本実施形態では、実施形態１とは異なる方法による文字情報抽出処理について説明する。本実施形態については、実施形態１からの差分を中心に説明する。特に明記しない部分については実施形態１と同じ構成および処理である。 <Embodiment 2>
In this embodiment, a character information extraction process using a method different from that of the first embodiment will be described. The present embodiment will be described focusing on the differences from the first embodiment. The configuration and process are the same as those of the first embodiment unless otherwise specified.

図１２は、本実施形態の文字情報抽出処理の詳細を説明するためのフローチャートである。Ｓ１２０１はＳ５０１と同様の処理であるため説明は省略する。 Figure 12 is a flowchart for explaining the details of the character information extraction process of this embodiment. S1201 is the same process as S501, so the explanation will be omitted.

Ｓ１２０２において文字列領域検出部２０２は、Ｓ１２０１で取得した読取画像データに対してＢＳ処理を行うことで、読取画像内の文字列領域の位置を検出する。 In S1202, the character string area detection unit 202 performs BS processing on the scanned image data acquired in S1201 to detect the position of the character string area in the scanned image.

Ｓ１２０３において文字情報抽出部２０５は、サーバ１９１の文書管理機能のＩ／Ｆを介して、サーバ１９１の記憶装置１９６に記憶されている、登録済文書のフォームデータを取得する。 In S1203, the character information extraction unit 205 acquires the form data of the registered document stored in the storage device 196 of the server 191 via the I/F of the document management function of the server 191.

Ｓ１２０４において文字情報抽出部２０５は、Ｓ１２０２のＢＳ処理の結果得られた読取画像の文字列領域の位置情報と、フォームデータに登録されている印刷文字列領域の位置情報とを比較する。そして、読取画像の文字列領域と配置が類似する、印刷文字列領域がフォームデータに登録されているかを判定する。 In S1204, the character information extraction unit 205 compares the position information of the character string area in the scanned image obtained as a result of the BS processing in S1202 with the position information of the print character string area registered in the form data. Then, it is determined whether a print character string area whose layout is similar to that of the character string area in the scanned image is registered in the form data.

本ステップにおける類似の判定の方法は、実施形態１で説明した方法と同様の方法が用いられてよい。なお、判定の精度を高めるために、読取画像の文字列領域の座標情報に基づく画素のうち、手書き文字の画素を除いてから、本ステップの判定処理行われてもよい。 The method of determining similarity in this step may be the same as the method described in embodiment 1. Note that, in order to improve the accuracy of the determination, the determination process in this step may be performed after excluding pixels of handwritten characters from among the pixels based on the coordinate information of the character string area of the scanned image.

読取画像の文字列領域と類似する印刷文字列領域が登録されていない場合（Ｓ１２０４がＮＯ）、Ｓ１２０７～Ｓ１２０９の処理が行われる。Ｓ１２０７～Ｓ１２０９の処理は、Ｓ５０２～Ｓ５０４の処理と同様であるため説明は省略する。なお、Ｓ１２０９とＳ１２１０との間に、Ｓ５０６～Ｓ５０９と同様の処理が行われてもよい。 If a print character string area similar to the character string area of the scanned image has not been registered (NO in S1204), the processes in S1207 to S1209 are performed. The processes in S1207 to S1209 are similar to the processes in S502 to S504, and therefore will not be described. Note that the same processes as S506 to S509 may be performed between S1209 and S1210.

このように本実施形態では、フォームデータに、読取画像に含まれる文字列領域と配置が類似する印刷文字列領域が含まれていない場合のみ、Ｓ５０２～Ｓ５０４の処理と同様の処理が行われる。手書き文字画像を読取画像から分離する処理、それぞれの画像に対してＯＣＲ処理をするケースを限定することで処理時間を抑制することができる。 In this manner, in this embodiment, processing similar to the processing in S502 to S504 is performed only if the form data does not contain a printed character string area whose layout is similar to that of a character string area contained in the scanned image. Processing time can be reduced by limiting the cases in which the process separates handwritten character images from the scanned image and the OCR process is performed on each image.

一方、読取画像の文字列領域と類似する印刷文字列領域がフォームデータに登録されている場合（Ｓ１２０４がＹＥＳ）、Ｓ１２０５に進む。Ｓ１２０５において文字認識部２０３は、類似すると特定された、フォームデータに登録されている文字列領域に対応付けられているメタデータ項目の位置情報を取得する。そして、文字認識部２０３は、メタデータ項目の位置に対応する、読取画像の文字列領域に絞ってＯＣＲ処理を行う。読取画像内の文字列領域が手書き文字列領域か印刷文字列領域かを判断できる場合には、それぞれの文字列領域ごとにＯＣＲ処理の方法を切り替えてもよい。 On the other hand, if a printed character string area similar to the character string area of the scanned image is registered in the form data (YES in S1204), the process proceeds to S1205. In S1205, the character recognition unit 203 acquires position information of the metadata item associated with the character string area registered in the form data that has been identified as similar. The character recognition unit 203 then performs OCR processing on the character string area of the scanned image that corresponds to the position of the metadata item. If it is possible to determine whether the character string area in the scanned image is a handwritten character string area or a printed character string area, the OCR processing method may be switched for each character string area.

Ｓ１２０６において文字情報抽出部２０５は、それぞれのメタデータ項目の文字列領域からＯＣＲ処理された結果得られた文字列をそれぞれの項目の文字列として抽出する。抽出された文字列は項目と対応付けられてユーザが確認するための画面を表示するために表示制御部２０６に出力される。なお、文字列が抽出できないメタデータ項目があった場合には、Ｓ１２０７～Ｓ１２０９の処理を行って、手書き文字列領域の文字列を登録できる状態にしてもよい
Ｓ１２０６またはＳ１２０９の処理が完了した場合、Ｓ１２１０～Ｓ１２１１の処理が行われるＳ１２１０～Ｓ１２１１の処理は、Ｓ５１０～Ｓ５１１の処理と同様であるため説明は省略する。 In S1206, the character information extraction unit 205 extracts character strings obtained as a result of OCR processing from the character string area of each metadata item as character strings for each item. The extracted character strings are associated with the items and output to the display control unit 206 to display a screen for the user to confirm. If there is a metadata item from which a character string cannot be extracted, the processes of S1207 to S1209 may be performed to make it possible to register a character string in the handwritten character string area. When the process of S1206 or S1209 is completed, the processes of S1210 to S1211 are performed. The processes of S1210 to S1211 are similar to the processes of S510 to S511, and therefore will not be described.

以上説明したように本実施形態によれば、ＯＣＲ処理の対象を、メタデータ項目の文字列領域に限定することで処理時間を抑制することができる。 As described above, according to this embodiment, processing time can be reduced by limiting the target of OCR processing to character string areas of metadata items.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

１００画像形成装置
１１１ＣＰＵ
２０１画像データ取得部
２０２文字列領域検出部
２０５文字情報抽出部 100 Image forming device 111 CPU
201 Image data acquisition unit 202 Character string area detection unit 205 Character information extraction unit

Claims

an image acquisition means for acquiring a scanned image obtained by scanning a document;
a separation means for separating an image of handwritten characters and an image of an area other than the handwritten characters included in the read image from the read image;
a detection means for detecting character string regions in each image obtained by the separation, and detecting a character string region of handwritten characters included in the read image and a character string region of printed characters included in the read image, respectively ;
a data acquisition means for acquiring form data including first position information of a character string area of handwritten characters for each of a plurality of areas for writing handwritten characters and second position information of a character string area of printed characters for each of the areas of a document on which handwritten characters are written;
a specifying means for specifying that an arrangement of the character string area of the document and an arrangement of the character string area of the scanned image are similar when a degree of coincidence between pixel information of the character string area of the document based on position information included in the form data and pixel information of the character string area of the scanned image exceeds a threshold value ;
an extraction means for extracting, when a character string region of printed characters on the form based on the second position information included in the form data and a character string region of printed characters detected from the read image are identified as being similar, a character string of a predetermined item associated with the character string region of handwritten characters in the form data from a character string region of handwritten characters detected from the read image which has been further identified as being similar to the character string region of handwritten characters on the form based on the first position information included in the form data;
13. An image processing device comprising:

The extraction means includes:
2. The image processing device according to claim 1, wherein a character string included in a character string area of the scanned image that corresponds to the character string area of the specified item among the character string areas of printed characters and handwritten characters of the form data is extracted as the character string of the specified item .

character recognition means for performing character recognition processing on a character string area included in the read image;
The extraction means includes:
3. The image processing device according to claim 1, wherein a character string obtained as a result of character recognition by the character recognition means for a character string area of handwritten characters detected from the scanned image that corresponds to a character string area of the specified item among character string areas of the form data is extracted as the character string of the specified item .

4. The image processing apparatus according to claim 3 , wherein the character recognition means performs character recognition processing on a character string area of handwritten characters by a method different from the method used for character recognition processing on a character string area of printed characters.

The character recognition means
5. The image processing device according to claim 3, wherein when the identification means is unable to identify the specified items, or when there is an item among the specified items from which the extraction means cannot extract a character string , character recognition processing is performed on all character string areas detected by the detection means.

a receiving means for receiving a character string area corresponding to the predetermined item included in the scanned image based on an instruction from a user;
6. The image processing device according to claim 1, further comprising a transmission means for transmitting data for registering information indicating an arrangement of character string areas of printed characters and character string areas of handwritten characters included in the scanned image and information indicating positions of the character string areas corresponding to the specified items in the form data.

The display control means further includes a display control unit that controls displaying the read image on a display unit while emphasizing a character string area of the read image,
7. The image processing device according to claim 6, wherein the accepting unit accepts a character string area selected by a user from among the character string areas displayed in a highlighted manner on the display unit as the character string area of the scanned image corresponding to the predetermined item.

The method further includes registering the data transmitted from the transmitting means into the form data,
The registration means includes:
8. The image processing device according to claim 6, wherein, when position information of a character string area of printed characters detected from the read image is registered in the form data and position information of a character string area of handwritten characters detected from the read image is not registered in the form data, the position information of the character string area of handwritten characters detected from the read image is registered as new information.

The image processing device according to claim 6 or 7 ,
a server having a registration means for registering data transmitted from the transmission means of the image processing device in the form data;
An image forming system comprising:

an image acquisition step of acquiring a scanned image obtained by scanning the document;
a separation step of separating an image of handwritten characters and an image of an area other than handwritten characters included in the read image from the read image;
a detection step of detecting character string regions in each image obtained by the separation, and detecting a character string region of handwritten characters included in the read image and a character string region of printed characters included in the read image, respectively;
a data acquisition step of acquiring form data including first position information of a character string area of handwritten characters for each of a plurality of areas for writing handwritten characters and second position information of a character string area of printed characters for each of the areas of a document on which handwritten characters are written;
a specifying step of specifying that an arrangement of the character string area of the form and an arrangement of the character string area of the read image are similar when a degree of coincidence between pixel information of the character string area of the form based on position information included in the form data and pixel information of the character string area of the read image exceeds a threshold value ;
an extraction step of extracting a character string of a predetermined item associated with the character string area of handwritten characters of the form data from a character string area of handwritten characters detected from the read image which has been further specified as being similar to a character string area of handwritten characters of the form data based on the first position information included in the form data when the character string area of printed characters of the form based on the second position information included in the form data and the character string area of printed characters detected from the read image have been specified as being similar;
13. An image processing method comprising:

A program for causing a computer to function as each of the means of the image processing apparatus according to any one of claims 1 to 8 .