[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

JPH08123901A - Character extraction device and character recognition device using this device - Google Patents

Character extraction device and character recognition device using this device

Info

Publication number
JPH08123901A
JPH08123901A JP6285930A JP28593094A JPH08123901A JP H08123901 A JPH08123901 A JP H08123901A JP 6285930 A JP6285930 A JP 6285930A JP 28593094 A JP28593094 A JP 28593094A JP H08123901 A JPH08123901 A JP H08123901A
Authority
JP
Japan
Prior art keywords
character
color
image data
binary image
picture data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP6285930A
Other languages
Japanese (ja)
Inventor
Shunichi Oi
俊一 大井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP6285930A priority Critical patent/JPH08123901A/en
Publication of JPH08123901A publication Critical patent/JPH08123901A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Input (AREA)

Abstract

PURPOSE: To extract only a character part from color picture data. CONSTITUTION: A color printed document or the like is read by a color input means 1 to obtain color picture data. A color space conversion means 3 converts color picture data to position information indicating a position on the color space. A dividing means 5 obtains the color distribution in the color space of color picture data based on position information and divides the color distribution into plural color ranges which are used as thresholds for binarization of color picture data. A binarizing means 6 uses each color range to binarize color picture data in the color range to generate binary picture data where picture elements in the color range are the character part and the other picture elements are the background part. A character extracting means 8 obtains the size of the circumscribed rectangle of the character part of binary picture data and stores only binary picture data, which has this size within a preliminarily determined range, in a character extraction result storage means 9, ana a character pattern recognition means 10 takes binary picture data in the character extraction result storage means 9 as the input and recognizes the character pattern.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、入力されたカラー画像
データから文字部分を抽出する文字抽出装置及びこの文
字抽出装置を用いた文字認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character extracting device for extracting a character portion from input color image data and a character recognizing device using the character extracting device.

【0002】[0002]

【従来の技術】文字部分と背景部分との明度差が少ない
カラー印刷物等を読み取ることにより得られたカラー画
像データから文字部分を抽出し、文字の認識を行う場
合、明度差でなく色の違いを利用して文字部分を抽出す
るということが行われている。
2. Description of the Related Art When a character portion is extracted from color image data obtained by reading a color printed matter or the like in which the difference in lightness between the character portion and the background portion is small and the character is recognized, the difference in color is not the difference in lightness. Is used to extract the character part.

【0003】この種の従来の技術としては、カラー印刷
物等を読み取ることにより得られたカラー画像データの
色分布に基づいてカラー画像データ中から同一色とみな
せる複数の色範囲を求め、この色範囲毎に、その色範囲
に含まれる画素を文字部分、それ以外の画素を背景部分
とした2値画像データを生成し、この生成された各色範
囲毎の2値画像データそれぞれに対して文字パターン認
識を行うパターン認識手段とを備えた装置が知られてい
る(例えば、特開平3−14077号公報)。
As a conventional technique of this kind, a plurality of color ranges that can be regarded as the same color are obtained from the color image data based on the color distribution of the color image data obtained by reading a color printed matter and the like Each time, binary image data in which the pixels included in the color range are the character portion and the other pixels are the background portion is generated, and the character pattern recognition is performed on each of the generated binary image data for each color range. There is known a device including a pattern recognition means for performing the above (for example, Japanese Patent Laid-Open No. 3-14077).

【0004】[0004]

【発明が解決しようとする課題】上述した従来の装置
は、分割手段で求めた複数の色範囲毎に、その色範囲を
使ってカラー画像データを2値化することにより色範囲
に含まれる画素は文字部分、それ以外の画素は背景部分
とした2値画像データを生成し、それを文字抽出結果と
しているため、実際の文字部分と2値画像データに於け
る文字部分とが対応しない2値画像データも文字抽出結
果に含まれてしまうという問題がある。また、文字認識
を行う際には、文字部分が実際の文字部分と対応してい
ない2値画像データに対しても文字パターン認識処理が
行われるため、文字パターン認識処理に時間がかかると
いう問題もある。
In the above-mentioned conventional apparatus, for each of a plurality of color ranges obtained by the dividing means, the color image data is binarized using the color ranges to form the pixels included in the color range. Is a character portion, and the other pixels are binary image data that is used as a background portion and is used as the character extraction result. Therefore, the actual character portion does not correspond to the character portion in the binary image data. There is a problem that image data is also included in the character extraction result. In addition, when character recognition is performed, the character pattern recognition processing is performed on binary image data in which the character portion does not correspond to the actual character portion, and therefore the character pattern recognition processing takes time. is there.

【0005】そこで、本発明の目的は、実際の文字部分
のみを2値化した2値画像データのみを文字抽出結果と
することができる文字抽出装置を提供すると共に、文字
パターン認識処理を短時間で行える文字認識装置を提供
することにある。
Therefore, an object of the present invention is to provide a character extraction device that can obtain only binary image data obtained by binarizing only actual character parts as a character extraction result, and to perform character pattern recognition processing in a short time. The purpose of the present invention is to provide a character recognition device that can be used.

【0006】[0006]

【課題を解決するための手段】本発明の文字抽出装置
は、実際の文字部分のみを2値化した2値画像データの
みを得られるようにするという目的を達成するため、入
力されたカラー画像データの色分布に基づき該カラー画
像データを2値化する際の閾値となる色範囲を複数求
め、該求めた各色範囲毎にその色範囲を使って前記入力
カラー画像データを2値化し、色範囲内に含まれる画素
は文字部分、それ以外の画素は背景部分とした2値画像
データを生成する手段を備えた文字抽出装置に於いて、
前記求められた各2値画像データに対し、文字部分の外
接矩形のサイズを求め、該求めたサイズが予め定められ
た範囲内に収まる2値画像データのみを、文字抽出結果
格納手段に格納する文字抽出手段を備えたものである。
The character extracting apparatus of the present invention achieves the purpose of obtaining only binary image data in which only the actual character portion is binarized. Based on the color distribution of the data, a plurality of color ranges serving as a threshold when binarizing the color image data are obtained, and the input color image data is binarized by using the obtained color range for each color range. In a character extraction device equipped with means for generating binary image data in which pixels included in the range are character parts and pixels other than that are background parts,
For each of the obtained binary image data, the size of the circumscribing rectangle of the character portion is obtained, and only the binary image data in which the obtained size falls within a predetermined range is stored in the character extraction result storage means. It is provided with a character extracting means.

【0007】また、本発明の文字認識装置は、文字パタ
ーン認識処理を短時間で行えるようにするという目的を
達成するため、上記した構成の文字抽出装置に、前記文
字抽出装置の前記文字抽出結果格納手段に格納された2
値画像データを入力して文字パターン認識を行う文字パ
ターン認識手段を付加した構成を有する。
Further, in order to achieve the object that the character pattern recognition processing can be performed in a short time, the character recognition device of the present invention is configured such that the character extraction device having the above-mentioned configuration is provided with the character extraction result of the character extraction device. 2 stored in the storage means
It has a configuration in which a character pattern recognition means for inputting the value image data and performing character pattern recognition is added.

【0008】[0008]

【作用】各色範囲毎の2値画像データそれぞれに対し、
文字抽出手段は、文字部分の外接矩形のサイズを求め、
求めたサイズが予め定められた範囲内に収まる2値画像
データのみを、文字抽出結果として文字抽出結果格納手
段に格納する。
Operation: For each binary image data for each color range,
The character extraction means obtains the size of the circumscribed rectangle of the character part,
Only the binary image data whose calculated size falls within a predetermined range is stored in the character extraction result storage means as the character extraction result.

【0009】また、文字パターン認識手段は、文字抽出
結果格納手段に格納された2値画像データを入力して文
字パターン認識を行う。
Further, the character pattern recognition means inputs the binary image data stored in the character extraction result storage means and recognizes the character pattern.

【0010】[0010]

【実施例】次に本発明の実施例について図面を参照して
詳細に説明する。
Embodiments of the present invention will now be described in detail with reference to the drawings.

【0011】図1は本発明の実施例のブロック図であ
り、カラースキャナ等のカラー入力手段1と、カラー画
像格納手段2と、色空間変換手段3と、変換結果格納手
段4と、分割手段5と、2値化手段6と、2値画像格納
手段7と、文字抽出手段8と、文字抽出結果格納手段9
と、文字パターン認識手段10とから構成されている。
FIG. 1 is a block diagram of an embodiment of the present invention. Color input means 1 such as a color scanner, color image storage means 2, color space conversion means 3, conversion result storage means 4 and division means. 5, binarization means 6, binary image storage means 7, character extraction means 8, and character extraction result storage means 9
And a character pattern recognition means 10.

【0012】カラー入力手段1は、カラー印刷された文
書を走査することにより読み取った赤,緑,青の所謂R
GB成分からなるカラー画像データをカラー画像格納手
段2に格納する機能を有する。
The color input means 1 is so-called R for red, green and blue read by scanning a color printed document.
It has a function of storing color image data composed of GB components in the color image storage means 2.

【0013】色空間変換手段3は、カラー画像格納手段
2に格納されている各画素のカラー画像データを、2つ
の色成分(ab)を軸とする色空間内の対応する位置を
示す位置情報に変換し、変換結果を変換結果格納手段4
に格納する機能を有する。ここで、a,bは次式
(1),(2)で示されるものである。
The color space conversion means 3 positions the color image data of each pixel stored in the color image storage means 2 at position information indicating a corresponding position in the color space with two color components (ab) as an axis. And the conversion result is converted into the conversion result storage means 4.
Has the function of storing in. Here, a and b are expressed by the following equations (1) and (2).

【0014】 a=500〔(X/Xn)1/3 −(Y/Yn)1/3 〕 … (1) b=200〔(Y/Yn)1/3 −(Z/Zn)1/3 〕 … (2)A = 500 [(X / Xn) 1/3 − (Y / Yn) 1/3 ] (1) b = 200 [(Y / Yn) 1/3 − (Z / Zn) 1/3 ] (2)

【0015】但し、X,Y,ZはRGB系と次式
(3),(4),(5)に示す関係を有するものであ
り、Xn,Yn,Znは対象とする物体色と同一照明下
の完全拡散面の3刺激値である。 X=0.49000R+0.31000G+0.20000B … (3) Y=0.17697R+0.81240G+0.01063B … (4) Z= 0.01000G+0.99000B … (5)
However, X, Y, and Z have a relationship with the RGB system as shown in the following equations (3), (4), and (5), and Xn, Yn, and Zn have the same illumination as the target object color. It is a tristimulus value of the lower perfect diffusion surface. X = 0.49000R + 0.31000G + 0.20000B (3) Y = 0.17697R + 0.81240G + 0.01063B (4) Z = 0.01000G + 0.99000B (5)

【0016】分割手段5は、変換結果格納手段4に格納
されている各画素の位置情報に基づいて、カラー画像格
納手段2に格納されているカラー画像データの上記色空
間に於ける色分布を求め、更に、その色分布をカラー画
像格納手段2に格納されているカラー画像データを2値
化する際に使用する複数の色範囲に分割する機能を有す
る。その際、分割手段5は、色分布を分散比=(色範囲
間分散/色範囲内分散)ができるだけ大きくなるような
数の色範囲に分割する。ここで、分散比が大きいという
ことは、各色範囲の境界が明確であるということを意味
している。
The dividing means 5 calculates the color distribution of the color image data stored in the color image storage means 2 in the color space based on the position information of each pixel stored in the conversion result storage means 4. Further, it has a function of obtaining and further dividing the color distribution into a plurality of color ranges used when binarizing the color image data stored in the color image storage means 2. At this time, the dividing unit 5 divides the color distribution into a number of color ranges such that the dispersion ratio = (dispersion between color ranges / dispersion within color range) is as large as possible. Here, a large dispersion ratio means that the boundaries of each color range are clear.

【0017】2値化手段6は、分割手段5が求めた各色
範囲毎に、その色範囲を使用してカラー画像格納手段2
に格納されているカラー画像データを2値化し、色範囲
に含まれる画素は文字部分、それ以外の部分は背景部分
とした2値画像データを生成する機能及び生成した2値
画像データを2値画像格納手段7に格納する機能を有す
る。
The binarizing means 6 uses, for each color range obtained by the dividing means 5, the color image storage means 2 by using the color range.
Binarize the color image data stored in, and generate the binary image data in which the pixels included in the color range are the character part and the other part is the background part, and the generated binary image data is binary. It has a function of storing in the image storage means 7.

【0018】文字抽出手段8は、2値画像格納手段7に
順次格納される各2値画像データに対し、文字部分の外
接矩形サイズを求める機能と、求めたサイズが予め定め
られた範囲に収まるか否かを判断する機能と、収まると
判断した場合は文字部分の輪郭線を抽出して輪郭線内の
領域の値を文字部分を示す値にした2値画像データを文
字抽出結果格納手段9に格納する機能を有する。
The character extraction means 8 has a function of obtaining the circumscribed rectangle size of the character portion for each binary image data sequentially stored in the binary image storage means 7, and the obtained size falls within a predetermined range. The function of determining whether or not it is, and when it is determined that it fits, the contour line of the character portion is extracted, and the binary image data in which the value of the area within the contour line is a value indicating the character portion is stored in the character extraction result storage unit 9 Has the function of storing in.

【0019】文字パターン認識手段10は、文字抽出結
果格納手段9に格納された2値画像データを入力して文
字パターン認識を行う機能を有する。
The character pattern recognition means 10 has a function of inputting the binary image data stored in the character extraction result storage means 9 and performing character pattern recognition.

【0020】次に、本実施例の動作を説明する。Next, the operation of this embodiment will be described.

【0021】今、図2に示すカラー画像データがカラー
入力手段1によって読み取られ、カラー画像格納手段2
に格納されたとする。尚、図2に於いて、21は抽出す
べき文字部分、22,23は背景部分を表している。
Now, the color image data shown in FIG. 2 is read by the color input means 1 and the color image storage means 2 is read.
Stored in. In FIG. 2, 21 is a character portion to be extracted, and 22 and 23 are background portions.

【0022】色空間変換手段3は、カラー画像格納手段
2に図2に示すカラー画像データが格納されると、カラ
ー画像データの各画素についてのデータを、2つの色成
分(ab)を軸とする色空間内の対応する位置を示す位
置情報に変換し、変換結果を変換結果格納手段4に格納
する。
When the color image data shown in FIG. 2 is stored in the color image storage means 2, the color space conversion means 3 stores the data for each pixel of the color image data with two color components (ab) as axes. The conversion result is converted into position information indicating the corresponding position in the color space, and the conversion result is stored in the conversion result storage unit 4.

【0023】分割手段5は、変換結果格納手段4に格納
された各画素の位置情報に基づいて、カラー画像格納手
段2に格納されているカラー画像データの上記色空間に
於ける色分布を求め、更に、その色分布を分散比ができ
るだけ大きくなるような数の複数の色範囲に分割する。
そして、色分布を複数の色範囲に分割すると、分割手段
5は、各色範囲を示す情報を2値化手段6に渡す。今、
例えば、色分布が図3に示すように、3つの色範囲31
〜33に分割されたとすると、分割手段5は、各色範囲
31〜33を示す数式等の情報を2値化手段6に渡す。
尚、以下の説明に於いては、色範囲31が図2に示した
文字部分21に対応し、色範囲32,33がそれぞれ背
景部分22,23に対応しているとする。
The dividing means 5 obtains the color distribution in the color space of the color image data stored in the color image storage means 2 based on the position information of each pixel stored in the conversion result storage means 4. Further, the color distribution is divided into a plurality of color ranges in which the dispersion ratio is as large as possible.
Then, when the color distribution is divided into a plurality of color ranges, the dividing unit 5 passes information indicating each color range to the binarizing unit 6. now,
For example, the color distribution has three color ranges 31 as shown in FIG.
If divided into 3 to 33, the dividing unit 5 passes information such as a mathematical expression indicating each color range 31 to 33 to the binarizing unit 6.
In the following description, the color range 31 corresponds to the character portion 21 shown in FIG. 2, and the color ranges 32 and 33 correspond to the background portions 22 and 23, respectively.

【0024】2値化手段6は、分割手段5から各色範囲
31〜33を示す情報が渡されると、その内の1つを選
択し、その後、カラー画像格納手段2に格納されている
各画素のカラー画像データそれぞれについて、その画素
のカラー画像データと対応する上記色空間上の位置が、
上記選択した情報によって示される色範囲に含まれてい
るか否かを判断する。
When the information indicating each of the color ranges 31 to 33 is passed from the dividing means 5, the binarizing means 6 selects one of them, and thereafter, each pixel stored in the color image storing means 2 is selected. For each of the color image data of, the position on the color space corresponding to the color image data of the pixel is
It is determined whether or not it is included in the color range indicated by the selected information.

【0025】そして、含まれていると判断した場合は、
その画素に対応する2値画像データを文字部分を示す値
(本実施例では“1”とする)とし、含まれていないと
判断した場合はその画素に対応する2値画像データを背
景部分を示す値(本実施例では“0”とする)にし、2
値画像格納手段7の上記画素に対応する位置に格納す
る。上述した処理を全ての画素について行うと、2値化
手段6はその旨を文字抽出手段8に通知する。
If it is determined that the content is included,
The binary image data corresponding to the pixel is set to a value indicating a character portion (“1” in the present embodiment), and when it is determined that it is not included, the binary image data corresponding to the pixel is set to the background portion. Set to the value shown (set to "0" in this embodiment), 2
It is stored in the value image storage means 7 at a position corresponding to the pixel. When the above-mentioned processing is performed for all the pixels, the binarizing means 6 notifies the character extracting means 8 of that fact.

【0026】今、例えば、2値化手段6が図3に示す色
範囲32を示す情報を選択したとすると、2値画像格納
手段7に格納される2値画像データは図4(a)に示す
ものとなり、2値化手段6は、2値画像データの格納が
完了すると、その旨を文字抽出手段8に通知する。ここ
で、図4に於いて斜線を施した部分は“1”を、空白部
分は“0”を表している。
Now, for example, if the binarizing means 6 selects the information indicating the color range 32 shown in FIG. 3, the binary image data stored in the binary image storing means 7 is as shown in FIG. When the storage of the binary image data is completed, the binarizing means 6 notifies the character extracting means 8 of that. Here, in FIG. 4, the shaded portion represents "1" and the blank portion represents "0".

【0027】文字抽出手段8は、2値化手段6から通知
を受けると、2値画像格納手段7に格納されている2値
画像データの“1”が連結している部分を探し、その部
分の外接矩形のサイズを求める。その後、文字抽出手段
8は、求めた外接矩形のサイズが予め定められた範囲に
収まるか否かを判断する。ここで、上記範囲は、抽出対
象にしている各文字の外接矩形のサイズの内、最も大き
なサイズよりも僅かに大きく、最も小さなサイズよりも
僅かに小さいサイズである。
Upon receiving the notification from the binarizing means 6, the character extracting means 8 searches for a portion where the binary image data "1" stored in the binary image storing means 7 is connected, and the portion is searched. Find the size of the bounding rectangle of. After that, the character extracting means 8 determines whether or not the size of the circumscribed rectangle obtained is within a predetermined range. Here, the above range is a size that is slightly larger than the largest size and slightly smaller than the smallest size of the sizes of the circumscribing rectangles of the characters to be extracted.

【0028】そして、範囲内に収まると判断した場合
は、文字抽出手段8は、“1”が連結している部分の輪
郭線を抽出し、輪郭線内の値を“1”とした2値画像デ
ータを文字抽出結果格納手段9に格納した後、文字パタ
ーン認識手段10に対して認識開始を指示し、文字パタ
ーン認識手段10の認識処理が終了した後、2値化手段
6に対して次の2値画像データを要求する。また、範囲
内に収まらないと判断した場合は、次の2値画像データ
を2値化手段6に要求する。
When it is judged that the character is within the range, the character extracting means 8 extracts the contour line of the portion where "1" is connected, and the binary value in which the value within the contour line is "1". After storing the image data in the character extraction result storage means 9, the character pattern recognition means 10 is instructed to start the recognition, and after the recognition processing of the character pattern recognition means 10 is completed, the binarization means 6 is operated next. The binary image data of If it is determined that the image data does not fall within the range, the next binary image data is requested to the binarizing means 6.

【0029】今、2値画像格納手段7には、図4(a)
に示す2値画像データが格納されており、“1”が連結
している部分の外接矩形のサイズは、予め定められた範
囲に収まらないので、文字抽出手段8は、2値化手段6
に次の2値画像データを要求することになる。
Now, the binary image storage means 7 is stored in FIG.
The binary image data shown in is stored, and the size of the circumscribing rectangle of the part to which “1” is connected does not fall within a predetermined range.
To request the next binary image data.

【0030】2値化手段6は、次の2値画像データが要
求されると、未処理の色範囲31,33を示す情報の内
の1つを選択し(色範囲33を示す情報を選択したとす
る)、選択した色範囲33を示す情報に基づいて前述し
たと同様の処理を行う。この結果、2値画像格納手段7
には、図4(b)に示す2値画像データが格納される。
When the next binary image data is requested, the binarizing means 6 selects one of the information indicating the unprocessed color ranges 31 and 33 (selects the information indicating the color range 33). Then, the same processing as described above is performed based on the information indicating the selected color range 33. As a result, the binary image storage means 7
Stores the binary image data shown in FIG. 4B.

【0031】文字抽出手段8は、図4(b)に示す2値
画像データが格納されると、前述したと同様の処理を行
う。この場合も、“1”の連結している部分の外接矩形
のサイズは予め定められている範囲に収まらないので、
文字抽出手段8は2値化手段6に対して次の2値画像デ
ータを要求することになる。
When the binary image data shown in FIG. 4B is stored, the character extracting means 8 performs the same processing as described above. Also in this case, since the size of the circumscribed rectangle of the connected portion of "1" does not fall within the predetermined range,
The character extracting means 8 requests the binarizing means 6 for the next binary image data.

【0032】この要求時、未処理の色範囲を示す情報
は、色範囲31についての情報だけであるので、2値化
手段6は、色範囲31についての情報を選択し、それに
基づいて前述したと同様の処理を行う。この結果、2値
画像格納手段7には図4(c)に示す2値画像データが
格納される。また、2値化手段6は、全ての色範囲につ
いて処理が完了したことを文字抽出手段8に通知する。
At the time of this request, since the information indicating the unprocessed color range is only the information about the color range 31, the binarizing means 6 selects the information about the color range 31, and based on it, the above-mentioned processing is performed. Perform the same processing as. As a result, the binary image data shown in FIG. 4C is stored in the binary image storage means 7. The binarizing means 6 also notifies the character extracting means 8 that the processing has been completed for all color ranges.

【0033】文字抽出手段8は、2値画像格納手段7に
図4(c)に示す2値画像データが格納されると、前述
したと同様の処理を行う。この場合、“1”が連結した
部分41の外接矩形のサイズは予め定められた範囲内に
収まるので、文字抽出手段8は、部分41の輪郭線を求
める。その後、文字抽出手段8は、輪郭線内の領域の値
を全て“1”にした図5に示すような2値画像データを
文字抽出結果格納手段9に格納する。即ち、2値画像格
納手段7に於ける“0”の部分42が“1”に変換され
た2値画像データが文字抽出結果格納手段9に格納され
ることになる。その後、文字抽出手段8は、文字パター
ン認識手段10に対して認識処理開始を指示する。ま
た、文字抽出手段8は、2値化手段6から全ての色範囲
について処理が完了したことが通知されているので、上
記した処理を行った後、その処理を終了する。
When the binary image data shown in FIG. 4C is stored in the binary image storing means 7, the character extracting means 8 performs the same processing as described above. In this case, the size of the circumscribing rectangle of the part 41 in which “1” s are connected falls within a predetermined range, so the character extracting means 8 obtains the contour line of the part 41. After that, the character extracting means 8 stores in the character extraction result storing means 9 the binary image data as shown in FIG. 5 in which all the values of the area within the outline are set to "1". That is, the binary image data obtained by converting the "0" portion 42 in the binary image storage means 7 into "1" is stored in the character extraction result storage means 9. After that, the character extraction means 8 instructs the character pattern recognition means 10 to start the recognition process. Further, since the character extracting means 8 has been notified by the binarizing means 6 that processing has been completed for all color ranges, the character extracting means 8 ends the processing after performing the processing described above.

【0034】文字パターン認識手段10は、文字抽出手
段8から認識処理開始が指示されると、文字抽出結果格
納手段9に格納された図5に示す2値画像データを入力
し、文字パターン認識を行う。
When the character extraction means 8 instructs the character pattern recognition means 10 to start the recognition processing, the character pattern recognition means 10 inputs the binary image data shown in FIG. To do.

【0035】[0035]

【発明の効果】以上説明したように本発明の文字抽出装
置は、色範囲毎に求められた2値画像データそれぞれに
対して文字部分の外接矩形のサイズを求め、そのサイズ
が予め定められた範囲内に収まる2値画像データのみを
文字抽出結果格納手段に格納する文字抽出手段を備えて
いるので、実際の文字部分と2値画像データに於ける文
字部分とが対応する2値画像データのみを文字抽出結果
とすることができる効果がある。
As described above, the character extraction device of the present invention obtains the size of the circumscribed rectangle of the character portion for each of the binary image data obtained for each color range, and the size is predetermined. Since the character extraction means for storing only the binary image data within the range in the character extraction result storage means is provided, only the binary image data in which the actual character portion and the character portion in the binary image data correspond to each other Is effective as a character extraction result.

【0036】また、本発明の文字認識装置は、実際の文
字部分と2値画像データに於ける文字部分とが対応する
2値画像データのみが格納されている文字抽出結果格納
手段から2値画像データを入力して文字パターン認識を
行う文字パターン認識手段を備えているので、文字認識
処理を高速に行うことができる効果がある。
Further, the character recognition apparatus of the present invention uses the binary image from the character extraction result storage means in which only the binary image data corresponding to the actual character portion and the character portion in the binary image data is stored. Since the character pattern recognition means for inputting data to recognize the character pattern is provided, there is an effect that the character recognition processing can be performed at high speed.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の実施例のブロック図である。FIG. 1 is a block diagram of an embodiment of the present invention.

【図2】入力されるカラー画像データの一例を示す図で
ある。
FIG. 2 is a diagram showing an example of input color image data.

【図3】分割手段5の処理例を説明するための図であ
る。
FIG. 3 is a diagram for explaining a processing example of a dividing unit 5.

【図4】2値化手段6の処理結果を示す図である。FIG. 4 is a diagram showing a processing result of the binarizing means 6.

【図5】文字抽出結果格納手段9の内容例を示す図であ
る。
5 is a diagram showing an example of contents of a character extraction result storage unit 9. FIG.

【符号の説明】[Explanation of symbols]

1…カラー入力手段 2…カラー画像格納手段 3…色空間変換手段 4…変換結果格納手段 5…分割手段 6…2値化手段 7…2値画像格納手段 8…文字抽出手段 9…文字抽出結果格納手段 10…文字パターン認識手段 21…文字部分 22,23…背景部分 31〜33…色範囲 1 ... Color input means 2 ... Color image storage means 3 ... Color space conversion means 4 ... Conversion result storage means 5 ... Division means 6 ... Binarization means 7 ... Binary image storage means 8 ... Character extraction means 9 ... Character extraction results Storage means 10 ... Character pattern recognition means 21 ... Character portion 22, 23 ... Background portion 31-33 ... Color range

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】 入力されたカラー画像データの色分布に
基づき該カラー画像データを2値化する際の閾値となる
色範囲を複数求め、該求めた各色範囲毎にその色範囲を
使って前記入力カラー画像データを2値化し、色範囲内
に含まれる画素は文字部分、それ以外の画素は背景部分
とした2値画像データを生成する手段を備えた文字抽出
装置に於いて、 前記求められた各2値画像データに対し、文字部分の外
接矩形のサイズを求め、該求めたサイズが予め定められ
た範囲内に収まる2値画像データのみを、文字抽出結果
格納手段に格納する文字抽出手段を備えたことを特徴と
する文字抽出装置。
1. A plurality of color ranges serving as a threshold when binarizing the color image data is obtained based on the color distribution of the input color image data, and the color range is used for each of the obtained color ranges. In the character extraction device having means for generating binary image data in which the input color image data is binarized and the pixels included in the color range are character portions and the other pixels are background portions. For each binary image data, the size of the circumscribing rectangle of the character portion is calculated, and only the binary image data whose calculated size falls within a predetermined range is stored in the character extraction result storage means. A character extraction device comprising:
【請求項2】 請求項1記載の文字抽出装置と、 前記文字抽出装置の前記文字抽出結果格納手段に格納さ
れた2値画像データを入力して文字パターン認識を行う
文字パターン認識手段を備えたことを特徴とする文字認
識装置。
2. The character extracting device according to claim 1, and a character pattern recognizing means for recognizing a character pattern by inputting the binary image data stored in the character extraction result storing means of the character extracting device. A character recognition device characterized by the above.
【請求項3】 前記文字抽出手段は、2値画像データを
前記文字抽出結果格納手段に格納する際、文字部分の輪
郭線を抽出し、輪郭線内の全ての画素を文字部分とする
ことを特徴とする請求項2記載の文字認識装置。
3. The character extracting means, when storing the binary image data in the character extraction result storage means, extracts a contour line of a character portion and sets all pixels within the contour line as the character portion. The character recognition device according to claim 2, wherein the character recognition device is a character recognition device.
JP6285930A 1994-10-26 1994-10-26 Character extraction device and character recognition device using this device Pending JPH08123901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP6285930A JPH08123901A (en) 1994-10-26 1994-10-26 Character extraction device and character recognition device using this device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP6285930A JPH08123901A (en) 1994-10-26 1994-10-26 Character extraction device and character recognition device using this device

Publications (1)

Publication Number Publication Date
JPH08123901A true JPH08123901A (en) 1996-05-17

Family

ID=17697845

Family Applications (1)

Application Number Title Priority Date Filing Date
JP6285930A Pending JPH08123901A (en) 1994-10-26 1994-10-26 Character extraction device and character recognition device using this device

Country Status (1)

Country Link
JP (1) JPH08123901A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0858048A1 (en) * 1995-08-10 1998-08-12 Nec Corporation Apparatus of optically reading character and method thereof
US6563949B1 (en) 1997-12-19 2003-05-13 Fujitsu Limited Character string extraction apparatus and pattern extraction apparatus
US6701008B1 (en) 1999-01-19 2004-03-02 Ricoh Company, Ltd. Method, computer readable medium and apparatus for extracting characters from color image data
US6987879B1 (en) 1999-05-26 2006-01-17 Ricoh Co., Ltd. Method and system for extracting information from images in similar surrounding color
KR100625755B1 (en) * 2004-03-10 2006-09-20 후지쯔 가부시끼가이샤 Character recognition apparatus, character recognition method, medium processing method and computer readable recording medium having character recognition program
US7221796B2 (en) 2002-03-08 2007-05-22 Nec Corporation Character input device, character input method and character input program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62202291A (en) * 1986-02-28 1987-09-05 Nec Corp Noise deciding system for character recognizing device
JPH0259979A (en) * 1988-08-26 1990-02-28 Toshiba Corp Document and image processor
JPH047792A (en) * 1990-04-26 1992-01-13 Fujitsu Ltd Method and device for extraction of character
JPH05266251A (en) * 1992-03-24 1993-10-15 Toshiba Corp Character extracting device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62202291A (en) * 1986-02-28 1987-09-05 Nec Corp Noise deciding system for character recognizing device
JPH0259979A (en) * 1988-08-26 1990-02-28 Toshiba Corp Document and image processor
JPH047792A (en) * 1990-04-26 1992-01-13 Fujitsu Ltd Method and device for extraction of character
JPH05266251A (en) * 1992-03-24 1993-10-15 Toshiba Corp Character extracting device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0858048A1 (en) * 1995-08-10 1998-08-12 Nec Corporation Apparatus of optically reading character and method thereof
US6563949B1 (en) 1997-12-19 2003-05-13 Fujitsu Limited Character string extraction apparatus and pattern extraction apparatus
US6701008B1 (en) 1999-01-19 2004-03-02 Ricoh Company, Ltd. Method, computer readable medium and apparatus for extracting characters from color image data
US6987879B1 (en) 1999-05-26 2006-01-17 Ricoh Co., Ltd. Method and system for extracting information from images in similar surrounding color
US7221796B2 (en) 2002-03-08 2007-05-22 Nec Corporation Character input device, character input method and character input program
KR100625755B1 (en) * 2004-03-10 2006-09-20 후지쯔 가부시끼가이샤 Character recognition apparatus, character recognition method, medium processing method and computer readable recording medium having character recognition program

Similar Documents

Publication Publication Date Title
KR100339691B1 (en) Apparatus for recognizing code and method therefor
US6865290B2 (en) Method and apparatus for recognizing document image by use of color information
US6347156B1 (en) Device, method and storage medium for recognizing a document image
JP2008148298A (en) Method and apparatus for identifying regions of different content in image, and computer readable medium for embodying computer program for identifying regions of different content in image
JP5337563B2 (en) Form recognition method and apparatus
JP3753357B2 (en) Character extraction method and recording medium
US7123768B2 (en) Apparatus and method for detecting a pattern
US6924909B2 (en) High-speed scanner having image processing for improving the color reproduction and visual appearance thereof
US8670623B2 (en) Image processing apparatus, image conversion method, and computer-readable storage medium for computer program based on calculated degree of complexity
EP0933719A2 (en) Image processing method and apparatus
JP4582200B2 (en) Image processing apparatus, image conversion method, and computer program
JPH08123901A (en) Character extraction device and character recognition device using this device
EP0600613A2 (en) Improvements in image processing
JP3955467B2 (en) Image processing program and image processing apparatus
JP2760791B2 (en) Image information processing device
JP3966448B2 (en) Image processing apparatus, image processing method, program for executing the method, and recording medium storing the program
JP2800192B2 (en) High-speed character / graphic separation device
JP3210378B2 (en) Image input device
JP2000163512A (en) Method and device for document picture and record medium
JPH0795397A (en) Original reader
JP2853141B2 (en) Image area identification device
JP2001307097A (en) Device and method for detecting pattern
JPH10200767A (en) Image processing unit
JPH0962782A (en) Document reader
JPH04288773A (en) Attribute discriminating method