JP2004287829A

JP2004287829A - Image data classifying device and program

Info

Publication number: JP2004287829A
Application number: JP2003078823A
Authority: JP
Inventors: Hitoshi Okamoto; 仁岡本; Kagenori Nagao; 景則長尾; Masayuki Hisatake; 真之久武; Shinichi Yada; 伸一矢田
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-03-20
Filing date: 2003-03-20
Publication date: 2004-10-14
Anticipated expiration: 2023-03-20
Also published as: JP4329370B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technology for classifying image data more efficiently while reducing a user's load. <P>SOLUTION: This image data classifying device 1 has a storage device 11 for storing image data and the feature amount of the image data, an operation input part 14a, and a display part 14b. When an instruction is given through the operation input part 14a to the effect that the image data stored in the storage device 11 is to be included in a certain group, whether it is proper to include the image data in the group is determined by comparing the feature amount of the image data with the feature amount corresponding to the group. In the case it is proper, the image data is included in the group and stored in the storage device 11, but in the case it is not proper, a message to the effect that it is not proper is outputted from the display part 14b. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、画像データを分類するための技術に関する。
【０００２】
【従来の技術】
近年、紙に書かれた文書をスキャナに読み込ませて電子化し画像データとしてコンピュータ装置で管理したり、デジタルカメラで撮影された画像を表す画像データをコンピュータ装置に保存して管理したりすることが行われている。画像データを管理する方法としては、紙に書かれた文書を管理する方法と同様に、種類毎に分類して管理するという方法がある。画像データを種類毎に分類する方法は、例えば、以下の通りである。ユーザが、例えば風景、人物、文書といった画像データによって表される画像の種類を予め把握しておき、その種類に対応するフォルダに画像データを格納する。ユーザがこのような分類作業を画像データ１つ１つに対して行うことにより、複数の画像データが種類毎に分類される。
しかし、分類すべき画像データが大量にある場合、画像データ１つ１つに対してこのような分類作業を行うことは、ユーザにとってはかなり面倒である。また、ユーザが分類を間違える可能性も高い。
このような問題を解決可能な技術が特許文献１及び特許文献２に記載されている。特許文献１及び特許文献２には、画像データを自動的に分類して保存する装置が記載されている。特許文献１に示される装置では、画像データのレイアウトを解析し、この解析結果に基づいて、画像データを分類する。レイアウトの解析結果としては、画像データのＢｌｏｃｋ数、Ｔｅｘｔ属性数、他属性のＢｌｏｃｋとの重なり具合度数、Ｔａｂｌｅ属性数とその大きさ、Ｐｉｃｔｕｒｅ数とその大きさなどの情報が挙げられる。特許文献２に示される装置では、画像データが入力されると、当該画像データによって表される画像の特徴を表す特徴量を算出し、この特徴量を用いて、当該画像データと当該装置に既に記憶されている画像データとの関連性をクラスタリング手法により評価し、関連性の高い画像データのグループ毎に分類して記憶する。画像の特徴としては、例えば、画像全体の色調、縦横比、輝度や色の分布状態などの特徴が挙げられる。
【０００３】
【特許文献１】
特開２００１―１０１２１３号公報
【特許文献２】
特開２００１―２５６２４４号公報
【０００４】
【発明が解決しようとする課題】
しかし、いずれの装置においても、画像データが自動的に分類されるため、その分類結果は必ずしもユーザの意図と合致するとは限らない。従って、ユーザは、どのように分類されたのかを確かめなければならない。そのため、ユーザは、例えばコンピュータ装置に画像データの分類先を表示させ、その表示を見て分類結果を確認する必要がある。また、その分類先がユーザの意図に反するものであった場合には、結局、ユーザ自身が分類作業を行わなければならない。
【０００５】
本発明は、以上説明した事情に鑑みてなされたものであり、ユーザの負荷を軽減し、より効率的に、画像データを分類する技術を提供することを目的とする。
【０００６】
【課題を解決するための手段】
本発明は、画像データと画像データの特徴量を記憶する蓄積装置と、操作入力部と、出力部と、前記蓄積装置に記憶された画像データをあるグループに含めるべき旨の指示が前記操作入力部を介して与えられた場合、当該画像データの特徴量と、当該グループに対応付けられた特徴量とを比較することにより、当該画像データを当該グループに含めることが妥当か否かを判断し、妥当である場合には当該画像データを当該グループに含めて前記蓄積装置に格納し、妥当でない場合には妥当でない旨のメッセージを前記出力部により出力する分類登録部とを有することを特徴とする画像データ分類装置を提供する。
【０００７】
本発明によれば、前記蓄積装置に記憶された画像データをあるグループに含めるべき旨の指示が前記操作入力部を介して与えられた場合、当該画像データの特徴量と、当該グループに対応付けられた特徴量とが比較されることにより、当該画像データを当該グループに含めることが妥当か否かが判断され、妥当である場合には当該画像データが当該グループに含めて前記蓄積装置に格納され、妥当でない場合には妥当でない旨のメッセージが前記出力部により出力される。
【０００８】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態を詳細に説明する。なお、実施の形態を説明するための全図において、同一機能を有するものは同一符号を付け、その繰り返しの説明は省略する。
（１．構成）
＜全体構成＞
図１は、本発明に係る画像データ分類装置１のハードウェア構成を示すブロック図である。
ユーザインターフェース１４は、操作入力部１４ａと表示部１４ｂとから構成される。操作入力部１４ａは、数字、文字、コマンドなどを入力するための複数のキーを備え、これらのキーの操作に応じた操作信号を出力する。表示部１４ｂは、図示しない表示パネルと、この表示パネルを駆動する駆動回路とを有する。なお、表示部１４ｂの代わりに、情報を音声にて出力する音声出力部を備えても良い。
【０００９】
ネットワークインターフェース１５は、ネットワークを介して制御部１０を他の装置に接続し、両者の間のデータの授受を制御する。
画像入力装置１２は、原稿から画像を読み取り、アナログ画像信号を出力するスキャナと、スキャナから出力されるアナログ画像信号をディジタル画像信号に変換して出力するＡ／Ｄ変換回路とを有する。また、好ましい態様において、画像入力装置１２は、原稿受けにセットされた原稿の束から原稿を１枚ずつ取り出してスキャナに出力するＡＤＦ（ＡｕｔｏｍａｔｉｃＤｏｃｕｍｅｎｔＦｅｅｄｅｒ；自動文書送り機構）を有する。
【００１０】
蓄積装置１１は、ハードディスクドライブやＤＶＤ−ＲＡＭ（ＤｉｇｉｔａｌＶｉｄｅｏＤｉｓｃ−ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）ドライブ等の大容量記憶装置から構成される。制御部１０は、以上説明した装置各部とバス１６を介して接続されている。この制御部１０は、画像データ分類装置１全体を制御する手段であり、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）と、ＲＡＭ（ＲａｎｄａｍＡｃｃｓｅｓｓＭｅｍｏｒｙ）と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）とを備える（いずれも図示略）。ここで、ＣＰＵは、ＲＯＭに格納されたプログラムを実行することにより、画像データ分類装置１の各部の制御を行う。
【００１１】
図２は、制御部１０に係る機能的構成を示すブロック図である。制御部１０は、特徴量抽出部１０ｂと、類似度評価部１０ｃと、分類登録部１０ａとを有する。これらの実体は、制御部１０のＣＰＵによって実行されるソフトウェアモジュールである。
特徴量抽出部１０ｂは、画像データに対して特徴量抽出処理を行い、画像データによって表される画像の特徴量を算出する。画像の特徴とは、例えば、画像の濃度、縦横比、輝度や色の分布状態、エッジの分布状態、平坦なエリアの分布状態などである。
このような特徴量を算出する手順は例えば以下の通りである。まず、特徴量抽出部１０ｂは、画像データで表される画像の領域を縦方向にＸ分割、横方向にＹ分割する。すなわち、一の画像データで表される画像をＸ＊Ｙ個の領域に分割する。次に、各々の領域についてそれぞれ、所定の算出方法で特徴量を算出する。この特徴量については、各種のものが考えられるが、本実施形態では、その領域の色のＲ（赤色）、Ｇ（緑色）、Ｂ（青色）の各成分を求める。そして、Ｒ、Ｇ、Ｂの各成分をＬ＊ａ＊ｂ空間（特徴色空間）での色成分を表す３個の量に変換する。これら３個の量を１つの領域に対応した特徴量とする。以上の結果、画像データから、（Ｘ＊Ｙ＊３）個の特徴量が算出される。図５においては、一例としてＸ＝６、Ｙ＝４の場合が示されており、この画像データからは計７２個の特徴量が算出されていることが示されている。この７２個の特徴量は、画像データによって表される画像の特徴を表す７２次元ベクトルを構成する。
【００１２】
類似度評価部１０ｃは、ある画像データの特徴量と別の画像データまたは画像データ群から得られた特徴量の比較を行い、両者に対応した各画像が同一種類の画像と認められる程度に類似しているか否かの判断をするモジュールである。対比の対象となる特徴量は、制御部１０によって類似度評価部１０ｃに与えられる。
類似度評価部１０ｃによって行われる判断は次の通りである。まず、各画像データから得られたｎ個の特徴量は、ある画像の特徴を表すｎ次元のベクトルを構成する。そこで、類似度評価部１０ｃは、対比すべき２つの画像データに対応した各ｎ個の特徴量が与えられた場合、一方のｎ個の特徴量を成分とするｎ次元ベクトルと他方のｎ個の特徴量を成分とするｎ次元ベクトルとのｎ次元空間での距離を求める。そして、この距離に基づいて、２つの画像データによって表される各画像が同一種類に属すると認められる程度に類似しているかの判断を行うのである。図６および図７は、この類似度評価部１０ｃによる処理を説明する図である。これらの図には、２つの画像データに対応した各特徴量を各々成分とする２つのベクトルとして、便宜上、３次元ベクトルαおよびβが示されている。そして、図６には、ベクトルαとベクトルβとの間のユークリッド距離が短い状態が示され、図７には、ベクトルαとベクトルβとの間のユークリッド距離が長い状態が示されている。類似度評価部１０ｃは、これらの２つのベクトルαおよびβのユークリッド距離を、対比される２つの画像の類似度とみなし、類似度が所定値以上である（ユークリッド距離が所定長以下である）場合には、２つの画像が同一種類に属すると判断し、図７に示すように類似度が所定値未満である（ユークリッド距離が所定長よりも長い）場合には、２つの画像が同一種類に属しないと判断するのである。
なお、算出する距離はユークリッド距離に限らず、例えば、マハラノビス距離などの他の算出方式による距離であっても構わない。
【００１３】
分類登録部１０ｃは、以上説明した特徴量抽出部１０ｂおよび類似度評価部１０ｃを利用して、蓄積装置１１に蓄積された画像データをユーザが適切なフォルダに分類する作業を補助するための処理を行う。なお、この処理については、説明の重複を避けるため、本実施形態の動作の説明において明らかにする。
【００１４】
（２．動作）
図８は、ユーザが蓄積装置１１に蓄積された画像データを分類する際の画像データ分類装置１の動作を示すフローチャートである。
以下説明する動作は、次のような状況において行われるものである。
まず、画像入力装置１２を介して入力された画像データあるいはネットワークインタフェース１５を介して受信された画像データなどの各種の画像データが、あるフォルダに納められ、蓄積装置１１に格納されている。
そして、ユーザは、操作入力部１４ａから指示を与え、このフォルダの内容を表示部１４ｂに表示させている。図３は、このときの表示部１４ｂの表示内容を例示している。この図に示すように、表示部１４ｂには、未分類の画像データを格納するための未分類フォルダ１１ａと、風景フォルダ、人物フォルダ、静物フォルダといった特定の種類の画像データを格納するために作成された各種の分類フォルダ１１ｂと、これらのいずれのフォルダにも格納されていない画像データを各々表すアイコンが表示されている。
この状態において、ユーザが、操作入力部１４ａにより所定の操作を行うと、制御部１０では、分類登録部１０ａが動作を開始する。
そして、ユーザが、操作入力部１４ａにおけるマウスにより、ある画像データのアイコンをある分類フォルダ１１ａのアイコン内に入れる操作をすると、分類登録部１０ａにより、図８に示すルーチンが行われる。
【００１５】
まず、分類登録部１０ａは、マウスによって指示されたアイコンが表している対象画像データを蓄積装置１１から読み出し（ステップＳ１０）、対象画像データに対応した特徴量が蓄積装置１１に記憶されているか否かを判断する（ステップＳ１１）。この判断結果が否定的である場合、分類登録部１０ａは、対象画像データを特徴量抽出部１０ｂに引き渡し、特徴量を算出する（ステップＳ１２）。そして、算出した特徴量を、対象画像データに対応付けて蓄積装置１１に格納する。次いで、分類フォルダ１１ｂのうちユーザが対象画像データのアイコンを入れようとしたアイコンを求め、このアイコンによって表されている分類フォルダ１１ｂ（対象フォルダ）に対応付けられた特徴量が蓄積装置１１に格納されているか否かを判断する。該当する特徴量が蓄積装置１１に格納されていない場合には（ステップＳ１３：ＹＥＳ）、対象画像データを対象フォルダに格納し（ステップＳ１６）、対象画像データの特徴量により対象フォルダの特徴量を更新する（ステップＳ１７）。ここでは、対象フォルダに対応付けられた特徴量がないので、分類登録部１０ａは、対象画像データの特徴量そのものを対象フォルダに対応付けて蓄積装置１１に格納し、図８に示すルーチンを終了する。
【００１６】
対象フォルダに対応付けられた特徴量が蓄積装置１１に格納されている場合（ステップＳ１３：ＮＯ）、分類登録部１０ａは、当該特徴量とステップＳ１２で算出した特徴量と類似度評価部１０ｃに引き渡す（ステップＳ１４）。そして、両特徴量によって表される画像が同一種類に属すると認められる程度に類似しているとの判断結果が類似度評価部１０ｃから得られ、対象画像データを対象フォルダに格納することが妥当であると認められる場合（ステップＳ１５：ＹＥＳ）、分類登録部１０ａは、対象画像データを対象フォルダに格納する（ステップＳ１６）。次いで、対象画像データに係る特徴量を用いて、対象フォルダに対応付けられた特徴量を更新する（ステップＳ１７）。このステップＳ１７の処理の態様には様々なものが考えられるが、１つ具体例を挙げると次の通りである。まず、分類登録部１０ａは、対象フォルダに対象画像データを格納すると、その時点において対象フォルダに格納されている全ての画像データ（新たに格納された対象画像データを含む）について、各々に対応付けられている各特徴量を蓄積装置１１から読み出す。ここで、１つの画像データに対応付けられた特徴量は、画像の特徴を表すベクトルを構成しているのは既に述べた通りである。分類登録部１０ａは、対象フォルダ内の各画像データに対応付けられた各特徴量、すなわち、画像の特徴を表す複数のベクトルの重心を求め、これを当該対象フォルダに対応付けて蓄積装置１１に格納するのである。なお、対象フォルダに対応付ける特徴量を他の算出方法によって求めるようにしても良い。また、特徴量を求めるタイミングは、画像データが分類フォルダ１１ｂに格納された直後から新たな画像データが分類フォルダ１１ｂに格納される直前までのいつでも良い。
【００１７】
一方、ステップＳ１５において、対象画像データを対象フォルダに格納することが妥当でないと判定した場合には、「この画像データはこのフォルダには適していません。それでも格納しますか？」という警告メッセージを表示部１４ｂに表示させる（ステップＳ１８）。この警告メッセージに肯定的に応答する指示がユーザから入力されると（ステップＳ１９：ＹＥＳ）、分類登録部１０ａは、ステップＳ１６〜Ｓ１７の処理を行う。
しかし、ステップＳ１８において表示された警告メッセージに否定的に応答する指示がユーザから入力されると（ステップＳ１９：ＮＯ）、分類登録部１０ａは、対象画像データを未分類フォルダ１１ａに格納する（ステップＳ２０）。
【００１８】
なお、図示は省略したが、分類登録部１０ａは、未分類フォルダ１１ａに格納された画像データについて、分類フォルダ１１ｂに格納させる指示があった場合には、その画像データについて、図８に示すルーチンを実行するように構成されている。従って、ユーザは、未分類フォルダ１１ａに格納された画像データを適切な分類フォルダ１１ｂに移動することができる。
また、既に分類フォルダ１１ｂに格納された画像データを他の分類フォルダ１１ｂに格納させる指示があった場合についても同様である。
【００１９】
以上のようにして、ユーザが画像データを所望のフォルダに格納させようとする際、当該フォルダに既に格納されている画像データと、格納させる対象の画像データとの類似性が判定されることにより、格納対象の画像データを所望のフォルダに格納させることが適切であるか否かが判定される。その判定結果が否定的である場合には、ユーザに対して警告メッセージが出される。従って、ユーザは、画像データの格納先に深い注意を払わなくても、画像データが不適切な格納先に格納されることを防ぐことができる。
【００２０】
（３．変形例）
以上、本発明の実施形態について説明したが、本発明はその主要な特徴から逸脱することなく他の様々な形態で実施することが可能である。なお、変形例としては、例えば、以下のようなものが考えられる。
【００２１】
＜変形例１＞
本変形例では、属性情報が操作入力部１４ａから入力された場合に、制御部１０は、画像入力装置１２を介して入力された画像データに基づいてジョブパッケージを作成する際に、この属性情報をジョブパッケージに含め、蓄積装置１１に格納する。属性情報は、例えば、文書データの名称、サイズ、作成の日付、データフォーマットの種類等についての情報である。そして、上述のステップＳ２７又はステップＳ６１において、制御部１０は、発見されたジョブパッケージ内の画像データに基づいてサムネイル画像を表示部１４ｂに表示させる代わりに、ジョブパッケージ内の属性情報を表示させる。
更に、画像データと対応付けられた属性情報を特徴量として用いても良い。
【００２２】
＜変形例２＞
画像入力装置１２は、磁気ディスクやメモリカード等の媒体に対して情報の読み書きを行うリード／ライト装置とその制御手段とから構成しても良い。また、画像入力部装置１２は、アプリケーションソフトウェアで作成されたデータからビットマップ形式の画像データを生成する装置であっても良い。
【００２３】
＜変形例３＞
画像データ分類装置１は、更に、画像出力装置を備える構成であっても良い。画像出力装置は、プリンタとこのプリンタの駆動制御を行うプリンタ制御回路とを有し、画像データに表される画像を紙に出力するものであっても良い。また、画像出力装置１３は、磁気ディスクやメモリカード等の媒体に対して情報の読み書きを行うリード／ライト装置とその制御手段とから構成しても良い。このような画像出力装置１３では、画像を紙にではなく、磁気ディスクやメモリカード等の媒体に出力する。
または画像出力装置１３は、ネットワークインターフェース１５を介してデータの授受を行うデータ転送装置であっても良い。このような画像出力装置１３では、画像を出力するのではなく、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）などのデータ形式に変換されたデータを出力しても良い。
また、画像出力装置１３は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）とその制御手段とで構成されるものであっても良い。
【００２４】
＜変形例４＞
複数の画像データをまとめて１つの分類フォルダに格納させるような構成であっても良い。このような構成においては、制御部１０は、複数の画像データに係る各特徴量を取得し、これらの各特徴量と、分類先のフォルダとしてユーザに選択された分類フォルダ（対象フォルダ）に係る特徴量とをそれぞれ比較する。そして、対象フォルダに格納することが妥当ではないと判定した画像データについては、上述の同様にして、警告メッセージを表示させ、対象フォルダに格納することが妥当であると判定した画像データについては、対象フォルダに格納する。このような構成によれば、ユーザの分類作業に係る手間や時間を更に削減することができる。
【００２５】
＜変形例５＞
画像データから算出する特徴量は、画像全体の色調、縦横比、輝度や色の分布状態、エッジの分布状態、平坦なエリアの分布状態のどれか一つでもよいし、複数であってもよい。例えば、画像データに表される画像の領域を２４分割した場合、一つの領域から色調、エッジ状態を抽出し、特徴量を１４４次元のベクトルとして表してもよい。また、主成分解析等の手法を用いて、できるだけ画像データの特徴を損なうことなく特徴量の数を減らしてもよい。これにより、類似度評価部１０ｃによって行われる計算量を減少することができる。従って、算出すべき特徴量の種類数は問われない。また、分割する領域の数は問われない。
【００２６】
＜変形例６＞
上述の実施形態においては、画像データに係る特徴を、画像全体の色調、縦横比、輝度や色の分布状態、エッジの分布状態、平坦なエリアの分布状態であるとした。このような特徴量を用いて画像データを特定することは、当該画像データに表される画像が原稿の全部またはほとんど全ての部分を占めている場合には非常に有効である。しかしながら、契約書等の文書のように文字が原稿のほとんど全ての部分を占めている場合には、あまり有効ではないこともある。従って、本変形例においては、画像データに表される画像が文書を含む場合に、当該画像データに係る特徴を、文字の高さ、字間、行間、縦書き・横書き等の文書フォーマットに係る特徴として取り扱う。以下、文書フォーマットに係る特徴量を求める方法について説明する。
スキャナ等の画像入力装置１２により読み込まれた画像から、文書フォーマットに係る特徴を求めるには、各種従来技術が適用可能である。そのような従来技術の一例が特開平５−１０８７９３の段落０００９から００１２の部分に開示されている。当該技術においては、横書きであることを前提として行間ｂと、字間ｄを求めている。しかし、原稿が横書きであるか縦書きであるか分からない場合もある。そこで、通常は行間の方が字間より大きいことを利用し、当該技術により行間として求めたｂと字間として求めたｄを比較して大きい方を行間、小さい方を字間とし、ｂの方が大きい場合は横書きとし、ｄの方が大きい場合は縦書きとする、という方法で、縦書きか横書きかを判別する。
【００２７】
また、文書フォーマットに係る特徴を求める他の方法は以下の通りである。画像データに表される画像を複数の領域に分割する。次に分割された領域の各々について、文字の高さ、字間、行間、縦書き・横書きなど情報を含む文書フォーマットに係る特徴量を算出する。例えば、文字の高さ「１０ｐｔ」、字間「１５ｐｔ」、行間「１０．５ｐｔ」、縦書き・横書き「０」という値が得られる。ここでは、縦書きの場合には「０」を、横書きの場合は「１」を対応付けるものとする。
【００２８】
次に、算出された全ての領域に係る特徴量のなかで最も頻繁に出現する値を決定し、これをそのページの特徴量とする。例えば、全２４個の領域のうち、２０個の領域において文字の高さが「１０ｐｔ」であり、３個の領域において「１２ｐｔ」、一つの領域では「１６ｐｔ」であった場合は、当該ページの文字の高さにかかる特徴量として「１０ｐｔ」が算出される。字間、行間、縦書き・横書きについても同様である。
【００２９】
一般的に言えば、表題や見出しがページ全体に占める割合は、文書を特徴付けている本文に比べて小さい。従って、最頻出の値は、本文の特徴を表した量であると推定することができる。これによって、文書に表題や見出し部が存在したとしても、画像データから的確に特徴量を得ることができる。
以上のようにして、１つの画像データから１つの特徴ベクトルが求められる。以下に、４個の成分をもつ４次元ベクトルで表される特徴ベクトルのｆの一例を示す。
ｆ＝（文字の高さ、字間、行間、［縦書き：０，横書き：１］）
【００３０】
【発明の効果】
本発明によれば、ユーザの負荷を軽減し、より効率的に、画像データを分類することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る画像データ分類装置１のハードウェア構成を示すブロック図である。
【図２】同実施形態に係る制御部１０の機能的構成を示すブロック図である。
【図３】同実施形態に係る分類登録部１０ａの機能を説明するための図である。
【図４】同実施形態に係る分類登録部１０ａの機能を説明するための図である。
【図５】同実施形態に係る特徴量抽出部１０ｂの機能を説明するための図である。
【図６】同実施形態に係る類似度評価部１０ｃの機能を説明するための図である。
【図７】同実施形態に係る類似度評価部１０ｃの機能を説明するための図である。
【図８】同実施形態に係る画像データ分類装置１の動作の流れを示すフローチャートである。
【符号の説明】
１・・・画像データ分類装置、１０・・・制御部、１０ａ・・・、分類登録部、１０ｂ・・・特徴量抽出部、１０ｃ・・・類似度評価部、１１・・・蓄積装置、１２・・・画像入力装置、１４・・・ユーザーインターフェース、１４ａ・・・操作入力部、１４ｂ・・・表示部、１５・・・ネットワークインターフェース、１６・・・バス。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for classifying image data.
[0002]
[Prior art]
In recent years, it has become possible to read a document written on paper with a scanner and digitize it and manage it as image data with a computer device, or to store and manage image data representing an image captured by a digital camera with a computer device. Is being done. As a method of managing image data, there is a method of classifying and managing each type in the same manner as a method of managing a document written on paper. A method of classifying image data for each type is as follows, for example. The user grasps in advance the type of an image represented by image data such as a landscape, a person, and a document, and stores the image data in a folder corresponding to the type. When the user performs such a classification operation for each image data, a plurality of image data are classified for each type.
However, when there is a large amount of image data to be classified, it is quite troublesome for the user to perform such classification work for each image data. Further, there is a high possibility that the user makes a mistake in the classification.
Techniques that can solve such a problem are described in Patent Literature 1 and Patent Literature 2. Patent Documents 1 and 2 disclose devices for automatically classifying and storing image data. The device disclosed in Patent Literature 1 analyzes the layout of image data and classifies the image data based on the analysis result. The layout analysis result includes information such as the number of blocks of image data, the number of text attributes, the degree of overlap with other attribute blocks, the number of table attributes and their sizes, and the number of picture attributes and their sizes. In the device disclosed in Patent Literature 2, when image data is input, a feature amount representing a feature of an image represented by the image data is calculated, and the image data and the device are already stored in the device using the feature amount. The relevance with the stored image data is evaluated by a clustering technique, and the image data is classified and stored for each group of highly relevant image data. The characteristics of the image include, for example, characteristics such as color tone, aspect ratio, luminance and color distribution of the entire image.
[0003]
[Patent Document 1]
JP 2001-101213 A [Patent Document 2]
JP 2001-256244 A
[Problems to be solved by the invention]
However, since the image data is automatically classified in any of the apparatuses, the classification result does not always match the user's intention. Therefore, the user has to check how it was classified. Therefore, the user needs to display the classification destination of the image data on a computer device, for example, and check the classification result by looking at the display. If the classification destination is contrary to the user's intention, the user must eventually perform the classification work.
[0005]
The present invention has been made in view of the circumstances described above, and an object of the present invention is to provide a technique for reducing the load on a user and more efficiently classifying image data.
[0006]
[Means for Solving the Problems]
The present invention provides a storage device that stores image data and a feature amount of the image data, an operation input unit, an output unit, and an instruction that image data stored in the storage device should be included in a group. When given through the section, the feature amount of the image data is compared with the feature amount associated with the group to determine whether it is appropriate to include the image data in the group. A classification registration unit that stores the image data in the storage device in a case where the image data is appropriate, and outputs a message indicating that the image data is invalid if the image data is invalid. To provide an image data classifying device.
[0007]
According to the present invention, when an instruction to include image data stored in the storage device in a certain group is given through the operation input unit, the feature amount of the image data is associated with the group. It is determined whether or not it is appropriate to include the image data in the group by comparing the obtained feature amount, and if so, the image data is included in the group and stored in the storage device. If not, a message indicating that the message is not valid is output from the output unit.
[0008]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In all the drawings for describing the embodiments, components having the same function are denoted by the same reference numerals, and repeated description thereof will be omitted.
(1. Configuration)
<Overall configuration>
FIG. 1 is a block diagram showing a hardware configuration of an image data classification device 1 according to the present invention.
The user interface 14 includes an operation input unit 14a and a display unit 14b. The operation input unit 14a includes a plurality of keys for inputting numbers, characters, commands, and the like, and outputs an operation signal corresponding to the operation of these keys. The display unit 14b has a display panel (not shown) and a drive circuit for driving the display panel. Note that an audio output unit that outputs information in audio may be provided instead of the display unit 14b.
[0009]
The network interface 15 connects the control unit 10 to another device via a network, and controls data transfer between the two.
The image input device 12 has a scanner that reads an image from a document and outputs an analog image signal, and an A / D conversion circuit that converts an analog image signal output from the scanner into a digital image signal and outputs the digital image signal. In a preferred embodiment, the image input device 12 has an ADF (Automatic Document Feeder) that takes out documents one by one from a bundle of documents set in a document receiver and outputs the documents to a scanner.
[0010]
The storage device 11 includes a large-capacity storage device such as a hard disk drive or a DVD-RAM (Digital Video Disc-Random Access Memory) drive. The control unit 10 is connected to each unit of the apparatus described above via the bus 16. The control unit 10 is a unit that controls the entire image data classification device 1 and includes a CPU (Central Processing Unit), a RAM (Random Access Memory), and a ROM (Read Only Memory) (all not shown). . Here, the CPU controls each unit of the image data classification device 1 by executing a program stored in the ROM.
[0011]
FIG. 2 is a block diagram illustrating a functional configuration according to the control unit 10. The control unit 10 includes a feature amount extraction unit 10b, a similarity evaluation unit 10c, and a classification registration unit 10a. These entities are software modules executed by the CPU of the control unit 10.
The feature amount extraction unit 10b performs a feature amount extraction process on the image data, and calculates a feature amount of an image represented by the image data. The characteristics of the image include, for example, image density, aspect ratio, luminance and color distribution, edge distribution, and flat area distribution.
A procedure for calculating such a feature amount is, for example, as follows. First, the feature amount extraction unit 10b divides an image area represented by image data into X in the vertical direction and Y in the horizontal direction. That is, an image represented by one piece of image data is divided into X * Y areas. Next, a feature value is calculated for each of the regions by a predetermined calculation method. Although various types of feature amounts are conceivable, in this embodiment, R (red), G (green), and B (blue) components of the color of the region are obtained. Then, the R, G, and B components are converted into three quantities representing color components in an L * a * b space (characteristic color space). These three quantities are set as feature quantities corresponding to one region. As a result, (X * Y * 3) feature amounts are calculated from the image data. FIG. 5 shows a case where X = 6 and Y = 4 as an example, and shows that a total of 72 feature amounts have been calculated from this image data. These 72 feature amounts constitute a 72-dimensional vector representing the features of the image represented by the image data.
[0012]
The similarity evaluation unit 10c compares a feature amount of a certain image data with a feature amount obtained from another image data or a group of image data, and resembles each corresponding image to the extent that the images of the same type are recognized as the same type of image. It is a module that determines whether or not it is performing. The feature amount to be compared is provided by the control unit 10 to the similarity evaluation unit 10c.
The judgment performed by the similarity evaluation unit 10c is as follows. First, n feature amounts obtained from each image data form an n-dimensional vector representing a feature of a certain image. Thus, when n pieces of feature amounts corresponding to two pieces of image data to be compared are given, the similarity evaluation unit 10c calculates an n-dimensional vector having one n pieces of feature amounts as components and the other n pieces of n pieces of feature amounts. The distance in the n-dimensional space from the n-dimensional vector having the feature amount of the component as a component is obtained. Then, based on this distance, it is determined whether or not the images represented by the two image data are similar to the extent that they are recognized as belonging to the same type. FIG. 6 and FIG. 7 are diagrams for explaining the processing by the similarity evaluation unit 10c. In these figures, for convenience, three-dimensional vectors α and β are shown as two vectors each having a feature amount corresponding to two pieces of image data as components. FIG. 6 shows a state in which the Euclidean distance between vector α and vector β is short, and FIG. 7 shows a state in which the Euclidean distance between vector α and vector β is long. The similarity evaluation unit 10c regards the Euclidean distance between these two vectors α and β as the similarity between the two images to be compared, and the similarity is equal to or more than a predetermined value (the Euclidean distance is equal to or less than a predetermined length). In this case, it is determined that the two images belong to the same type, and when the similarity is less than a predetermined value (the Euclidean distance is longer than a predetermined length) as shown in FIG. It does not belong to.
Note that the distance to be calculated is not limited to the Euclidean distance, and may be a distance by another calculation method such as a Mahalanobis distance.
[0013]
The classification registration unit 10c uses the feature amount extraction unit 10b and the similarity evaluation unit 10c described above to assist the user in classifying the image data stored in the storage device 11 into appropriate folders. I do. This processing will be clarified in the description of the operation of the present embodiment in order to avoid redundant description.
[0014]
(2. Operation)
FIG. 8 is a flowchart showing the operation of the image data classification device 1 when the user classifies the image data stored in the storage device 11.
The operation described below is performed in the following situation.
First, various types of image data such as image data input via the image input device 12 or image data received via the network interface 15 are stored in a certain folder and stored in the storage device 11.
Then, the user gives an instruction from the operation input unit 14a, and displays the contents of this folder on the display unit 14b. FIG. 3 illustrates the display contents of the display unit 14b at this time. As shown in this figure, the display unit 14b is created for storing an unclassified folder 11a for storing unclassified image data and image data of a specific type such as a landscape folder, a person folder, and a still life folder. The displayed various classified folders 11b and icons representing image data not stored in any of these folders are displayed.
In this state, when the user performs a predetermined operation using the operation input unit 14a, in the control unit 10, the classification registration unit 10a starts operating.
Then, when the user performs an operation of inserting an icon of certain image data into an icon of a certain classification folder 11a with a mouse in the operation input unit 14a, the routine shown in FIG. 8 is performed by the classification registration unit 10a.
[0015]
First, the classification registration unit 10a reads target image data represented by the icon designated by the mouse from the storage device 11 (step S10), and determines whether or not a feature amount corresponding to the target image data is stored in the storage device 11. Is determined (step S11). If the determination result is negative, the classification registration unit 10a delivers the target image data to the feature amount extraction unit 10b, and calculates the feature amount (Step S12). Then, the calculated feature amount is stored in the storage device 11 in association with the target image data. Next, the user obtains an icon of the classification folder 11b in which the user attempts to insert the icon of the target image data, and stores the feature amount associated with the classification folder 11b (target folder) represented by the icon in the storage device 11. It is determined whether or not it has been performed. If the corresponding feature amount is not stored in the storage device 11 (step S13: YES), the target image data is stored in the target folder (step S16), and the feature amount of the target folder is determined based on the feature amount of the target image data. Update (step S17). Here, since there is no feature amount associated with the target folder, the classification registration unit 10a stores the feature amount itself of the target image data in the storage device 11 in association with the target folder, and ends the routine shown in FIG. I do.
[0016]
When the feature amount associated with the target folder is stored in the storage device 11 (step S13: NO), the classification registration unit 10a sends the feature amount and the feature amount calculated in step S12 to the similarity evaluation unit 10c. Deliver (step S14). Then, a result of determination that the images represented by the two feature amounts are similar to the extent that they are recognized as belonging to the same type is obtained from the similarity evaluation unit 10c, and it is appropriate to store the target image data in the target folder. (Step S15: YES), the classification registration unit 10a stores the target image data in the target folder (step S16). Next, the feature amount associated with the target folder is updated using the feature amount related to the target image data (step S17). Various forms of the processing in step S17 are conceivable. One specific example is as follows. First, after storing the target image data in the target folder, the classification registration unit 10a associates all the image data (including newly stored target image data) stored in the target folder at that time with each other. The stored characteristic amounts are read from the storage device 11. Here, as described above, the feature amount associated with one piece of image data forms a vector representing a feature of the image. The classification registration unit 10a obtains each feature amount associated with each image data in the target folder, that is, the center of gravity of a plurality of vectors representing the features of the image, and associates this with the target folder to the storage device 11. Store it. Note that the feature amount associated with the target folder may be obtained by another calculation method. Further, the timing of obtaining the feature amount may be any time from immediately after the image data is stored in the classification folder 11b to immediately before the new image data is stored in the classification folder 11b.
[0017]
On the other hand, if it is determined in step S15 that storing the target image data in the target folder is not appropriate, a warning message “This image data is not suitable for this folder. Is displayed on the display unit 14b (step S18). When the user inputs an instruction to respond positively to the warning message (step S19: YES), the classification registration unit 10a performs the processing of steps S16 to S17.
However, when the user inputs an instruction to respond negatively to the warning message displayed in step S18 (step S19: NO), the classification registration unit 10a stores the target image data in the unclassified folder 11a (step S18). S20).
[0018]
Although not shown, the classification registration unit 10a performs a routine shown in FIG. 8 for the image data stored in the unclassified folder 11a when the instruction to store the image data in the classification folder 11b is issued. Is configured to execute. Therefore, the user can move the image data stored in the unclassified folder 11a to the appropriate classified folder 11b.
The same applies to a case where an instruction to store image data already stored in the classification folder 11b in another classification folder 11b is issued.
[0019]
As described above, when the user attempts to store the image data in the desired folder, the similarity between the image data already stored in the folder and the image data to be stored is determined. It is determined whether it is appropriate to store the image data to be stored in a desired folder. If the result of the determination is negative, a warning message is issued to the user. Therefore, the user can prevent the image data from being stored in an inappropriate storage destination without paying careful attention to the storage destination of the image data.
[0020]
(3. Modified example)
As described above, the embodiments of the present invention have been described, but the present invention can be embodied in various other forms without departing from the main features. Note that, for example, the following modifications are possible.
[0021]
<Modification 1>
In the present modification, when attribute information is input from the operation input unit 14a, the control unit 10 generates the attribute information when creating a job package based on image data input via the image input device 12. Is included in the job package and stored in the storage device 11. The attribute information is, for example, information on the name, size, creation date, type of data format, and the like of the document data. Then, in the above-described step S27 or step S61, the control unit 10 causes the display unit 14b to display the attribute information in the job package instead of displaying the thumbnail image on the display unit 14b based on the found image data in the job package.
Further, attribute information associated with image data may be used as a feature amount.
[0022]
<Modification 2>
The image input device 12 may include a read / write device that reads and writes information from and to a medium such as a magnetic disk and a memory card, and a control unit therefor. The image input unit 12 may be a device that generates bitmap image data from data created by application software.
[0023]
<Modification 3>
The image data classification device 1 may be configured to further include an image output device. The image output device may include a printer and a printer control circuit that controls the driving of the printer, and may output an image represented by image data to paper. Further, the image output device 13 may be constituted by a read / write device for reading / writing information from / to a medium such as a magnetic disk or a memory card, and a control unit therefor. Such an image output device 13 outputs an image not to paper but to a medium such as a magnetic disk or a memory card.
Alternatively, the image output device 13 may be a data transfer device that exchanges data via the network interface 15. The image output device 13 may output data converted into a data format such as HTML (Hyper Text Markup Language) instead of outputting an image.
Further, the image output device 13 may be configured by, for example, a CRT (Cathode Ray Tube) and its control means.
[0024]
<Modification 4>
A configuration in which a plurality of image data are collectively stored in one classification folder may be employed. In such a configuration, the control unit 10 acquires each feature amount related to a plurality of image data, and associates each of these feature amounts with a classification folder (target folder) selected by the user as a classification destination folder. The feature amounts are compared with each other. Then, for image data determined to be inappropriate to be stored in the target folder, a warning message is displayed in the same manner as described above, and for image data determined to be valid to be stored in the target folder, Store in the target folder. According to such a configuration, it is possible to further reduce the labor and time required for the user's classification work.
[0025]
<Modification 5>
The feature amount calculated from the image data may be any one of the color tone, the aspect ratio, the distribution state of the luminance and the color, the distribution state of the edge, and the distribution state of the flat area, or may be plural. . For example, when the area of the image represented by the image data is divided into 24, the color tone and the edge state may be extracted from one area, and the feature amount may be represented as a 144-dimensional vector. In addition, the number of feature values may be reduced as much as possible without impairing the features of the image data by using a technique such as principal component analysis. Thus, the amount of calculation performed by the similarity evaluation unit 10c can be reduced. Therefore, the number of types of feature amounts to be calculated does not matter. Further, the number of regions to be divided does not matter.
[0026]
<Modification 6>
In the above-described embodiment, the features related to the image data are the color tone, the aspect ratio, the distribution state of luminance and color, the distribution state of edges, and the distribution state of flat areas of the entire image. Specifying image data using such feature amounts is very effective when the image represented by the image data occupies all or almost all of the document. However, when characters occupy almost all parts of the manuscript, such as a document such as a contract, it may not be very effective. Therefore, in the present modification, when the image represented by the image data includes a document, the features related to the image data are related to the document format such as character height, character spacing, line spacing, and vertical / horizontal writing. Treat as a feature. Hereinafter, a method of obtaining the feature amount related to the document format will be described.
Various conventional techniques can be applied to determine the characteristics related to the document format from the image read by the image input device 12 such as a scanner. An example of such a prior art is disclosed in paragraphs 0009 to 0012 of JP-A-5-108793. In this technology, a line spacing b and a character spacing d are obtained on the assumption that the writing is horizontal. However, in some cases, it is not known whether the original is written horizontally or vertically. Therefore, it is usual to utilize the fact that the line spacing is larger than the character spacing, and to compare b obtained as the line spacing and d obtained as the character spacing according to the technology, the larger one is taken as the line spacing, and the smaller one is taken as the character spacing. It is determined whether the document is written vertically or horizontally by a method of writing horizontally when the size is larger and writing vertically when the size d is larger.
[0027]
Further, another method for obtaining the characteristics related to the document format is as follows. The image represented by the image data is divided into a plurality of regions. Next, for each of the divided areas, a feature amount related to a document format including information such as character height, character spacing, line spacing, and vertical / horizontal writing is calculated. For example, values such as a character height "10 pt", a character spacing "15 pt", a line spacing "10.5 pt", and a vertical / horizontal writing "0" are obtained. Here, “0” is associated with vertical writing, and “1” is associated with horizontal writing.
[0028]
Next, a value that appears most frequently among the calculated feature amounts of all the regions is determined, and is set as the feature amount of the page. For example, if the character height is "10 pt" in 20 of the 24 areas, "12 pt" in 3 areas, and "16 pt" in one area, the page is "10 pt" is calculated as a feature amount related to the height of the character. The same applies to character spacing, line spacing, vertical / horizontal writing.
[0029]
Generally speaking, titles and headings occupy a smaller percentage of the entire page than the text that characterizes the document. Therefore, the most frequently occurring value can be estimated to be a quantity representing the feature of the text. As a result, even if a title or a heading part exists in the document, a feature amount can be accurately obtained from the image data.
As described above, one feature vector is obtained from one image data. An example of a feature vector f represented by a four-dimensional vector having four components is shown below.
f = (character height, character spacing, line spacing, [vertical writing: 0, horizontal writing: 1])
[0030]
【The invention's effect】
According to the present invention, the load on the user can be reduced, and the image data can be classified more efficiently.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a hardware configuration of an image data classification device 1 according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a functional configuration of a control unit 10 according to the embodiment.
FIG. 3 is a diagram for explaining a function of a classification registration unit 10a according to the embodiment.
FIG. 4 is a diagram for explaining functions of a classification registration unit 10a according to the embodiment.
FIG. 5 is a diagram for explaining a function of a feature amount extraction unit 10b according to the embodiment.
FIG. 6 is a diagram for explaining a function of a similarity evaluation unit 10c according to the embodiment.
FIG. 7 is a diagram for explaining a function of a similarity evaluation unit 10c according to the embodiment.
FIG. 8 is a flowchart showing a flow of an operation of the image data classification device 1 according to the embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Image data classification apparatus, 10 ... Control part, 10a ..., Classification registration part, 10b ... Feature amount extraction part, 10c ... Similarity evaluation part, 11 ... Storage device, 12 image input device, 14 user interface, 14a operation input unit, 14b display unit, 15 network interface, 16 bus.

Claims

A storage device for storing image data and a feature amount of the image data,
An operation input unit;
An output unit;
When an instruction to include the image data stored in the storage device in a certain group is given via the operation input unit, the feature amount of the image data and the feature amount associated with the group are determined. By comparing, it is determined whether or not it is appropriate to include the image data in the group. If the image data is appropriate, the image data is included in the group and stored in the storage device. An image data classification device, comprising: a classification registration unit that outputs a message indicating that the message is invalid by the output unit.

After outputting the message to the output unit, the classification and registration unit, when an instruction to include the image data in the group is given via the operation input unit, the image data is output to the group. The image data classification device according to claim 1, wherein the image data is stored in the storage device.

A feature amount extracting unit that extracts a feature amount of the image data,
The classification registration unit transfers the image data specified by the operation input unit to the feature amount extraction unit, and stores the feature amount extracted by the feature amount extraction unit in the storage device in association with the image data. The image data classification device according to claim 1, wherein a feature amount associated with a certain group is calculated from a feature amount stored in the storage device in association with image data belonging to the group.

Receiving, via the operation input unit, an instruction indicating that the image data stored in the storage device should be included in a certain group;
Determining whether the image data is appropriate to be included in the group by comparing the feature amount of the image data with the feature amount associated with the group;
Storing the image data in the storage device by including the image data in the group when it is appropriate to include the image data in the group;
Outputting a message to the effect that the image data is not to be included in the group by the output unit.