JP3948616B2

JP3948616B2 - Image matching device

Info

Publication number: JP3948616B2
Application number: JP2002226055A
Authority: JP
Inventors: 亮一川田; 修杉本; 正裕和田
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2002-08-02
Filing date: 2002-08-02
Publication date: 2007-07-25
Anticipated expiration: 2022-08-02
Also published as: JP2004070490A

Description

【０００１】
【発明の属する技術分野】
本発明は画像のマッチング装置に関し、特に、動き補正テレビ方式変換や動画像符号化を行う場合、あるいはステレオ画像(左眼画像と右目画像からなる１組の静止画または動画)からの奥行抽出処理を行う場合等に好適な、動画像の中の動きを自動推定したり、左眼・右眼用画像からなるステレオ画像の間の対応点を自動検出したりする画像のマッチング装置に関する。
【０００２】
【従来の技術】
従来、テレビ放送やテレビ電話など、動画像の中の動きを自動推定したり、左眼用画像と右眼用画像からなるステレオ画像の間の対応点を自動検出したりする画像のマッチングの処理においてしばしば使用される方式としては、ブロックマッチング法や反復勾配法が挙げられる。これらの方法を説明する文献としては、次の文献を挙げることができる。
川田他：“動き補正テレビ方式変換の改善”、映像情報メディア学会誌、Vol．51，No.9(1997)，ｐｐ．１５７７〜１５８６。
【０００３】
これらの方法は、動き推定の場合、画像を、ある小さいサイズの多数のブロックに分割して、そのブロックごとに現フレームと前フレームとを比較して動きを求めることを基本としている。なお、ステレオマッチングの場合は、前記「現フレーム」と「前フレーム」を、「左眼画像」と「右眼画像」に置き換えて考えればよいので、本願の発明は動き推定の場合を中心に説明し、ステレオマッチングの場合の詳細な説明は省略する。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記の画像のマッチング処理においては、入力映像の絵柄によって、正しくマッチングできる場合とそうでない場合が存在する。例えば反復勾配法の場合、次のように説明できる。
【０００５】
反復勾配法により求まる動きベクトルv(画像内のブロック毎) は、初期偏位動ベクトルをv₀ として、次の式(1)で求められる(上記文献参照)。
v = Δv + v₀ ・・・(1)
ここに、差分ベクトルΔv の水平、垂直成分Δv_x、Δv_y は、画素値の水平勾配Δx、垂直勾配Δy、および初期偏位動ベクトルv₀による動き補正フィールド(フレーム) 間差分Δt によって、下記の式(2)および式(3)のように表される。なお、和は当該ブロック内の全画素について適用される。

ここに、初期偏位動ベクトルv₀は、過去に求められた周辺のブロックの動ベクトルを候補としてマッチングにより決定される(詳細は上記文献参照)。
【０００６】
前記式(2)、(3)においては、特に分母が小さい場合に０による除算に近くなるため、ノイズなどわずかな擾乱要因によっても大きな誤差が発生する恐れがある。
【０００７】
特に問題となる例としては、規則的な繰り返し模様が絵柄に存在する場合である。この場合、多数の動ベクトルで画像のマッチングが取れるため、ノイズ等により実際の動きとは別の動ベクトルが求まり、その結果、方式変換の場合、内挿画像が極端に劣化することがある。
【０００８】
また、反復勾配法においては、画像面の勾配を利用して反復的に動きを求めるため、フレーム間での相関が小さいと、動きが求まりにくくなってしまう。この面から特に問題となるのは、高速シャッターつきで撮影したシーンである。時間的に隣り合う画像間で動物体が離れているため、動きを捉えられなくなる傾向が強まる。
【０００９】
以上のように、規則的な繰り返し模様が絵柄に存在する画像と、フレーム間での相関が小さい画像とのように、マッチング処理がうまくいきやすいかどうかという点で、トレードオフの関係にある画像が存在する。
【００１０】
本発明の目的は、前記した従来技術の問題点に鑑みてなされたものであり、上記のように適切なマッチング処理を両立させにくい絵柄が混在するような場合でも、適切に両立させることを可能とするマッチング装置を提供することにある。
【００１１】
【課題を解決するための手段】
前記した目的を達成するために、本発明は、動画像の中の動きを自動推定したり、左眼用画像と右眼用画像からなるステレオ画像の間の対応点を自動検出したりする画像のマッチング装置において、画像の画素値の水平勾配、垂直勾配および初期偏位ベクトルによる動き補正フィールド（フレーム）間差分を基に求められる差分ベクトルΔｖに、変換パラメータαを乗算し、該乗算結果と前記初期偏位ベクトルｖ０とを加算することによりベクトルｖを算出する、反復勾配法により映像のマッチング処理を行うマッチング手段と、該マッチング手段から出力されるマッチング情報信号(ベクトル) の分布に関する特徴量を抽出する特徴量抽出手段と、前記変換パラメータαは選択可能な大小の２つ以上の値を有し、前記特徴量を基に、前記変換パラメータαの大小を決定する変換パラメータ決定手段とを具備し、前記変換パラメータαは、前記特徴量が予め定められた閾値と比較された結果に基づいてその大小が決定され、前記マッチング手段は、該変換パラメータ決定手段で決定された変換パラメータαを用いて、マッチング処理を行うようにした点に第１の特徴がある。また、前記マッチング手段から出力されるマッチング情報信号(ベクトル) の分布に関する特徴量を抽出する特徴量抽出手段に代えて、映像の内容から特徴量を抽出する特徴量抽出手段を具備した点に第２の特徴がある。
これらの特徴によれば、当該画像に最適なマッチングパラメータを適応的に決定することができるようになり、該最適パラメータを用いてマッチング処理を行うことができるので、より正確なマッチング処理が可能になる。
【００１２】
また、本発明は、前記マッチング手段は、画素値の水平勾配、垂直勾配および初期偏位ベクトルによる動き補正フィールド（フレーム）間差分を基に求められる差分ベクトルに前記変換パラメータ決定手段で決定された変換パラメータを乗算し、該乗算結果と初期偏位ベクトルとを加算することによりベクトルを算出する、反復勾配法により映像のマッチング処理を行うようにした点に第３の特徴がある。また、前記マッチング手段は、前記差分ベクトルに対してある数を加算もしくは減算し、該加算値もしくは減算値と初期偏位ベクトルとを加算することによりベクトルを算出する、反復勾配法により映像のマッチング処理を行うようにした点に第４の特徴がある。
【００１３】
これらの特徴によれば、ベクトルの求まり方(反復勾配法に於けるベクトルの収束速度) が制御可能となる。
【００１４】
また、本発明は、前記マッチング手段は、前記差分ベクトルを計算する際の分母が予め定められた閾値より小さいかどうかを判定する手段を有し、該閾値より小さいときに、前記差分ベクトルの寄与度が小さくなるように前記変換パラメータを決定するようにするようにした点に第５の特徴がある。また、前記マッチング手段は、前記差分ベクトルを計算する際の分母が予め定められた閾値より小さいかどうかを判定する手段を有し、該閾値より小さいときに、前記差分ベクトルの寄与度が小さくなるように前記加算もしくは減算する数を決定するようにした点に第６の特徴がある。
【００１５】
これらの特徴によれば、ノイズなどに起因する誤推定ベクトルの発生を抑えることが可能となる。
【００１６】
【発明の実施の形態】
以下に、図面を参照して、本発明を詳細に説明する。まず、本発明の原理を説明する。
【００１７】
画像の動き推定や画像のステレオマッチングの概要は、次の通りである。すなわち、画像の動き推定は、動き補償予測符号化やテレビの動き補正方式変換において、動画像(映像) の各部の動きを推定する処理である。通常、画面を多数のブロックに分割してそのブロックごとに動きを求める。そのブロックサイズは、１６画素×１６ラインであったり、4 画素×２ラインであったり、さまざまなケースがありうる。
【００１８】
また、画像のステレオマッチングとは、２台のカメラを使用し、左眼画像と右眼画像の１組の画像の組を得る。これは静止画である場合も、動画である場合もある。そして、左眼画像中の各部分が右眼画像中のどの部分に対応しているかをマッチングにより求める。これにより最終的には画像中各部がカメラに対しどのくらい離れているかの奥行きを推定するのがステレオマッチング処理の最終目的となる。このステレオマッチングについては、例えば次の文献が参考となる。
尾上守夫編「画像処理ハンドブック」(昭晃堂)(３９５ページ付近など）。
【００１９】
上記の画像動き推定では現フレームと前フレームとのマッチングを求めるのであるから、これら動き推定とステレオマッチングは、マッチング処理としては全く類似の処理となる。そこで、以下に、画像の動き推定処理を例として、説明を続ける。
【００２０】
画像の動き推定方法を行う代表的な方法として、反復勾配法がある。これについては、前掲の文献の川田他：“動き補正テレビ方式変換の改善”、映像情報メディア学会誌、Vol．51、No．9(1997)などに詳しく書かれている。該反復勾配法により求まる動きベクトルｖは、該文献に記載されているように、式(1)、式(2)、式(3) のように表現される。
【００２１】
さて、前述したように、前記式(2)、式(3)においては、分母が小さい場合に、ノイズなどのわずかな擾乱要因によっても大きな誤差が発生する恐れがある。そこで、本発明では、式(2)、式(3)の分母が小さい場合には、式(1)の右辺第１項のΔｖに、１より小さい変換パラメータαを掛けて、下記の式(4)のようにする。
ｖ＝α・Δｖ＋ｖ_０（ただし、α_ｘ＜１，α_ｙ＜１）・・・（４）
【００２２】
式(4) のような変換パラメータαを設定することによって、処理過程を制御することが可能である。従来これらの変換パラメータは固定であった。これを、画像の絵柄やマッチング結果であるベクトルの解析により、最適なパラメータを動的に求め、よりシーンに適合した正確なマッチング処理を可能にする、というのが本発明の第１の原理である。
【００２３】
次に、反復勾配法においては、式(４) が示すように、あるシーンになったとき、すぐに正しい動ベクトルが求まるのではなく、反復的に収束していくという性質がある。このため、高速シャッターつきで撮影したシーンのように、フレーム間での相関が小さいと、α＜１の場合には動きが求まりにくくなってしまう。したがって、例えばフレーム間での相関が小さい画像に対しても、速やかに最適なパラメータを求め、正確なマッチング処理を可能にする、というのが本発明の第２の原理である。
【００２４】
これらの第１、第２の原理は、反復勾配法に限らず、ブロックマッチング法などの他の動き推定方法についても適用できる。なお、前記の説明では、式(1) における差分ベクトルΔｖに、ある１より小さい変換パラメータαを掛けるものであったが、該差分ベクトルΔｖからある定数を減算または加算するものであってもよい。
【００２５】
次に、本発明の実施形態を、図面を参照しながら説明する。図１は、本発明の第１実施形態の構成を示すブロック図である。
【００２６】
図示されているように、マッチング装置１は、前記反復勾配法等のマッチング部１１、該マッチング部１１から出力されたベクトルの特徴量（分散等）を抽出する特徴量抽出部１２、および該抽出された特徴量に基づいてパラメータαを決定するパラメータ決定部１３から構成されている。該マッチング装置１から得られたマッチング情報信号である出力ベクトルｒは、テレビ方式変換部２に送られる。該テレビ方式変換部２は、該出力ベクトルｒを用いて、例えばＮＴＳＣ方式の入力映像ｐをＰＡＬ方式に変換し、ＰＡＬ方式の出力映像ｑを出力する。なお、該テレビ方式変換部２は一例であり、これに代えて、動き補償符号化部を置き、前記出力ベクトルｒを動き補償符号化に使用してもよい。また、入力映像ｐが左眼画像と右眼画像の１組の画像の場合には、前記出力ベクトルｒはステレオマッチング処理に使用することができる。
【００２７】
次に、本実施形態の動作を、図２のフローチャートを参照して説明する。ステップＳ１では、マッチング部１１に、初期変換パラメータとして、動ベクトルの収束を遅くするパラメータ、例えばα＝（α_ｘ，α_ｙ）＝（０．１，０．２）が設定される。次いで、マッチング装置１に、所定の処理単位、例えばブロック単位、フィールド単位で入力映像ｐが取り込まれると、ステップＳ２で、マッチング部１１は、反復勾配法により該処理単位の動き推定を行う。すなわち、前記αを用い、前記式(4)により動き推定を行う。
【００２８】
次に、ステップＳ３では、特徴量抽出部１２は、前記動き推定により得られた動ベクトルの分布から、特徴量、例えばベクトルの大きさの分散や標準偏差を抽出する。すなわち、演算により求める。ステップＳ４では、前記パラメータ決定部１３が、前記特徴量の値により、次の処理単位に適用される変換パラメータαを決定する。該特徴量が前記分散や標準偏差の場合には、該特徴量が予め定められた閾値以上の時には、より大きな変換パラメータα（例えば、α＝１）に決定される。一方、該閾値より小さいときには、前記初期変換パラメータ値に維持または決定される。
【００２９】
ステップＳ５では、全部の処理単位の動き推定処理が済んだか否かの判断がなされ、この判断が否定の時にはステップＳ６に進んで、次の処理単位の入力映像ｐが取り込まれる。次いで、前記ステップＳ２に戻って、反復勾配法により、該処理単位の動き推定が行われる。
【００３０】
以上の処理が、前記ステップＳ５の判断が肯定になるまで繰り返され、肯定になると、シーン適応型の動的パラメータ制御による動き推定処理は終了する。
【００３１】
したがって、本実施形態によれば、動ベクトルの特徴量に応じて、変換パラメータαを可変することができるので、入力映像ｐがフレーム間で大きな変化がない場合、例えば規則的な繰り返し模様が絵柄に存在する場合等では変換パラメータαは小さく決定され、逆にフレーム間での相関が小さく、隣り合う画像間で動物体が離れている場合等では変換パラメータαは大きく決定される。この結果、適切なマッチング処理を両立させにくい絵柄が混在するような場合でも、適切に両立させることが可能になる。
【００３２】
次に、本発明の第２実施形態を、図３のブロック図を参照して説明する。この実施形態では、マッチング装置３を、マッチング部３１、入力映像ｐの特徴量を抽出する特徴量抽出部３２、および該抽出された特徴量から変換パラメータαを決定するパラメータ決定部３３から構成した点に特徴がある。
【００３３】
この実施形態においては、特徴量抽出部３２は入力映像ｐから特徴量、例えば画素値の明るさの変化、その分散、標準偏差などを抽出する。パラメータ決定部３３は、該特徴量が予め定められた閾値以上であると、変換パラメータを大きく決定し、逆に該閾値より小さいと前記第１実施形態と同様に前記初期変換パラメータ値に維持または決定される。この動作以外は、前記第１実施形態と同様であるので、説明を省略する。
【００３４】
以上のように、この実施形態においても、適切なマッチング処理を両立させにくい絵柄が混在するようなビデオ信号に対しても、マッチング処理を適切に両立させることが可能になる。
【００３５】
次に、本発明の第３実施形態を、図４を参照して説明する。この実施形態は、反復勾配法において、前記式(2)、式(3) の差分ベクトル計算時の分母が小さいかどうかを判断して、これにより適応的にパラメータ制御を行うようにした点に特徴があり、図４は前記マッチング部１１，３１の一具体的構成を示すブロック図である。
【００３６】
本実施形態のマッチング部は、入力映像ｐから前記式(2)、式(3)の分子の計算をする第１、第２演算部４１，４２、それらの式の分母の計算をする第３演算部４３、前記式(2)の除算を行う第４演算部４４、前記式(3)の除算を行う第５演算部４５、前記第３演算部４３で求められた分母の値と前記パラメータ決定部１３からの変換パラメータαを基に変換パラメータα（α_ｘ，α_ｙ）を決定するα_ｘ決定部４６、α_ｙ決定部４７、乗算部４８，４９、および加算部５０，５１から構成されている。
【００３７】
本実施形態では、第３演算部４３により、前記式(2)、式(3)の分母の計算を行い、該分母の値が予め定められた閾値以下であれば、前記α_ｘ決定部４６およびα_ｙ決定部４７は強制的に小さい値の（α_ｘ，α_ｙ）を決定する。これにより、（α_ｘ，α_ｙ）が小さい場合に、ノイズなどのわずかな擾乱要因によって動き推定に大きな誤差が発生するのを防止する。一方、前記分母の値が前記閾値より大きい場合には、α_ｘ決定部４６およびα_ｙ決定部４７はパラメータ決定部１３，３３で決定された変換パラメータαを（α_ｘ，α_ｙ）と決定する。
【００３８】
第４、第５演算部から出力されたΔｖ_ｘ、Δｖ_ｙは、それぞれ乗算部４８，４９で前記α_ｘ決定部４６、α_ｙ決定部４７で決定されたα_ｘ，α_ｙと乗算され、それぞれの乗算結果は、加算部５０，５１で、ｖ_０ｘ、ｖ_０ｙと加算される。この結果、出力ベクトルｒ、すなわち（ｖ_ｘ，ｖ_ｙ）が得られる。
【００３９】
以上のように、本実施形態によれば、規則的な繰り返し模様が絵柄に存在する場合などの入力映像の場合には、強制的に小さな変換パラメータに決定され、擾乱要因の動き推定に対する寄与が小さくされるので、ノイズなどのわずかな擾乱要因による動き推定の誤差の発生を軽減することができるようになる。
【００４０】
なお、前記実施形態の説明では、式(4)をｖ＝α・Δｖ＋ｖ_０（ただし、α_ｘ＜１，α_ｙ＜１）としたが、ｖ＝（Δｖ−Ｐ）＋ｖ_０（Ｐは正の数）、またはｖ＝（Δｖ＋Ｑ）＋ｖ_０（Ｑは正の数）とし、該ＰおよびＱを、前記変換パラメータαと同様に適応的に変えて、動き推定に対するΔｖの寄与度を変えるようにしてもよい。
【００４１】
本発明者は、本発明方式をテレビ方式変換アルゴリズムに組み込み、計算機シミュレーションにより性能評価を行った。
【００４２】
テレビ方式変換では、原画と変換画ではＳＮ比を計算することができない。そこで、まず、６２５ライン、５０枚／秒のテスト動画像を、５２５ライン、６０枚／秒の動画像に変換し、それを再度６２５ライン、５０枚／秒の処理画像に逆変換した。そして、該処理画像と原画のＰＳＮＲを計算した。変換・逆変換のアルゴリズムは、ライン数比やフィールド内挿比に関するパラメータを除いて、全く同一とした。
【００４３】
テスト画像としては、最適変換パラメータの異なる２種類、すなわち、壁に格子模様を有する“Interview”と、高速シャッターにより撮影された“Carousel”を、それぞれ２５フレーム接続して作成して、前半５０フィールドを“Interview”シーン、後半５０フィールドを“Carousel”シーンとした（計５０フレーム）。前記特徴量抽出部１２（図１参照）で抽出する特徴量としては、前フィールドの発生動ベクトルの大きさの標準偏差を使用した（１フィールド毎に１特徴量とした）。また、パラメータ決定部１３では、適当な閾値を設定し、該特徴量が該閾値より大きければ、次フィールドの変換パラメータを動き優先型（式（４）で、α＝（１，１））、小さければ、静止優先型（式（４）で、α＝（０．１，０．２））と決定して、適応的に可変にした。なお、前記“Interview”と“Carousel”の各シーンに適する変換パラメータは、それぞれ、静止優先型（式（４）で、α＝（０．１，０．２））、動き優先型（式（４）で、α＝（１，１））である。
【００４４】
図５に、従来方式１，２でテレビ方式変換をした時の処理画像のＰＳＮＲのグラフを示し、図６に、本発明方式でテレビ方式変換をした時の処理画像のＰＳＮＲのグラフを示す。また、図７は、前記各方式での、各シーン区間における平均のＰＳＮＲを示す。なお、図５の従来方式１は、変換パラメータとして前記動き優先型を固定的に使用したものであり、従来方式２は、変換パラメータとして前記静止優先型を固定的に使用したものである。一方、本発明方式では、該動き優先型と静止優先型とを適応的に使用した。
【００４５】
この実験の結果、図５および図７から分かるように、前記従来方式１では、“Interview”シーンで大きな劣化が発生したが、“Carousel”シーンは良好に変換された。また、前記従来方式２では、“Interview”シーンは良好に変換されたが、“Carousel”シーンにおいて大きな劣化が発生した。その理由は、各劣化シーンでは、不適切な変換パラメータが使用されているからである。
【００４６】
一方、本発明方式では、図６および図７から分かるように、“Carousel”と“Interview”の両シーンにおいて適当な変換パラメータαが自動的に選択され、良好に変換できたことが確認された。図７においても、従来方式１，２よりも良好なＰＳＮＲ（平均）が得られていることが分かる。なお、本発明方式においても、シーンチェンジの直後においては、しばらくの間はＳＮが低下している。これは、変換／逆変換で異なる変換パラメータが選択された部分で、ミスマッチの度合いが大きくなったためであると考えられる。
【００４７】
【発明の効果】
以上の説明から明らかなように、請求項１、２の発明によれば、、出力されていくマッチング情報信号(ベクトル)や、入力されてくる画像信号の内容を自動解析してそれらの特徴量を抽出することにより、当該画像に最適な変換（マッチング）パラメータを適応的に決定することができる。この結果、該変換パラメータを用いてマッチング処理を行うことにより、より正確なマッチング処理が可能となる。
【００４８】
また、請求項３，４の発明によれば、差分ベクトルに前記変換パラメータを乗算し、該乗算値に初期偏位ベクトルを加算してベクトルを算出する、または
差分ベクトルにある数を加算もしくは減算し、該加算値又は減算値に初期偏位動ベクトルを加算してベクトルを算出するようにしたので、ベクトルの求まり方(反復勾配法に於けるベクトルの収束速度) が制御可能となる。
【００４９】
また、請求項５，６の発明によれば、差分ベクトルを計算する際の分母が予め定められた閾値より小さいかどうかを判定し、それが小さいときに、前記変換パラメータを１より小さく、あるいは減算する数を大きくもしくは加算する数を小さくするようにしたので、ノイズなどに起因する誤推定ベクトルの発生を押さえることが可能となる。
【００５０】
さらに、請求項７の発明によれば、出力されるマッチング情報信号(ベクトル)の特徴量として、ベクトルの分散を計算するようにしたので、有効なマッチング変換パラメータ制御が可能となる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態の構成を示すブロック図である。
【図２】該第１の実施形態の動作を説明するフローチャートである。
【図３】本発明の第２の実施形態の構成を示すブロック図である。
【図４】本発明の第３の実施形態の要部の構成を示すブロック図である。
【図５】従来方式による処理画像のＰＳＮＲを示すグラフである。
【図６】本発明方式による処理画像のＰＳＮＲを示すグラフである。
【図７】従来方式１，２，および本発明方式における、各シーンに対するＰＳＮＲ［dB］および平均のＰＳＮＲを示す図である。
【符号の説明】
１，３・・・マッチング装置、２・・・テレビ方式変換部、１１，３１・・・マッチング部、１２，３２・・・特徴量抽出部、１３，３３・・・パラメータ決定部、４１〜４５・・・第１〜第５演算部、４６・・・α_ｘ決定部、４７・・・α_ｙ決定部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image matching device, and in particular, when performing motion correction television system conversion and moving image encoding, or depth extraction processing from a stereo image (a set of still images or moving images including a left eye image and a right eye image). The present invention relates to an image matching apparatus that automatically estimates motion in a moving image and automatically detects corresponding points between stereo images composed of left-eye and right-eye images.
[0002]
[Prior art]
Conventionally, image matching processing that automatically estimates motion in moving images, such as TV broadcasts and videophones, and automatically detects corresponding points between stereo images consisting of left-eye and right-eye images Examples of methods often used in the above include a block matching method and an iterative gradient method. The following documents can be cited as documents explaining these methods.
Kawada et al .: “Improvement of motion compensated TV system conversion”, Journal of the Institute of Image Information and Television Engineers, Vol. 51, No. 9 (1997), pp. 1577-1586.
[0003]
In the case of motion estimation, these methods are based on dividing an image into a large number of blocks of a certain small size and comparing the current frame and the previous frame for each block to obtain motion. In the case of stereo matching, the “current frame” and “previous frame” may be replaced with “left eye image” and “right eye image”, so the invention of the present application focuses on the case of motion estimation. A detailed description of stereo matching will be omitted.
[0004]
[Problems to be solved by the invention]
However, in the above-described image matching processing, there are cases where the matching can be performed correctly and cases where the matching is not possible depending on the design of the input video. For example, the iterative gradient method can be explained as follows.
[0005]
The motion vector v (for each block in the image) obtained by the iterative gradient method is obtained by the following equation (1) with the initial displacement motion vector v ₀ (see the above document).
v = Δv + v ₀ ... (1)
Here, the horizontal and vertical components Δv _x , Δv _y of the difference vector Δv are expressed by the horizontal gradient Δx of the pixel value, the vertical gradient Δy, and the difference Δt between motion correction fields (frames) based on the initial displacement motion vector v _{0, as} follows: (2) and (3). The sum is applied to all pixels in the block.

Here, the initial excursion motion vector v ₀ is determined by matching using motion vectors of peripheral blocks obtained in the past as candidates (see the above-mentioned document for details).
[0006]
In the above formulas (2) and (3), especially when the denominator is small, it becomes close to division by zero, so that a large error may occur due to a slight disturbance factor such as noise.
[0007]
A particularly problematic example is the case where a regular repeating pattern exists in the pattern. In this case, since the images can be matched with a large number of motion vectors, a motion vector different from the actual motion is obtained due to noise or the like. As a result, in the case of system conversion, the interpolated image may be extremely deteriorated.
[0008]
In the iterative gradient method, since the motion is repetitively obtained by using the gradient of the image plane, if the correlation between frames is small, it becomes difficult to obtain the motion. Particularly problematic from this aspect is a scene shot with a high-speed shutter. Since moving objects are separated between images that are temporally adjacent, the tendency to become unable to capture movement is increased.
[0009]
As mentioned above, images that have a trade-off relationship in terms of whether the matching process is easy or not, such as images with regular repeating patterns in the pattern and images with small correlation between frames Exists.
[0010]
The object of the present invention has been made in view of the above-mentioned problems of the prior art, and it is possible to achieve both of them appropriately even in the case where there are mixed patterns that are difficult to achieve appropriate matching processing as described above. To provide a matching device.
[0011]
[Means for Solving the Problems]
In order to achieve the above-described object, the present invention automatically estimates a motion in a moving image, or automatically detects a corresponding point between a stereo image composed of a left-eye image and a right-eye image. In the matching device, the difference vector Δv obtained based on the difference between the motion correction fields (frames) based on the horizontal gradient, the vertical gradient, and the initial displacement vector of the pixel value of the image is multiplied by the conversion parameter α, and the multiplication result and A matching unit for calculating a vector v by adding the initial displacement vector v0 and performing a matching process of a video by an iterative gradient method , and a feature amount relating to a distribution of a matching information signal (vector) output from the matching unit based a feature amount extracting section which extracts, the conversion parameter α has two or more values of the selectable size, the feature amount, the conversion path Comprising a conversion parameter determining means for determining a magnitude of the meter alpha, wherein the conversion parameter alpha, the magnitude thereof is determined based on a result of the feature amount is compared with a predetermined threshold value, the matching means, A first feature is that matching processing is performed using the conversion parameter α determined by the conversion parameter determination means. In addition, in place of the feature quantity extracting means for extracting the feature quantity related to the distribution of the matching information signal (vector) output from the matching means, a feature quantity extracting means for extracting the feature quantity from the content of the video is provided. There are two features.
According to these features, it becomes possible to adaptively determine the optimum matching parameter for the image, and the matching process can be performed using the optimum parameter, so that a more accurate matching process can be performed. Become.
[0012]
Further, in the present invention, the matching unit is determined by the conversion parameter determining unit as a difference vector obtained based on a difference between motion correction fields (frames) based on a horizontal gradient, a vertical gradient, and an initial deviation vector of a pixel value. A third feature is that video matching processing is performed by the iterative gradient method, in which a vector is calculated by multiplying the conversion parameter and adding the multiplication result and the initial deviation vector. Further, the matching means adds or subtracts a certain number to the difference vector, and calculates the vector by adding the added value or the subtracted value and an initial deviation vector, and matching the video by the iterative gradient method There is a fourth feature in that processing is performed.
[0013]
According to these features, it is possible to control how the vector is determined (vector convergence speed in the iterative gradient method).
[0014]
In the present invention, the matching means includes means for determining whether a denominator for calculating the difference vector is smaller than a predetermined threshold value. A fifth feature is that the conversion parameter is determined so as to reduce the degree. The matching means includes means for determining whether a denominator for calculating the difference vector is smaller than a predetermined threshold value, and when the difference vector is smaller than the threshold value, the contribution degree of the difference vector becomes small. Thus, the sixth feature is that the number to be added or subtracted is determined.
[0015]
According to these features, it is possible to suppress generation of an erroneous estimation vector due to noise or the like.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the drawings. First, the principle of the present invention will be described.
[0017]
The outline of image motion estimation and image stereo matching is as follows. That is, image motion estimation is a process of estimating the motion of each part of a moving image (video) in motion compensation predictive coding or television motion correction method conversion. Usually, the screen is divided into a large number of blocks, and the motion is obtained for each block. There are various cases where the block size is 16 pixels × 16 lines, or 4 pixels × 2 lines.
[0018]
The stereo matching of images uses two cameras and obtains a set of images of a left eye image and a right eye image. This may be a still image or a moving image. Then, it is determined by matching which part in the left eye image corresponds to which part in the right eye image. As a result, the final purpose of the stereo matching process is to estimate how far each part in the image is from the camera. For this stereo matching, for example, the following document is helpful.
Morio Onoe “Image Processing Handbook” (Shododo) (near 395 pages).
[0019]
Since the above-mentioned image motion estimation requires matching between the current frame and the previous frame, these motion estimation and stereo matching are quite similar processing as matching processing. Therefore, the description will be continued below by taking an image motion estimation process as an example.
[0020]
A typical method for performing an image motion estimation method is an iterative gradient method. For details, see Kawada et al., “Improvement of Motion Compensated TV System Conversion”, Journal of the Institute of Image Information and Media Studies, Vol. 51, No. 9 (1997). The motion vector v obtained by the iterative gradient method is expressed as shown in equations (1), (2), and (3) as described in the document.
[0021]
As described above, in the equations (2) and (3), when the denominator is small, a large error may occur due to a slight disturbance factor such as noise. Therefore, in the present invention, when the denominators of Expressions (2) and (3) are small, Δv of the first term on the right side of Expression (1) is multiplied by a conversion parameter α smaller than 1, and the following expression ( Follow 4).
v = α · Δv + v ₀ (where α _x <1, α _y <1) (4)
[0022]
It is possible to control the process by setting a conversion parameter α as shown in Equation (4). Conventionally, these conversion parameters have been fixed. According to the first principle of the present invention, an optimal parameter is dynamically obtained by analyzing a pattern of an image and a matching result, and an accurate matching process more suitable for a scene is made possible. is there.
[0023]
Next, the iterative gradient method has the property that when a certain scene is reached, a correct motion vector is not obtained immediately but converges iteratively as shown in equation (4). For this reason, if the correlation between frames is small as in a scene shot with a high-speed shutter, it is difficult to find a motion when α <1. Therefore, for example, the second principle of the present invention is that an optimum parameter is quickly obtained even for an image having a small correlation between frames to enable accurate matching processing.
[0024]
These first and second principles can be applied not only to the iterative gradient method but also to other motion estimation methods such as a block matching method. In the above description, the difference vector Δv in equation (1) is multiplied by a conversion parameter α smaller than one, but a certain constant may be subtracted or added from the difference vector Δv. .
[0025]
Next, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the first embodiment of the present invention.
[0026]
As shown in the figure, the matching device 1 includes a matching unit 11 such as the iterative gradient method, a feature amount extraction unit 12 that extracts a feature amount (variance or the like) of a vector output from the matching unit 11, and the extraction. The parameter determination unit 13 determines the parameter α based on the feature amount. An output vector r which is a matching information signal obtained from the matching device 1 is sent to the television system conversion unit 2. The television system conversion unit 2 converts, for example, an NTSC system input video p into a PAL system using the output vector r, and outputs a PAL system output video q. Note that the television system conversion unit 2 is an example, and instead of this, a motion compensation encoding unit may be provided and the output vector r may be used for motion compensation encoding. When the input video p is a pair of images of a left eye image and a right eye image, the output vector r can be used for stereo matching processing.
[0027]
Next, the operation of the present embodiment will be described with reference to the flowchart of FIG. In step S1, a parameter that slows down the convergence of the motion vector, for example, α = (α _x , α _y ) = (0.1, 0.2) is set in the matching unit 11 as an initial conversion parameter. Next, when the input image p is captured by the matching device 1 in a predetermined processing unit, for example, a block unit or a field unit, in step S2, the matching unit 11 performs motion estimation of the processing unit by an iterative gradient method. That is, motion estimation is performed by using the α and the equation (4).
[0028]
Next, in step S3, the feature quantity extraction unit 12 extracts a feature quantity, for example, a variance of the vector size and a standard deviation, from the distribution of the motion vectors obtained by the motion estimation. That is, it is obtained by calculation. In step S4, the parameter determination unit 13 determines a conversion parameter α to be applied to the next processing unit based on the feature value. When the feature amount is the variance or the standard deviation, a larger conversion parameter α (for example, α = 1) is determined when the feature amount is equal to or greater than a predetermined threshold. On the other hand, when the value is smaller than the threshold, the initial conversion parameter value is maintained or determined.
[0029]
In step S5, it is determined whether or not the motion estimation processing has been completed for all processing units. If this determination is negative, the process proceeds to step S6, and the input video p of the next processing unit is captured. Next, returning to step S2, motion estimation of the processing unit is performed by the iterative gradient method.
[0030]
The above process is repeated until the determination in step S5 becomes affirmative. When the determination is affirmative, the motion estimation process by scene adaptive dynamic parameter control ends.
[0031]
Therefore, according to the present embodiment, since the conversion parameter α can be varied according to the feature quantity of the motion vector, when the input video p does not change greatly between frames, for example, a regular repetitive pattern has a pattern. The conversion parameter α is determined to be small, for example, when there is a small correlation between frames, and when the moving object is separated between adjacent images, the conversion parameter α is determined to be large. As a result, even in the case where a pattern that makes it difficult to achieve appropriate matching processing coexists, it is possible to achieve appropriate compatibility.
[0032]
Next, a second embodiment of the present invention will be described with reference to the block diagram of FIG. In this embodiment, the matching device 3 includes a matching unit 31, a feature amount extraction unit 32 that extracts a feature amount of the input video p, and a parameter determination unit 33 that determines a conversion parameter α from the extracted feature amount. There is a feature in the point.
[0033]
In this embodiment, the feature quantity extraction unit 32 extracts feature quantities, for example, changes in brightness of pixel values, variances, standard deviations, and the like from the input video p. The parameter determining unit 33 determines the conversion parameter to be large if the feature amount is equal to or greater than a predetermined threshold value, and conversely maintains the initial conversion parameter value as in the first embodiment if the feature amount is smaller than the threshold value. It is determined. Since this operation is the same as that of the first embodiment, description thereof is omitted.
[0034]
As described above, also in this embodiment, matching processing can be appropriately made compatible with a video signal in which a pattern in which it is difficult to make appropriate matching processing compatible is mixed.
[0035]
Next, a third embodiment of the present invention will be described with reference to FIG. In this embodiment, in the iterative gradient method, it is determined whether or not the denominator at the time of calculating the difference vector of the equations (2) and (3) is small, and thereby the parameter control is adaptively performed. FIG. 4 is a block diagram showing a specific configuration of the matching units 11 and 31.
[0036]
The matching unit of the present embodiment includes first and second arithmetic units 41 and 42 that calculate the numerators of the expressions (2) and (3) from the input image p, and a third that calculates the denominator of these expressions. The calculation unit 43, the fourth calculation unit 44 that performs the division of the formula (2), the fifth calculation unit 45 that performs the division of the formula (3), and the denominator value and the parameter obtained by the third calculation unit 43 An α _x determination unit 46, an α _y determination unit 47,

multiplication units

48 and 49, and addition units 50 and 51 that determine the conversion parameter α (α _x , α _y ) based on the conversion parameter α from the determination unit 13. Has been.
[0037]
In the present embodiment, the third computing unit 43 calculates the denominators of the equations (2) and (3), and if the denominator value is less than or equal to a predetermined threshold value, the α _x determining unit 46 The α _y determining unit 47 forcibly determines a small value (α _x , α _y ). Thus, when (α _x , α _y ) is small, a large error is prevented from occurring in motion estimation due to a slight disturbance factor such as noise. On the other hand, when the value of the denominator is larger than the threshold value, the α _x determination unit 46 and the α _y determination unit 47 determine the conversion parameter α determined by the

parameter determination units

13 and 33 as (α _x , α _y ). To do.
[0038]
Δv _x and Δv _y output from the fourth and fifth arithmetic units are respectively multiplied by α _x and α _y determined by the α _x determining unit 46 and α _y determining unit 47 in

multipliers

48 and 49, respectively. The multiplication results are added to v _0x and v _0y by the adders 50 and 51, respectively. As a result, an output vector r, that is, (v _x , v _y ) is obtained.
[0039]
As described above, according to the present embodiment, in the case of an input video such as a case where a regular repeating pattern is present in a pattern, it is forcibly determined as a small conversion parameter, and the contribution of the disturbance factor to the motion estimation is made. Since it is made smaller, the occurrence of errors in motion estimation due to slight disturbance factors such as noise can be reduced.
[0040]
In the description of the embodiment, Equation (4) is set to v = α · Δv + v ₀ (where α _x <1, α _y <1), but v = (Δv−P) + v ₀ (P is positive) Or v = (Δv + Q) + v ₀ (Q is a positive number), and P and Q are adaptively changed in the same manner as the conversion parameter α to change the contribution of Δv to motion estimation. It may be.
[0041]
The inventor incorporated the system of the present invention into a television system conversion algorithm, and evaluated the performance by computer simulation.
[0042]
In the TV system conversion, the SN ratio cannot be calculated for the original image and the converted image. Therefore, first, a test moving image of 625 lines and 50 images / second was converted into a moving image of 525 lines and 60 images / second, which was again converted back to a processed image of 625 lines and 50 images / second. Then, the PSNR of the processed image and the original image was calculated. The conversion / inverse conversion algorithm is exactly the same except for parameters relating to the line number ratio and field interpolation ratio.
[0043]
Two test images with different optimal conversion parameters, namely “Interview” with a grid pattern on the wall and “Carousel” taken with a high-speed shutter, are created by connecting 25 frames each, and the first half 50 fields Is the “Interview” scene, and the second half 50 fields is the “Carousel” scene (50 frames in total). As the feature quantity extracted by the feature quantity extraction unit 12 (see FIG. 1), the standard deviation of the magnitude of the generated motion vector in the previous field was used (one feature quantity for each field). Further, the parameter determination unit 13 sets an appropriate threshold value, and if the feature amount is larger than the threshold value, the conversion parameter of the next field is set to the motion priority type (equation (4), α = (1, 1)), If it is smaller, the static priority type (α = (0.1, 0.2) in equation (4)) is determined and adaptively variable. Note that the conversion parameters suitable for the “Interview” and “Carousel” scenes are the static priority type (α (0.1, 0.2) in equation (4)) and the motion priority type (equation ( In 4), α = (1, 1)).
[0044]
FIG. 5 shows a PSNR graph of the processed image when the television system conversion is performed by the

conventional systems

1 and 2, and FIG. 6 shows a PSNR graph of the processed image when the television system conversion is performed by the system of the present invention. FIG. 7 shows an average PSNR in each scene section in each of the above methods. The conventional method 1 in FIG. 5 uses the motion priority type as a conversion parameter in a fixed manner, and the conventional method 2 uses the stationary priority type as a conversion parameter in a fixed manner. On the other hand, in the method of the present invention, the motion priority type and the stationary priority type are used adaptively.
[0045]
As a result of this experiment, as can be seen from FIG. 5 and FIG. 7, in the conventional method 1, the “Interview” scene was greatly deteriorated, but the “Carousel” scene was well converted. Further, in the conventional method 2, the “Interview” scene was converted well, but the “Carousel” scene was greatly deteriorated. This is because an inappropriate conversion parameter is used in each deteriorated scene.
[0046]
On the other hand, in the method of the present invention, as can be seen from FIG. 6 and FIG. 7, it was confirmed that an appropriate conversion parameter α was automatically selected in both the “Carousel” and “Interview” scenes, and the conversion was satisfactory. . Also in FIG. 7, it can be seen that a better PSNR (average) is obtained than in the

conventional systems

1 and 2. Even in the system of the present invention, immediately after the scene change, the SN has decreased for a while. This is considered to be due to the fact that the degree of mismatch has increased at the part where different conversion parameters have been selected for conversion / inverse conversion.
[0047]
【The invention's effect】
As is apparent from the above description, according to the inventions of

claims

1 and 2, the matching information signal (vector) to be output and the contents of the input image signal are automatically analyzed to determine their feature amounts. By extracting, it is possible to adaptively determine the optimal conversion (matching) parameter for the image. As a result, a more accurate matching process can be performed by performing the matching process using the conversion parameter.
[0048]
According to the third and fourth aspects of the invention, the difference parameter is multiplied by the conversion parameter, and the initial deviation vector is added to the multiplication value to calculate the vector, or the number in the difference vector is added or subtracted. Since the initial displacement vector is added to the added value or the subtracted value to calculate the vector, it is possible to control how the vector is determined (vector convergence speed in the iterative gradient method).
[0049]
According to the fifth and sixth aspects of the present invention, it is determined whether or not the denominator for calculating the difference vector is smaller than a predetermined threshold, and when it is small, the conversion parameter is smaller than 1, or Since the number to be subtracted is increased or the number to be added is decreased, it is possible to suppress generation of an erroneous estimation vector due to noise or the like.
[0050]
Furthermore, according to the invention of claim 7, since the variance of the vector is calculated as the feature quantity of the matching information signal (vector) to be output, effective matching conversion parameter control can be performed.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the first embodiment.
FIG. 3 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.
FIG. 4 is a block diagram showing a configuration of a main part of a third embodiment of the present invention.
FIG. 5 is a graph showing the PSNR of a processed image according to a conventional method.
FIG. 6 is a graph showing the PSNR of a processed image according to the method of the present invention.
FIG. 7 is a diagram showing PSNR [dB] and average PSNR for each scene in the

conventional systems

1 and 2 and the system of the present invention.
[Explanation of symbols]
DESCRIPTION OF

SYMBOLS

1,3 ... Matching apparatus, 2 ... Television system conversion part, 11, 31 ... Matching part, 12, 32 ... Feature-value extraction part, 13, 33 ... Parameter determination part, 41- 45 first to fifth calculation unit ···, 46 ··· α _x determination unit, 47 ··· α _y determination unit.

Claims

In an image matching device that automatically estimates movement in a moving image or automatically detects corresponding points between a stereo image composed of a left-eye image and a right-eye image,
A difference vector Δv obtained based on a difference between motion correction fields (frames) based on a horizontal gradient, a vertical gradient, and an initial displacement vector of an image pixel value is multiplied by a conversion parameter α, and the multiplication result and the initial displacement vector are multiplied. a matching unit for calculating a vector v by adding v0 and performing a matching process of an image by an iterative gradient method ;
A feature quantity extracting means for extracting a feature quantity relating to the distribution of the matching information signal (vector) output from the matching means;
The conversion parameter α has two or more selectable values, and includes conversion parameter determination means for determining the size of the conversion parameter α based on the feature amount,
The size of the conversion parameter α is determined based on a result of comparing the feature amount with a predetermined threshold, and the matching unit uses the conversion parameter α determined by the conversion parameter determination unit, An image matching apparatus characterized by performing a matching process.

In an image matching device that automatically estimates movement in a moving image or automatically detects corresponding points between a stereo image composed of a left-eye image and a right-eye image,
A difference vector Δv obtained based on a difference between motion correction fields (frames) based on a horizontal gradient, a vertical gradient, and an initial displacement vector of an image pixel value is multiplied by a conversion parameter α, and the multiplication result and the initial displacement vector are multiplied. a matching unit for calculating a vector v by adding v0 and performing a matching process of an image by an iterative gradient method ;
Feature amount extraction means for extracting feature amounts from the content of the video;
The conversion parameter α has two or more selectable values, and includes conversion parameter determination means for determining the size of the conversion parameter α based on the feature amount,
The size of the conversion parameter α is determined based on a result of comparing the feature amount with a predetermined threshold, and the matching unit uses the conversion parameter α determined by the conversion parameter determination unit, An image matching apparatus characterized by performing a matching process.

The matching means determines whether a denominator for calculating a difference vector obtained based on a difference between motion correction fields (frames) based on a horizontal gradient, a vertical gradient, and an initial deviation vector of a pixel value is smaller than a predetermined threshold. 3. The image processing apparatus according to claim 1, wherein the conversion parameter is determined so that a contribution of the difference vector is small when the difference is smaller than the threshold. Matching device.

2. The image matching apparatus according to claim 1, wherein the feature quantity of the matching information signal (vector) is a variance or a standard deviation of the vector.

The image matching apparatus according to claim 2 , wherein the feature amount of the content of the video is a variance or a standard deviation of pixel values .