JPH1013832A

JPH1013832A - Moving picture recognizing method and moving picture recognizing and retrieving method

Info

Publication number: JPH1013832A
Application number: JP16443096A
Authority: JP
Inventors: Junji Yamato; 淳司大和; Hiroshi Murase; 洋村瀬
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-06-25
Filing date: 1996-06-25
Publication date: 1998-01-16

Abstract

PROBLEM TO BE SOLVED: To directly recognize and retrieve a specific pattern from compressed moving picture data through the use of the parameter of maximum likelihood with respect to a symbol string by extracting a DCT coefficient from image data and transforming its feature vector string to the symbol string. SOLUTION: One frame of image data is divided by MPEG data 1 to obtain the DCT(discrete cosine transformation) coefficient of each unit. Next, a feature extraction part 2 fetches the frame feature vector of a low frequency component. Similarly, the feature vector string of a series of moving picture data is fetched and recorded in a memory for storing feature 3. Next, a quantization part 4 vector-quantizes it and records the symbol string in a symbol storing memory 5. A model parameter estimating part 6 estimates the parameter of such a state transition model as generates this symbol string and records it in a state transmission model storing memory for recognition 7. A likelihood calculating part 8 estimates the parameter of the model of high likelihood for each category of a recognizing object and stores it in a memory for a recognizing result 9. Thereby the specific moving picture pattern is recognized and retrieved from compressed moving picture data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、動画像認識方法お
よび動画像認識検索方法に係わり、特に、一連の動画像
を表示する各画面の画像データの中から特定の動画像パ
ターンを認識・検索を行う動画像認識方法および動画像
認識検索方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a moving image recognizing method and a moving image recognizing and retrieving method, and more particularly, to recognizing and retrieving a specific moving image pattern from image data of each screen displaying a series of moving images. To a moving image recognition method and a moving image recognition search method.

【０００２】[0002]

【従来の技術】動画像を対象としたパターン認識技術
は、近年多くの研究が行われており、その一つとして、
下記公報（イ）に記載されている手法が公知である。2. Description of the Related Art In recent years, a great deal of research has been conducted on pattern recognition technology for moving images.
The technique described in the following publication (a) is known.

【０００３】（イ）特開平５−４６５８３号公報前記公報（イ）（特開平５−４６５８３号公報）には、
動画像を表示する各画面の画像データから抽出した動物
体のメッシュ特徴をベクトル量子化によりシンボル化し
て、動画像系列をシンボル系列に変換し、当該シンボル
系列を学習・認識することにより、人間等の動物体の各
動作を認識する手法が記載されている。(A) Japanese Patent Application Laid-Open No. 5-46583 The above-mentioned Japanese Patent Application Publication (A) (Japanese Patent Application Laid-Open No. 5-46583) includes:
A mesh feature of a moving object extracted from image data of each screen displaying a moving image is symbolized by vector quantization, a moving image sequence is converted into a symbol sequence, and the symbol sequence is learned and recognized, thereby enabling a human or the like. A method for recognizing each motion of the moving object is described.

【０００４】また、マルチメディアの中核技術を構成す
る、動画像データの蓄積あるいは伝送の際の情報圧縮技
術として、ＭＥＰＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥ
ｘｐｅｒｔｓＧｒｏｕｐ；メディア統合系動画像圧縮
の国際標準）、ＭＥＰＧ２といった国際標準符号化方式
が普及しつつある。[0004] In addition, as a data compression technology for storing or transmitting moving image data, which constitutes a core technology of multimedia, an MPEG (Moving Picture E) is used.
International standard encoding schemes such as xparts Group (international standard for media-integrated moving image compression) and MPEG2 are becoming widespread.

【０００５】[0005]

【発明が解決しようとする課題】前記公報（イ）（特開
平５−４６５８３号公報）に記載されている手法のよう
に、従来一連の動画像の中から、特定の動画像パターン
をその動画像パターン自体を検索キーとして検索する場
合は、大容量の画像データおよび特徴量データを取り扱
う必要があり、データ処理の処理時間が増大するという
問題点があった。As described in the above publication (A) (Japanese Patent Application Laid-Open No. 5-46583), a specific moving image pattern is selected from a conventional series of moving images by using the moving image. When a search is performed using the image pattern itself as a search key, it is necessary to handle a large amount of image data and feature amount data, and there is a problem in that the processing time of data processing increases.

【０００６】また、ＭＥＰＧ、ＭＥＰＧ２等の標準符号
化方式が普及しつつあり、一連の動画像の中から、特定
の動画像パターンをその動画像パターン自体を検索キー
として検索する場合に、この標準符号化方式により圧縮
された動画像データを使用することにより、データ処理
の処理時間を短縮することが期待される。[0006] In addition, standard encoding methods such as MEPG and MEPG2 are becoming widespread, and when a specific moving image pattern is searched from a series of moving images using the moving image pattern itself as a search key, this standard coding method is used. By using moving image data compressed by the encoding method, it is expected that the processing time of data processing will be reduced.

【０００７】しかしながら、標準符号化方式により圧縮
された動画像データを対象として、一連の動画像の中か
ら特定の動画像パターンを検索する最適な手法につい
て、従来何ら検討されていなかった。However, there has been no study on an optimal method for searching for a specific moving image pattern from a series of moving images by using moving image data compressed by the standard encoding method.

【０００８】本発明は、前記問題点を解決するためにな
されたものであり、本発明の目的は、動画像認識方法に
おいて、標準符号化方式等により圧縮された動画像デー
タを使用し、データ処理時間を短縮することが可能とな
る技術を提供することにある。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a method for recognizing a moving image using moving image data compressed by a standard encoding method or the like. An object of the present invention is to provide a technology capable of shortening a processing time.

【０００９】本発明の他の目的は、動画像認識検索方法
において、標準符号化方式等により圧縮された動画像デ
ータを使用し、データ処理時間を短縮することが可能と
なる技術を提供することにある。[0009] Another object of the present invention is to provide a technique capable of shortening the data processing time by using moving picture data compressed by a standard coding method or the like in a moving picture recognition and retrieval method. It is in.

【００１０】本発明の前記目的並びにその他の目的及び
新規な特徴は、本明細書の記載及び添付図面によって明
らかにする。The above and other objects and novel features of the present invention will become apparent from the description of the present specification and the accompanying drawings.

【００１１】[0011]

【課題を解決するための手段】本願において開示される
発明のうち、代表的なものの概要を簡単に説明すれば、
下記の通りである。SUMMARY OF THE INVENTION Among the inventions disclosed in the present application, the outline of a representative one will be briefly described.
It is as follows.

【００１２】（１）一連の動画像の動画像パターンを認
識する動画像認識方法において、一連の動画像を表示す
る各画面の画像データをＭ×Ｎのブロックに区切り、各
ブロックのＤＣＴ係数を抽出するステップと、前記ＤＣ
Ｔ係数の少なくとも１つを各画面の特徴ベクトルとして
抽出するステップと、特定の動画像パターンを表示する
各画面の特徴ベクトルで構成される時系列の特徴ベクト
ル列により、確率的な状態遷移モデルを、認識キーとな
る複数の特定の動画像パターン毎に学習するステップ
と、認識対象である一連の動画像を表示する各画面の画
像データから抽出された特徴ベクトルで構成される時系
列の特徴ベクトル列の、前記学習により得られた複数の
状態遷移モデルに対する尤度が最大となる状態遷移モデ
ルの動画像パターンを認識結果として出力するステップ
とを具備することを特徴とする。(1) In a moving image recognition method for recognizing a moving image pattern of a series of moving images, image data of each screen for displaying a series of moving images is divided into M × N blocks, and a DCT coefficient of each block is calculated. Extracting, said DC
A step of extracting at least one of the T coefficients as a feature vector of each screen, and a time-series feature vector sequence composed of feature vectors of each screen displaying a specific moving image pattern form a stochastic state transition model. Learning for each of a plurality of specific moving image patterns serving as a recognition key, and a time-series feature vector composed of feature vectors extracted from image data of each screen displaying a series of moving images to be recognized. Outputting, as a recognition result, a moving image pattern of a state transition model in which the likelihood of the sequence with respect to the plurality of state transition models obtained by the learning is maximized.

【００１３】（２）前記（１）の手段において、前記認
識対象である一連の動画像を表示する各画面の画像デー
タが、標準符号化方式により圧縮されており、各画面の
特徴ベクトルとして、標準符号化方式により圧縮された
各画面の画像データ中に含まれるＤＣＴ係数の一部を使
用することを特徴とする。(2) In the means of (1), the image data of each screen displaying the series of moving images to be recognized is compressed by a standard encoding method, and the feature vector of each screen is It is characterized in that a part of the DCT coefficients included in the image data of each screen compressed by the standard encoding method is used.

【００１４】（３）前記（１）の手段において、前記各
画面の特徴ベクトルとして、ＤＣＴ係数とともに動きベ
クトルを使用することを特徴とする。(3) In the means of the above (1), a motion vector is used together with a DCT coefficient as a feature vector of each screen.

【００１５】（４）前記（３）の手段において、前記認
識対象である一連の動画像を表示する各画面の画像デー
タが、標準符号化方式により圧縮されており、各画面の
特徴ベクトルとして、標準符号化方式により圧縮された
各画面の画像データ中に含まれるＤＣＴ係数の一部、お
よび、動き補償ベクトルを使用することを特徴とする。(4) In the means of (3), the image data of each screen displaying the series of moving images to be recognized is compressed by a standard encoding method, and the feature vector of each screen is It is characterized by using a part of DCT coefficients included in image data of each screen compressed by the standard encoding method and a motion compensation vector.

【００１６】（５）一連の動画像の中から、特定の動画
像パターンを含む時間領域を抽出する動画像認識検索方
法において、一連の動画像を表示する各画面の画像デー
タをＭ×Ｎのブロックに区切り、各ブロックのＤＣＴ係
数を抽出するステップと、前記ＤＣＴ係数の少なくとも
１つを各画面の特徴ベクトルとして抽出するステップ
と、検索キーとなる特定の動画像パターンを表示する各
画面の特徴ベクトルで構成される時系列の特徴ベクトル
列により、確率的な状態遷移モデルを学習するステップ
と、検索対象である一連の動画像を表示する各画面の画
像データから抽出された特徴ベクトルで構成される時系
列の特徴ベクトル列の中で、前記学習により得られた状
態遷移モデルに対する尤度が高い時間領域を検索結果と
して出力するステップとを具備することを特徴とする。(5) In a moving image recognition / retrieval method for extracting a time region including a specific moving image pattern from a series of moving images, image data of each screen displaying a series of moving images is M × N. Extracting a DCT coefficient of each block into blocks, extracting at least one of the DCT coefficients as a feature vector of each screen, and a feature of each screen displaying a specific moving image pattern serving as a search key. It consists of a step of learning a stochastic state transition model by a time-series feature vector sequence composed of vectors, and a feature vector extracted from image data of each screen displaying a series of moving images to be searched. A step of outputting, as a search result, a time domain having a high likelihood with respect to the state transition model obtained by the learning in the time-series feature vector sequence. Characterized by including and.

【００１７】（６）前記（５）の手段において、前記検
索対象である一連の動画像を表示する各画面の画像デー
タが、標準符号化方式により圧縮されており、各画面の
特徴ベクトルとして、標準符号化方式により圧縮された
各画面の画像データ中に含まれるＤＣＴ係数の一部を使
用することを特徴とする。(6) In the means of (5), the image data of each screen displaying the series of moving images to be searched is compressed by a standard encoding method, and the feature vector of each screen is It is characterized in that a part of the DCT coefficients included in the image data of each screen compressed by the standard encoding method is used.

【００１８】（７）前記（５）の手段において、前記各
画面の特徴ベクトルとして、ＤＣＴ係数とともに動きベ
クトルを使用することを特徴とする。(7) In the means of the above (5), a motion vector is used together with a DCT coefficient as a feature vector of each screen.

【００１９】（８）前記（７）の手段において、前記検
索対象である一連の動画像を表示する各画面の画像デー
タが、標準符号化方式により圧縮されており、各画面の
特徴ベクトルとして、標準符号化方式により圧縮された
各画面の画像データ中に含まれるＤＣＴ係数の一部、お
よび、動き補償ベクトルを使用することを特徴とする。(8) In the means of (7), the image data of each screen displaying the series of moving images to be searched is compressed by a standard encoding method, and the feature vector of each screen is It is characterized by using a part of DCT coefficients included in image data of each screen compressed by the standard encoding method and a motion compensation vector.

【００２０】前記各手段によれば、特徴量としてＤＣＴ
係数、あるいはＤＣＴ係数および動き補償ベクトルを使
用し、ＭＥＰＧ，ＭＥＰＧ２等の標準符号化方式で圧縮
された少容量の動画像データから、特定の動画像パター
ンを直接認識・検索するようにしたので、データ処理の
処理時間を少なくすることが可能となる。According to each of the above-mentioned means, DCT is used as the characteristic amount.
A specific moving image pattern is directly recognized and searched from a small amount of moving image data compressed by a standard encoding method such as MEPG or MEPG2 using a coefficient or a DCT coefficient and a motion compensation vector. The processing time of data processing can be reduced.

【００２１】[0021]

【発明の実施の形態】以下、本発明の発明の実施の形態
を図面を参照して詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００２２】なお、発明の実施の形態を説明するための
全図において、同一機能を有するものは同一符号を付
け、その繰り返しの説明は省略する。In all the drawings for describing the embodiments of the present invention, components having the same function are denoted by the same reference numerals, and their repeated description will be omitted.

【００２３】図１は、本発明の一発明の実施の形態であ
る動画像認識方法および動画像認識検索方法が適用され
る動画像認識検索装置の概略構成を示す機能ブロック図
である。FIG. 1 is a functional block diagram showing a schematic configuration of a moving image recognition and search apparatus to which a moving image recognition method and a moving image recognition and search method according to an embodiment of the present invention are applied.

【００２４】図１において、１はＭＥＰＧデータ、２は
特徴抽出部、３は特徴格納用メモリ、４は量子化部、５
はシンボル格納メモリ、６はモデルパラメータ推定部、
７は認識用状態遷移モデル格納メモリ、８は尤度算出
部、９は認識結果用メモリである。In FIG. 1, 1 is MEPG data, 2 is a feature extraction unit, 3 is a feature storage memory, 4 is a quantization unit, 5
Is a symbol storage memory, 6 is a model parameter estimator,
7 is a memory for storing a state transition model for recognition, 8 is a likelihood calculator, and 9 is a memory for a recognition result.

【００２５】ここで、前記認識用状態遷移モデル格納メ
モリ７および認識結果用メモリ９としては、例えば、外
部記憶装置を使用し、また、前記ＭＥＰＧデータ１は、
例えば、外部記憶装置に格納されている。Here, for example, an external storage device is used as the recognition state transition model storage memory 7 and the recognition result memory 9, and the MEPG data 1 is
For example, it is stored in an external storage device.

【００２６】本発明の実施の形態の基本的動作には、学
習と認識の３つの段階があり、学習時には、学習用のデ
ータから認識用状態遷移モデルのパラメータ推定を行い
認識カテゴリ（図１に示すカテゴリ１〜カテゴリ６）毎
に認識用状態遷移モデル格納メモリ７に格納する。The basic operation of the embodiment of the present invention has three stages of learning and recognition. At the time of learning, parameters of a state transition model for recognition are estimated from learning data, and a recognition category (see FIG. 1). Each of the categories 1 to 6 shown) is stored in the recognition state transition model storage memory 7.

【００２７】また、認識時には、学習によって認識用状
態遷移モデル格納メモリ７に格納された、各カテゴリに
対応するモデルの尤度を算出し、最大の尤度を持つモデ
ルに対応するカテゴリを認識結果とする最尤推定を行
う。At the time of recognition, the likelihood of the model corresponding to each category stored in the recognition state transition model storage memory 7 by learning is calculated, and the category corresponding to the model having the maximum likelihood is determined. Is performed.

【００２８】本発明の実施の形態の動画像認識方法およ
び動画像認識検索方法において、量子化までの処理は学
習時、認識時とも同一である。In the moving picture recognition method and the moving picture recognition and retrieval method according to the embodiment of the present invention, the processing up to quantization is the same at the time of learning and recognition.

【００２９】以下、図１に沿って、本発明の実施の形態
動画像認識方法および動画像認識検索方法について説明
する。A moving image recognition method and a moving image recognition search method according to an embodiment of the present invention will be described below with reference to FIG.

【００３０】まず、検索対象のＭＥＰＧデータ１から、
特徴抽出部２により、特徴ベクトルとして、ＤＣＴ係数
を抽出する。First, from the MPEG data 1 to be searched,
The feature extraction unit 2 extracts a DCT coefficient as a feature vector.

【００３１】ここで、ＭＥＰＧデータ１について、簡単
に説明する。Here, the MEPG data 1 will be briefly described.

【００３２】ＭＥＰＧ標準化符号化方式では、フレーム
内では８×８画素のブロック毎のＤＣＴ（離散コサイン
変換；ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏ
ｒｍ）係数と量子化により、また、フレーム間では動き
補償ベクトル情報を使用して、データを圧縮している。In the MPEG standardized coding system, DCT (Discrete Cosine Transform: Discrete Cosine Transform) for each block of 8 × 8 pixels in a frame.
rm) Data is compressed using coefficients and quantization, and using motion compensation vector information between frames.

【００３３】また、通常のＭＥＰＧデータ１の各フレー
ムは、Ｉピクチャ，Ｐピクチャ，Ｂピクチャの３種類の
いずれかのタイプの符号化データで構成される。Each frame of normal MPEG data 1 is composed of coded data of any one of three types of I picture, P picture and B picture.

【００３４】なお、Ｉピクチャはフレーム内符号化、Ｐ
ピクチャは順方向フレーム間予測符号化、Ｂピクチャは
双方向フレーム間予測符号化を意味する。It should be noted that the I picture is intra-frame coded,
A picture means forward inter-frame predictive coding, and a B picture means bidirectional inter-frame predictive coding.

【００３５】通常のシーケンスでは、１つのＧＯＰ（Ｇ
ｒｏｕｐｏｆＰｉｃｔｕｒｅ）は、Ｉピクチャで始ま
り、画像の動きの激しさや要求画質等に応じて、適当な
間隔でＰピクチャあるいはＢピクチャを配置する。In a normal sequence, one GOP (G
(loop of Picture) starts with an I picture, and arranges a P picture or a B picture at an appropriate interval according to the intensity of motion of an image, required image quality, and the like.

【００３６】本発明の実施の形態では、ＤＣＴ係数を利
用するために、全てのフレームをＩピクチャである画像
データに変換して使用する。In the embodiment of the present invention, in order to use the DCT coefficients, all the frames are converted into image data which is I pictures and used.

【００３７】なお、Ｉ，Ｐ，Ｂピクチャから構成される
ＭＥＰＧデータ１からＩピクチャへのへの変換は、例え
ば、下記文献（ロ）に記載されているように、符号化デ
ータを直接操作することにより可能である。The conversion from the MPEG data 1 composed of I, P, and B pictures to an I picture is performed by directly operating the encoded data as described in, for example, the following document (b). This is possible.

【００３８】（ロ）Shin-Fu Chang and David G. Messe
rchmitt:“A New Approach to Decoding and Compositi
ng Motion-Compensated DCT-Based Images”,Proceedin
gs ofICASSP'93(1993).図２は、ＭＥＰＧデータ１およ
びＭＥＰＧデータ１のＤＣＴ係数の概略構成を示す図で
ある。(B) Shin-Fu Chang and David G. Messe
rchmitt: “A New Approach to Decoding and Compositi
ng Motion-Compensated DCT-Based Images ”, Proceedin
gs ofICASSP'93 (1993). FIG. 2 is a diagram showing a schematic configuration of the MEPG data 1 and the DCT coefficient of the MEPG data 1.

【００３９】図２に示すように、ＭＥＰＧデータ１で
は、１フレームの画像データを、１ブロックが８×８画
素で構成されるＭ×Ｎブロックに分割し、そのブロック
単位にＤＣＴ演算を行い、それにより、図２の最下段の
ブロック内の数字１〜６４に示すＤＣＴ係数が得られ
る。As shown in FIG. 2, in the MPEG data 1, one frame of image data is divided into M × N blocks in which one block is composed of 8 × 8 pixels, and a DCT operation is performed for each block. As a result, DCT coefficients indicated by numerals 1 to 64 in the lowermost block in FIG. 2 are obtained.

【００４０】本発明の実施の形態では、この８×８画素
のブロックのＤＣＴ係数の内、低周波成分のＤＣＴ係数
（図３に示すＥ１の領域のＤＣＴ係数）を適当な数を取
り出し、これを全ブロックに対して行い、取り出された
ＤＣＴ係数全体を並べた数値列を、そのフレームの特徴
ベクトル（ｆ）とする。In the embodiment of the present invention, an appropriate number of DCT coefficients of low frequency components (DCT coefficients in the area E1 shown in FIG. 3) are extracted from the DCT coefficients of the block of 8 × 8 pixels. Is performed on all the blocks, and a numerical sequence in which all the extracted DCT coefficients are arranged is set as a feature vector (f) of the frame.

【００４１】仮に、３２画素×３２画素の画像を使用
し、各ブロックからｉ個のＤＣＴ係数を取り出すとする
と、全部で１６個のブロックがあるので、この場合の特
徴ベクトルの次元は１６ｉとなる。Assuming that an image of 32 pixels × 32 pixels is used and i DCT coefficients are extracted from each block, since there are 16 blocks in total, the dimension of the feature vector in this case is 16i. .

【００４２】ＭＰＥＧデータ１の１フレームの画像デー
タから１つの特徴ベクトル（ｆ）が得られるので、一連
の動画像を表示する連続したフレーム（画面）の画像デ
ータから特徴ベクトル列（Ｆ）が得られ、この特徴ベク
トル列（Ｆ）は、特徴格納用メモリ３に記録される。Since one feature vector (f) is obtained from one frame of image data of the MPEG data 1, a feature vector sequence (F) is obtained from continuous frame (screen) image data for displaying a series of moving images. The feature vector sequence (F) is recorded in the feature storage memory 3.

【００４３】なお、特徴ベクトル（ｆ）として使用する
ＤＣＴ係数は、低周波成分の適当な数のＤＣＴ係数以外
に、水平方向の１番目のライン上のＤＣＴ係数（図３に
示すＥ２の領域のＤＣＴ係数）、垂直方法の１番目のラ
イン上のＤＣＴ係数（図３に示すＥ３の領域のＤＣＴ係
数）、あるいは、直流成分を含む対角線上のＤＣＴ係数
（図３に示すＥ４の領域のＤＣＴ係数）を使用するよう
にしてもよい。The DCT coefficients used as the feature vector (f) are the DCT coefficients on the first line in the horizontal direction (in the region E2 shown in FIG. 3), in addition to the appropriate number of low frequency components. DCT coefficient), the DCT coefficient on the first line of the vertical method (the DCT coefficient in the area E3 shown in FIG. 3), or the DCT coefficient on the diagonal line including the DC component (the DCT coefficient in the area E4 shown in FIG. 3) ) May be used.

【００４４】水平方向の１番目のライン上のＤＣＴ係数
（図３に示すＥ２の領域のＤＣＴ係数）を特徴ベクトル
として使用することにより、動画像の特定パターンが主
に水平方向の動きが支配的である場合に、少ないＤＣＴ
係数で精度よく動画像の特徴を抽出することが可能であ
る。By using the DCT coefficient on the first line in the horizontal direction (the DCT coefficient in the area E2 shown in FIG. 3) as a feature vector, a specific pattern of a moving image is mainly dominated by horizontal movement. , The less DCT
It is possible to accurately extract a feature of a moving image using a coefficient.

【００４５】また、垂直方法の１番目のライン上のＤＣ
Ｔ係数（図３に示すＥ３の領域のＤＣＴ係数）を特徴ベ
クトルとして使用することにより、動画像の特定パター
ンが主に垂直方向の動きが支配的である場合に、少ない
ＤＣＴ係数で精度よく動画像の特徴を抽出することが可
能である。Also, the DC on the first line of the vertical method
By using a T coefficient (a DCT coefficient in an area E3 shown in FIG. 3) as a feature vector, when a specific pattern of a moving image is mainly dominated by vertical motion, a moving image can be accurately formed with a small number of DCT coefficients. It is possible to extract image features.

【００４６】また、直流成分を含む対角線上のＤＣＴ係
数（図３に示すＥ４の領域のＤＣＴ係数）を特徴ベクト
ルとして使用することにより、動画像の特定パターンが
水平方法および垂直方向の動きの両方を含んでいる場合
に、少ないＤＣＴ係数で精度よく動画像の特徴を抽出す
ることが可能である。Further, by using a DCT coefficient on a diagonal line including a DC component (a DCT coefficient in an area E4 shown in FIG. 3) as a feature vector, a specific pattern of a moving image can be used both in a horizontal direction and in a vertical direction. , It is possible to accurately extract the feature of the moving image with a small number of DCT coefficients.

【００４７】さらに、特徴ベクトル（ｆ）としては、Ｄ
ＣＴ係数と動き補償ベクトルとを併用することも可能で
あり、これにより、より詳細に動画像の特徴を抽出する
ことが可能となる。Further, as the feature vector (f), D
It is also possible to use the CT coefficient and the motion compensation vector together, and thereby it is possible to extract the feature of the moving image in more detail.

【００４８】この特徴ベクトル列（Ｆ）は、量子化部４
でベクトル量子化によって、シンボル列（Ｏ）へ変換さ
れ、シンボル格納メモリ５に記録される。This feature vector string (F) is
Is converted into a symbol sequence (O) by vector quantization and recorded in the symbol storage memory 5.

【００４９】即ち、各特徴ベクトルはあらかじめ用意さ
れた量子化のための代表点の一覧に基づき、それらの内
で最も距離の近い代表点ベクトルに対応するシンボルに
変換される。That is, each feature vector is converted into a symbol corresponding to the closest representative point vector among the representative points based on a list of representative points prepared for quantization in advance.

【００５０】この代表点群をコードブックと呼び、この
コードブックは、各種類の動作画像から抽出した特徴ベ
クトルの一部を利用して、下記文献（ハ）に記載されて
いるＬＢＧアルゴリズムで作成した。This representative point group is called a codebook, and this codebook is created by using the LBG algorithm described in the following document (c) using a part of the feature vectors extracted from each type of motion image. did.

【００５１】（ハ） Y.Linde, A.Buzo, R.M.Gray；“An
Algorithm for Vector Quantizer design”，IEEE Tra
ns.Commin. vol.COM-28（1980）.なお、このコードブッ
クの作成には、下記文献（ニ）に記載されているｋ−ｍ
ｅａｎ（ｋ−平均) アルゴリズムで作成してもよい。(C) Y. Linde, A. Buzo, RMGray;
Algorithm for Vector Quantizer design ”, IEEE Tra
ns.Commin. vol.COM-28 (1980). This codebook was created using km-m
It may be created by an ean (k-mean) algorithm.

【００５２】（ニ）X.D.Huang,Y.Ariki,M.A.Jack；“Hi
dden Markov Model for Speech Recognition”，Edinbu
rg Univ.Press（1990）．今、コードブックを下記
（１）式のように表現するとすると、特徴ベクトル
（ｆ）は、下記式（２）に示すシンボル（Ｏ_ｔ）に変換
される。(D) XDHuang, Y. Ariki, MAJack; “Hi
dden Markov Model for Speech Recognition ”, Edinbu
rg Univ. Press (1990). Now, assuming that the codebook is represented by the following equation (1), the feature vector (f) is converted into a symbol (O _t ) shown in the following equation (2).

【００５３】[0053]

【数１】Ｃ＝ｃ₁，ｃ₂，．．．．．ｃ_N ・・・・・（１）## EQU1 ## C = c ₁ , c ₂ ,. . . . . c _N ... (1)

【００５４】[0054]

【数２】Ｏ_t＝ｖ_k ・・・・・（２）ｋ＝ａｒｇｍｉｎ_jｄ（ｆ，ｃ_j）但し、ｄ（ｘ，ｙ）はｘ，ｙの距離ここまでの処理によって、特徴ベクトル列（Ｆ）がシン
ボル列（Ｏ）に変換され、このシンボル列（Ｏ）を、状
態遷移モデルにより、学習、認識を行う。[Number 2] _{_{O t = v k ····· (2}} ) k = argmin j d (f, c j) where, d (x, y) is x, the processing of the distance y so far, the feature vector The sequence (F) is converted into a symbol sequence (O), and the symbol sequence (O) is learned and recognized by a state transition model.

【００５５】なお、ここまでの動作については、認識
時、学習時ともに同一である。The operation up to this point is the same for both recognition and learning.

【００５６】この状態遷移モデルとしては、前記文献
（ニ）、あるいは、下記文献（ホ）に記載されている隠
れマルコフ（以下、ＨＭＭと称す。）モデルを使用す
る。As this state transition model, a Hidden Markov (hereinafter, referred to as HMM) model described in the above reference (d) or the following reference (e) is used.

【００５７】（ホ）中川聖一；“確率モデルによる音声
認識”，電子情報通信学会（1990）学習時には、前記ＨＭＭモデルのパラメータの推定を行
い、また、認識時には、認識するカテゴリ数だけ用意さ
れ、認識用状態遷移モデル格納メモリ７に格納されたＨ
ＭＭモデルの各々から、認識対象の特徴ベクトル列
（Ｆ）が生成される確率を尤度算出部８によって算出す
る。(E) Seiichi Nakagawa; "Speech Recognition by Stochastic Model", IEICE (1990). At the time of learning, the parameters of the HMM model are estimated. At the time of recognition, only the number of categories to be recognized is prepared. H stored in the recognition state transition model storage memory 7
From each of the MM models, the likelihood calculating unit 8 calculates the probability of generating the feature vector sequence (F) to be recognized.

【００５８】以下、ＨＭＭモデルについて、簡単に説明
する。Hereinafter, the HMM model will be briefly described.

【００５９】ＨＭＭモデルは、確率的な状態遷移モデル
であり、時系列現象の発生源のモデル化と見ることがで
きる。The HMM model is a stochastic state transition model, and can be regarded as modeling a source of a time series phenomenon.

【００６０】図４は、ＨＭＭモデルの概念を示す概念図
である。FIG. 4 is a conceptual diagram showing the concept of the HMM model.

【００６１】図４に示すように、ＨＭＭモデルには、複
数の状態（ｑ₁〜ｑ₅）が存在し、各状態（ｑ₁〜ｑ₅）か
ら他の状態へ遷移する確率（ａ_ij）が与えられている。As shown in FIG. 4, the HMM model has a plurality of states (q _{1 to} q ₅ ), and the probability (a _ij ) of transition from each state (q _{1 to} q ₅ ) to another state. Is given.

【００６２】時刻が進につれて状態遷移が確率的に発生
し、さらに、各状態から確率的にシンボル（Ｏ₁〜Ｏ_t）
が出力される。State transitions occur stochastically as time advances, and symbols (O _{1 to} O _t ) stochastically change from each state.
Is output.

【００６３】観測可能なのはこの出力シンボル列（Ｏ＝
Ｏ₁，Ｏ₂，．．．，Ｏ_t）であり、状態を直接観測する
ことができない。What can be observed is the output symbol sequence (O =
O ₁ , O ₂ ,. . . , O _t ), and the state cannot be directly observed.

【００６４】これが“隠れ”マルコフモデルの由来であ
る。This is the origin of the “hidden” Markov model.

【００６５】動作認識への適用においては、動作中にお
ける各姿勢が状態に当たり、従って、状態数は、認識対
象動作の長さや複雑さに応じて適当な数を選択する必要
がある。In the application to the motion recognition, each posture during the motion corresponds to a state. Therefore, it is necessary to select an appropriate number of states according to the length and complexity of the motion to be recognized.

【００６６】また、動作認識への適用においては、状態
遷移確率が姿勢変化の時系列パターン自身とその伸縮な
どの変化を、シンボル出力確率が、各姿勢の揺らぎや姿
勢の観測結果の揺らぎを記述する部分に当たると解釈で
きる。In the application to motion recognition, the state transition probability describes the time series pattern of the posture change itself and the change of expansion and contraction, and the symbol output probability describes the fluctuation of each posture and the fluctuation of the observation result of the posture. It can be interpreted that it corresponds to the part that does.

【００６７】ＨＭＭモデルは、以下のパラメータで記述
される。The HMM model is described by the following parameters.

【００６８】[0068]

【数３】Ｓ＝｛ｓ_t｝：状態の集合。ｓ_t はｔ番目の状
態（観測できない）Ｏ＝Ｏ₁ ，Ｏ₂ ，... ，Ｏ_T ；観測されたシンボル系列
（長さＴ）Ａ＝｛a _ij｜a _ij＝Pr（s _t+1 ＝j｜s _t ＝i）｝: 状態
遷移確率 a _ijは状態（ｓi）から状態（ｓj）へ遷移する確率Ｂ＝｛b_j（Ｏ_t）｜b_j（Ｏ_t）＝Pr（Ｏ_t｜s_t＝j）｝: シ
ンボル出力確率ｂ_j（ｋ）は状態（ｓｊ）においてシンボル（υ_k）を出
力する確率 π＝｛π_i｜π_i＝Ｐｒ（ｓ₁＝ｉ）｝: 初期状態確率次に、ＨＭＭモデルを使用した時系列パターン（シンボ
ル列（Ｏ））の学習と認識の手順について説明する。S = {s _t }: set of states. s _t is (not observable) t th state _{_{O = O 1, O 2,}} ..., O T; observed symbol sequence (length _{T) A = {a ij |} a ij = Pr (s t + _{_{1 = j | s t = i}} )}: state transition probability a _ij probability transition from state (si) to state (sj) is _{_{B = {b j (O t}} ) | b j (O t) = Pr (O _t | s _t = j)｝: Symbol output probability b _j (k) is the probability of outputting a symbol (υ _k ) in state (sj) π = {π _i | π _i = Pr (s ₁ = i)}: Initial State Probability Next, a procedure for learning and recognizing a time-series pattern (symbol sequence (O)) using the HMM model will be described.

【００６９】《学習時の手順》モデルパラメータ推定部
６は、各カテゴリ毎に複数与えられた学習用データから
得られたシンボル列（Ｏ）に対して、そのシンボル列
（Ｏ）を発生するような状態遷移モデルのパラメータを
推定し、認識用状態遷移モデル格納メモリ７に蓄える。<< Procedure at the time of learning >> The model parameter estimating unit 6 generates a symbol sequence (O) for a symbol sequence (O) obtained from a plurality of learning data provided for each category. The parameters of the appropriate state transition model are estimated and stored in the recognition state transition model storage memory 7.

【００７０】ＨＭＭモデルによる認識系は、各カテゴリ
毎に１つのＨＭＭモデルから構成される。The recognition system based on the HMM model includes one HMM model for each category.

【００７１】いま、認識対象の各カテゴリ毎のＨＭＭモ
デルをλ_i（＝｛Ａ_i，Ｂ_i，π_i｝）とすると、このλ_i
の学習を各カテゴリ毎の学習パターンを用いて行う。Assuming that the HMM model for each category to be recognized is λ _i (= {A _i , B _i , π _i }), this λ _i
Is performed using a learning pattern for each category.

【００７２】ここで、学習とは、学習パターンを発生し
やすいようなＨＭＭモデルのパラメータ、即ち、状態遷
移確率Ａ_i、シンボル出力確率Ｂ_iと初期状態確率π_iを
推定することに他ならない。Here, learning is nothing less than estimating the parameters of the HMM model that are likely to generate a learning pattern, that is, the state transition probability A _i , the symbol output probability B _i, and the initial state probability π _i .

【００７３】学習パターンからＨＭＭモデルのパラメー
タを推定するには、前記文献（ニ）、あるいは、文献
（ホ）に記載されているＢａｕｎ−Ｗｅｌｃｈアルゴリ
ズムを使用する。To estimate the parameters of the HMM model from the learning pattern, the Baun-Welch algorithm described in the above-mentioned reference (d) or (e) is used.

【００７４】具体的には、ある初期値から順に、より尤
度の高いＨＭＭモデルのパラメータを求めることを、尤
度の値、変化などから充分収束したと見なせるまで繰り
返す手続き、即ち、あるＨＭＭモデルのパラメータをも
とに、それよりもより尤度の高いモデルパラメータを求
めることを繰り返していく手続きである。More specifically, a procedure of repeatedly obtaining the parameters of the HMM model with a higher likelihood in order from a certain initial value until it can be considered that the parameters have been sufficiently converged from the likelihood value, change, etc., ie, a certain HMM model Is a procedure for repeatedly obtaining a model parameter having a higher likelihood based on the above parameter.

【００７５】繰り返し毎に、前記文献（ニ）に記載され
ているｆｏｒｗａｒｄアルゴリズムによって尤度の値を
確認することで収束の確認が可能である。At each repetition, the convergence can be confirmed by confirming the likelihood value by the forward algorithm described in the above reference (d).

【００７６】数式で表現すると、When expressed by a mathematical formula,

【００７７】[0077]

【数４】 (Equation 4)

【００７８】[0078]

【数５】 (Equation 5)

【００７９】[0079]

【数６】 (Equation 6)

【００８０】[0080]

【数７】 (Equation 7)

【００８１】但し、ここで、However, here,

【００８２】[0082]

【数８】 (Equation 8)

【００８３】[0083]

【数９】 (Equation 9)

【００８４】前記各式の意味するところは、（３）式
は、ＨＭＭモデルλのもとでのａ_ijの再評価であり、
（４）式は、ＨＭＭモデルλのもとでのｂ_i（ｋ）の再
評価である。The meaning of the above expressions is that expression (3) is a re-evaluation of a _ij under the HMM model λ.
Equation (4) is a re-evaluation of b _i (k) under the HMM model λ.

【００８５】前記した手続きによって、学習データに対
応する認識用状態遷移モデルのパラメータを求めること
ができる。By the above-described procedure, the parameters of the state transition model for recognition corresponding to the learning data can be obtained.

【００８６】こうして求めた各カテゴリ毎のモデルを認
識の際に使用する。The model for each category obtained in this way is used for recognition.

【００８７】《認識時の手順》認識の手順は、各ＨＭＭ
モデルの尤度計算と最大値の選択で行われる。<< Procedure for Recognition >> The procedure for recognition is as follows.
This is done by calculating the likelihood of the model and selecting the maximum value.

【００８８】認識対象のパターンに対して、λ_iが、認
識対象パターンであるシンボル列（Ｏ＝Ｏ₁，
Ｏ₂，．．．．，Ｏ_t）を出力する確率（尤度）Ｐｒ（Ｏ
｜λ_i）を計算する。For the pattern to be recognized, λ _i is a symbol sequence (O = O ₁ ,
O ₂ ,. . . . , O _t ), the probability (likelihood) Pr (O
| Λ _i ).

【００８９】尤度の計算は、前記文献（ニ）に記載され
ているｆｏｒｗａｒｄアルゴリズムによって再帰的に、
以下のようにして求めることができる。The calculation of the likelihood is performed recursively by the forward algorithm described in the above reference (d).
It can be obtained as follows.

【００９０】即ち、あるモデルλ＝｛Ａ，Ｂ，π｝がシ
ンボル系列（Ｏ＝Ｏ₁，Ｏ₂，... ，Ｏ_t）を出力する確
率Ｐｒ（Ｏ｜λｉ）は、That is, the probability Pr (O│λi) that a certain model λ = {A, B, π} outputs a symbol sequence (O = O ₁ , O ₂ ,..., O _t ) is

【００９１】[0091]

【数１０】 (Equation 10)

【００９２】ただし、ここで、Ｓ_Fは最終状態の集合、
α_T（ｉ）は、Here, S _F is a set of final states,
α _T (i) is

【００９３】[0093]

【数１１】 [Equation 11]

【００９４】で定義される値で、ＨＭＭモデルλがシン
ボル系列（Ｏ＝Ｏ₁，Ｏ₂，．．．．，Ｏ_t）を発生し
て、時間ｔにおいて状態（Ｓ_t＝ｉ）である確率であ
る。The HMM model λ generates a symbol sequence (O = O ₁ , O ₂ ,..., O _t ) and is in a state ( _St = i) at time t. Probability.

【００９５】これは、This is

【００９６】[0096]

【数１２】 (Equation 12)

【００９７】の漸化式で求められる。[0097] It is obtained by the recurrence formula.

【００９８】こうして求められた尤度が最大となるモデ
ル、即ち、式（１）から式（１１）で求めたＰｒ（Ｏ｜
λ_i）から、尤度最大のλ_iに対するカテゴリ（Ｇ_k）
（ｋ＝ａｒｇｍａｘ_iＰｒ（Ｏ｜λ_i））が認識結果とし
て選択され、認識結果用メモリ６に蓄えられる。The model with the maximum likelihood obtained in this way, that is, Pr (O |) obtained from Expressions (1) to (11)
λ _i ), the category (G _k ) for the maximum likelihood λ _i
(K = argmax _i Pr (O | λ _i )) is selected as a recognition result and stored in the recognition result memory 6.

【００９９】また、検索時には、検索対象となるＭＥＰ
Ｇデータ１のどの部分が、検索対象に対応するＨＭＭモ
デルに対して尤度最大となるかを、ＭＥＰＧデータ１の
中をスキャンして検索を行う。At the time of searching, the MEP to be searched is
A search is performed by scanning the MEPG data 1 to determine which part of the G data 1 has the maximum likelihood for the HMM model corresponding to the search target.

【０１００】この場合、効率的にＭＥＰＧデータ１の中
の最大尤度部分を求めるには、前記文献（ホ）に記載さ
れているＨＭＭスポッティングアルゴリズムを使用する
ことが可能である。In this case, in order to efficiently obtain the maximum likelihood portion in the MEPG data 1, it is possible to use the HMM spotting algorithm described in the above-mentioned document (e).

【０１０１】以上の処理フローから明らかなように、Ｈ
ＭＭモデルによる認識は最尤推定により行われ、また、
学習は、学習用データからのＨＭＭモデルのパラメータ
の推定という形で実現される。As is clear from the above processing flow, H
Recognition by the MM model is performed by maximum likelihood estimation.
The learning is realized in the form of estimating the parameters of the HMM model from the training data.

【０１０２】そして、シンボル系列全体から尤度計算が
行われるため、カテゴリに特有のシンボル列パターンが
現れていれば、時間軸方向の多少の移動、伸縮等に対し
て強いというメリットがある。Since the likelihood calculation is performed from the entire symbol sequence, if a symbol string pattern specific to the category appears, there is an advantage that it is resistant to some movement, expansion and contraction in the time axis direction.

【０１０３】また、動画像の時系列パターンの各時点ま
での尤度を求め、これに対して閾値処理等を施すことで
特定の時系列パターンの検索が可能となる。Further, the likelihood up to each time point of the time series pattern of the moving image is obtained, and a threshold processing or the like is performed on the likelihood, whereby a specific time series pattern can be searched.

【０１０４】次に、本発明の実施の形態に基づく実験結
果例として、テニス動作画像を対象とした２つの人物動
作確認実験結果について説明する。Next, as an example of an experimental result based on the embodiment of the present invention, a description will be given of an experimental result of confirming two persons' movements on a tennis movement image.

【０１０５】〔実験１〕本発明の実施の形態において、
実験１に使用したテニス動作画像の写真の一例を図５に
示す。[Experiment 1] In the embodiment of the present invention,
One example of a photograph of the tennis operation image used in Experiment 1 is shown in FIG.

【０１０６】図５の上段に示すテニス動作画像から、図
５の下段に示すように、背景差分により人物領域を抽出
し、この人物領域が抽出された画像例をもとに作成した
ＭＥＰＧデータを認識対象とし、ＤＣＴ計数を特徴量と
したときの認識性能を評価した。As shown in the lower part of FIG. 5, a person region is extracted from the tennis operation image shown in the upper part of FIG. 5 by background subtraction, and MEPG data created based on an image example in which the person region is extracted is extracted. The recognition performance was evaluated when the DCT count was used as a feature quantity, as a recognition target.

【０１０７】認識性能は、各ブロック（８×８画素）当
たりのＤＣＴ係数を、低次成分から順に１列づつ、即
ち、１、３、６、１０、１５、２１、２８個抽出して、
それぞれ実験を行い、認識率を求めた。The recognition performance is as follows. The DCT coefficients for each block (8 × 8 pixels) are extracted one by one from the low-order component, that is, 1, 3, 6, 10, 15, 21, and 28 DCT coefficients are extracted.
Each experiment was performed to determine the recognition rate.

【０１０８】なお、各ブロック当たりのＤＣＴ係数が１
のときは、ＤＣ成分のみとなる。The DCT coefficient for each block is 1
In the case of, there is only a DC component.

【０１０９】また、画像サイズは、１６×１６画素（マ
クロブロック単位で１×１ブロック）、３２×３２画素
（マクロブロック単位で２×２ブロック）の２種類とし
た。The image size was of two types, 16 × 16 pixels (1 × 1 block in macroblock units) and 32 × 32 pixels (2 × 2 blocks in macroblock units).

【０１１０】また、量子化のためのコードブックのサイ
ズは、各クラスサイズ８、６クラス合計で４８とし、Ｌ
ＢＧアルゴリズムにより作成し、ＨＭＭモデルの状態数
は１２、シンボル数は４８である。The size of the codebook for quantization is 48 in total for each class size of 8 and 6 classes.
The HMM model is created by the BG algorithm, and has 12 states and 48 symbols.

【０１１１】図６は、本発明の実施の形態の実験１で対
象とするテニス動作画像を示す写真である。FIG. 6 is a photograph showing a tennis operation image targeted in Experiment 1 of the embodiment of the present invention.

【０１１２】図６に示すように、対象としたテニス動作
は、バックハンドボレイ（ｂａｃｋ−ｖｏｌｌｅｙ）、
バックハンドストローク（ｂａｃｋ−ｓｔｒｏｋｅ）、
フォアハンドボレイ（ｆｏｒｅ−ｖｏｌｌｅｙ）、フォ
アハンドストローク（ｆｏｒｅ−ｓｔｒｏｋｅ）、スマ
ッシュ（ｓｍａｓｈ）、サービス（ｓｅｒｖｉｃｅ）の
６カテゴリである。As shown in FIG. 6, the target tennis operation includes back-hand volley, back-volley,
Back hand stroke (back-stroke),
There are six categories: fore-hand volley, fore-stroke, smash, and service.

【０１１３】６カテゴリの動作のそれぞれについて、１
０試行の動作画像データを収集し、このうちの５試行を
学習用データとして使用し、ＨＭＭモデルのパラメータ
の推定を行い、残りの５試行をテストデータとして、認
識実験を行った。For each of the six categories of operations, 1
The motion image data of 0 trials was collected, five trials among them were used as learning data, parameters of the HMM model were estimated, and a recognition experiment was performed using the remaining five trials as test data.

【０１１４】この場合に、１０試行のうちから５試行を
選択する選択方法を１０通りに変えて実験を行った。In this case, an experiment was conducted by changing the selection method for selecting 5 trials out of 10 trials into 10 different ones.

【０１１５】したがって、認識率は、５×１０×６＝３
００回の認識実験のうち何回成功したかで評価される。Therefore, the recognition rate is 5 × 10 × 6 = 3
It is evaluated based on how many of the 00 recognition experiments were successful.

【０１１６】この認識実験結果を、表１、表２に示す。Tables 1 and 2 show the results of this recognition experiment.

【０１１７】[0117]

【表１】 [Table 1]

【０１１８】[0118]

【表２】 [Table 2]

【０１１９】表１、表示２から理解できるように、特徴
量として使用するＤＣＴ係数を増やすことにより、認識
率が大きく向上しており、比較的低周波成分のＤＣＴ係
数が人物動作の画像認識のための特徴量として有効であ
ることがわかった。As can be understood from Table 1 and Display 2, the recognition rate is greatly improved by increasing the DCT coefficient used as the feature quantity, and the DCT coefficient of a relatively low frequency component is used for image recognition of human motion. It is found that it is effective as a feature value for

【０１２０】また、対象画像が比較的小さい場合でも、
ＤＣＴ係数を高周波成分まで使用することにより、９８
％以上の認識率が得られ、画像が大きい場合と遜色のな
い認識率を実現できることがわかった。Further, even when the target image is relatively small,
By using DCT coefficients up to high frequency components, 98
%, And it was found that a recognition rate comparable to that of a large image could be realized.

【０１２１】〔実験２〕本発明の実施の形態において、
複数種の動作を含む一連の動画像データを対象として、
動画像検索への適用実験を行った。[Experiment 2] In the embodiment of the present invention,
For a series of moving image data including multiple types of operations,
An experiment on application to moving image retrieval was performed.

【０１２２】各動作カテゴリの学習済ＨＭＭモデルによ
り、各時点まででの尤度最大のＨＭＭモデルを選ぶこと
により、動作の検索が行えるかを検討した。It was examined whether or not an operation can be searched by selecting an HMM model having the maximum likelihood up to each time point based on the learned HMM models of each operation category.

【０１２３】画面サイズは３２×３２画素を使用し、特
徴量としてＤＣＴ係数は各ブロック当たり６とした。The screen size used was 32 × 32 pixels, and the DCT coefficient was set to 6 for each block as a feature value.

【０１２４】図７は、本発明の実施の形態における、実
験２の実験結果を示すグラフである。FIG. 7 is a graph showing the experimental result of Experiment 2 in the embodiment of the present invention.

【０１２５】図７は、各時点までの観測に基づいて、そ
れぞれ６カテゴリのＨＭＭモデルの対数尤度をプロット
したグラフである。FIG. 7 is a graph in which the log likelihood of the HMM model of each of the six categories is plotted based on the observations up to each time point.

【０１２６】したがって、尤度は、動作の終了時に最大
となることが期待される。Therefore, the likelihood is expected to be maximum at the end of the operation.

【０１２７】図７に示すグラフから、各対象動作のＨＭ
Ｍモデルが順に最大尤度となっていることが確認でき、
閾値処理により動作区間の切り出しが可能であることが
理解できる。From the graph shown in FIG. 7, the HM of each target operation is shown.
It can be confirmed that the M models have the maximum likelihood in order,
It can be understood that the operation section can be cut out by the threshold processing.

【０１２８】これにより、連続動画像データの中の特定
の動作パターンの検索が可能である。As a result, it is possible to search for a specific operation pattern in the continuous moving image data.

【０１２９】なお、前記本発明の実施の形態の説明にお
いては、ＭＥＰＧ、ＭＥＰＧ２等の標準符号化方式によ
り符号化されたＭＥＰＧデータを使用したが、これに限
定されるものではなく、例えば、ｍｏｔｉｏｎ−ＪＰＥ
Ｇ等の標準符号化方式により符号化されたデータを使用
できることはいうまでもない。In the description of the embodiment of the present invention, the MPEG data encoded by the standard encoding method such as MEPG and MEPG2 is used. However, the present invention is not limited to this. -JPE
It goes without saying that data encoded by a standard encoding method such as G can be used.

【０１３０】以上、本発明を発明の実施の形態に基づい
て具体的に説明したが、本発明は、前記発明の実施の形
態に限定されるものではなく、その要旨を逸脱しない範
囲において種々変更し得ることはいうまでもない。Although the present invention has been specifically described based on the embodiments of the present invention, the present invention is not limited to the embodiments of the present invention, and various modifications may be made without departing from the gist of the present invention. It goes without saying that it can be done.

【０１３１】[0131]

【発明の効果】本願で開示される発明のうち、代表的な
ものによって得られる効果を簡単に説明すれば、下記の
通りである。The effects obtained by the representative inventions among the inventions disclosed in the present application will be briefly described as follows.

【０１３２】（１）本発明によれば、特徴量としてＤＣ
Ｔ係数、あるいは、ＤＣＴ係数および動き補償ベクトル
を使用するようにしたので、ＭＥＰＧ，ＭＥＰＧ２等の
標準符号化方式で圧縮された少容量の動画像データか
ら、特定の動画像パターンを直接認識・検索することが
可能となる。(1) According to the present invention, DC is used as the feature value.
Since the T coefficient or the DCT coefficient and the motion compensation vector are used, a specific moving image pattern is directly recognized and searched from a small amount of moving image data compressed by a standard coding method such as MEPG or MEPG2. It is possible to do.

【０１３３】これにより、データ処理の処理時間を少な
くすることが可能となる。As a result, the processing time of data processing can be reduced.

【０１３４】（２）本発明によれば、特徴ベクトル系列
全体から尤度計算が行われるため、カテゴリに特有の特
徴ベクトル列パターンが現れていれば、時間軸方向の多
少の移動、伸縮等があっても、特定の動画像パターンを
精度良く認識・検索することが可能となる。(2) According to the present invention, the likelihood calculation is performed from the entire feature vector sequence. Therefore, if a feature vector sequence pattern specific to a category appears, a slight movement, expansion and contraction in the time axis direction is performed. Even if there is, it becomes possible to recognize and search a specific moving image pattern with high accuracy.

【０１３５】（３）本発明によれば、特徴量として使用
するＤＣＴ係数を高周波成分まで使用することにより、
認識率を大幅に向上させることができ、また、対象画像
が比較的小さい場合であっても、特徴量として使用する
ＤＣＴ係数を高周波成分まで使用することにより、認識
率を向上させることが可能である。(3) According to the present invention, by using the DCT coefficient used as the feature value up to the high frequency component,
The recognition rate can be greatly improved, and even when the target image is relatively small, the recognition rate can be improved by using DCT coefficients used as feature amounts up to high-frequency components. is there.

【０１３６】（４）本発明によれば、銀行や商店におけ
る不審行動監視、スポーツなどの動画から所望の動作部
分の切り出しなどに広く適用できる。(4) According to the present invention, the present invention can be widely applied to monitoring of suspicious behavior in a bank or a store, clipping of a desired operation portion from a moving image such as sports, and the like.

[Brief description of the drawings]

【図１】本発明の一発明の実施の形態である動画像認識
方法および動画像認識検索方法が適用される動画像認識
検索装置の概略構成を示す機能ブロック図である。FIG. 1 is a functional block diagram showing a schematic configuration of a moving image recognition / search apparatus to which a moving image recognition method and a moving image recognition / search method according to an embodiment of the present invention are applied.

【図２】ＭＥＰＧデータ１およびＭＥＰＧデータ１のＤ
ＣＴ係数の概略構成を示す図である。FIG. 2 shows MEPG data 1 and D of MEPG data 1
FIG. 3 is a diagram illustrating a schematic configuration of a CT coefficient.

【図３】本発明の実施の形態形態における、ＤＣＴ係数
の抽出方法を説明するための図である。FIG. 3 is a diagram for explaining a method of extracting DCT coefficients according to the embodiment of the present invention.

【図４】ＨＭＭモデル（隠れマルコフ）の概念を示す概
念図である。FIG. 4 is a conceptual diagram showing the concept of an HMM model (Hidden Markov).

【図５】本発明の実施の形態において、実験１に使用し
たテニス動作画像の例を示すディスプレイ上に表示した
中間調画像である。FIG. 5 is a halftone image displayed on a display showing an example of a tennis operation image used in Experiment 1 in the embodiment of the present invention.

【図６】本発明の実施の形態の実験１で対象とするテニ
ス動作画像を示すディスプレイ上に表示した中間調画像
である。FIG. 6 is a halftone image displayed on a display showing a tennis operation image targeted in Experiment 1 of the embodiment of the present invention.

【図７】本発明の実施の形態の実験２の実験結果を示す
グラフである。FIG. 7 is a graph showing experimental results of Experiment 2 according to the embodiment of the present invention.

[Explanation of symbols]

２…特徴抽出部、３…特徴格納用メモリ、４…量子化
部、５…シンボル列格納用メモリ、６…モデルパラメー
タ推定部、７…認識用状態遷移モデル格納メモリ、８…
尤度算出部、９…認識結果用メモリ。2 ... Feature extraction unit, 3 ... Feature storage memory, 4 ... Quantization unit, 5 ... Symbol string storage memory, 6 ... Model parameter estimation unit, 7 ... Recognition state transition model storage memory, 8 ...
Likelihood calculating section 9, memory for recognition result.

─────────────────────────────────────────────────────
────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成８年６月２７日[Submission date] June 27, 1996

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】特許請求の範囲[Correction target item name] Claims

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【特許請求の範囲】[Claims]

Claims

[Claims]

In a moving image recognition method for recognizing a moving image pattern of a series of moving images, image data of each screen displaying a series of moving images is divided into M × N blocks, and a DCT coefficient of each block is extracted. , Extracting at least one of the DCT coefficients as a feature vector of each screen, and a time-series feature vector sequence composed of feature vectors of each screen displaying a specific moving image pattern. Learning a state transition model for each of a plurality of specific moving image patterns serving as a recognition key, and a feature vector extracted from image data of each screen displaying a series of moving images to be recognized. Recognize a moving image pattern of a state transition model in which the likelihood of a time-series feature vector sequence with respect to a plurality of state transition models obtained by the learning is maximized. Video recognition method characterized by comprising the step of outputting a result.

2. The image data of each screen displaying a series of moving images to be recognized is compressed by a standard encoding method, and each of the image data compressed by the standard encoding method is used as a feature vector of each screen. 2. The method according to claim 1, wherein a part of DCT coefficients included in the image data of the screen is used.
The moving image recognition method described in 1.

3. A feature vector of each screen is DC
The moving image recognition method according to claim 1, wherein a motion vector is used together with the T coefficient.

4. The image data of each screen displaying a series of moving images to be recognized is compressed by a standard encoding method, and each of the image data compressed by the standard encoding method is used as a feature vector of each screen. 4. The moving image recognition method according to claim 3, wherein a part of DCT coefficients included in the image data of the screen and a motion compensation vector are used.

5. The method according to claim 1, wherein 3 to 21 DCT coefficients of low frequency components among DCT coefficients included in image data of each screen compressed by the standard encoding method are used as feature vectors. The image recognition method according to claim 2 or 4, wherein

6. A DCT coefficient on a first horizontal line among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. The image recognition method according to claim 2 or 4, wherein

7. A DCT coefficient on a first line of a vertical method among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. The image recognition method according to claim 2 or 4, wherein

8. A DCT coefficient on a diagonal line including a DC component among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. The image recognition method according to claim 2 or 4.

9. A moving image recognition and retrieval method for extracting a time region including a specific moving image pattern from a series of moving images, wherein image data of each screen displaying a series of moving images is M × N blocks. Extracting a DCT coefficient of each block, extracting at least one of the DCT coefficients as a feature vector of each screen, and a feature vector of each screen displaying a specific moving image pattern serving as a search key. Learning a stochastic state transition model using a time-series feature vector sequence composed of: and a feature vector extracted from image data of each screen displaying a series of moving images to be searched. Outputting, as a search result, a time domain having a high likelihood for the state transition model obtained by the learning in the time-series feature vector sequence. Video recognition retrieval method characterized by Bei.

10. The image data of each screen displaying a series of moving images to be searched is compressed by a standard encoding method, and each image compressed by the standard encoding method is used as a feature vector of each screen. The moving image recognition / retrieval method according to claim 9, wherein a part of DCT coefficients included in the image data of the screen is used.

11. A feature vector of each screen is D
The moving image recognition search method according to claim 9, wherein a motion vector is used together with the CT coefficient.

12. The image data of each screen displaying the series of moving images to be searched is compressed by a standard encoding method, and each image compressed by the standard encoding method as a feature vector of each screen. The moving image recognition search method according to claim 11, wherein a part of DCT coefficients included in the image data of the screen and a motion compensation vector are used.

13. A DCT coefficient of 3 to 21 low frequency components among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. Claim 10 or Claim 1
2. The image recognition search method described in 2.

14. A DCT coefficient on a first horizontal line, among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. Claim 10 or Claim 1
2. The image recognition search method described in 2.

15. A DCT coefficient on a first line of a vertical method, among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. Claim 10 or Claim 1
2. The image recognition search method described in 2.

16. A DCT coefficient on a diagonal line including a DC component among DCT coefficients included in image data of each screen compressed by the standard encoding method, is used as a feature vector. An image recognition and retrieval method according to claim 10 or 12.